Dare you buy a Henry Moore on eBay? - Statistics can tell you what to avoid
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Dare you buy a Henry Moore on eBay? Statistics can tell you what to avoid When the rarefied world of modern art sales meets the digital age, almost anything is possible. You, too, can buy a Henry Moore on eBay. But it is risky. The old, high-commission auction-houses have rivals, but you will need statistics to guide you. Joseph Gastwirth and Wesley Johnson tell you where the fakes may be lurking. Henry Moore’s sculptures – huge works in bronze or carved marble – are iconic. The public know them from museums and public places. Less well known is that Moore, the pre-eminent British sculptor of the 20th century, produced many smaller-scale sculptures, drawings, etchings and lithographs, and that these frequently come up for sale. One perhaps surprising place to find them is on eBay. Guide prices there range from £250 to tens of thousands of pounds. And such eBay Henry Moores are not at all uncommon. On one day in December 2010, no fewer than five were on offer, from apparently separate sellers, and all described as original. The internet has provided consumers with new and easy ways to purchase goods – and the commis- sions charged by an internet auction host are a fraction of those of the major art houses. But it has allowed less scrupulous businesses and individuals to offer poor-quality or mislabelled items. Ideally, before buying works at auction one would have experts examine them, Mother and Child II (1983) Cramer, Grant and Mitchinson (CGM) catalogue 672 as is done at the big auction-houses like Sotheby’s and 10 march2011 © 2011 The Royal Statistical Society
because they recently had surgery. The method authentic when it is not, objects that were also needs at least two evaluators of classifica- described as “similar” to or “related” to Moore tions or tests of success or failure. Furthermore, were excluded. We analysed only objects that the evaluations should be independent of each were claimed to be actually by Henry Moore. other. In the case of screening for DVT, one test measures the level of antibodies while the The data and statistical model other is based on different technologies4. Here we had our two subgroups – draw- The results of the study were summarized in ings and sculpture, where questionable works two 2 × 2 tables reporting the matched pair are common, versus lithographs and etchings, classifications for the two groups of artwork where they are rarer. In order to obtain a suf- (Table 1). The drawings and sculpture data, ficient and representative sample of Moore’s as we have said, were combined because the work, almost all of the objects described as background information indicated that the having been created by Henry Moore that ap- prevalence of non-genuine items of both types peared on eBay during the period from March was similar. Furthermore, the fractions of 2005 to November 2007 (239 of them in all) these two items that both evaluators thought were assessed. We needed not one but two dubious were similar. Table 1 shows, for each independent evaluators: the first was Stephen group, the items that both evaluators thought Gabriel, an expert on Moore, and the second questionable; that Stephen Gabriel thought was one of the authors ( JLG). Both have had questionable but Gastwirth thought genuine; a long-time interest in Moore’s art and have that Gabriel thought genuine but Gastwirth Mother and Child VIII (1983) CGM catalogue 678 extensive libraries. A third collaborator, Dr H. thought questionable; and that both evaluators Hikawa, downloaded the descriptions of each thought genuine. item, which typically included a digital photo, Two things affect the numbers that ap- Christie’s, but for items listed on eBay, which and provided the two evaluators with copies. pear in the tables: the actual prevalence of non- come from all over the world, this is impracti- The files were e-mailed to the first evaluator, genuine objects in each of the two groups, and cal. One of us ( JLG) has long been interested while the second evaluator was given a printed the accuracy of the evaluators. This is where in art by Moore. After noticing on eBay a small version. To further ensure the independence having two independent evaluators is so vital: sculpture, supposedly made by Henry Moore, of the evaluations, the two assessors did not they provide a mutual cross-reference. Suitably that did not “look right” he checked with discuss any of the art for sale during this statistically treated, each can provide a stand- friends at the Moore Foundation. period. Because a major objective of the study ard by which to judge the other. Furthermore, They had received inquiries from buyers is to protect consumers against “misleading” the evaluators’ accuracy has two parts. The first who have purchased works incorrectly attrib- descriptions, which suggest that an item is is their sensitivity. This is the probability that a uted to Moore; so the question of estimating the prevalence of counterfeit art work arose. From our informal correspondence it Table 1. Assessments of genuineness of Henry Moore’s art offered on eBay from March 2005 to November appeared that a much higher percentage of 20075 “drawings” or “small sculptures” were dubious than was the case for signed etchings and Prints Evaluator 2 lithographs (prints). This last detail sug- gested a statistical approach that we could Questionable Genuine Total use to estimate the proportion of fake Henry Moores – or “questionable works”, in the more Evaluator 1 Questionable 6 10 16 cautious language of the art world – that were Genuine 1 149 150 out there. In medical and social science applications, Total 7 159 166 where even the best method of classification is not a “gold standard”, the Hui–Walter method1 can be used to estimate the accuracy rates of Sculptures and drawings Evaluator 2 clinical tests2 and survey classifications3. That Questionable Genuine Total method requires one to study two subpopula- tions, with a different prevalence of the trait Evaluator 1 Questionable 59 6 65 in each. The high prevalence group might be individuals who had symptoms of deep vein Genuine 2 6 8 thrombosis (DVT) while the low prevalence Total 61 12 73 group consists of individuals at risk of DVT march2011 11
non-genuine object will be classified correctly interval gives 82% of them questionable. This the data9. The estimated correlation was 0.29, – that they will spot a fake. The second part clearly means that government agencies con- which is insufficient to result in a serious bias is their specificity, which is in some ways the cerned with consumer protection are justified in the prevalence estimates. reverse – that they will know a genuine article in informing the public of potential authentic- when they see one. These are very far from be- ity issues. In contrast, only 4.1% of the signed ing the same thing. An evaluator who classified prints appear to be of doubtful authenticity. Implications for buyers of artwork every item as genuine would have a very high The obvious first lesson is: if you are thinking specificity, but a sensitivity of zero. of buying a Henry Moore on eBay, buy a print Clearly the results indicate that consumers If one considers the classification of rather than a drawing or small sculpture. should not take for granted the authenticity of objects in the framework of classical statistics, While a number of authors have raised works by Moore, and probably other major art- where the null hypothesis is that the object is questions about the validity of the estimates ists, that are offered on eBay or other internet genuine and the alternative is that it is not, the from latent class models such as the Hui– sellers, and that they should carefully compare Type I error equals 1 minus specificity and the Walter7, most of the studies indicate that it is the digital photographs and related informa- Type II error is 1 minus sensitivity. the estimates of sensitivity and specificity that tion provided by sellers with the correspond- The Hui–Walter method takes the data are most affected by modest violations of its ing information in the major catalogues. This of Table 1 and calculates probabilities of assumptions; the estimates of prevalence are also applies to Moore’s prints because several genuineness for objects in each category and more sturdy. Furthermore, the greater the dif- that were classified as non-genuine were from calculates also estimates of the accuracies of ference in the prevalence of the characteristic an unsigned version where a questionable the evaluators. The virtue of the method is that in the two groups, the greater is the robustness signature was added. As in all observational it gives information both about the evaluators of the prevalence estimate8 – and here our studies, there is a possibility that some impor- and the evaluated. It assumes that the accuracy difference is indeed great: between fake rates tant covariates, such as provenance or prior rates of each evaluator are the same for both of 91.5% in drawings and 4.1% in prints lies ownership of the item, were not available. It is categories of art and that, conditional on the a difference of 87.4%. We may therefore place difficult to think, however, of a realistic covari- true status of an object, the evaluations are some reliance on our conclusions. The key as- ate that could explain the very low prevalence independent. Given that, it provides statistical sumption is that each evaluator has the same of genuine drawings and small sculptures. The estimates of the specificity and the sensitivity of sensitivity and specificity for artworks of both very high proportion of dubious drawings and each evaluator, and of the fraction of prints and types, and that they are independent. small sculptures by Moore offered on eBay the fraction of drawings that are questionable. Although we took pains to ensure that indicates that prospective buyers of art by The results, with their confidence intervals, are the evaluators worked independently, there other major artists, such as Picasso or Chagall, given in Table 2. are two ways in which a modest degree of should also be very careful. Although the confidence intervals for the dependence could arise. Some sellers may accuracy rates overlap, they suggest that the offer multiple objects and, whether by design Potential applications in legal cases evaluators had similar but not identical rates of or ignorance, there is likely to be correlation accuracy. The first evaluator, Stephen Gabriel, in the status of the items put on eBay by the After we began the project we became aware was more sensitive, detecting more counterfeit same seller. Also, both evaluators probably of several legal decisions in cases where eBay items, while the second, Joseph Gastwirth, had consulted many of the same definitive cata- was sued for assisting the sale of counterfeit a slightly higher specificity, correctly classifying logues and books and might have compared products. All the suits involved possible viola- legitimate items. What is more remarkable is the photograph on eBay with the same “refer- tions of intellectual property and trademark the estimated prevalence of dubious drawings ence photo”. To check the potential sensitivity infringement, but the legal criteria used in and sculptures: 91.5% of them are questionable. of the results to possible dependence, a model different nations are not uniform. Moreover, Even taking the lower end of a 95% confidence allowing for such correlation was also fitted to eBay did have a process that allowed firms to report counterfeit items. Statistical evidence had a key role in many of the cases. In the United States, eBay was found not Table 2. Maximum likelihood estimates of the two prevalence parameters and accuracy rates of the two evaluators. Maximum likelihood estimates were obtained using the EM algorithm with standard errors based to have contributed to trademark infringement on the bootstrap using the program TAGS6 in Tiffany v. eBay10. Tiffany presented a survey which claimed that about 75% of the items labelled as its product were counterfeit, while Parameter Mean 95% Confidence interval only 5% were surely genuine11,12. The courts Se1, sensitivity of evaluator 1 (Stephen Gabriel) 0.968 (0.877,0.992) decided that, even though eBay had general Se2, sensitivity of evaluator 2 (Joseph Gastwirth) 0.913 (0.810,0.962) knowledge that counterfeit Tiffany silver jew- Sp1, specificity of evaluator 1 0.941 (0.889,0.969) ellery was being sold, it was only required to Sp2, specificity of evaluator 2 0.995 (0.939,0.999) take action if it had contemporary knowledge of which particular listings were infringing or Prev1, fraction of prints that are dubious 0.041 (0.018,0.089) would infringe in the future. Furthermore, the Prev2, fraction of sculptures and drawings that are dubious 0.915 (0.818,0.962) trial court noted significant flaws in Tiffany’s 12 march2011
survey. It was not probability-based, so one amount (at least 30%) of Tiffany jewellery trademark infringement cases. The method could not calculate a confidence interval for the was counterfeit, this only helped establish that we have used here might be adapted to help fraction of non-genuine items. Furthermore, eBay had general knowledge that counterfeit monitor the authenticity of items offered for the search used to identify the items that products were being sold on its site. sale on the internet. were purchased and examined by two Tiffany In France, however, a study that was experts included non-silver jewellery as well submitted by Christian Dior and Luis Vuitton Potential refinements and as the silver items that were at issue in the estimated that 90% of items allegedly made by improvements to the study design case. The sample sizes (186 in 2004 and 139 these designers were not genuine. This study in 2005) were less than those specified by the was accepted by the court. This estimate is Our work can be regarded as a proof of prin- survey designer. One reason for this shortfall surprisingly similar to the 91.5% prevalence ciple: it is possible to obtain reasonable esti- was that Tiffany was unable to purchase some estimate for non-genuine Henry Moore mates of the prevalence of counterfeit items of the items that were supposed to be in their drawings and small sculptures in our study. even when the evaluators do not examine the sample. It was quite likely that those “missing Partly on this basis, eBay was found liable for pieces individually. During the time the data items” had a higher probability of being genu- contributing to trademark infringement. were collected, the evaluators observed that ine than those that they were able to acquire, Although surveys have been used to some particular art objects came up repeat- as knowledgeable individual buyers were also estimate the proportion of potential consum- edly, and that items from some particular bidding for the genuine pieces but not for ers who are “confused” – a polite word for sellers, especially those who sold many items, the fakes. Finally, Tiffany did not participate “deceived” – as to the source of a product were more likely not to be authentic. The in eBay’s monitoring programme during this because of the design or packaging or are approach could be improved by incorporat- time, so that items that could have been re- misled by advertising, statisticians may not ing knowledge that is gained during a first moved from the site were not. Although eBay’s fully appreciate the potential for using sta- phase, either about the type of items that are statistical expert agreed that a substantial tistical surveys and studies similar to ours in non-genuine or sellers of those products, into Two Women Seated on Beach (1984) CGM catalogue 719 march2011 13
a second phase study. That study might be a probability-based buying programme that is focused on a smaller group of likely sellers of problematic objects. When it is possible to obtain a third in- dependent evaluation the latent class approach does not require two subpopulations and has been successfully used to evaluate screening tests and estimate the prevalence of disease in animals. The three-evaluator version is well suited to estimating the prevalence of counter- feit jewellery, as a second subpopulation with a low prevalence of fakes might not exist. One possible limitation of the method is that an infringing seller might purchase one expensive handbag, say, make counterfeit ver- sions, but put a picture of the genuine bag on the internet. Presumably, a disappointed pur- chaser would complain to eBay, which would inform the company about a particular seller of infringing items. Trademark holders and consumer protection agencies might still find a broad-based study or survey that provided a statistically reliable estimate of the fraction of counterfeit products sold by internet sites useful both in legal cases and to inform policy- makers and the public of the magnitude of the problem. References 1. Hui, S. L. and Walter, S. D. (1980) Esti- mating the error rates of diagnostic tests. Biometrics, 36, 167–171. 2. Pepe, M. and Janes, H. (2007) Insights into latent class analysis of diagnostic test performance. Biostatistics, 8, 474–484. Two Reclining Figures in Yellow and Green (1967) CGM catalogue 74 3. Sinclair, M. D. and Gastwirth, J. L. (1996) On procedures for evaluating the effectiveness of reinterview survey methods: application to labor force data. Journal of the American Statistical As- models overstate accuracy for binary classifiers? (in 207–238. sociation, 91, 961–969. press). 12. Levin, E. K. (2009) A safe harbor for 4. Line, B.R., Peters, T. L. and Keenan, J. 8. Sinclair, M. D. and Gastwirth, J. L. (2000) trademark: Reevaluating secondary trademark (1997) Diagnostic test comparisons in patients Properties of the Hui and Walter and related meth- liability after Tiffany v. eBay. Berkeley Technology with Deep Venous Thrombosis. Journal of Nuclear ods for estimating prevalence rates and error rates Law Journal, 24, 491–527. Medicine, 38, 89–92. of diagnostic testing procedures. Drug Information 5. Gastwirth, J. L., Johnson, W. O. and Journal, 34, 605–615. Joseph Gastwirth is Professor of Statistics and Hikawa, H. (2011) Estimating the fraction of 9. Dendukuri, N. and Joseph, L. (2001) Economics at the George Washington University, “non-genuine” artwork by Henry Moore on eBay: Bayesian approaches to modeling the conditional Washington, DC, and Wesley Johnson is Professor of application of latent class screening test methodol- dependence between multiple diagnostic tests. Statistics at the University of California at Irvine. ogy. Journal of the Royal Statistical Society, Series A, Biometrics, 57, 158–167. 174 (in press). 10. 576 F. Supp. 2d 463 (S.D.N.Y. 2008) and Acknowledgements 6. Pouillot, R., Gerbier, G. and Gardner, I. 600 F. 3d 93 (2d. Cir. 2010). Grateful thanks are due to the Henry Moore Founda- A. (2002) “TAGS”, a program for the evaluation 11. Goldwasser, K. (2010) Knock it off: An tion (www.henry-moore.org) for their generosity in of test accuracy in the absence of a gold standard. analysis of trademark counterfeit goods regulation providing digital images of the artwork. A fuller, more Preventive Veterinary Medicine, 53, 67–71. in the United States, France and Belgium. Cardozo technical version is to appear in the Journal of the 7. Spencer, B. (2010) When do latent class Journal of International and Comparative Law, 18, Royal Statistical Society, Series A. 14 march2011
You can also read