Quantifying the Tacit: The Imitation Game and Social Fluency

Page created by Roger Townsend
 
CONTINUE READING
455735
2012
         SOC48110.1177/0038038512455735SociologyCollins and Evans

                                      Article

                                                                                                                                                          Sociology
                                                                                                                                              2014, Vol. 48(1) 3­–19
                                      Quantifying the Tacit:                                                                                 © The Author(s) 2013
                                                                                                                                          Reprints and permissions:
                                      The Imitation Game                                                                      sagepub.co.uk/journalsPermissions.nav
                                                                                                                                  DOI: 10.1177/0038038512455735
                                      and Social Fluency                                                                                           soc.sagepub.com

                                      Harry Collins
                                      Cardiff University, UK

                                      Robert Evans
                                      Cardiff University, UK

                                      Abstract
                                      This article describes a new research method called the Imitation Game. The method is based
                                      on the idea of ‘interactional expertise’, which distinguishes discursive performance from practical
                                      expertise and can be used to investigate the relationship between groups that diverge culturally
                                      or experientially. We explain the theory that underpins the method and report results from a
                                      number of empirical trials. These include ‘proof of concept’ research with the colour blind, the
                                      blind and those with perfect pitch, as well as Imitation Games on more conventional sociological
                                      topics such as the social relationships between men and women, homosexuals and heterosexuals,
                                      and active Christians and secular students. These studies demonstrate the potential of the
                                      method and its distinctive features. We conclude by suggesting that the Imitation Game could
                                      complement existing techniques by providing a new way to compare social relationships across
                                      social and temporal distances in both a qualitative and a quantitative way.

                                      Keywords
                                      comparative research, Imitation Game, interactional expertise, research methods

                                      Investigating Cultural Differences
                                      The way different social groups understand the world is a central topic of sociological
                                      inquiry. One indicator of this is the number of different terms that have been used to
                                      describe the phenomenon. These include: ‘taken-for-granted reality’ (Schutz, 1964);

                                      Corresponding author:
                                      Robert Evans, Centre for the Study of Knowledge Expertise and Science (KES), Cardiff School of Social
                                      Sciences, Glamorgan Building, King Edward VII Avenue, Cardiff CF10 3WT, UK.
                                      Email: evansrj1@cardiff.ac.uk

                                                                    Downloaded from soc.sagepub.com by guest on September 7, 2015
4                                                                                         Sociology 48(1)

‘form-of-life’ (Winch, 1958; Wittgenstein, 1953); ‘social collectivity’ (Durkheim, 1915);
‘paradigm’ (Kuhn, 1962); culture (Geertz, 1973; Kluckhohn, 1962); subculture (Yinger,
1982); ‘microculture’, ‘ideoculture’ (Fine, 2007); and technical expertise (MacKenzie
and Spinardi, 1995). What makes these different approaches sociological is the emphasis
placed on participation as the mechanism by which understanding is acquired.
    This interpretivist approach has brought many benefits. In our home discipline of
Science and Technology Studies (STS), studying ‘science-as-practice’ has revealed the
importance of tacit knowledge in the successful replication of scientific knowledge and
shown science to be a skillful, craft-based activity that has much in common with other
social activities. Emphasising the socialisation of scientists within disciplinary para-
digms also explains the difficulties that can arise when science is applied outside its own
domain (e.g. Irwin and Wynne, 1996). Here the absence of relevant socialisation, and
consequent lack of tacit knowledge, can lead experts to overlook important features of
the local context that are well known to other social groups. This, in turn, creates the
potential for dissatisfaction and controversy as those affected seek to have their knowl-
edge recognised. The practical implication of this work has been the development of
more participatory forms of decision-making in which dialogue and learning are pro-
moted in order to increase shared understanding and develop more legitimate and robust
solutions.
    In this article we describe a new method – the Imitation Game – that can be used to
explore the extent to which members of one social group have developed an understand-
ing of a different social world. The method is new because it uses the acquisition of
linguistic understanding to explore participants’ knowledge of practice and culture and,
in particular, the extent to which an authentic account can be produced without experi-
encing directly the practices being described. The method is also unusual in that it gener-
ates qualitative and quantitative data simultaneously. At the time of writing, we are in the
early stages of a five-year research project that should develop the method further and
train a new cohort of researchers in its use. Fortunately we are already in a position to
sketch out the method’s potential applications, report some striking results and identify
ways in which its principles make it different to other research methods. We begin by
summarising the ideas that underpin the Imitation Game and describe how the method is
used in practice.

Interactional Expertise
One way in which the differences in social experience are expressed is in language. At
the crudest level, persons brought up in different societies often speak different natural
languages and it is widely accepted that there is always something ‘lost in translation’.
The same applies to the various groups within the same society who speak the different
‘practice-languages’ associated with distinct domains of expertise (Collins, 2011). For
example, although many scientists may share a natural language (e.g. English) they also
have a specialist lexicon that relates to their own disciplinary area (e.g. physics). This
specialist language is an example of a practice-language.1
   Language is part of the ‘collective tacit knowledge’ of a social group and captures
many features of that knowledge.2 The ability to speak a practice-language fluently,

                          Downloaded from soc.sagepub.com by guest on September 7, 2015
Collins and Evans                                                                          5

either with or without the ability to execute the corresponding practices, is known as
‘interactional expertise’ (Collins and Evans, 2002, 2007).3 That interactional expertise
can be acquired in the absence of practice has been argued at length (e.g. Selinger et al.,
2007) and the ‘proof of concept’ Imitation Games described here provide empirical
support for the claim.4 They also show how the idea of interactional expertise need not
be restricted to the case of practice-languages, such as those associated with scientific
disciplines, but can used with any social group that has a distinctive set of experiences
represented in a particular form of language-use.
   The practical implications of interactional expertise are wide ranging, particularly
for social scientists concerned with how to understand social groups. Because interac-
tional expertise can be acquired without experiencing the embodied practice directly,
then it may not be necessary for, say, men to live as women, or gay people to live as
straight people, or anthropologists actually to ‘eat the eye of the sheep’, or criminolo-
gists to commit murders in order for the relevant other groups to be understood. Instead,
deep and extended immersion in the linguistic discourse can be enough. Interactional
expertise is also central to committee work, peer review, multi-agency teams, interdis-
ciplinary science and many other elements of advanced societies where a complex
division of labour is required. In these settings, interactional expertise is needed
because it enables members of different groups to coordinate their joint actions without
each individual member needing to practise or experience everyone else’s task. In
other words, interactional expertise makes it possible for there to be coordinated groups
made up of specialists (Collins, 2011).
   The Imitation Game enables these claims to be examined empirically. It does so by
investigating whether members of one social group are able to demonstrate linguistic
fluency in the practice-language of a different social group. In the game, the person
who is selected to play the role of the ‘judge’ will be fluent in the ‘target language’ and
‘target expertise’ and is tasked with trying to identify the other players by gauging their
levels of expertise. Thus the Imitation Game can test for the extent to which the tacit
knowledge of the judge’s group has been acquired by non-members of that group who
are asked to pretend to be members. If the persons who are doing the pretending are
ordinary members of a society whereas the judge is a member of some more esoteric
group, Imitation Games can show the extent to which the language of the specialist
group has entered the language of the society as a whole. To the extent that it has, the
ordinary members of society will be better able to imitate specialists; where it has not
then ordinary members will have little chance of passing as members of the esoteric
group.
   Given that language-use is linked to social interaction, the extent to which ordinary
members have learnt the specialist discourse of the other social group says something
about the ways in which the different groups interact and feel it necessary to absorb the
ways of being of the other culture. Though the Imitation Game uses the medium of lan-
guage it is still testing for the possession of tacit knowledge about the target culture. The
judges are instructed to ask the kind of question that requires a skilled judgement to
answer or to require players to describe experiences typical of the target group. The
respondents answer in plain language but the content of their answers will be authentic
only if it reflects the appropriate tacit knowledge.5

                         Downloaded from soc.sagepub.com by guest on September 7, 2015
6                                                                                         Sociology 48(1)

Interactional Expertise and Socio-cultural Difference
The possession of interactional expertise is, then, an indicator of the depth and quality of
social interactions. Therefore it can be used as a measure of socio-cultural difference and
integration. The idea can be illustrated by a retrospective application of the idea to Du
Bois’s notion of ‘double consciousness’ (1994). Double consciousness refers to the way
in which Black Americans experienced both their own culture and the dominant White
culture that enslaved them. Although the term has several different connotations, here we
are particularly interested in the fact that Black Americans could see their lives from two
perspectives: their own African identity and the discriminatory perspective of the White
American society. To the extent that Black Americans were able to articulate the norms
and values of White America – that is to correctly anticipate how a White American
would speak about a situation and reproduce this discourse – then they can be seen as
having interactional expertise in the practice-language of White America.
   If this framework was to be applied in this setting, it would differ from the existing
sociolinguistic analyses in three ways. First, rather than focus on language-use in natural
settings, it would address the rather different question of how Black Americans were able
to develop a rich and sophisticated understanding of a society from which they were
largely excluded. Used in this way the Imitation Game is not concerned with how Black
Americans spoke amongst themselves, or with White Americans. Instead, it is concerned
with how Black Americans were able to know and understand the world as a member of
the White society would see it even though they were denied membership of White cul-
tural institutions and most of the physical experiences enjoyed by members of the domi-
nant society. That is to say, despite not having the actual, physical experience of being a
White American, the immersion of Black Americans in the discourse of White America
was enough for them to attain fluency in the practice-language of the dominant society.
Without recourse to the idea of interactional expertise it is difficult to explain how such
an understanding could have arisen.
   Second, the Imitation Game could explore the differences in understanding between
the groups and hence the social distribution of interactional expertise. Again, the focus
would not be on how members of one community spoke ‘to’ or ‘about’ the other but the
extent to which they were able to speak ‘as’ the other. In this case, it seems likely that
most White Americans had little understanding of African heritage or culture and would
have been unable to ‘see’ or ‘speak’ from the perspective of a Black American. In other
words, the fluency in practice-languages is not symmetrical, as Black Americans can
reproduce the discourse of White Americans but not vice versa. Running Imitation
Games directed at the two target expertises with White judges and Black judges respec-
tively would reveal this asymmetry through a comparative analysis.
   Third, because the Imitation Game produces quantitative data about the extent to
which different groups can use each others’ discourse correctly, then any differences in
the distribution of interactional expertise can be measured and monitored. This in turn
creates the possibility for longitudinal research that tracks how the distribution of inter-
actional expertise changes over time, and comparative research that examines the distri-
bution of different kinds of interactional expertise within a single area or compares how
the same kind of interactional expertise is expressed in different places.

                          Downloaded from soc.sagepub.com by guest on September 7, 2015
Collins and Evans                                                                          7

Figure 1. The concept of the Imitation Game.

The Imitation Game Method
The aim of the Imitation Game method is to investigate these kinds of phenomena by
measuring the distribution of interactional expertise within and between societies. It is a
more rigorous version of the parlour game that inspired Alan Turing’s famous ‘Turing
Test’ for the intelligence of computers (Turing, 1950).6 The method as we have devel-
oped it is represented in Figure 1. Each Imitation Game consists of three participants
who are linked by computers and none of whom should know the identity of the others.
One participant, drawn from the ‘target group’ whose abilities are being investigated acts
as the judge. The judge creates questions and sends these to the other two participants.
Of these, one is another member of the target group and is asked to answer the questions
naturally. The other is drawn from a different group and charged with answering as if
they were a member of the target group. The judge then compares the answers and tries
to work out which comes from the person who is pretending and which from the person
who is answering naturally. The hypothesis is that, where the person who is pretending
has interactional expertise (i.e. sustained linguistic immersion leading to fluency in the
relevant practice-language), the judge will find it difficult or impossible to distinguish
between the two sets of answers.
   Initial ‘proof of concept’ research involved minority groups – the blind, the colour
blind, and those with perfect pitch – where clear prior predictions could be made. In each
case, members of the minority groups have spent their lives immersed in the majority
discourse and should have the interactional expertise needed to ‘pass’ linguistically as a
member of the majority culture. On the other hand, there was no reason to expect mem-
bers of the majority culture to have been immersed in the minority discourse so they
should find it hard to pretend to belong to the ‘target group’. If the theory is correct, and
the method works as expected, then the success rates of majority and minority judges
should be different. Specifically, the expectation is that:

   •   the blind will be more successful at pretending to be sighted than the sighted will
       be at pretending to be blind;
   •   the colour blind will more successful at pretending to be colour perceivers than
       vice versa; and
   •   those with perfect pitch will be more successful at pretending to be ‘pitch-blind’
       (i.e. not to have perfect pitch) than vice versa.

                         Downloaded from soc.sagepub.com by guest on September 7, 2015
8                                                                                          Sociology 48(1)

   More generally, we can say that, where we expect the person charged with pretending
to have interactional expertise then judges’ guesses will, over a series of individual
Imitation Games, tend towards chance. Imitation Games with this configuration are
therefore called chance conditions. In contrast, where the person pretending is not
expected to possess the relevant expertise then judges should able to work out who is
who. Imitation Games with this configuration are called identify conditions.

The Imitation Game Tests on the Blind7
Taking the tests on the blind as the exemplar we now explain the method in more detail.
In this case, participants were volunteers recruited from the area around Cardiff University.
In the case of the blind, we recruited five volunteers who had university degrees and pro-
fessional experience that broadly matched those of the sighted participants.8 We could not
recruit any congenitally blind participants but all five volunteers had lost all or nearly all
of their sight in early childhood. To take account of this, sighted judges were asked to
devise questions pertaining to experiences that belong to adult life and that the blind per-
sons could not have practised or watched.
    Each Imitation Game began with the judge composing a question that was then
relayed to the other two participants. Judges were told that one of the other two partici-
pants was pretending and that their task was to work out who was the pretender based on
the answers. Judges were asked to invent their own questions and were free to ask about
any topic they thought would distinguish between the two participants. When both par-
ticipants had answered the question their answers were displayed simultaneously on the
judge’s computer screen.
    On being presented with each pair of answers, the judge was asked to indicate which
answer came from the person who was pretending and which from the person with the
target expertise. The judge then recorded how confident they felt about this judgement
using the following four-point scale:

    1   I have little or no idea who is who.
    2   I am more unsure than sure.
    3   I am more sure than unsure.
    4   I am pretty sure I know who is who.

Judges were also asked to give their reasons for their view. This completes a ‘turn’. After
each turn the judge had the option of asking a new question or ending the game. If
the game was ended, the judge’s final decision and confidence level were counted as the
‘outcome’. A ‘Don’t Know’ option was not provided but any final guess at confidence
level 1 or 2 was counted as a ‘Don’t Know’. Confidence levels 3 or 4 were taken as indi-
cating a true guess or judgement, which was either right or wrong.
    All the Imitation Games described here were conducted in two formats: Phase 1 and
Phase 2. Phase 1 Imitation Games are conducted in the manner described above. In Phase
2, transcripts are made of each Phase 1 Imitation Game and are distributed to new judges
who also possess the target expertise. As Phase 2 is much less resource intensive than
Phase 1, more Phase 2 runs can be carried out.9

                           Downloaded from soc.sagepub.com by guest on September 7, 2015
Collins and Evans                                                                         9

   In the case of the blind, a total of 70 Imitation Games were completed. These
comprised 5 Phase 1 and 51 Phase 2 chance condition runs (i.e. with sighted judges) and
four Phase 1 and 10 Phase 2 identify condition runs (i.e. with blind judges). The differ-
ences between the numbers in the two conditions arise from the difficulty of recruiting
suitable blind volunteers, the need to have two blind volunteers for each identify condi-
tion run, and the fact that the chance condition is the test of the counter-intuitive claim
that the blind can pass as sighted.

Analysis
The results are shown graphically in Figure 2. The left-hand pair of columns, headed
‘Raw data’, show the frequencies for each final guess in Phase 1 and Phase 2. The left
column of the pair is the identify condition and, as hypothesized, the result is obviously
different from the chance condition on the right.
   To quantify this difference and make it possible to compare Imitation Games in dif-
ferent domains, the data must be recoded. First, Phase 1 and Phase 2 are aggregated by
adding the frequencies together.10 Next, any cultural variation in the willingness of
judges to make ‘high confidence’ guesses must be eliminated. For this reason, the funda-
mental measure of successful identifications is neither the absolute number of correct
guesses, nor the proportion of correct guesses. This is because both of these are affected
by the number of Don’t Knows, which may, in turn, be affected by the willingness of
judges to make high confidence guesses. Instead, the correct measure of success is the
excess of right guesses over wrong guesses (i.e. right guesses minus wrong guesses) as a
proportion of the total number of guesses. This procedure reduces the three categories
into two: the excess of right guesses and all other guesses including ‘Don’t Knows’. The
effect of this transformation is shown by the right-hand pair of columns in Figure 2,
labelled ‘Data after recoding’.

Figure 2. Imitation Game results for blind and sighted persons.

                          Downloaded from soc.sagepub.com by guest on September 7, 2015
10                                                                                                 Sociology 48(1)

   Finally, to compare results across topics the number of excess correct guesses needs
to be standardised. This is done by dividing the number of excess right guesses by the
total number of guesses, creating what we call the ‘Identification Ratio’ (IR). The num-
bers at the border of dotted black and white regions are the IRs – in this case 0.86 for the
identify condition and 0.13 for the chance condition. To see if the difference is statisti-
cally significant, we use a newly developed Monte Carlo simulation of the sampling
error of the difference between two identification ratios. In this case, the difference
between the two IRs is statistically significant at the 99 per cent level (p < 0.000).11 Thus
the experiment shows, that, as expected, the blind are able to pass as sighted but not vice
versa and, even if we were unable to inspect the dialogues, we would still be able to say
that this result is highly unlikely to have come about by chance.

Imitation Game Results with Social Significance
The results of Imitation Games on other topics are summarised in Table 1. Columns 1
and 2 show the results for the colour blind and those with perfect pitch.12 These are con-
sistent with those for the blind (column 3), with statistically significant differences in
each case. The other columns deal with topics of more direct sociological interest.13
Three different types of comparison can be made, each of which is described below.
    First, in the case of columns 4 and 5, which concern sexuality and religion, it is the
identify condition that provides the most interesting result. It is not particularly surpris-
ing that gay participants can pretend to be straight, or that active Christians can pretend
to be non-church-goers, but the extent to which the majority can pretend to the minority
is revealing of the extent to which these minority worlds are comparatively unknown
within mainstream society.
    Second, the IR can be used to compare the ‘success rates’ of different groups allowing
comparison between topics rather than within topics. In these Imitation Games, the IR
for the identify condition was around 0.4 in the case of the gay community and around

Table 1. Results of Imitation Game tests on six topics.

                   1         2             3             4                5                    6             7
               Colour     Perfect        Blind        Sexuality         Religion           Gender        Gender
               blind      pitch                                                            female/male   old/young
Chance IR      0.05       0.00           0.13         0.00              0.00               0.10          0.0
Identify IR    0.33       0.73           0.86         0.44              0.68               0.16          0.28
Effect size    0.3        0.7            0.7          0.4               0.7                0.1           0.3
Two tail sig   0.04       0.00           0.00         0.00              0.00               0.64          0.01

                           Downloaded from soc.sagepub.com by guest on September 7, 2015
Collins and Evans                                                                            11

0.7 in the case of the Christians. This suggests that knowledge of gay culture might be
more widespread amongst our majority sample than knowledge of Christianity. Though,
in this case, the (two tailed) result is just below the 5 per cent level of statistical signifi-
cance, comparing IRs in this way lays the foundation for comparative research (with
larger numbers). For example, one might expect that the IR in the Christianity Imitation
Game would be lower in those countries where Christianity is a much more prominent
part of mainstream culture as non-Christians would be better at pretending. Similarly, the
IR in Imitation Games with gays and lesbians could be compared across countries or
within countries over time to investigate the impact of equality legislation on mutual
understanding. For example, does the IR drop, as gay culture becomes more mainstream,
or does the IR increase as more legitimacy leads to the proliferation of gay-only spaces?
    Finally, Table 1 shows the results of Imitation Game runs on gender (columns 6 and 7).
Here we stratified the sample and recruited participants who were either student-age or
from their parents’ generation. When the results are compared by gender (column 6) we
discover that though women seemed slightly better at guessing when a man pretended to
be a woman than vice versa, the success rates were both quite similar to those achieved
in chance conditions and the difference between them was not statistically significant.
The implication is that, nowadays, men and women appear to be well integrated, at least
as far as cultural understanding is concerned.
    A more interesting finding emerges from comparing the results of runs conducted
among students with runs conducted among their parent’s generation (column 7). The
students were worse at pretending than the older participants. The effect size was low
(0.3) but similar to that found in the colour blind tests, and the difference between the
young and old groups was statistically significant (p < 0.000).14
    We originally set up this comparison because we believed that men and women of the
60s and 70s generation would be better integrated than contemporary men and women and
this provides encouragement that the kind of longitudinal study that we have in mind might
work. Unfortunately, we cannot eliminate a second possible cause for this outcome: it might
be that the older group have simply had more time in their longer lives to socialise with their
counterparts. Only a longitudinal study proper could hope to separate these effects.

Qualitative Results
Thus far, the quantitative analysis has been treated as the principal outcome but the dia-
logues themselves are also a potentially rich source of data. Returning to the Imitation
Games with the blind, examination of the dialogues shows that the questions asked by
the sighted judges were more numerous and more elaborate than those asked by blind
judges, but their judgements remained tentative. In contrast, blind judges could quickly
identify who was who, sometimes apologising when they thought they had misunder-
stood what they expected to be a far more difficult task.
   The tables give a sense of how these results are generated. Table 2 sets out the four
questions and answers which deal with the topic of tennis out of the nine substantive
questions and answers from an Imitation Game involving a sighted judge. The fourth
column shows four judgements from four different Phase 2 respondents, two of whom
judged one way and two the other.

                          Downloaded from soc.sagepub.com by guest on September 7, 2015
12                                                                                                 Sociology 48(1)

Table 2. Five out of nine questions from a sighted condition dialogue.

Respondent 1                Judge                             Respondent 2                    4 Phase 2 judges
I watch Wimbledon           So let me start                   I like tennis but               1) I think respondent
a little bit on the         with sport. Are                   only watch big                  1 gives himself away
television and              you interested in                 tournaments like                when he discusses the
occasionally the            tennis and do you                 Wimbledon.                      human judgements
Australian Open in          ever watch it on                                                  on the flight of a
January.                    the television?                                                   tennis ball.
Not being a tennis          So tell me what                   It adds an other                2) I cannot believe a
professional it is not      you think about                   element to the game             sighted person saying
for me to say if it         the Hawk-Eye line                 which could make it             that Hawk-Eye does
should or should not        judging system.                   more interesting.               not alter the viewing.
be used. It does not
really alter viewing.
I assume it’s the same      But I want to                     There is always a               3) The Hawk-Eye
technology in cricket       know whether                      degree of uncertainty           questions reveal
and in cricket, Hawk-       you think that                    with both people and            some quite specific
Eye is between two          the umpire or the                 technology.                     information that
and four mm out. If it      players could ever                                                I don’t think was
is the same for tennis,     make a better                                                     published in audio
then it is probably still   judgment than                                                     media. Also, the story
more accurate than          Hawk-Eye.                                                         wasn’t that important
the human eye. If the                                                                         that I’d expect it to
players are happy with                                                                        be picked up by the
it and the umpires are                                                                        audio news services
happy with it then they                                                                       provided to the blind.
should continue using
Hawk-Eye.
I think often a tennis      How accurately                    It would depend on              4) Person 2 seems
player is not in a          would you say                     the speed the ball              really unfamiliar with
position to judge           a human can                       was travelling and              Hawk-Eye, given that
accurately as they are      judge the flight                  the position of the             they say they watch
not usually parallel with   of a tennis-ball? I               judge relative to the           Wimbledon.
the line. I think that if   mean, would you                   line and obviously
you set up a test for a     say they could                    the closer the ball
line judge with two balls   tell the difference               is to the line the
one which landed on         between touch                     harder it would be to
the line and one which      the line and 1 mm                 make a judgement.
landed 1 mm away            out 2 mm out 1                    So you would have
from the line, I don’t      cm out, 2 cm out,                 to judge each call on
think they could tell the   or what, and what                 an individual basis
difference. If you think    would it depend                   as there are a lot of
how small 1 mm is then      on?                               factors.
it would be so hard for
them to judge.

   Table 3 exemplifies the contrasting style of a complete Phase 1 identify condition
dialogue with the comments from the judge, who was blind, in the fourth column. With
the one exception, the substance of Phase 1 judges’ comments was confirmed by all sub-
sequent Phase 2 judges.15

                              Downloaded from soc.sagepub.com by guest on September 7, 2015
Collins and Evans                                                                                                     13

Table 3. Example of a complete blind condition dialogue.

Respondent 1        Judge                            Respondent 2                      Phase 1 Judge
I’m 50 and have     Could you tell                   I’m 30 and I’ve                   The second person is not black
been blind since    me roughly                       been registered                   and white and you do not
I was 10.           how old you are                  blind since I was                 usually lose your sight overnight,
                    and whether                      twelve.                           so the fact they mention being
                    you have been                                                      registered suggest that they
                    registered blind                                                   are blind. If the first one was
                    since birth.                                                       blind they would normally say
                                                                                       how they became blind if it was
                                                                                       sudden. (level 2)
No I have no        Do you have                      I’ve got light and                I have both white stick and
residual sight. I   any residual                     dark and colour                   dog but would never use
use a white stick   sight and what                   perception in one                 both at same time. Therefore
and have a guide    mobility aids do                 eye and I use a                   if I was responding I would
dog.                you use?                         guide dog.                        say something like I use a
                                                                                       guide dog predominantly but
                                                                                       sometimes use a white stick
                                                                                       – but if you are blind you would
                                                                                       call it a cane normally. Also,
                                                                                       number 2 was much less black
                                                                                       and white. It’s always grades of
                                                                                       blindness. (level 4)

   The dialogues show how the game requires the judge to reflect upon the special char-
acteristics of their own culture or expertise and ask questions that bring out its esoteric
qualities – things they believe will be known by members but not outsiders. The genuine
respondent and the one with interactional expertise (i.e. chance condition) will ‘fill out’
the content of the special culture by using the appropriate practice-language to provide
authentic or realistic answers to the judge’s probing questions. In the identify condition,
however, the person pretending lacks interactional expertise and their response often
draws on a stereotypical image of the target culture and is seen by the judge as inauthentic
or unrealistic; these unsuccessful answers can show what the majority culture thinks the
minority culture is like. Whether an answer to a question is successful or unsuccessful is
shown by changes in the judge’s confidence levels and the associated comments made by
the judge.

Imitation Games as Comparative Research
Although our research to date has been based in a single location, the data we have col-
lected so far shows the potential value of the method for comparative research. We do not
claim that Imitation Games are a substitute for in-depth field work but they do provide
an alternative to methods such as focus groups or semi-structured interviews. The advan-
tage of the Imitation Game is that its ‘proxy researcher’ philosophy should mean that
comparative data from culturally diverse groups can be collected without having to make
significant ‘front-end’ investments in acquiring local cultural repertoires. It is the judges,

                            Downloaded from soc.sagepub.com by guest on September 7, 2015
14                                                                                         Sociology 48(1)

the genuine respondents, and those who pass the test who are the domain experts. The
method uses judges as proxy researchers, who, in turn, use the genuine respondents as
‘informants’. The respondents who do not succeed in pretending to be members of the
target group act as ‘proxy strangers’; proxy strangers are like ethnographers or anthro-
pologists at the beginning of their sojourns in a strange society, inadvertently carrying
out ‘breaching experiments’ – in this case by providing inappropriate answers – which
reveal the characteristics of the native culture – in this case via the judges’ comments.16
    For these reasons the Imitation Game could be especially useful when it comes to
cross-cultural or longitudinal comparisons. The ordinary native members of society who
ask the questions are continually immersed in their own changing cultures and automati-
cally adjust their investigation to take account of the changing or differing ways in which
cultural divides are expressed. Differences in cross-national and longitudinal expressions
of cultural difference are, as it were, ‘factored out’ of the judgements by using ordinary
people as proxy researchers. At the same time the qualitative data this process generates
provides a detailed record of how these differences are manifested and hence how what
it means to be ‘gay’ or ‘female’ differs across time and space.
    In the case of quantitative data, a similar argument can be made for Identification
Ratio (IR). If comparative statistics are to be useful across large social distances, the
quantitative measure has to be unaffected as far as possible by the effects of social change
other than the change of interest. In this respect, the IR has two advantages over other
measures. First, the questions are always phrased in the appropriate cultural idiom and,
second, it automatically ‘factors out’ any broad change in the propensity to be more or
less certain when asked for an opinion. Historians have implied that this propensity
might change with a new weltanschauung.17 Analogously, it has been shown that what
counts as a publishable result for a scientist varies from group to group.18 In contrast, IR
remains constant in the face of a changing proportion of ‘Don’t Knows’ so long as the
difference in the proportions of right and wrong guesses stays the same.
    To exemplify the use of IR for comparative purposes, examination of Table 1 shows
that it was easier for the blind to identify sighted people (column 3, IR=0.86) than it was
for the colour blind to identify those who were not colour blind (column 1, IR=0.33).
Similar comparisons can also be made for the other topics, such as the IRs for ‘straights’
pretending to be gay and secular students pretending to be Christian and so on. It is also
possible to examine the relationship between different IRs by running Imitation Games
in different countries. For example, it is possible that the IRs for religion and sexuality
are related; where the IR in religion Imitation Games is low (i.e. religious culture is quite
mainstream) it might be the case that the IR in sexuality Imitation Games is high (i.e.
homosexuality is not widely understood). Conversely, where the IR for religion is rela-
tively high (i.e. religious culture is not widespread) it might be that the IR for sexuality
is low (i.e. homosexuality is more open). These are the kinds of thing that can be tested
with the more extensive use of the Imitation Game in which we are now engaged.
    To look further forward, more fine-grained analysis of religion could be carried out
with the IR used to compare the integration of different minority groups – Christians,
Muslims, Jews and so on – into the mainstream culture of a single country or to compare
the differential integration of a single religious group in different countries. Alternatively,

                           Downloaded from soc.sagepub.com by guest on September 7, 2015
Collins and Evans                                                                                15

if Imitation Games were conducted over longer periods, then the way degrees of integra-
tion changed over time could be compared. Such work would be particularly interesting
in domains such as sexuality where dramatic changes in social policy have taken place.
Using the Imitation Game, it would be possible to measure how the social isolation or
integration of the gay community varies from country to country and, should work on the
method be supported over a longer period, over time. Richard Hodges’ (1985) biography
suggests that Turing’s special interest in the gender parlour game was related to his
homosexuality. It would be poignant if the game were used to monitor the relaxation of
the very intolerance that led to Turing’s suicide.

Conclusion
In this article, we have described a new research method and set out some of its pos-
sible uses and applications. Its potential is, however, best illustrated by considering
the gender-based parlour game on which it is based. That Turing could find the gender
parlour game interesting, even when played with paper and pencil, suggests that in the
1930s men and women knew sufficiently little of each others’ worlds to make it that
trying to identify them gave rise to a ‘frisson’. Nowadays when we use the same
gender-based game to demonstrate the Imitation Game it is much more of a challenge
than participants expect – men and women know so much about each other that they
find it easy to pretend to be each other and hard to spot who is pretending. What would
be really interesting, however, would be to have comparative data for both the 1930s
and the current day; just think how revealing of social and cultural change it would be
to see both the change in quantitative outcomes and the changing content of the dis-
course. The material in this article can be thought of as a demonstration of what might
be achieved should the Imitation Game become a routine part of the social scientist’s
repertoire.

Acknowledgements
We are grateful to the Cardiff School of Social Sciences for enabling the initial Imitation Games
to be conducted and to the colleagues and students who have contributed, by volunteering as
research participants, by collecting data or being willing to discuss ideas. We are also grateful to
the participants at the SEESHOP conferences where the ideas that underpin this paper have been
developed.

Funding
We are grateful to the European Research Council for two grants supporting the continued devel-
opment of this work. These are a €2.26M Advanced Grant (269463 IMGAME) and a €150K Proof
of Concept Grant (297467 IMCOM). The larger grant provides funding for comparative and
cross-national Imitation Game research on a number of different topics. The aim is to explore the
robustness of the method and to develop the research protocols needed to enable the method to be
used by others. The smaller grant supports the development of robust Imitation Game software
suitable for wide distribution. This article sets out the methodological and conceptual work on
which the awards are based.

                           Downloaded from soc.sagepub.com by guest on September 7, 2015
16                                                                                           Sociology 48(1)

Notes
 1. Although based on language-use, the Imitation Game differs from comparative language
    studies as summarised in, for example, Hock and Joseph (1996). In particular, dialects, pro-
    nunciation and the origins of words are of no particular interest. Instead, the focus of the
    Imitation Game is on expertise and what different groups of people are able to say about dif-
    ferent kinds of experience. Whilst these differences may be marked by the presence of differ-
    ent dialects, it seems unlikely to be significant as the differences that matter in the Imitation
    Game are differences of content rather than form.
 2. For the concept of collective tacit knowledge see Collins, 2010.
 3. The idea of interactional expertise (Collins and Evans, 2002, 2007) needs to be distinguished
    from expertise in interacting: interactional expertise means grasping the conceptual structure
    of another’s world. It is not simply skill in interacting and nor does it seek to explain how
    successful social interaction is accomplished and rendered accountable.
 4. See also Giles, 2006.
 5. This is not the same as conveying that tacit knowledge. For example, someone who can ride
    a bike can say things pertaining to bike riding that only another bike rider will recognise as
    authentic but that does not render a non-bike rider who hears the answer able to ride a bike.
 6. The first discussion of the Turing Test is Turing, 1950, and there is now a large literature on
    it. Saygin et al., 2000, provide an indicative collection of pieces that examines the Turing Test
    from an AI perspective, though see Collins, 1990, for a detailed analysis of the methodology
    of the test. Our research is based on bespoke software written by Martin Hall. The software
    enables participants to communicate with each other using a standard web browser, automati-
    cally records the dialogue, ensures that judges make a provisional guess after each question
    and answer, records a confidence level at each ‘turn’, and prompts for an explanation of the
    reason for each guess, provisional or final. The software also ensures that both answers appear
    to the judge at the same time. Our studies are analysed statistically and a summary statistic
    allows for comparative analyses across time or across cultural groups. The Turing Test has
    also inspired other kinds of social research: see Berman and Bruckman, 2001; Herring and
    Martinson, 2004; and Nyboe, 2004.
 7. The only change from the standard protocol was that, in order for the blind participants to
    use the software, we recruited additional assistants to read out the questions and type in the
    answers.
 8. Sighted participants were recruited from within the University. All had degrees or similar
    qualifications.
 9. Checks have shown that Phase 2 is independent from Phase 1 in so far as the chance condi-
    tion Phase 2 judges’ guesses do not match the guesses of Phase 1 judges. There is, then, no
    obvious ‘stacking effect’ or its equivalent. Nevertheless, we are exploring improved methods
    for generating larger numbers of results in which questions generated in Phase 1 are used as
    the basis of more games involving only respondents not judges. This way the numbers of
    games required to give statistical significance where differences are small can be played in a
    way which more closely resembles a much larger Phase 1.
10. Though we present a statistical analysis of these results (below), where there are large differ-
    ences between the heights of the two columns, statistical analysis is almost a formality and, in
    any case, the recorded dialogues show that these differences were generated for the expected
    reasons. On the other hand, the difference in the ease with which non-gays could imitate gays

                             Downloaded from soc.sagepub.com by guest on September 7, 2015
Collins and Evans                                                                                 17

      (IR=0.4) and non-Christians could imitate Christians (IR=0.7) is just below the margin of
      statistical significance (two-tailed) given the number of games in our data. In this case the
      significance test is important and the result explains why we have looked hard for better
      ways of generating large numbers of games.
11.   Other statistical tests that could be used include Fisher’s Exact test and the Chi Square test,
      both of which support the result reported in the main text but which we believe are less exact
      for the purposes at hand. When ‘wrong’, ‘don’t know’ and ‘right’ answers are coded as –1,
      0, and 1 respectively, our Monte Carlo method gives similar results to a t-test applied to the
      difference between means so long as numbers are large. With this coding, our method is
      arithmetically identical to bootstrapping. We are grateful to the late Professor Tony Coxon
      for an early affirmation of our Monte Carlo method and to Professor Bernard Silverman for
      explaining its relationship to the t-test; only later did we come to understand the identity of
      our conceptually simple approach with less intuitive idea of bootstrapping.
12.   Collins et al., 2006, provide a full analysis.
13.   Imitation Games on sexuality, religion and gender were all carried out with the help of
      students taking part in a social research methods course at Cardiff University. Unless other-
      wise stated, all participants were drawn from the student population.
14.   This is a brand new technique and a more thorough analysis of possible sources of non-
      random error is also in hand.
15.   This sort of variation is inevitable when using human participants.
16.   For breaching experiments see Garfinkel, 1967. For earlier use of the idea of the ‘proxy stran-
      ger’ see Collins and Kusch, 1998; Hartland, 1996.
17.   Forman (1971) considers that Weimar Germany, characterised by the political and cultural
      uncertainties engendered by defeat in the First World War, may have been an especially fertile
      ground for the rise of quantum theory, based, as it is, on the positing on uncertainty in the
      world of physical phenomena.
18.   For the case of physicists’ varying standards see Collins, 1998, 2004 (Chapter 22).

References
Berman J and Bruckman A (2001) The Turing Game: Exploring identity in an online environment.
   Convergence 7(3): 83–102.
Collins H (1990) Artificial Experts: Social Knowledge and Intelligent Machines. Cambridge, MA:
   MIT Press.
Collins H (1998) The meaning of data: Open and closed evidential cultures in the search for gravi-
   tational waves. American Journal of Sociology 104(2): 293–337.
Collins H (2004) Gravity’s Shadow: The Search for Gravitational Waves. Chicago, IL: University
   of Chicago Press.
Collins H (2010) Tacit and Explicit Knowledge. Chicago, IL: University of Chicago Press.
Collins H (2011) Language and practice. Social Studies of Science 41(2): 271–300.
Collins H and Evans R (2002) The third wave of science studies: Studies of expertise and experi-
   ence. Social Studies of Sciences 32(2): 235–96.
Collins H and Evans R (2007) Rethinking Expertise. Chicago, IL: University of Chicago Press.
Collins H and Kusch M (1998) The Shape of Actions: What Humans and Machines Can Do. Cam-
   bridge, MA: MIT Press.
Collins H, Evans R, Ribeiro R and Hall M (2006) Experiments with interactional expertise. Studies
   in History and Philosophy of Science 37(A/4): 656–74.

                            Downloaded from soc.sagepub.com by guest on September 7, 2015
18                                                                                          Sociology 48(1)

Du Bois WEB (1994) The Souls of Black Folk. Avenel, NJ: Gramercy Books.
Durkheim E (1915) Elementary Forms of the Religious Life. London: George Allen and Unwin.
Forman P (1971) Weimar culture, causality and quantum theory, 1918–1927: Adaptation by Ger-
    man physicists and mathematicians to a hostile intellectual environment. In: McCormack R
    (ed.) Historical Studies in the Physical Sciences, No 3. Philadelphia: University of Pennsyl-
    vania Press, 1–115.
Garfinkel H (1967) Studies in Ethnomethodology. Upper Saddle River, NJ: Prentice-Hall.
Geertz C (1973) The Interpretation of Cultures. New York: Basic Books.
Giles J (2006) Sociologist fools physics judges. Nature 442: 8.
Hartland J (1996) Automating blood pressure measurements: The division of labour and the trans-
    formation of method. Social Studies of Science 26: 71–94.
Herring SC and Martinson A (2004) Assessing gender authenticity in computer-mediated lan-
    guage use. Journal of Language and Social Psychology 23(4): 424–46.
Hock HH and Joseph BD (1996) Language History, Language Change and Language Relation-
    ship. Berlin: Mouton de Gruyter.
Hodges A (1985) Alan Turing: The Enigma of Intelligence. London: Unwin.
Irwin A and Wynne B (eds) (1996) Misunderstanding Science? The Public Reconstruction of Sci-
    ence and Technology. Cambridge: Cambridge University Press.
Kluckhohn R (ed.) (1962) Culture and Behavior: Collected Essays of Clyde Kluckhohn. Glencoe,
    IL: Free Press of Glencoe.
Kuhn TS (1962) The Structure of Scientific Revolutions. Chicago, IL: University of Chicago Press.
MacKenzie D and Spinardi G (1995) Tacit knowledge, weapons design and the uninvention of
    nuclear weapons. American Journal of Sociology 101: 44–99.
Nyboe L (2004) ‘You said I was not a man’: Performing gender and sexuality on the internet.
    Convergence 10: 2.
Saygin PS, Cicekli I and Akman V (2000) Turing Test: 50 years later. Minds and Machines 10:
    463–518.
Schutz A (1964) Collected Papers II: Studies in Social Theory. The Hague: Martinus Nijhoff.
Selinger E, Dreyfus H and Collins H (2007) Embodiment and interactional expertise. In: Collins H
    (ed.) Case Studies of Expertise and Experience: Special Issue. Studies in History and Philoso-
    phy of Science 38(4): 722–40.
Turing AM (1950) Computing machinery and intelligence. Mind LIX (236): 433–60.
Winch PG (1958) The Idea of a Social Science. London: Routledge and Kegan Paul.
Wittgenstein L (1953) Philosophical Investigations. Oxford: Blackwell.
Yinger JM (1982) Countercultures: The Promise and the Peril of a World Turned Upside Down.
    New York: Free Press.

Harry Collins is Distinguished Research Professor of Sociology and Director of the
Centre for the Study of Knowledge, Expertise and Science (KES) at Cardiff University.
His books include two on artificial intelligence (MIT Press), the award-winning Golem
Series (with Trevor Pinch – Cambridge and Chicago), Changing Order (1985, 1992),
three books on the sociology of gravitational wave detection, the latest being Gravity’s
Ghost and Big Dog: Scientific Discovery and Social Analysis in the Twenty-first Century
(2013, forthcoming), Rethinking Expertise (2007 with Robert Evans), Tacit and Explicit
Knowledge (2010) all the latter being published by University of Chicago Press. He is
currently the recipient of a five-year Advanced Research Grant from the European
Research Council (ERC) that will be used to develop the Imitation Game method. In
2012 he was elected a Fellow of the British Academy.

                            Downloaded from soc.sagepub.com by guest on September 7, 2015
Collins and Evans                                                                        19

Robert Evans is a Reader in sociology at the Cardiff School of Social Sciences, where his
research focuses on research methods, the sociology of science and technology and the
nature of expertise. Previous projects have examined economic forecasting, GIS models
for sustainable development and genetics. His current research is devoted to developing
the ideas set out in the ‘Third Wave of Science Studies’ paper co-authored with Harry
Collins (Social Studies of Science, 2002) and, in particular, the use of the Imitation Game
method for comparative research.
Date submitted February 2011
Date accepted June 2012

                         Downloaded from soc.sagepub.com by guest on September 7, 2015
You can also read