Social TV: Designing for Distributed, Sociable Television Viewing

Page created by Rick Ramsey

Lifestyle

English

Like
Share
Embed
Fullscreen
Slides
Download HTML
Download PDF
Abuse

←

→

Page content transcription

If your browser does not render page correctly, please read the page content below

Social TV: Designing for Distributed, Sociable Television Viewing

Nicolas Ducheneaut1, Robert J. Moore1, Lora Oehlberg2, James D. Thornton1, Eric Nickell1
1
Palo Alto Research Center; 2UC Berkeley, Mechanical Engineering
[nicolas, jthornton, bobmoore, nickell]@parc.com

Abstract While there is now little doubt watching television
Media research has shown that people enjoy can be a “ticket to talk” (Sacks, 1992) encouraging
watching television as a part of socializing in interaction between groups of viewers, there is little
groups. However, many constraints in daily life research available on the exact practices
limit the opportunities for doing so. The Social TV surrounding sociable television viewing. Media
project builds on the increasing integration of research has focused essentially on the role that
television and computer technology to support television plays in social groups like the family,
sociable, computer-mediated group viewing examining for instance how programs are
experiences. In this paper, we describe the initial interpreted outside the viewing context and how
results from a series of studies illustrating how this contributes (or not) to the maintenance of the
people interact in front of a television set. Based on family system (Alexander, 1990). This research
these results, we propose guidelines as well as tends to be based on surveys and ethnographic
specific features to inform the design of future observations of group viewing like Lull’s (1990)
“social television” prototypes. remain rare. Audiences have been generally
considered as “a category rather than a way of
1. Introduction being” (Casey et al., 2002) – that is, past research
Television has often been criticized as an isolating, has investigated how a given audience’s socio-
anti-social experience. However, early ethnographic demographic characteristics affect their choice of
studies (Lull, 1990) showed that programs, rather than how they behave as active
participants engaged in social activities while
“TV and other mass media, rarely watching television.
mentioned as vital forces in the
construction or maintenance of As such, the actual mechanics of joint viewing (e.g.
interpersonal relations, can now be seen to When do people talk during a show? About what?
play central roles in the methods which To what effect?) remain largely unexplored (an
families and other social units employ to interesting exception can be found in Walker and
interact normatively.” Bellamy, 2001, a survey of remote control use
during family viewing). This lack of data has
And indeed, television viewing appears to be important design implications: a better knowledge
largely a social activity, often conducted in groups of joint viewing practices could help develop new
(Morrison, 2001). In fact, the worth of a particular technology to better support television-mediated
television program is often gauged according to the sociability.
amount of social interaction it generates (White,
1986). Television can foster multiple forms of It might seem that practices around a technology as
sociability: direct (e.g. when chatting with friends well-established as television are not amenable to
and family during a “movie night” at home) or redesign, but in fact both societal change and
indirect (e.g. when discussing previously viewed technological innovation have been affecting
programs with colleagues at the office water viewing, as well as entertainment in general, for a
cooler), and both are equally worthy of attention. long time. Many characteristics of contemporary
Previous research (Morrison, 2001) highlighted a life in wealthy societies conspire against traditional
similar distinction between the “internal” social joint television viewing, despite its social appeal.
functions of television viewing (when family Urban sprawl, for instance, can make travelling to a
members watch television together) and its friend’s house or a public space for a movie night
“external” functions (e.g. television programs as inconvenient (Oldenburg, 1989); domestic isolation
topics of conversation at work or elsewhere; special and scheduling constraints prevent gatherings
events organized at home such as inviting friends (Putnam, 2000); and increasing mobility often
over for watching the Superbowl). separates family members (e.g. a child living away
from his family to attend university). Sociability is
becoming more and more distributed in this context

as technology enables diverse remote interactions. down the barriers between audience members and
Online computer games, for instance, are displacing performers, as opposed to facilitating group
television among young viewers, in part because interaction while watching television.
they offer an interactive, social, and location-
independent experience (Subrahmanyam et al., 3. Understanding TV-Mediated Sociability
2002; Schwartz, 2004). In order to understand what can make group
television viewing so enjoyable, we invited
Technology is changing television practices also, participants to watch various television shows
through integration of computing capabilities as in together in a specially equipped room in our
the case of digital video recorders such as TiVo 1. laboratory. Our experiments were meant to
Our Social TV project is about leveraging this kind reproduce one prevalent form of group television
of computing integration to remove the increasing viewing: the “viewing party” where, for instance, a
barriers to sociable interaction around video group of friends get together to watch the latest
content. In focusing on the direct form of sociable episode of the show “Desperate Housewives.” It is
viewing we will be talking about design for important to recognize, however, that other forms
distributed, shared television viewing. While this of group television viewing are equally interesting
necessarily changes the television experience, our and that they would probably affect viewing
concern is to preserve the ‘natural’, familiar social patterns differently. For instance, we plan to
atmosphere of watching television in a collocated investigate “everyday viewing” (e.g. married
group. This goal of enhancing social opportunity couples watching television together in their own
while fitting existing practice frames the central home, without planning) at a later date.
design challenge.
We began with three exploratory sessions where
In this paper, we report on preliminary studies and viewers were all located in the same room and
design concepts that serve our goal. We begin with could interact face-to-face. The smallest group size
observation of what television viewers do while was 5 and the largest 8. Participants were between
they are watching television in groups, both when 20 and 50 years old, with more males than females
they are physically co-present and geographically (70% and 30% respectively). Most were colleagues
distributed. Based on our observations, we then but a few shared social activities outside of work.
articulate in the second part of this paper a series of Participants watched a soccer game or one of two
design guidelines and concepts for future Social TV episodes of a documentary (the BBC’s “1900
prototypes. House”), each lasting approximately 2 hours
overall. We used two cameras to videotape both the
2. Related Work participants’ behaviour and the content of the
Surprisingly, very few systems have been television show. The tapes were later reviewed and
developed to support social interactions of any kind coded (Glaser, 1998) by two of the authors, with a
among television viewers. Two projects are closest particular attention to the local organization of talk
to ours. The first one is Chuah’s (2002) “reality among the participants (Sacks et al., 1974). Our
instant messenger”, which provides both “buddy observations were complemented by a
surfing” (an awareness that friends are watching the questionnaire asking participants about their group
same television program) and an IM-based television viewing habits (e.g. how often they
communication channel between viewers. The viewed TV in groups, the kinds of content watched,
second related project is Alcatel’s Amigo TV etc.), as well as exit-interviews after the viewing
(Coppens et al., 2005), which allows television session to reflect on their experience.
viewers to share opinions and feelings with friends
via an interactive broadband link. Amigo TV shares In the second phase of our study we held six
a similar design philosophy to ours but, as we shall additional viewing sessions, this time separating the
see later in this paper, the range of features offered participants into two groups located in two different
to users is quite different. rooms. A social audio link was established between
the rooms using two computers running Robust
Another approach has been to transform television Audio Tool (RAT2, see Figure 1). By relaying all
into an inhabited virtual world. In Benford et al, audio between the two rooms, each group could
(1998), audience members control avatars in a 3D hear what the other was saying at all times. While
space and can interact with the performers of the the audio quality was adequate, it was certainly
show they are watching. The focus is on breaking degraded compared to co-present interactions.

1 2
http://www.tivo.com http://www.netsys.com/mbone/software/rat/

However, the quality was good enough to allow our      invited again some of the participants from the first
experiments to proceed.                                phase to allow them to compare and contrast the
                                                       two conditions. Participants watched a wide range
                                                       of content ranging from sporting events (the
                                                       Olympics) to cartoons (the Simpsons) and reality-
                                                       TV shows (the Real World).

                                                       3.1. Survey Results
                                                       The majority of the survey respondents reported
                                                       that they routinely held viewing parties with their
                                                       friends, and among the reasons why were that “TV
                                                       [is] an excuse for sociality… [we watch] what[ever]
                                                       TV is "big" enough to justify having someone come
                                                       over.” This confirmed both our intuitions and
                                                       earlier research about the attractiveness of group
                                                       television viewing.

                                                       We also asked people what kinds of television
                                                       programs they watched in groups. Among the
                                                       responses the most popular genres included
                                                       Animation, Sports Events, Documentaries, Action-
   Figure 1 – Equipment used during Phase 2            Adventure, and Reality Television. It is clear from
                                                       our participants’ comments that certain qualities in
These later experiments were meant to simulate
                                                       TV shows encourage sociability more than others.
some of the conditions users might encounter while
                                                       In particular, shows with bursty rhythms or
watching television in a distributed setting. We
                                                       redundant content (such as sporting events) provide
chose not to transmit video streams of the
                                                       plenty of pauses and opportunities for interaction.
participants’ faces between the rooms, as this has
                                                       People-centered content (such as reality TV shows)
already been tried elsewhere (Huijnen et al., 2004).
                                                       provide audiences with many “conversational
Our focus was on the most basic aspect of
                                                       props” (Lull, 1990). Indeed, as one of our
sociability: conversation.
                                                       participants humorously mentioned during an exit
                                                       interview, “It’s fun to make fun of people.” This
As before, we used cameras (this time, three of
                                                       parallels earlier research on the social psychology
them) to videotape both the participants and the
                                                       of entertainment. Stone (1981), for instance, noted
audience. We assembled and synchronized the three
                                                       that sports spectacles “are conversation-pieces, and
video feeds such that they could be reviewed and
                                                       conversations about them before, during, and after
coded on a single screen (see Figure 2). We
                                                       the event bring people together in an emotional
administered the same questionnaire after each
                                                       rapport” (p. 222). The talk around the event itself
session as was used in phase 1.
                                                       may be interpreted as “a loosely structured game
                                                       […] a form of nonutilitarian play” (Crabb and
                                                       Goldstein, 1991, p.367).

                                                       Poor quality movies were also often mentioned as a
                                                       good way to foster social interaction. Our
                                                       respondents did not only find it fun to talk about
                                                       poor casting, acting, or effects: a consensus about a
                                                       lack of important or relevant dialogue in a show
                                                       seems to provide viewers with as many
                                                       opportunities to comment as they would like.
                                                       Shows such as “Mystery Science Theater 30003”
                                                       have capitalized on this phenomenon. All the
                                                       characteristics above encourage a kind of “vicarious
 Figure 2 – Merged video feeds used for coding         audience play” (Sutton-Smith, 2001) that is central
         our participants’ interactions                to sociable television viewing.
In the second phase, there were at least 2 and up to
6 participants in each room. The gender ratio was
the same as before, and so was the age range. We       3
                                                           http://www.mst3kinfo.com/mstfaq/basics.html

As we were collecting survey results, it became           These rules were never openly discussed by the
clear that content selection would make a difference      participants. Instead, they seem to be part of a set of
to the level of social interaction independent of         ingrained cultural practices dictating proper
other features of the experimental condition. This        behaviour when watching television in groups.
was indeed supported by our observations. For
instance, the documentary we later used during the        3.2.1. Interactional Practices When Watching
first phase of our experiments clearly generated less     Television in Groups
lively conversations than other genres (although          Critics of television often point out that the nature
interesting      sociable    exchanges       occurred     of television programs encourages passivity (Casey
nonetheless, as we illustrate later in this paper). To    et al., 2002). This passivity could be explained, at
make sure our participants had enough material and        least in part, by the fact that the television set tends
incentive     to    communicate,      we     therefore    to dominate most channels of communication (both
purposefully selected the most popular genres for         audio and visual) and, as such, it might not be
most of our experiments.                                  conducive to interactive exchanges between
                                                          audience members. However, our experiments
3.2. Empirical Observations                               revealed that television viewers are quite adept at
The most striking finding to emerge from our              communicating with each other during a show. To
observations was the surprising similarity in the         do so, they rely on a set of interactional practices
nature and structure of the participants’                 that allow them to simultaneously socialize with
conversations across the two experimental                 each other around the TV and to follow the ongoing
conditions. We identified clear interaction rules that    program with sufficient attention.
participants respected when they interacted during a
TV show, whether or not they were collocated.

      01   Voice:       in nineteen hundred it must have been (0.5)
      02                fantastic to be able to whiz down the roa:d
      03                (0.4)
      04   Voice:       on your bicycle.=
      05   Music:       =[((gets louder))
      06                 [((P1 begins to take a drink))
      07                (2.2)
      08                (1.2) ((P2 turns toward P1 and P3))
      09   P2:          °(       musta been    [    ) °
      10   Voice:                              [HE::Y I’M FREE
      11                (0.2)
      12   P3:          °yeah°
      13                (0.1)
      14   P3:          [°yeah°]
      15   P4:          [wha:t?]
      16                (0.8)
      17   P4:           [what ro:b?]
      18   Music:        [ Fade Out ]
      19                (0.5)
      20   P2:          well it really [mus]t have been ju[st=
      21   Video:                       [FO ]              [FI
      22   Music:                                          [FI
      23   P2:          =liber[ating]
      24   P4:                 [Oh:: ] yeah::
      25               (0.2)
      26   Voice:      but not everyone [(in the)] home can=
      27   P4:                           [ yeah:: ]
      28   Voice:      =enjoy their freedom...

        Transcript 1 – The participants’ interactions are carefully timed to exploit gaps in the show

To illustrate this phenomenon, let us consider the overlap with the TV dialogue (line 27) and the
transcript above. The excerpt is based on a three- participants then refrain from speaking as the new
minute long segment from one of our Phase 1 scene opens. Thus, the participants work hard to fit
sessions, during which seven co-located their conversation into the transition between
participants were viewing “The 1900 House”, a scenes.
historical documentary. The transcript is coded
using the conventions of Conversation Analysis Beyond the kind of exchanges illustrated above, we
(Sacks et al., 1974), emphasizing the sequential also observed that lulls in programming are used to
organization of turn-taking in social exchanges. The help newcomers, who may have arrived late in the
excerpt features four of the seven participants, program, catch up on what happened and is
identified as P1-P4 (their talk appears in bold). FI currently going on in a program. The composition
and FO indicate fade-in and fade-out in the video of the audience is often fluid during group
stream. At this moment of the show, the main television viewing, with participants constantly
protagonist (coded as “Voice” and in italics) leaving, arriving, or coming back. Our participants
describes her joy at being able to ride a bicycle. used gaps in the show to progressively bring their
co-viewers up to speed.
In the transcript, we can see how the viewers are
finely attuned to the structure of the show – a show Moreover, it is also important to note that, even
that none of them has watched before – by the way when the participants had access to a remote control
they insert their talk precisely in the gaps in allowing them to pause or stop the show at any time
dialogue and transitions between scenes. This is it was almost never used. Instead, the participants
made possible by the fact that such gaps are more preferred to let the show continue at its own pace
or less projectable, much like turns-at-talk (Sacks et and relied on their aforementioned ability to predict
al. 1974). There are several gaps in the TV gaps in order to communicate with each other. This
dialogue, but they are not all appropriate places to confirms earlier research showing that group
talk. For example, the 0.5 second silence at the end viewing creates a certain pressure “to just leave the
of line 01 occurs at a place at which the narrator’s clicker alone” (Walker and Bellamy, 2001, p.83).
utterance is hearably incomplete. In contrast, the Participants structure their interactions to avoid
0.4 second silence at line 03 is a possible using technological control over the show’s pacing
completion point, although the actual completion and structure.
point occurs at the end of line 04. Now at the end of
line 04, there is a cinematic cue that projects an Although viewers tend to try to fit their talk within
upcoming scene transition: the music gets louder gaps in the program, they are not constrained to do
(line 05). P1 orients to this projected transition so and in fact sometimes choose not to. An example
nonverbally by beginning to take a drink at of such overlap is illustrated in the following case
precisely this point. (Transcript 2, next page). In this excerpt, from
Phase 2 of our study (three viewers in room A and
However, it is not until after an additional 2.2 two in room B), the viewers watched The Real
seconds of silence (line 07) in the TV dialogue that World, a reality-TV show.
P2 turns toward P1 and P3 and makes a quiet
comment about the show (line 09). Although this In the first part of this excerpt (line 05), A2 makes a
would seem an ideal place to talk, it actually joke about the car that one of the Real World
overlaps slightly with the final utterance in the roommates is being driven to the airport in by her
scene (line 10) that appears as a surprise. P2 then father. The joke is precisely timed to fit in a gap in
receives two quiet agreement tokens from P3 (lines the TV conversation, like the conversation in
12 and 14), short responses that minimize the Transcript 1. However, because the joke is not
length of the conversational sequence. But in placed near a scene transition, there is no gap for a
overlap with the second, P2 also receives a request recipient to respond without overlapping with the
for a repeat from P4 (lines 15 and 17) who is across program. Thus, in overlap with the next utterance
the room and who apparently could not hear P2’s on the TV, B1 responds nonverbally with a smile
comment. P4 thus expands the sequence. As P2 (line 08) that A2, in the other room, cannot see; A3
repeats his comment louder and directs it to P4, the responds with a short utterance that overlaps with
scene fades to black and then fades back in (line 21) the TV only minimally (line 09); and B2 responds
right in the middle of his turn. P4 then marks her with a longer joke (suggesting the backwardness of
recognition of the repeat with “Oh::” in overlap Alabama) that occurs entirely in overlap with the
with his turn and produces a minimal agreement TV (line 11). Thus, without a ready gap for a
token (line 24) just before the dialogue of the next response, the participants nonetheless choose to
scene begins (line 26). She then repeats the token in interrupt the TV dialogue to respond to A2.

01    TV:     =an if you jus' stay in your little neighbuhood there
      02             an it looks tuh me like everythang would be:
      03             (0.5)
      04             work out gr[eat.
      05    A2:                   [they still make cars like that?
      06            (0.1)
      07    TV:     {what ti:me} do you think [would be good] ti:mes=
      08            {B1 smiles }                      [                   ]
      09    A3:                                       [ not anymore ]
      10    TV:     =f[or you tuh [ca:ll ]me eleven uh] clock?
      11    B2:       [    in     al[abama?]          yeah.     ]
      12    A1:                      [ hhuh ]
      13            (0.3) ((B2 smiles))
      14    TV:     .hh (0.1) i- (0.3) in the mornin' an eleven o'clock at night?
      15            (1.2) ((scene transition))
      16            (0.9) ((TV: voice over PA system))
      17    TV:     tell me bah!
      18            (0.9)
      19    TV:     °hm not like ol[(                                                   ]
      20    A3:                        [i ain't ne{ver been on the air ee oh pla:ne.]
      21    B1:                                       {NO WONDER SHE WANTED TO GO TO NEW]=
      22                                              {A2 turns head toward A3 & smiles
      23    TV:            )]}(oh my god)°
      24    B1:      =YORK!]}
      25                      }
                            Transcript 2 – Competing for opportunities to comment

Then at line 15 a scene transition occurs, and the       room to hear. Transcript 3 offers an example of
Real World roommate is shown at the airport              such a this-side comment that develops into an
saying goodbye to her aunt and her baby niece, to        extended conversation.
whom she says, “tell me bah!” or rather “bye” with
a southern accent (line 17). Two of the participants     Transcript 3 (next page), also from The Real
then treat this as an opportunity to make additional     World, begins during a scene transition in which an
jokes. In complete overlap with the TV dialogue,         image of the World Trade Center in New York
A3 makes a joke picking up the earlier                   briefly appears. After 2 seconds but still during the
backwardness theme, and mocks her southern               scene transition, A2 quietly comments that it is a
accent (line 20). Just after A3 begins his joke, B1      “bizarre sight” post 9/11 (line 03). She produces the
comes in with another joke (lines 21 and 24) in          comments so quietly that only A1, who is sitting
overlap with A3 and the TV, suggesting perhaps           next to her on the couch, can hear it, but not loud
that the girl’s family situation is driving her out of   enough for the viewers in room B to hear. After 4
her small town to the big city. B1 not only produces     seconds, A1 expands the topic by quietly
her joke in overlap with A3’s but produces it with a     commenting on whether such as scene might get
loud voice (indicated by caps), to be heard over         edited out (line 05), and A2 quietly produces an
him. Thus we see the viewers doing a couple of           agreement token (line 08). At this point, the
things: prioritizing joking about the program over       program is still transitioning to the next scene with
hearing parts of the program itself and competing        a montage of images from the city that are slowly
for limited opportunities immediately following          narrowing in on the Real World apartment. The TV
events in the program to comment on them.                dialogue has not yet resumed. After waiting another
                                                         2.1 seconds, A2 expands the topic further by
In addition to talking around and over the TV            mentioning that an image of the World Trade
program, viewers also used another practice for          Center was edited out of another TV show (lines
talking during a TV program in a group setting. In       10-12). After another 2.4 seconds, A1 adds a
Phase 2, the viewers occasionally directed               different example of a movie preview in which the
comments only to a subset of their fellow viewers,       World Trade Center was removed. As he is
usually for only those in the same room, by              describing the scene (lines 19-28), the TV dialogue
producing them too quietly for those in the other        finally begins (line 23). He completes his

description (lines 24-28), which includes a hand         viewing audience. Because the verbal part of A1’s
gesture depicting the Twin Towers, in overlap with       description (lines 24 and 27) is spoken softly, it
the TV dialogue, and then they drop the topic to         doesn’t disrupt the ability of the room B viewers to
listen to the program.                                   hear the show. In addition to minimizing disruption
                                                         through the loudness of their talk, A1 and A2 also
This entire conversation is produced by A1 and A2        do so by placing it in a scene transition. They
at a low volume and thus only for their room. A1’s       appear to expand the topic cautiously by pausing
descriptive hand gesture further shows that he was       for somewhat long periods after each expansion
designing his turns only for A2. Such this-side          (lines 04, 09 and 14) in anticipation of the
comments and conversations minimize disruption of        resumption of TV dialogue.
the TV program by engaging only a subset of the

     01            ((TV: 3-second image of World Trade Center, New York))
     02            (2.0)
     03    A2:     °that's a bizarre sight°
     04            (4.0)
     05    A1:     °i wonder if they would cut {something like that out°}
     06                                         { A2 turns to look at A1 }
     07            (0.7)
     08    A2:     °°yeahhh°°
     09            (2.1)
     10    A2:     °.hhh they {took it out of the sopranos.}°
     11                        {A2 turns to look at A1 }
     12    A2      °°didn't they. they took it from thee {uh::°°
     13                                                   {A2 turns back to TV
     14            (2.4)
     15    A1:     °i remember there was uh: °
     16            (1.2)
     17    A1:     °there was a preview for that spiderman movie°
     18            (0.5) ((A1 & A2 look at each other))
     19    A1:      {that had
     20    A1:      {raises
     21            (0.1)}
     22    A1:     hands}
     23    TV:     [{hello: / hi mo::m / how are you? /
     24    A1:     [{°the twin towers on it}{and they- they removed it}
     25    A1:      { traces Twin Towers }{ swirls hands around }
     26    TV:      fi:ne,    how   are     you: ]
     27    A1:     {and did some other cg stuff°}]=
     28    A1:     {R-hand chop & crosses hands }
     29    TV:     oka::y
     30            (0.2)
     31    TV:     are you going to...

                 Transcript 3 – Use of body orientation and whispers during side comments

Thus, in the preceding transcripts we see viewers        their fellow viewers, to compete for opportunities
using a variety of practices for coordinating their      to make time-sensitive comments.
talk with the dialogue in TV programs to avoid or
reduce disruption: fitting their talk in the gaps,       3.2.2. Typology of comments
using minimal responses, and producing quiet             While we described how the viewers in our
comments for a subset of their fellow viewers.           experiment coordinate their talk with TV programs
However, we saw also that viewers are by no              and each other in the previous section, in this
means constrained to avoid disruptions and will at       section we focus on content and offer a typology of
times talk over the TV dialogue, as well as over         the different kinds of comments viewers make.

Type of             Examples edited from data           Our attempt at categorizing comments reveals that
 Comment                                                 some exchanges are more disruptive to group
 Content-based       “I love the way Homer               television viewing than others. Non-sequitur, in
                     dances in this scene.”              particular, can negatively affect a group’s
 Context-based       “Didn’t Conan O’Brien write         experience since they often lead a small subset of
                     for the Simpsons?”                  participants to “talk over” the show about topics
                                                         that may not be relevant to a majority of the group.
 Logistical          “Could you turn up the
                                                         In contrast, phatic responses do not affect the flow
                     volume?”
                                                         of the program and do not need to be so finely
 Non-Sequitur        “Did you hear about the             timed. They also scale well with the number of
                     tornado in Germany?”                people: many participants can easily laugh (but not
 Phatic              “Whoa!”, laughter, gasps,           talk) at the same time.
                     groans, etc.
                                                         To summarize, our observations reveal that
   Table 1 – Typology of comments observed
                                                         interactions between television viewers are tightly
    during group television viewing sessions
                                                         interwoven with the structure of the show they are
                                                         watching. In other words, the television program is
At a general level, we can characterize the
                                                         both a resource and a constraint: it provides a rich
comments exchanged by our participants using five
                                                         conversational context that, to be properly
broad types(Table 1):
                                                         exploited, requires the audience members to “read”
• Content-based comments directly reference the          the upcoming structure of the show in order to time
  content that is on or recently shown on the screen     their comments. Reaching a “flow experience”
                                                         (Csikszentmihalyi, 1990) is important and viewers
  (as in the example in Transcript 1).
                                                         appeared reluctant to interrupt the show to
• Context-based comments are relevant to the show        communicate – the show itself has to be structured
  in its greater context, but perhaps not the specific   such that opportunities for communication exist.
  episode or moment that is being viewed.                Moreover, only certain kinds of conversations
  Examples are references to the actors, past            contribute positively to sociability and private side-
  episodes, show trivia, etc.                            conversations can “derail” the audience from the
• Non-sequitur comments are social exchanges             show. The clearest contributors to sociability are
  such as asking about one’s family, or talking          phatic utterances and any kind of comment related
  about events unrelated to the TV program. These        to the show that elicits only a quick back-and-forth
  are usually more common in groups who already          between a few of the audience members, such that
  have some social connection with each other.           the entire interaction sequence fits within the gaps
  They often take the form of side conversations         provided by the television program.
  (usually whispered, or at least toned down)
  between two participants and rarely follow the         3.2.3. Visual Interaction and Presence
  structure of the show.                                 During our observations, we also paid close
                                                         attention to the body movements of our
• Logistical comments are relevant to the television
                                                         participants. One of the most surprising moments in
  watching experience, but are independent of the
                                                         our experiments was during a soccer game when a
  programming. Tasks like changing channels,
                                                         player almost scored a goal. Out of excitement, one
  adjusting volume, etc. must be verbally
                                                         member of the audience jumped out of his seat
  communicated to the group so that whoever has
                                                         completely to react to the event, and crashed to the
  control of the set can respond.
                                                         floor. Surprisingly, no one in the room shifted their
• Phatic responses are almost involuntary reactions      focus from the TV to watch him. This illustrates
  from the audience like laughter, gasps, groans,        another important feature of group television
  “Whoa!”, etc. Nearly content-free (Schneider,          viewing, namely, that the interactions between the
  1998), phatic comments are not complicated             participants are visually peripheral: the televised
  interactions, but are vital to the social atmosphere   content remains the visual focus. In most cases, it
  in the room. Unlike other interactions which           looks as if participants talk to the television itself
  require turn-taking in order to make any sense,        instead of addressing any specific person by turning
  phatic responses have a “more the merrier” effect      their head. The sense of audience presence is
  where additional “audience participation” adds to      mostly conveyed via subtle cues in the room such
  the sociable atmosphere. Phatic responses also do      as gestures perceived at one’s visual periphery,
  not require a pre-existing social connection to        movement in the room, and environmental audio
  participate; both socially unfamiliar and socially     (sound of people shifting in their seats, putting
  gelled groups can share in a good laugh.               things on a coffee table, etc.).

Side conversations are a very different matter, each participant’s utterances and transmitting the
however (Transcript 3). They are most often held appropriate mix of social audio content to all
with nearby audience members, and so as to viewers, Social TV would act as a “clearing house”
establish a more personal connection, the facilitating distributed television viewing.
participants will turn their heads towards each other
to establish some visual contact (although as the
conversation tapers off, the visual connection
between the members deteriorates and the focus
returns to the television). As we mentioned earlier,
side conversations can negatively affect a group’s
experience but our observations of body
movements show that they can be easily detected
using the orientation of someone’s face. This
suggests potential avenues to dynamically move
sub-groups in and out of the main audience – we
discuss these possibilities and others in the next
section of this paper.

4. Designing for Sociability
We believe that the subtle aspects of group Figure 3 – A possible design for Social TV
television viewing we have just described can
powerfully shape the design of future social Based on the data obtained from our experiments,
television technology. Moving from observations to we believe such software would be particularly
design, we will now briefly describe some possible useful if designed to:
features that could be implemented to support and
encourage sociable, distributed television viewing. • Support the proper timing of social interaction
We want to emphasize that, while we plan to during group television viewing;
implement and test some of these features, our main • Minimize disruptions in the television
focus so far has been on observing group program’s flow;
behaviour, not prototyping. We hope, however, that • Isolate exchanges that are beneficial to the
the design possibilities we describe might inspire group from side conversations and non-sequitur;
future research in this domain. • Allow viewers to move in and out of the
audience smoothly;
Our experiments using two rooms show that groups • Avoid drawing viewers’ attention away from
can socialize remotely while watching TV using a the television screen.
simple, always-on audio channel. Indeed, as we
mentioned earlier, there were no major differences Following these guidelines, we are considering the
in behaviour whether or not our participants were features listed below for Social TV system
co-located. Therefore, this basic communication prototypes. It is important to note that, while we
infrastructure provided us with a foundation to start aim at simulating a conventional group television
from. experience for distributed viewers, some of the
features we describe will unavoidably change this
Currently, we envision Social TV as a experience. We have tried to minimize such
communication module that could be added to a disruptions and the amount of new skills required
PVR (e.g. TiVo), another piece of audio/video from the television viewers as much as possible.
equipment (e.g. a receiver), or maybe the television Note also that the list is far from exhaustive and we
set itself. It would allow viewers to establish believe there is much room for future work:
connections with as many of their remote friends as
they wish (probably using a mechanism similar to • A preview of the oncoming show structure (for
Chuah’s (2002) “buddy surfing”), opening up a instance, a curve moving up and down depending
shared audio channel between these locations on the amount of dialogue in upcoming scenes).
(using, for instance, voice-over-IP). Participants Audience members would be able to call up this
would communicate with each other simply by preview to “glance ahead” and accurately time
talking into a microphone that could be placed in when to start and wrap up their comments, so as
the room (Figure 3) or, alternatively, on a small not to interrupt the program. This would be
headset worn by each viewer. The main value of particularly useful since, in the absence of the
the technology, however, would be in the software subtle cues given off by co-located viewers (e.g.
available in each Social TV system. By processing heads turning), the show’s structure becomes the

only reliable indicator of when to start and stop a quickly (and silently) watching the synopsis
conversation. without disturbing other viewers. This would
• Allow for variable-rate video when the Social TV allow them to gain some shared background with
system senses social interaction. When the the other viewers, which they can later use to
audience keeps conversing close to the end of a participate in social interactions.
break, our system could slow down and
eventually pause the show if the conversation
appears to continue after the show has resumed.
This would prevent the ongoing television
programming from stifling more developed social
interaction. While this satisfies one design
constraint (“encourage interactions”) it does,
however, directly break another (“do not change
the flow of the program”). User testing would
allow us to explore whether or not the gain in
sociability offsets the loss of continuity in the
program.
Note also that such a feature offers the intriguing
possibility of adjusting the length of advertising Figure 4 – An unobtrusive interface to convey
breaks dynamically. Since ad breaks are often the presence of other viewers
used for conversations that have the potential to
last longer than the break itself, our system could We have argued that Social TV should be as
automatically keep adding more 30 seconds ads visually unobtrusive as possible, in order to keep
until the conversation is over. This opens up the the audience’s focus on the show. To convey a
door to alternative revenue streams and business subtle sense of audience presence, we argue in
models for content providers and broadcasters. favor of an interface building off of the visual
• Automatically isolate private side conversations. scheme of the television show “Mystery Science
When audience members are not co-located, we Theater 3000.” It uses a movie theater metaphor to
cannot rely on body orientation to detect who is visually indicate the presence of other characters as
talking to whom. Technology developed in our they are watching movies. Similarly, the interface
laboratory, however, leverages insights from the for the Social TV system could be a row of theater
same conversation analytic techniques we used seats at the bottom of the television screen. When
earlier (Transcript 1) to automatically detect and new members join in the Social TV session, their
isolate multiple, independent conversations in a shadows would appear in the seats along the bottom
shared audio space (Aoki et al., 2003). We plan of the screen. Whenever viewers want to judge how
to use such technology to facilitate interactions many other viewers there are, they can call up this
with Social TV. visualization which remains briefly superimposed
over the lower third of the television screen (Figure
• “What did they say?”: Often people want to
4). In other words, the interface only appears when
hear every word of content but miss parts of it.
a “presence” event occurs, such as joining viewers,
Asking others about what was just said disrupts
identifying speakers, or at the user’s explicit
the flow of the program. Instead, our system will
request. This is less distracting and uses far less
offer the possibility to turn on (temporarily and
screen space than full-bandwidth video showing
on demand) “delayed closed captioning,” so that
each remote viewer, as in Huijnen et al. (2004). Of
others can read what was just said without
course, the theater interface does not convey as
interrupting the flow of the content.
many social cues as full video, but Social TV
• “Catch-me-up”: Since breaks are often used for compensates for this absence with the features
bringing a new member of the audience up to described earlier.
speed, we plan to offer a “Catch Me up” button to
facilitate the process. The Social TV system will Moreover, this interface could also be useful for
automatically generate a one-minute visual selective sub-conversations. By selecting another
synopsis of the show, displaying the scenes and user’s shadow, one could activate a more private,
moments that caused the biggest reaction from secondary audio channel where the two could
the participants (in a first approximation, we will exchange comments that the main group may find
use the general audio level in the room as an uninteresting or distracting.
indicator of audience engagement with the show).
New participants will be able to join the group,

5. Future Work
5.1. Technical improvements We hope that all of the above will eventually lead
Future work on the project will involve us to deploying our technology in more natural
implementing the aforementioned design settings, as we discuss in our conclusion. We plan
suggestions in a working system, and later to provide families with Social TV systems so that
evaluating their impacts and usability. To conduct we can see how daily television use is affected by
further studies however, we first need to make sure the ability to share television viewing with remote
that the quality and stability of the audio between friends and family members.
non-collocated participants is enhanced. In our
phase 2 studies, the quality of the audio transmitted 5.2. From synchronous to asynchronous use
between rooms left much to be desired, since it was In the introduction, we pointed out that television
plagued by echo and feedback issues that required can foster two different kinds of sociability: direct
significant time and effort to be minimized. Such (when viewers get together and watch the same
time-consuming set-up was the cost of a quick and show at the same time) and indirect (when viewers
simple experimental configuration using off-the- exchange comments after the show has already
shelf components but would be unacceptable in real been seen by each, independently and at different
homes. In fact the problem might be worse in the times). While the former was the focus of this
context of home deployments, since audio would be paper, the latter is rich in design possibilities and, in
carried over the Internet and might suffer from the course of our studies, some possibilities for
greater latency than in our laboratory connection. asynchronous technologies emerged. In the interest
Providing a pleasant social audio experience in the of generating discussion in this space, we would
presence of loud and diverse program audio and like to briefly mention some possibilities offered by
latency is a challenge that requires attention to Social TV in the asynchronous realm.
signal processing.
It is possible to envision viewers running their
Beyond the transmission of audio between Social TV device even when they are watching a
locations, we also plan to investigate the effects of show alone. The device would still record and track
providing a more specific directionality to the their reactions to the show but, instead of sending it
social audio so that it is easier for users to discern to other viewers, it would store them for future use.
between the comments originating from remote As we mentioned earlier in our description of the
viewers and the audio coming directly from their “catch me up” feature, the reactions of the audience
television. In our initial set-up both audio streams are rich in meaning and often indicate important
were sent to the same set of speakers and our users moments in the show. Therefore, each individual
repeatedly pointed out that it could be quite viewer’s reaction could be used as a “tag” to mark
confusing. important moments in a program. This could then
be used by other viewers, at a different time, in the
The issue of controls is also something we have yet following ways (the list is, of course, not
to explore. We need to consider how our system exhaustive):
would resolve control requests being issued from
two (or more) distributed locations that receive a • “My friends’ laugh track”: television shows often
common video feed. One possibility is a virtual rely on edited laugh tracks to cue their audience
remote control that can only be used by a single about funny lines. However, these often feel
“host” at a time. The host would control the flow of highly artificial. Using the aforementioned tags, a
the show (e.g. pausing) while other controls (e.g. solitary viewer could instead hear laughter only
volume, brightness) would remain local. A marker when his/her friends laughed when they watched
on the host’s shadow (Figure 4) would indicate who the show earlier. The laugh track would sound
is holding the remote, and other users could issue much more natural and, moreover, it would be
request to hand it over to them. This would further placed at the most appropriate moment – humor
imitate the co-located viewing situation. But unlike differs across viewers and it is probably better to
co-located viewing, any participant should be able rely on one’s friends’ reactions than the editing
to wrestle the control back from any other viewer, of the show’s producers.
in order to avoid some of the abusive behaviours • “My friends’ commentary”: commentary from
described in previous research (Walker and movie directors and others is a popular feature on
Bellamy, 2001): “compulsive grazers” can be quite DVDs. Social TV suggests the additional
disruptive to a group’s experience, and the possibility of recording a commentary about any
aforementioned research even describes users who show and sharing it with friends. A viewer would
claim to gain gratification from annoying others by simply talk over the show and the voice stream
controlling their viewing! would be recorded by the device. Each of his/her

friends, when viewing the same show at a later participants most often “talked to the television”
date, would have the option of turning on the when exchanging comments with each other, and
commentary track he/she produced. In fact, this changes in body orientation were rare and most
feature need not be limited to friends: viewers often connoted private side-conversations. This,
could offer their commentaries on the Web for however, could be an artefact of our laboratory
anybody to download. This would emulate the setting. Studies of domestic life (e.g. Hughes et al.,
popular “user reviews” offered on Web sites such 2000) have shown that home dwellers can be very
as Amazon.com, but this time in the audio realm mobile, conducting a variety of activities while
and with the added advantage of being moving about the house (e.g. alternating between
synchronized to a video stream. television watching and trips to the kitchen to
watch over a cooking dish). Television might also
6. Conclusion not be the exclusive focus of attention, with other
While counter-intuitive to many, watching tasks and devices asking for simultaneous attention
television can be a very sociable activity. In this (e.g. taking care of a child and watching television
paper, we have explored how groups of television only peripherally). If these were the dominant
viewers interact with each other in front of the TV patterns of use, Social TV would need to find a way
set. It became clear that television-mediated to “follow its users around the house”, so to speak.
sociability is governed by a set of cultural practices This would imply decoupling the device from static
and interaction rules, which apparently evolved A/V equipment and integrating it instead with
such that joint viewers can simultaneously enjoy another, more mobile platform (e.g. a headset, as
each other’s company and preserve the structure mentioned earlier, maybe connected to a mobile
and pacing of the show they are watching together. phone equipped with the Social TV software). This
way, users would be able to communicate about an
However, the pressures of daily life make joint ongoing show whether or not they are facing the
television viewing increasingly difficult. To counter television.
this trend, we suggested design avenues for a Social
TV prototype inspired by our studies. We envision It is clear much work remains to be done to
a system integrated into standard audio-video redesign television and make sure it continues to
equipment (e.g. a TV set, a PVR) and allowing play a role in the ways people socialize and interact
geographically-distributed viewers to communicate with each other. We hope this paper will help
with each other using an open audio channel. Social stimulate future research in this domain.
TV would facilitate distributed, sociable television
viewing by processing each participant’s comments Acknowledgements
and ensuring that they fit within the interaction The authors express their gratitude to the three
rules we inferred from our empirical observations. anonymous reviewers who provided insightful
comments and ideas that have been incorporated in
While we strongly believe in the value of this the paper. We also thank our study participants for
approach, it is important to note that our studies are time and enthusiasm.
not without limitations. First and foremost, our
observations were conducted in a laboratory setting. References
While we tried as much as possible to emulate a Alexander, A. (1990). Television and family
real living room, the environment remains by interaction. In J. Bryant (Ed.), Television and the
nature somewhat artificial. Moreover, the American family (pp. 211-225). Hillsdale, NJ:
participants were mostly recruited individually, Lawrence Erlbaum Associates.
without any attempt at reproducing the structure of
social groups that most often watch TV together Aoki, P.M., Romaine, M., Szymanski, M.H.,
(e.g. groups of friends, family members). It could Thornton, J.D., Wilson, D. and Woodruff, A.
be that such prior relationships greatly affect (2003). The Mad Hatter's Cocktail Party: A Mobile
interaction patterns in front of the television, but Social Audio Space Supporting Multiple
our data cannot shed light on this issue. More Simultaneous Conversations. Proc. ACM SIGCHI
studies in “natural” environments (e.g. a Conf. on Human Factors in Computing Systems
participant’s own home) would certainly be (CHI '03), Ft. Lauderdale, FL, Apr. 2003, 425-432.
enlightening. Benford, S., Greenhalgh, C., Brown, C., Walker,
G., Regan, T., Rea, P., Morphett, J., & Wyver, J.
The above could be most important when (1998). Experiments in inhabited TV. In
considering the form-factor of future Social TV Proceedings of CHI98 (pp. 289-290). New York,
systems and, in particular, the input devices it NY: ACM.
offers to its users. During our observations,

Casey, B., Casey, N., Calvert, B., French, L., &       Oldenburg, R. (1989). The great good place. New
Lewis, J. (2002). Television studies: The key          York, NY: Marlowe & Company.
concepts. London: Routledge.                           Putnam, R. (2000). Bowling alone: the collapse and
Chuah, M. (2002). Reality instant messenger. In        revival of American community. New York: Simon
Proceedings of the 2nd          Workshop on            & Schuster.
Personalization in Future TV (TV02). Malaga,           Sacks, H. (1992). Lectures on Conversation.
Spain.                                                 Oxford: Basil Blackwell.
Coppens, T., Handekyn, K. and Vanparijs, F.            Sacks, H., Schegloff, Emanuel A., and Jefferson, G.
(2005). Amigo TV, Alcatel White Paper, available       (1974) A simplest systematics for the organization
at http://tinyurl.com/9h68d                            of turn-taking for conversation. Language, 50, pp.
Crabb, P. B., & Goldstein, J. H. (1991). The social    696-735.
psychology of watching sports: From Ilium to           Schneider, K. (1998). Small Talk: Analysing Phatic
living room. In J. Bryant & D. Zillmann (Eds.),        Discourse. Marburg: Hitzeroth.
Responding to the screen: reception and reaction
processes (pp. 355-365). Hillsdale, NJ: Lawrence       Schwartz, J. (2004). Leisure pursuits of today’s
Erlbaum Associates.                                    young man. The New York Times, March 29, 2004.
                                                       http://tinyurl.com/csztu
Csikszentmihalyi,   M.    (1990).   Flow.    Harper
Perennial.                                             Stone, G. P. (1981). Sport as community
                                                       representation. In G. Luschen & G. Sage (Eds.),
Glaser, B. G. (1998). Doing grounded theory:           Handbook of social science of sport (pp. 214-245).
issues and discussions. Mill Valley, CA: Sociology     Champagne, IL: Stipes.
Press.
                                                       Subrahmanyam, K., Kraut, R., Greenfield, P., &
Hughes, J., O'Brien, J., Rodden, T., Rouncefield,      Gross, E. (2002). New forms of electronic media.
M. and Viller, S. (2000). Patterns of home life:       In D. G. Singer & J. L. Singer (Eds.), Handbook of
informing design for domestic environments,            Children and the Media (pp. 73-99). London: Sage.
Personal Technologies, 4 (1) : 25-38.
                                                       Sutton-Smith, Brian. (2001). The ambiguity of play.
Huijnen, C., IJsselsteijn, W., Markopoulos, P., & de   Harvard University Press.
Ruyter, B. (2004). Social presence and group
attraction: exploring the effects of awareness         Walker, J. R., & Bellamy, R. V. (2001). Remote
systems in the home. Cogn Tech Work, 6, 41-44.         control devices and family viewing. In J. Bryant &
                                                       J. A. Bryant (Eds.), Television and the American
Lull, J. (1990). Inside family viewing: Ethnographic   family, second edition (pp. 75-89). Mahwah, NJ:
research on television's audiences. London:            Lawrence Erlbaum Associates.
Routledge.
                                                       White, E. S. (1986). Interpersonal bias in television
Morrison, M. (2001). A look at mass and computer       and interactive media. In G. Gumpert & R. Cathcart
mediated technologies: understanding the roles of      (Eds.), Inter/Media: Interpersonal Communication
television and computers in the home. Journal of       in a Media World (pp. 110-120). Oxford: Oxford
Broadcasting and Electronic Media, 45(1), 135-         University Press.
161.

You can also read