Valuing New Goods in a Model with Complementarity: Online Newspapers

Page created by Kevin Miranda
 
CONTINUE READING
Valuing New Goods in a Model with Complementarity:
                        Online Newspapers

                                               By MATTHEW GENTZKOW*

          Many important economic questions hinge on the extent to which new goods either
          crowd out or complement consumption of existing products. Recent methods for
          studying new goods rule out complementarity by assumption, so their applicability
          to these questions has been limited. I develop a new model that relaxes this
          restriction, and use it to study competition between print and online newspapers.
          Using new micro data from Washington, DC, I estimate the relationship between the
          print and online papers in demand, the welfare impact of the online paper’s
          introduction, and the expected impact of charging positive online prices. (JEL C25,
          L11, L82)

   The effect of new goods on demand for ex-                            cited Business Week article anticipated that
isting products is often uncertain. Convinced                           computers would create a “paperless office.”
that radio broadcasts were crowding out music                           Instead, the spread of information technology
sales, record companies in the 1920s waged a                            has sharply increased consumption of paper
series of court battles demanding high royalties                        (Abigail J. Selen and Richard H. R. Harper
for songs, leading some networks to stop play-                          2002). Debate continues in the economics liter-
ing major-label music altogether (Christopher                           ature about the relationships between free file-
H. Sterling and John M. Kittross 2001, 214;                             sharing services and recorded music (Alejandro
Paul Starr 2004, 339). It soon became apparent,                         Zentner 2003; David Blackburn 2004; Felix
however, that radio airplay dramatically in-                            Oberholzer and Koleman Strumpf 2007; Rafael
creased record sales, and by the 1950s record                           Rob and Joel Waldfogel 2004), file-sharing ser-
companies were paying large bribes to get their                         vices and live concerts (Julie Holland Mortimer
songs onto disk jockeys’ playlists (Sterling and                        and Alan Sorensen 2005), public and private
Kittross 2001, 294).1 More recently, a much-                            broadcast channels (Steven Berry and Waldfo-
                                                                        gel 1999; Andrea Prat and David Stromberg
                                                                        2005), and online and offline retailing (Austan
   * Graduate School of Business, University of Chicago,                Goolsbee 2001; Todd Sinai and Waldfogel
5807 South Woodlawn Avenue, Chicago, IL 60637 (e-mail:                  2004).
gentzkow@chicagogsb.edu). I would like to offer special
thanks to Bob Cohen and Jim Collins of Scarborough Re-
                                                                           Measuring the impact of new goods in such
search for giving me access to the data for this study. I thank         settings is important for several reasons. First, it
two anonymous referees for insightful comments. I am also               directly affects firm decisions. A record compa-
grateful to John Asker, Richard Caves, Gary Chamberlain,                ny’s decision to start licensing music for sale
Karen Clay, Liran Einav, Gautam Gowrisankaran, Ulrich                   online, a publisher’s decision to sell the film
Kaiser, Larry Katz, Julie Mortimer, Jesse Shapiro, Andrei
Shleifer, Minjae Song, and especially to Ariel Pakes for                rights to a novel, a discount retailer’s decision
advice and encouragement. Jennifer Paniza provided out-                 to open a new line of more upscale stores, and
standing research assistance. I thank the Social Science                many other choices about entry, product posi-
Research Council and the Centel Foundation/Robert P. Re-                tioning, and pricing depend critically on the
uss Faculty Research Fund at the University of Chicago
Graduate School of Business for financial support.
                                                                        demand-side relationships between new and old
   1
     A similar example concerns the introduction of movies.             products. Estimating these relationships is thus
An 1894 article in Scribners predicted that the availability
of motion-picture and audio versions of novels would lead
to the disappearance of printed books (Octave Uzanne
1894). Today, film adaptations and novels are widely per-               companied by order-of-magnitude increases in sales of the
ceived to be complements, and film releases are often ac-               associated book (Kera Bolonik 2001).
                                                                  713
714                                      THE AMERICAN ECONOMIC REVIEW                                          JUNE 2007

important for both firms themselves and econ-                 ship between each pair of products to be
omists seeking to understand firm behavior.                   freely estimated from the data.
Second, new goods are a major component of                       Second, I apply the model to study the impact
increases in the standard of living, and their                of online newspapers, a good whose relation-
omission is a leading source of bias in standard              ship with affiliated print newspapers has been
price indices (Timothy F. Bresnahan and Robert                hotly debated.4 I estimate the model using new
J. Gordon 1997). Correcting these biases re-                  individual-level data on the print and online
quires accurate estimates of the effect of new                newspaper readership of consumers in Wash-
goods on consumer welfare, which cannot be                    ington, DC, and look at the interaction among
constructed without knowing the relevant de-                  the Washington Post, the Post’s online edition
mand elasticities. Finally, the degree of substi-             (the post.com), and the city’s competing daily
tutability between old and new products is an                 (the Washington Times). I then use the fitted
important input to many policy debates, includ-               model to ask whether the print and online news-
ing those surrounding cable price regulation                  papers are substitutes or complements, and how
(Goolsbee and Amil Petrin 2004), deregulation                 the introduction of online news has affected the
of local phone markets (Robert G. Harris and C.               welfare of consumers and newspaper firms. I
Jeffrey Kraft 1997), and the allowability of                  also address a question of immediate interest to
cross-media mergers (Federal Communications                   firms: how profits would change if they were to
Commission 2001).                                             charge positive prices for online content that is
   This paper has two goals. First, I extend                  currently free.
existing techniques for estimating the impact                    A central empirical challenge in evaluating
of new goods to allow for the possibility that                the impact of a new good is separating true
goods could be either substitutes or comple-                  substitutability or complementarity of goods
ments. Although a large recent literature stud-               from correlation in consumer preferences.
ies the effect of new goods,2 it has been built               Observing that frequent online readers are
on discrete-choice demand models whose                        also frequent print readers, that file sharers
starting assumption is that consumers choose                  buy more CDs, or that computer users con-
exactly one product from the set available.3                  sume large volumes of paper might be evi-
This means that all goods are restricted a                    dence that the products in question are
priori to be perfect substitutes at the individ-              complementary. It might also reflect the fact
ual level. Although this is a reasonable                      that unobservable tastes for the goods are
starting point for looking at demand for au-
tomobiles or satellite television, it makes
these techniques inappropriate for cases such                    4
                                                                   According to the Wall Street Journal, “Newspaper
as those described above, where the degree                    executives are increasingly debating whether free Web ac-
                                                              cess [to their papers’ content] is siphoning off readers from
of substitutability or complementarity among                  their print operations” (Mike Esterl, “New York Times Sets
products is a key parameter of interest. The                  an Online Fee,” Wall Street Journal, May 17, 2005). See
new discrete demand model I develop permits                   also Leslie Walker, “News Groups Wrestle with Online
consumers to choose multiple goods simulta-                   Fees,” Washington Post, May 26, 2005; Katharine Q.
neously and allows the demand-side relation-                  Seelye, “Can Papers End the Free Ride Online?” New York
                                                              Times, March 14, 2005; and Julia Angwin and Joseph T.
                                                              Hallinan, “Newspaper Circulation Continues Decline, Forc-
                                                              ing Tough Decisions,” Wall Street Journal, May 2, 2005.
    2
      See, for example, Jerry A. Hausman (1997) on the        Others have argued that an online edition need not crowd
effect of Apple Cinnamon Cheerios, Shane M. Greenstein        out its affiliated print edition and could even complement it
(1997) on the effect of PCs, Petrin (2002) on the effect of   (Rob Runnett 2001, 2002). The print-online relationship has
minivans, and Goolsbee and Petrin (2004) on the effect of     been central to the debate surrounding online pricing: “A
direct broadcast satellites.                                  big part of the motivation for newspapers to charge for their
    3
      Several existing papers do estimate discrete choice     online content is not the revenue it will generate, but the
models in which consumers can choose multiple goods. The      revenue it will save, by slowing the erosion of their print
model developed here differs by allowing goods to range       subscriptions” (Seelye 2005). The print-online relationship
freely from substitutes to complements, and also allowing a   also looms large in the debate about the long-run viability of
flexible form of unobserved consumer heterogeneity. Exist-    print newspapers (Dan Okrent 1999; Gates 2000; David
ing models and their relationship to the present model are    Henry, “Is Buffet too Quick to Write off Newspapers?” USA
discussed in detail below.                                    Today, May 4, 2000).
VOL. 97 NO. 3                        GENTZKOW: PRINT AND ONLINE NEWSPAPERS                                              715

correlated—for example, that some consum-                        make welfare statements, we also need to know
ers just have a greater taste for news or music                  how consumers trade off these utils of news con-
overall. In the first section below, I analyze                   sumption against dollars. This would be straight-
this identification problem in the context of a                  forward to estimate if we could observe how
simple two-good model. I show that the key                       demand responds to exogenous variation in prices.
elasticities are unidentified with data on con-                  I propose an alternative strategy that exploits in-
sumer choices and characteristics alone. I                       formation from the supply-side of the market and
then point out two natural sources of addi-                      is valid in the absence of price variation. It is
tional information that can aid identification.                  based on a simple observation: the less sensitive
The first is variables that can be excluded a                    consumers are to prices, the higher the price a
priori from the utility of one or more goods.                    profit-maximizing firm would set for its products.
In many settings, price is the obvious candi-                    Given observable data on marginal costs and ad-
date. The identification argument is also valid                  vertising revenue, and a model of the firm’s ob-
for nonprice variables, however, and so can                      jective function, I can therefore calculate the value
be applied where prices do not vary or where                     of the price elasticity that would equate the profit-
the variation is not exogenous. This is the                      maximizing price of the print newspaper with the
case in the newspaper market I study, where                      price we actually observe.6 This strategy depends
the price of the online paper is zero through-                   on strong assumptions about the form of the firm’s
out the sample. In the estimation, I exploit                     profit function, as well as the accuracy of the
variables, such as whether consumers have                        observed cost data. But sensitivity analysis con-
Internet access at work or a fast connection at                  firms that the qualitative conclusions are robust to
home, which shift the utility of the online                      reasonable alternative assumptions.
edition without affecting the utility of the                        The results show that properly accounting for
print edition.5 The second potential source                      consumer heterogeneity changes the conclu-
of identification is panel data. If correlated                   sions substantially. Both reduced-form OLS re-
unobservables such as taste for news are                         gressions and a structural model without
constant for a given consumer over time, ob-                     heterogeneity suggest that the print and online
serving repeated choices by the same con-                        editions of the Post are strong complements,
sumer can allow us to separate correlation and                   with the addition of the post.com to the market
complementarity. For example, a consumer                         increasing profits from the Post print edition by
who views the content of two papers as com-                      $10.5 million per year. In contrast, when I es-
plementary would tend to read both of them                       timate the full model with both observed and
on some days and neither on other days. A                        unobserved heterogeneity, I find that the print
news junkie who views the papers as substi-                      and online editions are significant substitutes. I
tutes, on the other hand, would also read both                   estimate that raising the price of the Post by
with high frequency, but would be more                           $.10 would increase post.com readership by
likely to read them on alternate days. In the                    about 2 percent, and that removing the post.com
application, I have data on which newspapers                     from the market entirely would increase read-
consumers read in the last 24 hours, and also                    ership of the Post by 27,000 readers per day, or
in the last five weekdays, a limited form of                     1.5 percent. The estimated $33.2 million of rev-
panel data I exploit in the estimation.                          enue generated by the post.com comes at a cost
   A further challenge is how to translate the util-             of about $5.5 million in lost Post readership.
ity estimates from the demand model into dollars.                For consumers, the online edition generated a
Intuitively, data on consumer choices (combined                  per-reader surplus of $.30 per day, implying a
with exclusion restrictions and panel data) allow                total welfare gain of $45 million per year.
us to estimate how consuming one good affects                       The model also informs the debate about the
the marginal utility of consuming another. To                    sustainability of free online content (see foot-
                                                                 note 4). I take two approaches to this question.

   5
     Zentner (2003) also uses broadband connections as a
                                                                    6
shifter of Internet use in studying the impact of file sharing        Howard Smith (2004) uses a related technique in study-
on music sales.                                                  ing consumer shopping behavior.
716                                         THE AMERICAN ECONOMIC REVIEW                                   JUNE 2007

The first is to assume that the Post Company                      fication I exploit constitutes an ideal natural ex-
may be setting the price of the online edition                    periment. Taken together, however, they provide a
suboptimally, and ask whether profits could be                    substantial improvement on the information avail-
increased by charging positive prices.7 I find                    able in the raw data, lead to sharply different
that, for the period under study, the optimal                     conclusions than would be obtained from naive
price is indeed positive, at $.20 per day, and that               analysis, and allow us to make progress in under-
the loss from charging the suboptimal price of                    standing a market where the lack of price variation
zero is about $8.8 million per year. The second                   limits the applicability of standard tools.
approach is to suppose that the zero price is                        The next section analyzes the general prob-
optimal and ask how large transactions costs                      lem of identifying substitution patterns in a
would have to be to rationalize it. I show that a                 discrete demand model with multiple choices,
zero price would be optimal for any transaction                   and provides a brief discussion of related discrete-
cost greater than or equal to $.13 per day. I also                choice methods. Section II introduces the data
show that because of growth in online advertis-                   and presents reduced-form results on the re-
ing demand, the gain to raising online prices                     lationship between print and online demand.
was virtually eliminated by 2004. This suggests                   Section III specifies the empirical model and
that the zero price may have been part of a                       estimation strategy, Section IV presents the
rational forward-looking strategy and is approx-                  results, and Section V concludes.
imately optimal today.
   Estimating a structural model of the newspa-                      I. Substitution Patterns and Identification
per market is not, of course, the only possible
approach to studying the impact of online news-                               A. An Illustrative Model
papers. I show below that valuable information
can be gleaned by looking at both time series of                     In this section, I use a simple example to ex-
aggregate newspaper circulation and reduced-                      amine identification of substitution patterns in a
form regressions using micro data.8 There are                     discrete-choice setting where consumers can
two major benefits to estimating the complete                     choose multiple goods. Suppose there are two
model, however.9 First, because the model is                      goods, labeled A and B, and that consumers can
derived from utility maximization, it takes on                    choose at most one unit of each. We observe the
all of the restrictions implied by consumer the-                  choices of a large population of consumers. For
ory. This means that the estimated parameters                     simplicity, I will not write the dependence of the
can be used to calculate welfare effects. It also                 model on observable characteristics, assuming
allows us to obtain meaningful answers to coun-                   that all the consumers in the data are ex ante
terfactual experiments, such as changing the                      identical from the econometrician’s point of view.
online price, that are outside the variation ob-                  The terms below can easily be rewritten as func-
served directly in the data. Second, the model                    tions of a vector of observables, and the identifi-
allows multiple forms of identification to be                     cation arguments interpreted as identification of
brought to bear and combined efficiently in a                     parameters conditional on this vector.
single estimate. None of the sources of identi-                      We can potentially measure three quantities:
                                                                  PA (the probability of choosing A but not B); PB
                                                                  (the probability of choosing B but not A); and
    7
      Note that the method for calculating the price elasticity   PAB (the probability of choosing both). The final
described above is based on the assumption that the price of      probability— choosing neither—is linearly de-
the print edition is set optimally. The alternative assump-       pendent so does not provide any additional
tions I entertain are then (a) that only the zero online price
is suboptimal and (b) that all prices are set optimally. These
                                                                  information.
assumptions are discussed in more detail below.                      The goal is to estimate the various own- and
    8
      In particular, linear instrumental variables alone pro-     cross-price elasticities. These may in turn be in-
vide strong evidence that the print and online papers are         puts into the analysis of the welfare from new
substitutes rather than complements (as the raw correlations      goods, the effect of a merger, or the change in
would suggest).
    9
      See also Nevo (2000) and Peter C. Reiss and Frank A.
                                                                  profits from offering a different mix of products.
Wolak (2005) for a general discussion of the advantages of           Denote the prices of discrete goods A and B
structural demand models.                                         by p A and p B . Income not spent on A or B is
VOL. 97 NO. 3                  GENTZKOW: PRINT AND ONLINE NEWSPAPERS                                    717

used to purchase a continuous composite               (2)
commodity. Utility from q units of this com-
modity is ␣ q which enters overall utility lin-
early. Denote the utility of consuming a
bundle r by u⬘r . A natural quantity to define is
                                                      PA ⫽     冕u
                                                                    I共uA ⱖ 0兲I共uA ⱖ uB 兲I共uA ⱖ uAB 兲 dF共u兲,

the double difference:

         ⌫ ⫽ 共u⬘A B ⫺ u⬘B 兲 ⫺ 共u⬘A ⫺ u⬘0 兲.

This is the discrete analogue of the cross-partial
                                                      PB ⫽     冕u
                                                                    I共uB ⱖ 0兲I共uB ⱖ uA 兲I共uB ⱖ uAB 兲 dF共u兲,

                                                               冕
of utility, and measures the extent to which the
added utility of consuming good A increases if
good B is consumed as well.                           P AB ⫽        I共uAB ⱖ 0兲I
   Normalizing utility by u⬘0, we can define:                   u

                                                               共uAB ⱖ uA 兲I共uAB ⱖ uB 兲 dF共u兲.
(1)              u 0 ⫽ 0,
                                                         A central focus of this paper will be estimat-
                 uA ⫽ ␦A ⫺ ␣pA ⫹ ␯A ,                 ing the degree of substitutability or complemen-
                                                      tarity among products. Throughout the analysis,
                 uB ⫽ ␦B ⫺ ␣pB ⫹ ␯B ,                 I will use the standard modern definition of
                                                      complements (substitutes): a negative (positive)
                u AB ⫽ u A ⫹ u B ⫹ ⌫.                 compensated cross-price elasticity of demand.
                                                      Note that the definition is not based directly on
Here, ur ⫽ u⬘r ⫺ u⬘0, ␦A and ␦B are mean utilities,   properties of the utility function (see Paul A.
and ␯A and ␯B represent unobservable variation        Samuelson 1974 for an extended discussion). I
in utility. I assume that ␦A, ␦B, and ⌫ are all       show in this section, however, that in the simple
constant across consumers. Note that these ex-        model with two goods there is an intuitive re-
pressions use the fact that the difference be-        lationship between complementarity and the
tween the utility from the composite commodity        sign of the interaction term, ⌫.
when good j is purchased (␣( y ⫺ pj)) and when           Denote expected demand per consumer for
neither good is purchased (␣y) is just ⫺␣pj.          goods A and B by QA ⫽ PA ⫹ PAB and QB ⫽
   To make the discussion concrete, I assume          PB ⫹ PAB. Because the quasilinear specification
the unobservables are distributed as                  of utility causes income to drop out, there are no
                                                      wealth effects. The elements of the Slutsky ma-

           冋 ␯␯ 册 ⬃ N冉 0, 冋 ␴1 ␴1 册冊 .
                                                      trix are then just the cross-derivatives of de-
                A
                                                      mand, and so by the standard definition:
                B

                                                      DEFINITION 1: Goods A and B are substi-
The normalization of one of the variance terms        tutes if ⭸QA/⭸pB ⬎ 0, independent if ⭸QA/⭸pB ⫽
to one is without loss of generality, since we can    0, and complements if ⭸QA/⭸pB ⬍ 0.
divide all utilities by a constant and not change
any of the choice probabilities. The normaliza-          Figure 1 shows demand for the goods as re-
tion of the other is purely to simplify exposition.   gions of (uA, uB) space. The first panel shows the
                                                      case of ⌫ ⫽ 0, the second panel shows the case of
                                                      ⌫ ⬎ 0, and the third panel shows the case of ⌫ ⬍
            B. Substitution Patterns                  0. To see how the model determines the cross-
                                                      price derivatives, observe first that increasing pB is
   Let F(u) be the distribution of u ⫽ (uA, uB,       equivalent to shifting probability mass downward.
uAB) implied by the assumptions above. Assum-         That is, for any point (a, b) in this space, it
ing consumers maximize utility, choice proba-         increases the probability that uB ⱕ b given that
bilities will be given by:                            uA ⫽ a.
718                                       THE AMERICAN ECONOMIC REVIEW                                             JUNE 2007

                    FIGURE 1. ILLUSTRATION    OF   SUBSTITUTION PATTERNS   IN A   MODEL   WITH   TWO GOODS
Notes: Figures show the regions of UA–UB space in which the consumer would choose the bundles A and B, B alone, A alone,
or neither good. The first panel shows the case where the interaction between the two goods in utility is zero, the second panel
the case where it is positive, and the third panel the case where it is negative.

  Consider the first panel. Increasing p B                           Next, consider the second panel. Increasing
causes marginal consumers such as m to                            pB causes consumers m and n to switch as
switch from buying the bundle AB to buying                        before. There will now be consumers such as o,
A alone. It also causes marginal consumers                        however, who will switch from buying the bun-
such as n to switch from buying B alone to                        dle AB to buying nothing. This means that the
buying neither good. Neither of these changes                     drop in PAB will be larger than the increase in
has any effect on the demand for good A,                          PA, and so ⭸QA/⭸pB ⬍ 0. In the case of ⌫ ⬎ 0,
however—the increase in P A is exactly offset                     therefore, the goods are complements.
by a decrease in P AB . This implies that when                       In the third panel, there are no consumers
⌫ ⫽ 0, the cross-derivatives of demand for the                    indifferent between buying AB and buying nei-
products will be ⭸Q A /⭸p B ⫽ 0, and they are                     ther good, but consumers such as o are indiffer-
therefore independent.                                            ent between buying A alone and buying B alone.
VOL. 97 NO. 3                   GENTZKOW: PRINT AND ONLINE NEWSPAPERS                                  719

Increasing pB causes them to switch from buy-         bility mass in the figure upward just as reducing
ing B to buying A, so that the increase in P A        pB would, and the effect of such a change on QA
is larger than the drop in P AB . We therefore        will be determined by ⌫.
find that ⌫ ⬍ 0 implies the goods must be                For clarity of exposition, this example was
substitutes.                                          restricted to the case of two goods. Gentzkow
   This discussion suggests the quite intuitive       (2005) shows how the intuition extends to the
result that the interaction term ⌫ is the key         multi-good case. The situation becomes more
parameter for determining the substitutability of     complex in a way analogous to standard (con-
goods in a multivariate discrete choice model.        tinuous) demand theory, but an intuitive link
Formally, we can substitute into the definition       between interaction terms in utility such as ⌫
of QA and take the derivative with respect to pB      and substitution patterns continues to hold.
to show that
                                                                   C. The Outside Option

(3)
      ⭸Q A
      ⭸p B
           ⫽    冕
                u
                    关I共uA ⫽ uB 兲I共⫺⌫ ⱖ uA , uB ⱖ 0兲
                                                         The interpretation of the outside good in this
                                                      setting is different from its interpretation in the
                                                      standard multinomial model. In the standard
                                                      case, the utility of consuming none of the
  ⫺ I共uA ⫹ uB ⫽ ⫺⌫兲I共uA ⱕ 0兲I共uB ⱕ 0兲兴 dF共u兲.         modeled goods—typically indexed as choice
                                                      zero—is implicitly maximized over all goods
The first term inside the integral represents         excluded from the model. If we are modeling
points on the dark diagonal line segment in the       demand for cars, for example, the utility of good
third panel of Figure 1, along which consumers        zero for consumer i would capture the utility of
are indifferent between buying A alone and B          that consumer’s best non-car transportation op-
alone. The second term represents points on the       tion. It would be the maximum of utility from
dark diagonal segment in the second panel,            taking the bus, riding the subway, walking, and
along which consumers are indifferent between         so forth.
the bundle AB and buying neither good.                   In a model where choosing multiple goods
   Inspection of equation (3) immediately im-         simultaneously is possible, on the other hand,
plies the following result.                           all choices in the model include such an implicit
                                                      maximization. In the newspaper application, the
PROPOSITION 1: Goods A and B are substi-              data do not include consumers’ consumption of
tutes if ⌫ ⬍ 0, independent if ⌫ ⫽ 0, and             many news sources, such as cable television,
complements if ⌫ ⬎ 0.                                 radio, Yahoo! news, and so forth. When a con-
                                                      sumer in the data is observed to have read the
   While I motivate this result in terms of the       Washington Post on a particular day, it may be
thought experiment of changing prices, the ap-        that the Washington Post was her only source of
plication below will be to a situation in which       news on that day, or it may be that she both read
the price of one product—the online paper—is          the Washington Post and watched half an hour
fixed at zero. This does not cause any problems       of CNN. What the econometrician observes is
in terms of Definition 1, since a price change        that the maximum utility of bundles that include
around zero is well defined. Furthermore, be-         only the Post is greater for this consumer than
cause utility is quasi-linear, the sign of the        the maximum utility of bundles that include any
cross-price derivatives will be the same as the       other combination of the observed goods.
cross-derivatives with respect to other compo-           One might ask how these unobserved goods
nents of utility. This means we could run             will affect the estimated substitution patterns. Sup-
through the same intuition from Figure 1 for a        pose, for example, that having watched CNN dra-
shift in nonprice dimensions of utility. Suppose,     matically reduces the marginal utility of reading
for example, that good A is a print paper and         the post.com (so that the two are never consumed
good B is an online paper. Increasing the utility     together) and dramatically increases the mar-
of B by improving connection speed or making          ginal utility of reading the Post print edition (so
the Internet available at work will shift proba-      that the two are always consumed together).
720                                           THE AMERICAN ECONOMIC REVIEW                                            JUNE 2007

Suppose, further, that reading the Post has no                                           D. Identification
effect on the marginal utility of reading the
post.com. From the discussion above, we                                  Under the assumptions made so far, the
know that if the Post and the post.com were                           model is not identified. There are three observ-
the only two goods in the market, they would                          able data points and five independent parame-
be independent in demand. If CNN is present                           ters: ␦A, ␦B, ⌫, ␣, and ␴.
but unobserved, however, we would never see                              The price coefficient ␣ is identified from
the Post and the post.com consumed together,                          choice data alone if and only if there is variation
and so would estimate that they are strong                            in prices. To see this, note that all predicted
substitutes.                                                          probabilities would be the same if we replace
   What is important to recognize is that the                         the parameters (␦A, ␦B, ␣) by (␦A ⫹ ␣pA, ␦B ⫹
model’s answer in both cases would be correct.                        ␣pB, 0). With two observed price vectors, on the
In a world without CNN, increasing the price of                       other hand, we gain three additional moments—
the Post would have no effect on demand for the                       any one of these would be sufficient to identify
post.com. In a world with CNN, on the other                           ␣ given the other parameters of the model.
hand, increasing the price of the Post would re-                         In situations where there is no usable varia-
duce consumption of both it and CNN, which in                         tion in prices, ␣ must be inferred by introducing
turn would increase consumption of the post.com.                      an additional moment from some other source.
The fact that the true substitutability of a pair of                  In the application below, this comes from one
products will depend on both their direct inter-                      firm’s first-order condition. Although only the
action in utility and their indirect interaction via                  sums ␦A ⫹ ␣pA and ␦B ⫹ ␣pB are identified
other goods in the market has long been recog-                        from demand data, there will be a unique ␣ such
nized in classical demand theory (Samuelson                           that the first-order condition is satisfied at the
1974; Masao Ogaki 1990). The data on con-                             observed price.
sumption of the Post and the post.com will                               The remaining issue is how to separately
allow us to estimate accurately their relation-                       identify the interaction term, ⌫, and the covari-
ship in demand, whether or not we have data on                        ance of the unobservables, ␴. Intuitively, the
consumption of other related goods. These es-                         mean utilities ␦A and ␦B will be identified by the
timates, however, will still be conditional on the                    marginal probabilities QA and QB. The remain-
set of alternative goods available in the market.                     ing moment in the data will be how often the
The estimates provide the correct quantity for                        goods are consumed together (whether PAB is
evaluating the effect of a price change on firm                       high relative to PA and PB). A high value of PAB
profits. The estimated response to removing the                       can be explained by either a high value of ⌫ or
post.com from the choice set will also be cor-                        a high value of ␴, and there is nothing left in the
rect. The effects could change, however, if the                       data to separate these.
prices or characteristics of important unob-                             Furthermore, Proposition 1 shows that this
served goods changed dramatically, and the                            leaves the substitution patterns in the model
data will of course allow us to say nothing about                     severely unidentified. Without some additional
the relationship between the observed and the                         information, the same data could be fit by as-
unobserved products. Note that these latter lim-                      suming that the goods are nearly perfect substi-
itations are shared by all discrete-choice de-                        tutes (⌫ ⬇ ⫺⬁ and ␴ high) or nearly perfect
mand models.10                                                        complements (⌫ ⬇ ⬁ and ␴ low). A model that
                                                                      “solves” the problem by imposing an ad hoc
    10
                                                                      restriction on one of these two parameters will
       A more subtle issue is how the correct functional form
of equation (1) will change in the presence of unobserved
third goods. Suppose, for example, that there are three
goods A, B, and C, but that only consumption of A and B is            information about the functional form u⬘A than we do about
observed. If the underlying utilities u⬘A, u⬘B, etc., are linear in   max{u⬘A, u⬘AC}. Also, it is equally true in standard discrete
price, the terms such as max{u⬘A, u⬘AC} that will actually be         choice models that the “true” functional form of utilities
estimated will be linear as well. Beyond this, however, there         changes in complex ways as we vary the set of outside
is no obvious relationship between the functional form of             goods. The question in the current setting as always is
utility with and without the implicit maximization over               whether the functional form is sufficiently flexible to cap-
consumption of C. Of course, we really have no more prior             ture the important variation in the data.
VOL. 97 NO. 3                      GENTZKOW: PRINT AND ONLINE NEWSPAPERS                                               721

be unlikely to provide a basis for reliable infer-            consumers over time, and an additional time-
ence about any quantity in which substitutabil-               varying component (␧A, ␧B), which is assumed
ity of the goods plays an important role.                     to be i.i.d. across products and time. In the
    There are, of course, many ways that more                 newspaper application, this model would
moments could be added to the data in order to                amount to assuming that unobserved correlation
identify the model. I will briefly discuss two                in the utilities of different papers is driven by
that seem likely to arise frequently in practice              consumer characteristics such as a general taste
and will play a key role in the application. I                for news that are constant over the course of a
assume that the necessary technical conditions                week, and that the additional shocks that lead
are satisfied such that the model is identified if            consumers to read on Monday but not Tuesday
and only if the number of moments is greater                  are uncorrelated.
than or equal to the number of parameters.                       Now, if we observe each consumer’s choice
    The first possible source of identification is            at two different points in time, we have in-
exclusion restrictions. Suppose, in particular,               creased the number of moments from 3 to 15.12
that there is some variable x which is allowed to             Under the assumption that (␯˜ A, ␯˜ B) is constant
enter the utility of one good, making the mean                over time, this is sufficient for formal identifi-
utility of good A, say, ␦A(x), but does not enter             cation of the model parameters, including the
either ␦B or ⌫. One obvious candidate is the                  full covariance matrix of the random effects.
price of good A. In the newspaper application                 Intuitively, the argument is just a variant of the
considered in this paper, there is no price vari-             usual one for the identification of random ef-
ation, but there are consumer specific observ-                fects from panel data. Suppose again that goods
ables such as having Internet access at work that             A and B are frequently consumed together. If
affect the utility of online but not print newspa-            this is the result of correlated random effects,
pers. Having observations at a second value of                we should see some consumers likely to con-
such an x (call this new vector x⬘) would add                 sume both and some consumers likely to con-
three new moments (PA(x⬘), PB(x⬘), and PAB(x⬘))               sume neither, but conditional on a consumer’s
but only one new parameter (␦A(x⬘)). The model                average propensity to consume each good, the
would therefore be formally identified.                       day-to-day variation should be uncorrelated
    Furthermore, the intuitive basis of the iden-             across goods. If it is the result of a high ⌫, on
tification is quite strong. Suppose, for example,             the other hand, the day-to-day variation should
that the goods are frequently consumed together               be strongly correlated—a given consumer
(PAB is high relative to PA and PB). If this is the           might consume both on one day and neither on
result of a high ⌫, the goods are complements,                another day but would be unlikely to consume
and shifting up the utility of good A by moving               either one alone.
x should also increase the probability of consum-                A special case that will be relevant to the
ing good B. If ⌫ is zero and the observed pattern             application below is one where the data are not
is the result of correlation, the probability of con-         a true panel but include observations on both a
suming good B should remain unchanged.11                      single day’s purchases and a summary of pur-
    The second possible source of identification              chases over a longer period of time. For exam-
is panel data. Extending the model slightly to                ple, suppose that consumers in the two-good
allow for repeated choices over time, assume                  model make choices on two consecutive days.
that the observables (␯A, ␯B) are made up of two              Suppose we observe the actual choice made on
components—a possibly correlated random ef-                   day 1, but not on day 2. We also observe two
fect term (␯˜ A, ␯˜ B), which is constant within              dummy variables dA and dB, where dj ⫽ 1 if
                                                              product j was chosen at least once over the two
                                                              days. This clearly contains less information than
   11
      Michael P. Keane (1992) presents Monte Carlo evi-
dence on the role of this kind of exclusion restriction in
                                                                 12
identifying the covariance parameters in a multinomial pro-         With observations at two points in time, the moments
bit model. Since a multinomial probit model defined over      would be the probability of each possible combination of
bundles effectively nests the model of equation (1), this     choices over the two periods. When there are 4 choices, this
evidence is relevant. He shows that including exclusion       gives 16 possible combinations. The number of moments is
restrictions greatly improves the accuracy of the model.      one less than this because the probabilities must sum to one.
722                                   THE AMERICAN ECONOMIC REVIEW                                         JUNE 2007

a true panel would—if both A and B are chosen             consumed alone. I will discuss what these data
on day 1, we will have dA ⫽ 1 and dB ⫽ 1                  would imply for several existing approaches in
regardless of the choice on day 2, and the data           the literature.
therefore provide no information on the day 2                One approach is the multiple-discrete
choice. On the other hand, if neither good was            choice model pioneered by Igal Hendel
chosen on day 1, dA and dB will tell us what was          (1999) and applied by Jean-Pierre Dubé
chosen on day 2 exactly.                                  (2004). These models assume that the data are
   Although this is a more limited form of                generated by an aggregation over a number of
information about choices over time, it can               individual choice problems, or “tasks.” For
still separately identify the covariance matrix           example, Hendel (1999) estimates demand for
of the random effects and thus distinguish                PCs by corporations. In this case, a task might
true complementarity from correlation. To see             represent a single employee’s computing
the intuition for this, consider, first, observa-         needs. Each agent chooses a single good for
tions on consumers who chose neither product              each task, which makes the task-level prob-
on day 1. The data will allow us to observe               lem analogous to equation (1) with ⌫ AB ⫽
exactly what these consumers chose on day 2.              ⫺⬁. 13 Because the utility from using a given
If the variance of ( ␯˜ A , ␯˜ B ) is small, condition-   good in one task does not depend on what
ing on the fact that they chose neither good on           goods were chosen for other tasks, aggregat-
the first day does not change their choice                ing over a large number of these tasks is
probabilities on day 2—we should expect the               similar to aggregating over a population of
latter to be exactly the same as the choice               heterogeneous consumers in a standard multi-
probabilities in the sample as a whole for day 1.         nomial discrete choice model. The model
If the variance of the random effects is large,           therefore restricts the goods to be substi-
on the other hand, the fact that these consum-            tutes.14
ers did not purchase on day 1 would predict                  A second approach is the multivariate probit
that they would also be less likely to purchase           (applied, for example, by Angelique Augereau,
on day 2. We can therefore think of these                 Shane Greenstein, and Marc Rysman forthcom-
consumers as identifying the variance of the              ing). Here, consumption of each good is as-
random effects. The correlation term will then            sumed to be driven by a separate probit
be identified by consumers who chose either               equation, with errors possibly correlated across
A or B, but not both, on day 1. For a consumer            equations. This is exactly equivalent to equation
who chose A only on day 1, we will see d B ⫽              (1) with ⌫AB ⫽ 0, and so restricts all goods to be
1 if and only if B was chosen on day 2. If the
random effects are strongly positively corre-
lated, observing a choice of A on day 1 suggests              13
that the consumer will be relatively more likely to              Both papers allow consumers to choose multiple units
                                                          of each good, so the task-level choice is more complicated
choose B on day 2. If they are negatively corre-          than a standard multinomial discrete choice problem. But
lated, such a consumer should be less likely to           the utility specification implies that consumers will choose
choose B on day 2.                                        at most one type of good for each task.
                                                              14
                                                                 A different parametric restriction on the ⌫ interaction
                                                          terms underlies the model of Tat Y. Chan (2006). He defines
                                                          goods to be a bundle of characteristics, and assumes that the
       E. Relationship to Past Literature                 utility of a bundle is a function of the sum of each charac-
                                                          teristic across the different goods. The bundle consisting of
   The model of equation (1) provides a useful            a bottle of Diet Coke and a bottle of Diet Pepsi, for example,
                                                          consists of two units of the characteristic “cola,” two units
starting point for understanding the existing ap-         of the characteristic “diet,” and one unit each of the char-
proaches in the literature to estimating discrete         acteristics “Coke” and “Pepsi.” Because utility is assumed
choices when multiple goods are chosen simul-             in the main specification to be concave in the total of each
taneously. To make the discussion concrete,               characteristic, it is subadditive across goods, meaning
suppose we have micro data on demand for two              ⌫AB ⬍ 0. This would again imply that the products must be
                                                          substitutes. (Chan does find complementarity among some
goods, A and B. Suppose that the frequency with           products in a specification with many goods, which appears
which the goods are consumed together is high             to result from indirect substitution effects as described
relative to the frequency with which either is            above.)
VOL. 97 NO. 3                        GENTZKOW: PRINT AND ONLINE NEWSPAPERS                                                   723

independent in demand (all cross-elasticities are                              II. A First Look at the Data
zero).15
   A third approach is to estimate a logit or
                                                                              A. The Scarborough Survey
nested logit model defined over the set of all
possible bundles. Papers that take this approach
include Charles F. Manski and Leonard Sher-                          The empirical analysis is based on a survey of
man (1980) and Kenneth E. Train, Daniel L.                        16,179 adults in the Washington, DC, Desig-
McFadden, and Moshe Ben-Akiva (1987). Be-                         nated Market Area (DMA), conducted between
cause each bundle’s utility is parameterized sep-                 March 2000 and February 2003 by Scarborough
arately, the ⌫AB term could be estimated freely                   Research. The Washington, DC, DMA includes
(although both of these papers restrict the inter-                the District of Columbia itself, as well as neigh-
actions as a parametric function of the goods’                    boring counties in Virginia, West Virginia,
characteristics). The unobservables, on the                       Pennsylvania, and Maryland. The data include a
other hand, are either assumed to be uncorre-                     range of individual and household characteris-
lated (in the case of the logit) or have a corre-                 tics of the respondents, as well as information
lation structure dictated by the nests, which is                  on various consumption decisions. Most impor-
too restrictive to allow the kind of correlation                  tantly for the current application, these include
implied by equation (1) with ␴ ⫽ 0. Given the                     an enumeration of all local print newspapers
hypothetical data, we would expect such a                         read over the last 24 hours and 5 weekdays, as
model to find ⌫AB ⬎ 0, implying that the goods                    well as readership of the major local online
would be complements.                                             newspapers over the same periods.
   The main difference between the current                           Washington, DC, has two major daily news-
framework and those that exist in the literature                  papers, the Washington Post and the Washing-
is thus a more flexible specification of the way                  ton Times. The former is dominant: average
goods interact in utility and the correlation of                  daily readership of the Post was 1.8 million in
unobservable tastes. The functional forms for                     2000 –2003, compared to 256,000 for the Times.
observable and unobservable utility that have                     The two papers also differ in their perceived
been used in the past impose strong restrictions                  political stance, with the Times generally
on substitution patterns: for a given set of ob-                  thought to be more conservative than the Post.
servations, one could choose models from the                      The main online newspaper is the post.com,
literature that would imply that the goods are                    which had an average of 406,000 area readers
strong substitutes, independent, or strong com-                   per day.16
plements. In certain settings, such assumptions                      I will define the goods in the model to be
will be justified, and making them has the ob-                    daily editions of the Post, the Times, and the
vious benefit of allowing the researcher to ana-                  post.com. The outside alternative will include
lyze larger choice sets than the one considered                   other print and online newspapers, other news
here. In other settings, the necessary prior infor-               sources such as television and radio, and the
mation is not available, and it will be critical to               choice not to consume news at all. As noted
allow a more flexible structure and address di-                   above, all choices in the model represent an
rectly how substitution patterns are identified by                implicit maximization over these outside
the data.                                                         goods—the observed choice to read the Post
                                                                  only, for example, includes consumers who
   15
      The discrete-continuous framework of Jaehwan Kim,
Greg M. Allenby, and Peter E. Rossi (2002) also assumes
the equivalent of ⌫AB ⫽ 0 (that the utility of a bundle is            16
                                                                         Readership figures are based on the Scarborough sur-
simply the sum of the utilities of the underlying goods). The     vey. Note that these readership numbers are larger than
conclusion that the goods must be independent does not            circulation figures for the same papers, reflecting the fact
hold here, however, because the utility of the outside com-       that multiple consumers read each copy. The Times also has
posite commodity is allowed to be concave rather than             an online edition, the washingtontimes.com, but its reader-
linear. This implies that all goods will be substitutes, though   ship is very small and there are only 373 readers in my
with a single curvature parameter governing all the cross-        sample. In practice, this turns out to be too few to accurately
elasticities as well as the elasticity of total expenditure on    estimate utility parameters for the washingtontimes.com,
the inside goods.                                                 and so I omit it from the analysis.
724                                       THE AMERICAN ECONOMIC REVIEW                                         JUNE 2007

             TABLE 1—SUMMARY STATISTICS

                     Scarborough Washington, DC, DMA
                       survey          (census)
N                      16,179              4,203,621
Median income          $62,500              $60,774
Black                   20.6%                23.5%
Hispanic                 6.4%                 7.9%
Female                  57.9%                52.1%
Age distribution:
  18–29                 17.5%                21.3%              FIGURE 2. READERSHIP OF NEWSPAPERS IN WASHINGTON, DC
  30–39                 22.6%                23.4%                                  (1961–present)
  40–49                 22.2%                21.7%              Notes: Scarborough Research Readership figures are de-
  50–59                 17.9%                15.8%              rived by using historical circulation data and the ratio of
  60⫹                   19.8%                17.7%              readership to circulation in the 2000 –2003 Scarborough
Highest schooling:                                              data.
  ⬍High school           7.7%                14.4%              Source: Audit Bureau of Circulations.
  High school           47.0%                42.1%
  College               27.2%                26.3%
  Graduate              18.0%                17.2%

Notes: The Scarborough survey is a randomized sample of                      B. Reduced-Form Results
residents of the Washington, DC, DMA 18 years of age and
older. All census figures refer to the population of individ-
uals 18 years of age and older, except percent black and           Figure 2 displays the daily readership of
Hispanic, which are proportions of all residents. Median        Washington, DC’s print and online newspapers
income is the population-weighted mean of the median            since 1961. The first thing to note is that the
incomes of counties in the Washington, DC, DMA.
                                                                rapid increase in post.com readership since its
                                                                introduction in 1996 has been accompanied by a
                                                                drop in Post readership. A simple OLS regres-
read both the Post and the New York Times, or                   sion of Post readership since 1984 on post.com
read the Post and watch TV news.                                readership and a time trend gives a significantly
   Table 1 gives summary statistics for the Scar-               negative coefficient, and suggests that it takes
borough data along with corresponding census                    four post.com readers to reduce Post readership
figures for the Washington, DC, DMA. The                        by one. Although it might be tempting to take
survey is approximately a 0.4 percent sample,                   this as direct evidence that the print and online
and is broadly representative, with some over-                  editions are substitutes, several factors make
representation of older, more educated, and                     such a conclusion dubious. First, the downward
more wealthy individuals, and some underrep-                    trend in Post readership begins in 1994, two
resentation of minorities. The survey includes                  years before the post.com was introduced, and it
sampling weights to correct for this overrepre-                 does not accelerate significantly thereafter. Sec-
sentation. I will use the unweighted data for                   ond, newspaper readership has been declining
estimation, and use weights when I simulate                     for many years nationally and there are many
aggregate effects.17                                            demand-side trends that could account for the
                                                                downward slide of the Post. Finally, the down-
                                                                ward trend in Post readership coincides with a
   17
      In addition to including weights, the raw Scarborough     series of increases in the Post’s subscription
data also correct for respondents who filled out an initial     price, and it would be difficult to separate these
questionnaire but not the longer survey by filling in a small
number of these consumers’ survey responses using the
                                                                price effects from the effect of the post.com
responses of other consumers matched by demographics.           using aggregate time series alone. For these
These “ascribed” observations are easy to identify because      reasons, getting a handle on the impact of the
the probability of two respondents with the same sampling       post.com will require bringing additional infor-
weight matching perfectly on all survey responses by ran-       mation to bear on the problem.
dom chance is very low. I omit these observations (about 6
percent of the initial sample) in all estimation, but include      Figure 2 also provides evidence about the
them in the policy simulations in order to get the correct      extent of substitutability among different print
match to aggregate demographics.                                papers. The exit of the Washington Star in 1981
VOL. 97 NO. 3                     GENTZKOW: PRINT AND ONLINE NEWSPAPERS                                           725

  TABLE 2—CROSS TABULATION OF POST        AND POST.COM     ing values in the survey.18 Readership of the
                  READERSHIP                               Post and post.com are significantly positively
                                                           correlated over both 24-hour and 5-day win-
24-hour:           Didn’t read post.com    Read post.com
                                                           dows. Controlling for observable characteristics
Didn’t read Post          8,771                 622        reduces this correlation by about two-thirds, but
Read Post                 5,829                 877
                                                           it remains significant at the 0.1 percent level.
5-day:             Didn’t read post.com    Read post.com   The correlation between readership of the Post
Didn’t read Post          6,012                  680       and the Times is also significantly positive in the
Read Post                 7,203                2,204       raw data, but this disappears when controls are
                                                           added. The partial correlation is zero over a
                                                           24-hour window, and significantly negative
                                                           over a 5-day window. The correlation between
                                                           the Times and the post.com is never signifi-
and the Washington News in 1973 led to in-                 cantly different from zero.
creases in the readership of the remaining pa-                What can we conclude from these results?
pers, suggesting some substitutability. In both            The basic fact in the raw data is that a con-
cases, however, the exit led to declines in total          sumer who reads any one paper is on average
readership, and fewer than half of the readers of          more likely to have also read a second paper.
the exiting paper appear to have switched to one           If all heterogeneity in utilities were uncorre-
of the remaining papers. In terms of the Post              lated across papers, this would be strong ev-
and the Times, the time-series provides no evi-            idence that all three are complements. An
dence of a negative relationship. A linear re-             alternative explanation is that the kind of con-
gression of Post readership on Times readership            sumers who get a lot of value from reading
actually gives a positive coefficient (though in-          the Post also get a lot of value from reading
significant), even when a time trend is included.          the post.com and the Times. The fact that the
Of course, these regressions do not distinguish            positive correlation decreases dramatically
substitutability from changes in demand or                 when we partial out the effect of observables
characteristics of the products over time.                 provides direct evidence for this. The ques-
   Turning to the Scarborough micro data, the              tion is whether the remaining correlation—in
first thing to note is that readership of multiple         particular the positive correlation between the
papers is common. Forty-eight percent of con-              Post and the post.com—represents true
sumers reported reading at least one of the Post,          complementarity or additional correlation in
Times, or post.com in the last 24 hours. Of these          tastes which is unobserved.
consumers, 18 percent reported reading two of                 To separate these stories, I will exploit
the papers, and 1 percent reported reading all             variables that should have a strong effect on
three. Over a five-day window, 65 percent of               the utility of reading the online newspaper,
consumers read at least one of the papers; of              but should have no direct effect on the utility
these, 27 percent read two papers and 3 percent            from reading in print. First, I include a
read all three. Table 2 reports the number of              dummy variable measuring whether the con-
consumers reading the Post and the post.com                sumer has Internet access at work. Being able
over 24-hour and 5-day windows. It is immedi-              to access the Internet at work clearly reduces
ately clear from this table that combined read-            the time cost of reading online, but should not
ership of print and online news is common. In              directly affect the utility from reading in
fact, the fraction of online readers who read              print. Second, I include two dummy variables
print is higher than the fraction of those who do          indicating whether the consumer uses the In-
not read online.                                           ternet for either work-related or education-
   Table 3 reports raw and partial correlation             related tasks. Performing these tasks should
coefficients for each pair of papers. The partial
correlations control for age, sex, education, in-
dustry of employment, employment status, in-                  18
                                                                 These correlations drop consumers for whom either
come, political party, date of survey, location of         print or online papers were excluded from the choice set as
residence within the DMA, and number of miss-              discussed in the demand specification below.
726                                     THE AMERICAN ECONOMIC REVIEW                                              JUNE 2007

                                          TABLE 3—CORRELATION COEFFICIENTS

                                                   24-hour                                  5-day
                                          Raw                 Partial             Raw                Partial
              Post-post.com             0.0989**             0.0364**          0.1579**              0.0673**
              Post-Times                0.0632**             0.0035            0.0450**             ⫺0.0623**
              Times-post.com            0.0146               0.0090            0.0184                0.0066

              Notes: The table displays correlation coefficients between dummy variables for reading the
              Post, post.com, and Times. In the first two columns, the variable is equal to one if a respondent
              read in the last 24 hours. In the second two columns, the variable is equal to one if a
              respondent read in the last five weekdays. Partial correlations are correlations in the residuals
              from regressions of each consumption dummy on controls for age, sex, education (four
              categories), white-collar worker, computer worker, employment status, income, political
              party, date of survey, location of residence within the DMA (six categories), and dummy
              variables for the number of missing values. Observations where either print or online
              newspapers were not in the choice set (consumer reports that she generally reads no
              newspaper sections or did not use the Internet in the last 30 days) were dropped.
                 ** Significant at 1 percent.

lead consumers to be more familiar with the                       zero and one, it restricts the cross-derivative be-
Internet and spend more time at their comput-                     tween print and online to be the same for all
ers, both of which should decrease the effec-                     consumers, and it does not use information from
tive cost of reading news online, but not                         the other choice equations. It shows in an intuitive
directly affect the utility of print reading.                     way, however, how the exclusion restrictions con-
Finally, I include a dummy variable indicat-                      tribute to identification.
ing whether the consumer has a high-speed                            The first column of Table 4 shows estimates
Internet connection at home. This, too, should                    from a linear probability model of readership over
increase the utility from reading online with-                    the five-day window, using the same controls as in
out directly affecting the utility from reading                   the partial correlations above. Reflecting the pos-
in print.                                                         itive correlation noted earlier, the first column
   Note that an important limitation of the data                  shows that reading the post.com is positive and
is that I do not have variables that could be                     significant in an OLS regression. The second col-
assumed to shift the utility of the Post print                    umn presents two-stage least squares (2SLS) esti-
edition but not the Times print edition, or vice                  mates using the excluded variables as instruments.
versa. I discuss below how the model is identi-                   The coefficient on online reading becomes signif-
fied despite this limitation. In the robustness                   icantly negative. The magnitude suggests that if
section, I also show that the results remain                      we could do an experiment and randomly assign
qualitatively unchanged when the Times is ex-                     individuals to read the online paper (at zero time
cluded from the analysis completely, suggesting                   cost), they would be on average 40 percent less
that limited identification on the print side does                likely to read the print paper, though the limita-
not bias the estimates of the Post-post.com re-                   tions of the linear probability model mean this
lationship. The reader should bear in mind,                       magnitude must be interpreted with caution. The
however, that the lack of such excluded vari-                     F statistic on the instruments in the first stage is
ables means that the print-print substitution pat-                33.05, suggesting that weak instruments are not a
terns should be interpreted with more caution                     problem (James H. Stock and Motohiro Yogo
than the print-online substitution patterns.                      2002). The ␹2 statistic for the overidentification
   One way to see the effect of these excluded                    test in this regression is 3.65, with a p value of
variables and perform some checks on their valid-                 0.302, meaning the validity of the instruments
ity is to use them to instrument for online reading               cannot be rejected.
in a linear probability model of print reading.                      A possible concern is that even if the excluded
There are several problems with such a specifica-                 variables do not affect the utility of reading print
tion: it does not restrict probabilities to be between            newspapers directly, they might be correlated with
You can also read