Privacy and Children: What drives digital data protection for very young children?
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Privacy and Children: What drives digital data protection for very young children? G. Cecere∗, F. Le Guel†, V. Lefrere ‡, C. Tucker§, P.L. Yin¶ June 4, 2019 Abstract Mobile platforms have provided children and their parents widespread access to edu- cational and other helpful apps in the past decade. The easy entry into mobile apps also creates opportunities for developers to take advantage of the increasing time children spend playing on apps. We use an original dataset of apps commercialized in the USA and targeting children to explore the types and scope of the data collected via children’s use of online mobile applications. Rules related to the collection of sensitive data vary with the developer’s geographical location and size. We find that apps that opt in to Google’s “Designed for Families” program generally comply with US children’s privacy regulation unless the developer is located in a country with weak privacy regulation. However, big developers from countries with weak regulation collect less sensitive data, suggesting that they may be better able to bear the costs of privacy regulation. JEL CODE: D82, D83, M31, M37 ∗ Institut Mines Telecom, Business School Email: grazia.cecere@imt-bs.eu † University of Paris Sud. Email: fabrice.le-guel@u-psud.fr ‡ Institut Mines Telecom, Business School-University of Paris Sud. Email: vincent.lefrere@u-psud.fr § Massachusetts Institute of Technology (MIT) - Management Science (MS). Email: cetucker@mit.edu ¶ Greif Center for Entrepreneurial Studies, Marshall School of Business, University of Southern California. Email: pailingy@marshall.usc.edu We are grateful to Qinhou Li, Ginger Zhe for helpful comments and suggestions. An earlier version of this paper was presented at IIOC Indiana 2018, AFSE Paris 2018, the Peex Lab, ZEW ICT Conference 2018, 11th Conference on Digital Economics Paris 2019, Summer Institute Munich 2019. The underlying work for this paper was supported by the DAPCODS/IOTics ANR 2016 project (ANR-16-CE25-0015). We are grateful to Theo Marquis, Hugo Allouard and Enxhi Leka for their research assistance. All errors are the sole responsibility of the authors. 1
1 Introduction Many mobile applications are targeted at very young children, including toddlers (i.e. 2 to 3-year-olds and preschool-age children generally). While mobile applications offer valuable learning opportunities, they are also able to collect large amounts of user data, since children do not have a good grasp of safety and privacy. Both the USA and the EU have specific regulations in place designed to protect children’s privacy. While they may protect children, these regulations may constrain developers’ opportunities to monetize their apps. Since consumer and developer access to the Android app market is international, the effect of regulation is unclear. To encourage developers to comply with the USA Children’s Online Privacy Protection Act (COPPA), Google Play (the largest app store worldwide) introduced a self-regulation program called “Designed for Families” (DfF). This program helps parents identify child-appropriate content.1 Given the widespread use by kids of mobile applications, what sensitive data are developers collecting from children? What institutions influence whether and how much sensitive data developers collect? To our knowledge, there is no empirical study on the extent and influence of the digital data collected on child users.2 The US COPPA and the EU General Data Protection Regulation (GDPR) are both in- struments designed to protect children’s privacy in digital markets. The USA Federal Trade Commission (FTC) has stated that in order to collect data from a child, the app developer must have documented consent from the parents. Developers should also require parental consent if they want to use ad targeting children on the basis of their behaviour (behavioural 1 This program was introduced in May 2015 by Google Play (See, https://developer.android.com/google- play/guides/families for more information, last retrieved, May 12, 2019) In 2013, the iOS App Store intro- duced a kids app category (Apple’s WWDC 2013 Keynote). 2 The recent Mobile Kids Report published by Nielsen (2017) shows that 59% of the children interviewed use mobile devices to download apps , retrieved January 8, 2018. 2
ad). The FTC has brought law enforcement actions against several app developers located both in the USA and other countries. The most recent action involved the Chinese app “TikTok” and resulted in a fine of $5.7 million for failure to seek and obtain parental consent to collect children’s sensitive data.3 This paper provides empirical evidence on how the US COPPA laws have spillover effects on developers originating from countries with weak regu- lation enforcement. We also explore whether app compliance with COPPA legislation varies with the developer’s size and location. We collected weekly data from July to September 2017 on apps available in Google Play in the DfF category. We compared them to apps pro- duced by developers which had chosen not to adopt this categorization but which targeted children based on app descriptions that included keywords such as “preschool” and “tod- dler”. Our dataset includes 9,799 apps corresponding to 4,442 different developers located in 89 countries, generating a panel of 92,746 observations. To measure the effects of developer country’s regulation, we identify developer location based on the address provided by the developer. The dynamic structure of our data combined with the large set of controls and fixed effects included in our main models allow us to interpret the effect of USA regulation on the features of apps commercialized in this market. We find that developers located in regions with no privacy regulation collect more sen- sitive child data relative to developers based in the USA or European countries. However, developers that opt in to the Google self-certification program are less likely to collect child data, which suggests that the program at least is an informative label for parents and may even promote privacy protection spillovers from US regulation to developers from other coun- tries. We also find that big developers located in countries with weak legislation are less likely 3 , last retrieved, March 3, 2019. 3
to collect sensitive data. If the cost of compliance with COPPA legislation differs between big and small developers and the marginal costs of compliance and security control are de- creasing, larger developers may be more able to bear the cost of compliance with COPPA. The results are robust to a broader definition of sensitive data and the granularity of the location data. However, developers from Hong Kong and countries with no privacy laws are likely to collect sensitive user data aimed at accurately identifying unique users - e.g. IMEI (International Mobile Equipment Identity) - and to take children’s photos and record voices, although they are less likely to collect location data. We contribute to three literature streams: the economics of privacy regulation, the eco- nomics of mobile applications, and more general work on children’s Internet usage. The economic effects of privacy regulation literature highlight a trade-off between protecting in- dividuals and developing further innovations. Goldfarb and Tucker (2012) focus on the effects of privacy regulation on firm performance. Campbell et al. (2015) examine how regulation influences competition and show that privacy regulation is likely to affect small and new firms adversely. An important regulatory enforcment tool in the context of privacy legislation is the industry self regulation which can affect the competitive structure (Acquisti et al., 2016; Brill, 2011). Johnson et al. (2017) evaluate the disadvantages of self-regulation initiatives in the ad industry. Jia et al. (2018) show that regulation increases firms’ costs and reduces their performance. Miller and Tucker (2009) examine the welfare outcomes of privacy regulation. To our knowledge, the present study is the first to document the effects of privacy regulation in the context of protecting children’s privacy. It builds on Rochelandet and Tai (2016), who find that privacy regulation and location are related. We show that in the global app economy, developers are influenced by the existence or lack of regulation, and that there can 4
be international spillovers from privacy regulation on behavior. Our findings have direct relevance for the second literature stream on the economics of mobile applications. Bresnahan, Davis, and Yin (2015) and Li, Bresnahan, and Yin (2017) characterize the monetization challenges facing app developers. Yin et al. (2014) finds that successful developer strategies differ by app category (game vs. non-game). Ghose and Han (2014) use a structural model to assess the factors influencing consumers’ demand for apps. Their results suggest that demand for children’s apps is higher than demand for adults’ apps. They show also that children’s apps have lower marginal production costs compared to apps designed for other age categories. These papers suggest that developers have strong incentives to find fast ways to monetize apps outside of the app stores (to avoid the 30% revenue share with Google Play), so we may find conflicts between privacy and commercialization. Furthermore, the extent of this behavior may differ across apps, so children’s apps should be examined as a distinct category. We also build on the body of work which demonstrates that the role played by platform design on the strategies of app developers. Ershov (2017) investigates how the design of the Google Play platform changed the entry dynamics, and shows that splitting the game category into different subcategories reduces search costs and lowers the quality of new entrants. Kummer and Schulte (2018) provide evidence of a trade-off between app demand and supply: the amount of personal information collected to monetize a given app reduces app success as measured by the number of downloads. Kesler et al. (2017) show that apps that target the 13+ and 16+ age categories are more intrusive compared to apps targeting the “everyone” category (which includes children and adults). While there is empirical evidence showing the importance of the game category in the smartphone market, there is almost no published economics or management research on the characteristics of apps 5
aimed at children, or how platform policies can support regulation and influence individual behavior. The exception (in computer science) is the recent paper by Reyes et al. (2018) which analyzes popular free mobile apps. The authors show that the majority of apps do not comply with USA child privacy regulation. Our paper extends this analysis by using a larger, pooled sample, and taking account of both developers’ location and the impact of the DfF program on compliance with COPPA. Finally, our research contributes to two broader streams of research on children’s use of the Internet. Internet access has mixed effects on education outcomes (Bulman and Fairlie, 2016; Belo et al., 2013). There is empirical evidence showing that Internet use in schools affects the level of household Internet penetration (Belo et al., 2016). We contribute to this work by highlighting the participation of children in the mobile app economy. In the context of the existing literature, our study makes an important contribution related to estimating the effect of USA child privacy regulation on national and foreign de- velopers that want to sell their applications in the USA market. First, our statistics based on the scope and depth of the data collected on children are an improvement on those used in several existing examinations of policy. To our knowledge the FTC (FTC, 2012a,b) policy reports provide some initial summary statistics on data collection by apps and focus on the extent to which these apps disclose their data collection activity via privacy policies. A study conducted by the Global Privacy Enforcement Network (GPEN, 2015) analyzes the privacy practices of 1,494 world websites targeting children.4 We show that in the mobile applica- tions economy (which increasingly replacing desktop access to websites), collection of data on very young children may be even more pervasive. Indeed, compare to websites accessed 4 GPEN includes 29 Data Protection Authorities worldwide - “2015 GPEN Sweep - Children’s Privacy”: , last retrieved, January 8, 2018. 6
its collection is automated without distinction between user because they implicitly accept this collection when they download the app , such that data can be collected even on very young children. As well as providing some preliminary and very comprehensive information on automated data collection practices related to very young children, our empirical analysis provides evidence that should be informative for future policy. Second, we identify spillover effects from platform compliance efforts related to USA policy regulation on the behavior of foreign developers through the platform’s self-regulation program. Our findings suggest that the effect is extremely heterogeneous across countries, country regulation and developer firm sizes. Third, our analysis suggests that in the global app economy, although some developers are subject to regulation, collection of child data is pervasive in non-regulated countries. Many international developers appear not to comply with any child privacy regulation. The paper is structured as follows. Section 2 describes the data sources and presents the descriptive statistics. Section 3 presents the econometric model. Section 4 discusses the econometric results based on different country specifications. Section 5 provides some ro- bustness checks related to the probability to collect sensitive data and provide ad targeting via ad third parties. Section 6 concludes. 2 Description of the sample We collected weekly data on smartphone applications for children from the USA Google Play. Thus, we are able to study apps commercialized in the US, but which may have been developed elsewhere. Developers who produce children apps can decide to opt in to 7
DfF, or they can commercialize their apps in the Google Play without belonging to the DfF program. Our data collection strategy allows us to collect both groups of children apps. First, we collected the characteristics of apps in the DfF program aimed at children aged under 13.5 The apps indicate appropriate age categories: children aged 5 & under, children aged 6-8 years, children aged 9 years & over, or mixed audience. Within the DfF program, we collected 4,679 different apps. Developers who opt in to this program self-declare compliance with COPPA, along with other requirements specified by Google. Apps submitted to DfF should comply to all Google requirements which are directly in line with COPPA legislation. Second, we constructed a benchmark group of applications aimed at children by simulat- ing the user’s (parent’s) likely keyword search process in the Google Play. Using the Google Adwords keyword planner tool, we identified the list of keywords most frequently associated to children applications: children, children’s, kids, baby, babies, toddlers, educational, tod- dler, preschool, preschoolers, child monitoring, kindergarten, kindergartners, boys, girls, kid monitoring, 2 year old, 3 year old, 4 year old, 5 year old, 6 year old, 7 year old, 8 year old, 9 year old, 10 year old, 11 year old, 12 year old. Apps identified at least once by keyword search in Google Play during the study period generated our list of apps targeted towards children in the Google Play but which do not belong to the DfF program. The benchmark group of apps includes 4,031 apps. There are 1,089 apps which has been collected both within the DfF program and via keyword search. Our sample consists of apps included in the Google Play DfF program and our benchmark apps. We tracked each application over a period of 12 weeks, starting from its first appearance to the end of the sample period. New apps appear over time while others become unavailable; 5 The DfF program also includes six broad categories: Action & Adventure, Brain Games, Creativity, Education, Music and Video, and Pretend Play. 8
the number of apps available in the DfF program or identified by the keyword searches increased from 5,137 to 9,799. We excluded apps found only in the last period.6 Our sample includes 92,746 observations7 ; 79.65% of the applications included a clear developer address. Developers originated from 89 different countries. Table 1 presents the statistics of the overall sample. Table 2 presents the breakdown statistics of the app and developer statistics. Column (1) presents the statistics of the apps that do not collect sensitive data, and shows that on this sub-sample of apps 69% are inside DfF. Whereas column (2) shows the statistics of the apps that collect sensitive data, and only 31% of them are inside DfF. Column (4) presents the statistics of the apps that do not opt in into DfF. Column (5) presents the statistics of the apps that opt in into DfF. USA, China and Hong Kong are not included into the institutional and geographical variables because they are the largest producer countries after the European Union. China is one of the largest markets; it accounts for nearly 50% of total downloads across iOS and other Android Stores in 2018 (App Annie, 2019). Our empirical strategy allows us to measure whether the platform policy related to chil- dren’s content provides effective protection for their personal data compared to the bench- mark group. We collected over time all publicly available data such as app characteristics (e.g. user ratings, freemium, free, paid), developer’s name and address, type of interactive elements utilized by the app, and number and type of permissions required by developers. We are interested in 1) measuring the effectiveness of the platform policy in protecting children, and 2) testing whether national privacy regimes correspond to developers’ collection 6 We exclude 481 applications that appear only once. 7 We delete from our sample the applications which only appeared the last week of the data collection. An observation is at the week and app level. 9
of less sensitive data. Table 1: Summary statistics for the full sample of apps Mean SD Min. Max. N Sensitive Data 0.358 (0.479) 0 1 92746 DfF 0.722 0.448 0 1 92746 USA 0.225 (0.418) 0 1 92746 Europe 0.266 (0.442) 0 1 92746 Recognized by the EU 0.060 (0.238) 0 1 92746 Independent authority 0.040 (0.197) 0 1 92746 With legislation 0.125 (0.331) 0 1 92746 No privacy law 0.027 (0.161) 0 1 92746 China 0.021 (0.142) 0 1 92746 Hong Kong 0.033 (0.178) 0 1 92746 Without developer address 0.203 (0.402) 0 1 92746 Child specialization 0.44 0.335 0 1 92746 Log number of reviews 5.318 (3.480) 0 18 92746 User rating 3.820 (1.228) 0 5 92746 App age (Month) 20200.854 (672.065) 18263 21075 89375 Users interact 0.048 (0.213) 0 1 92746 Contains ad 0.578 (0.494) 0 1 92746 Freemium 0.318 (0.466) 0 1 92746 Price 0.799 (2.317) 0 100 92746 Top ranking 0.361 (0.348) 0 1 92746 High Download 0.58 0.494 0 1 92746 Number of developers by country 15.55 29.417 1 156 92746 Developer: Downloads 13.033 (5.213) 0 23 92746 Developer: User ratings 3.560 (1.378) 0 5 92746 Developer: No User ratings 0.060 (0.238) 0 1 92746 Developer: Missing information 0.088 (0.283) 0 1 92746 Notes: Descriptive statistics of the full sample. 10
Table 2: Summary statistics Do not collect Collect Not In DfF In DfF DfF 0.69 0.31 Senstitive Data 0.38 0.62 High download 0.60 0.40 0.35 0.65 Users interact 0.39 0.61 0.55 0.45 Contains ad 0.61 0.39 0.32 0.68 Presence 0.64 0.36 0.30 0.70 Freemium 0.62 0.38 0.25 0.75 Child Specialization 0.46 0.41 0.35 0.47 China 0.33 0.67 0.12 0.88 Hong Kong 0.29 0.71 0.11 0.89 Without developer address 0.63 0.37 0.53 0.47 Europe 0.71 0.29 0.19 0.81 Recognized by the EU 0.61 0.39 0.20 0.80 Independent authority 0.65 0.35 0.22 0.78 With legislation 0.65 0.35 0.37 0.63 No privacy law 0.56 0.44 0.29 0.71 United States 0.67 0.33 0.18 0.82 Log number of reviews 4.90 6.06 6.47 4.87 User rating 3.81 3.85 3.87 3.80 Top ranking 0.36 0.37 0.33 0.37 Price 0.87 0.66 0.22 1.02 App age 20216.81 20174.31 19963.72 20292.16 2.1 Dependent variables: Sensitive data and IMEI COPPA regulation defines the list of child-sensitive data covered by the law. It includes geolocation details (sufficiently precise to identify street name and city) photos, videos, and audio files that contain children’s images or voices), and usernames and persistent identifiers to recognize an app user over time and across different applications.8 To measure whether children’s apps possibly violate the USA children privacy legislation, we identify the Google Play permissions that allow apps to collect these child-sensitive data. We created the binary 8 The complete list of children’s personal data is available at , last retrieved, November 6, 2018. 11
variable Sensitive Data which indicates whether the app collects any sensitive data covered by COPPA regulation. More precisely, the law requires verifiable parental consent for the collection, use, and disclosure of personal information on children aged under 13. This information is not available to the researchers: only developers and users who actually use the app have access to it. Thus, we are only able to measure whether any personal data covered by law is required by each app. We identify seven permissions and one interactive element that require personal data covered by the COPPA regulation. The permission Read Phone Status and Identity allows developers to identify a smartphone’s unique IMEI which is considered a persistent unique identifier by COPPA and the GDPR (Reyes et al., 2018). The IMEI can be used to recognize a user over time and across different online services,9 and it could be used to log all kinds of personal data and target the consumer. IMEI number also permits developers to know which advertising as already being seen by a users. A child’s image and voice can be captured via the permissions Take Pictures and Videos and Record Audio. There are different sets of permissions that allow location data collection: ALEC (Access Location Extra Commands) is used to determine user’s locations based on various device capabilities, and ANBL (Ap- proximate Network Based Location) is used to access approximate location deriving from network location sources such as cell towers and Wi-Fi. MLST (Mock Location Sources for Testing) is used to facilitate developer’s testing of geolocation data applications as Precise GPS Location and the interactive element Share Location. Appendix Table 17 shows the permissions and interactive elements required to construct the dependent variable Sensitive Data. Column (1) presents the statistics for the entire sample, and Columns (2) and (3) 9 Complying with COPPA: Frequently Asked Questions. Available at , last retreived, December 12, 2018. 12
present the respective app statistics for the USA and EU developers, Column (4) presents the statistics for China, and Column (5) presents the statistics for Hong Kong. 2.2 App characteristics Google Play provides a large set of information for each app: the number of reviews, the users’ rating, the price, whether the app contains ads, etc. (Table 2). To measure app success, we include the variable Log Number of Reviews. To indicate whether the app has a high number of downloads, we create a dummy variable High Download that takes value 1 if the app has more than 10,000 downloads. Table 20 shows the breakdown statistics of the all download categories from 0 download to more than 1 million. The variable Top ranking measures the highest search rank of each app in the Google Play. To measure app popularity, we use the ranking indicated by the user (variable User Rating) on app’s quality, on a 0 to 5 scale, where 0 indicates that the app has no rating. We also include a dummy variable No app rating for the app without any stars. To measure the age of the applications, we include the variable App age which measures the age of the app in months since its launch on Google Play. The variable Price indicates the price of the apps going from zero for free apps to 99.99$, the maximum price.10 While many apps are free, paid apps represent 26.8% of applications. Finally, the binary variable Freemium indicates whether the application offers IAP (in-app purchases). This applies to 31.8% of the applications in the sample. In order to measure if the apps provide advertising, we include the binary variable Contains Ad which takes the value 1 if the app displays advertisements to users. Overall, 57.8% of apps include ads. 10 In our sample, paid apps are over-represented compared to the overall Playstore. Indeed, we collected inside the DfF category Top paid and Top New Paid categories. 13
The binary variable Users Interact measures if the app exchanges sensitive data collection. This feature allows the app to be exposed to unfiltered/uncensored user-generated content including user-to-user communications and media sharing via social media and networks.11 A dummy variable, Presence in the Search, indicates whether each week the app is present either in the DfF program or in the benchmark search. 2.3 Developers’ geographical location To explore regulation spillovers to other countries, we retrieved geographical information disclosed by developers of apps available in the Google Play. In fact, each app developer’s address has to indicate its country. Although the platform mandates developer’s address (since September 30, 2014),12 for those offering paid apps, in-app purchases, and payment through the app, there are several apps without a geographical address. To retrieve develop- ers’ countries, we use different strategies. First, we use Maps APIs to collect the latitudes and longitudes of the given address to identify the country.13 Second, we used a Python library (Libpostal) to search for a country name in the developer’s address. Third, we checked the match between the location identified using the Google Maps APIs and the country name identified via Libpostal. Fourth, among the subset of apps without any developer’s address, we identify the their location using the email extension.14 Finally, we did a manual check for certain addresses. To indicate the apps without any developer location information, we created the variable Without developer address, 20.3 % fall into this category. Regarding apps 11 , last retrieved, January 8, 2018. 12 Among the developers not declaring an address, 16.7% of those in our dataset were registered on the platform before September 2014. 13 We retrieve the geographical address using four different geographical API: Google Map, Bing Map, Open street Map, Geo Map. 14 Using this procedure, we identify the origin countries of 100 apps. 14
without a developer address, it is important to emphasize that COPPA legislation requires that parents be informed about the companies that collect children data, and in particular, that companies indicate their contact details such as email or geographical location. 2.3.1 National privacy regulation Privacy regulation rules vary across countries, and we exploit this variation to characterize countries’ privacy policies. To assess differences in national regulatory frameworks, we aug- ment these data with a vector of the Institutional framework measures associated with the developer’s address. Since developers in the US develop apps for their domestic market, it is reasonable to believe that their behavior may differ from others. We created the dummy variable USA to indicate whether the developer is located in the US. US developers produced 22.5 % of the apps in our sample. We use a measure of privacy regulation to indicate the country’s level of compliance with EU privacy legislation. Appendix Table 11 presents countries categorized according to their level of compliance with EU privacy legislation. This index was computed by the French Privacy Regulation Authority (CNIL).15 The dummy variable Europe (Table 2) identifies developer countries belonging to the EU and indicates if the country’s privacy laws are compatible with EU legislation who represent 26.6% (21,44 % of these apps are produced by developers in the United Kingdom). The variable Recognized by the EU indicates that the privacy laws in a country outside the EU are compatible with the EU privacy laws corresponds to 6% of the sample. The binary variable Independent authority indicates the existence of an independent privacy regulation authority in the app developer’s country and 15 , last retrieved, January 8, 2018. 15
the presence of a privacy legislation framework which corresponds to 4%. The binary variable With Legislation indicates that the country has privacy legislation only (but no independent authority regulating privacy), which counts for 12.5% of apps, and the dummy variable No Privacy Law indicates the absence of privacy laws in the developer’s country (2.6% of apps). We also created a dummy variable China which takes the value 1 if the developer is located in China. A dummy variable Hong Kong takes the value 1 if the developer is located in Hong Kong. After European countries such as Germany, United Kingdom, and France, the largest producers of apps commercialized in USA are located in Hong Kong and China with respectively 3.3% and 2.1% of the whole sample. Apps in the Google Play are automatically released worldwide with automated translation of the app’s description unless the developer specifies otherwise. The developer’s strategy might also be associated to the home institutional framework. First, we consider whether OECD country (excluding USA and Europe) developers demon- strate behavior that is different from that displayed by developers located in non-OECD countries (excluding China and Hong Kong) that have weaker institutions and regulation. 2.3.2 Graphical evidence: Collection of sensitive data by developer location Figure 1 depicts the percentage of apps per country group that collect children’s personal data.16 The graphical evidence shows that overall, developers located in China and Hong Kong collect more data compared to developers in the USA and other country groups. The bar graph (1) in Figure 1 shows the distribution of Sensitive Data items for OECD and non-OECD countries, with USA, China and Hong Kong separated from the group of coun- 16 For example, considering Figure 1, bar graph 1, among developers with no address about 37% collect sensitive data. 16
Figure 1: Collection of sensitive data by developer location (1) (2) .8 .8 .6 .6 Average Sensitive data Average Sensitive data .4 .4 .2 .2 0 Without developer address Member of the UE Recognized by the EU Independent authority \& law 0 Without developer address OECD With legislation No privacy law No OECD United States USA China China Hong Kong Hong Kong Notes: The vertical axis is the percentage of apps collecting sensitive data. tries. Developers in Hong Kong and China collect more sensitive data compared to all other locations, followed by developers who do not list an address. The bar graph (2) in Figure 1 shows the distribution of the number of Sensitive Data items collected according to the EU privacy regulation regime, again with USA, China and Hong Kong separated out. Developers in Hong Kong and China collect more sensitive data compared to all other locations, followed by developers in countries with no privacy laws. 17
2.4 Developer characteristics Besides geographical data, we collected several data to measure developer characteristics (Table 2). In particular, it is interesting to differentiate experienced developers. This data includes the overall ratings associated to developers (Developer: User ratings), the overall number of downloads (Developer: Downloads), and the date of entry in the Google Play (De- veloper: Entry). We also collect the total number of apps produced by the developers and we create the variable Number of app by developers. In order to measure developer specializa- tion in children’s apps, we compute an index Developer: Child specialization which measures the ratio between the number of apps that each developer produces in the children market divided by the overall number of apps they have produced. We avoid dropping observations in the regressions by setting the missing values equal to 0 and defining corresponding dummy variables Developer missing information equal to 1. The main results are very similar even if we exclude these observations. 3 Model specification Our econometric analysis estimates the effect of the regulation in the developer’s country of origin on the amount of child sensitive data collected. Our dependent variable is binary, and we use probit estimation with robust standard errors clustered on the app level. We model the probability of collecting sensitive data using the following specification: P(Sensitive Datait ) = α0 + Xit β + Zijt γ + Dijt θ + ρt + ijt (1) 18
Our primary vectors of interest are country privacy regulation, X. We include the level of privacy protection using the distinction between OECD and non-OECD countries. We also identify the stringency of regulation according to the European legislation by using a set of dummy variables which capture the country’s level of compliance compared to EU regulation, one of the most strict regulation frameworks. We also include dummies for developers from China, Hong Kong and USA. Z is the vector of app characteristics i at time t developed in country j. D is the vector of developers characteristics i at time t developed in country j. The regressions are weighted by the number of developers in each country. is an independent and identically distributed random error term. The equation also includes time (week) effects ρt . Since, we have very few time varying variables, we estimate the model using pooled cross- section data. 4 Estimation of the probability to collect sensitive data 4.1 Baseline model: Effect of DfF and developer home country regulation In order to estimate the impact of COPPA regulation on foreign developers, we begin with a pooled cross-section framework. COPPA precisely defines the sensitive data covered by the legislation and requires that each company or the third parties that collect user data provide information such as name and address to allow parents to contact them. We investigate the impact of privacy regulation and macro-economic characteristics on the probability of collecting sensitive data requested by developers.Table 3 presents the coefficient of the Probit estimation. All regressions include app characteristics to control for observable differences in 19
the apps underlying the probability of collecting sensitive data. By including the full set of app characteristics in these models, we can abstract away from app heterogeneity. We also include developer characteristics in order to measure developer’s experience and size. All the specifications include time fixed effects. The omitted category for the institutional variable is Europe. Thus, the OECD dummy variable does not include Europe. To investigate how DfF moderates the effect of developer country to measure USA legislation spillovers, Table 3 of Column (2) adds a set of interaction terms between developer country and DfF. Developers, both in China and Hong Kong, are more likely to collect sensitive data once they opt in to the DfF program. Table 3 Column (3) includes measures of institutional privacy regulation. Column (4) adds the interaction terms between the regulation measure and DfF. Overall, there is a negative association between the decision to opt in to the DfF program and sensitive data collection, which is statistically significant in all specifications. This finding is aligned with the intention of the platform to encourage compliance with COPPA legislation. Apps with a large number of downloads are more likely to collect sensitive data when they comply with the DfF program suggesting that the self regulation program is overall effective for popular apps. Developers that decide to comply with the DfF program show lower probability of collecting users’ sensitive data. Column (2) and (4) show that the interaction term High Download × DfF is negative and statistically significant suggesting that apps with a high number of downloads are likely to collect less sensitive data if they opt in to the DfF. Columns (2) and (4) show that developers located in China, Hong Kong, and Without privacy legislation have a higher probability of collecting users’ data relative to developers in Europe. Table 4 reports the marginal effects at the mean for the main specifications. Column (4) shows that developers who decide to opt in to the DfF program and originate 20
from China, Hong Kong, or No privacy law are likely to collect sensitive data by respectively 19.3%, 60.6% and 20.6%. The result related to the Chinese apps is in line with the finding of Wang et al. (2018) who show that apps commercialized in the Chinese market are more intrusive compared to other apps presented in the Google Play. 21
Table 3: Probit estimation: Sensitive Data (1) (2) (3) (4) high download -0.194*** -0.030 -0.194*** -0.031 (0.066) (0.089) (0.066) (0.089) OECD 0.276*** 0.137 (0.058) (0.116) No OECD 0.358*** 0.258** (0.062) (0.109) USA 0.190*** 0.335*** 0.189*** 0.335*** (0.048) (0.102) (0.048) (0.102) China 1.123*** 0.623** 1.124*** 0.623** (0.106) (0.266) (0.106) (0.266) Hong Kong 1.139*** -0.420* 1.139*** -0.420* (0.092) (0.243) (0.092) (0.243) Without developer address 0.208*** 0.100 0.208*** 0.101 (0.057) (0.092) (0.057) (0.093) DfF -0.462*** -0.386*** -0.461*** -0.385*** (0.047) (0.109) (0.047) (0.109) DfF × High Download -0.267*** -0.267*** (0.086) (0.086) DfF × USA -0.198* -0.198* (0.115) (0.115) DfF × China 0.596** 0.597** (0.289) (0.289) DfF× Hong Kong 1.830*** 1.830*** (0.260) (0.260) DfF × OECD 0.178 (0.134) DfF× No OECD 0.129 (0.130) DfF× Without developer address 0.176 0.176 (0.107) (0.107) Recognized by the EU 0.250*** 0.105 (0.073) (0.159) Independent authority & law 0.239*** -0.027 (0.084) (0.174) With legislation 0.347*** 0.271** (0.062) (0.108) No privacy law 0.635*** 0.201 (0.107) (0.187) DfF × Recognized by the EU 0.180 (0.179) DfF × Independent authority & law 0.348* (0.197) DfF× With legislation 0.088 (0.131) DfF × No privacy law 0.591*** (0.220) Constant -0.024 0.044 -0.024 0.043 (0.192) (0.213) (0.192) (0.213) Week fixed effect Yes Yes Yes Yes App characteristics Yes Yes Yes Yes Developer characteristics Yes Yes Yes Yes Observations 92746 92746 92746 92746 Log-likelihood -3.154e+07 -3.137e+07 -3.154e+07 -3.137e+07 Notes: Probit estimation. Sensitive Data is the dependent variable. Robust stan- dard errors clustered at app level reported in parentheses. The omitted category in all specifications is Europe. Weighted by the number of developers by country. Significance levels: ∗p < .10, ∗ ∗ p < .05, 22 ∗ ∗ ∗p < .01
Table 4: Probit Marginal effect: Sensitive Data main equation (1) (2) (3) (4) High download -0.041 0.030 -0.041 0.030 (0.025) (0.034) (0.025) (0.034) DfF -0.236*** -0.184*** -0.236*** -0.181*** (0.020) (0.054) (0.020) (0.054) OECD 0.092*** 0.012 (0.023) (0.050) No OECD 0.102*** 0.058 (0.025) (0.048) USA 0.084*** 0.098** 0.084*** 0.099** (0.018) (0.039) (0.018) (0.039) China 0.384*** 0.199** 0.384*** 0.201** (0.034) (0.097) (0.034) (0.097) Hong Kong 0.401*** -0.148** 0.401*** -0.146** (0.030) (0.058) (0.030) (0.058) Without developer address 0.050** 0.001 0.050** 0.003 (0.023) (0.040) (0.023) (0.040) high downloadxdff rc -0.085*** -0.085*** (0.033) (0.033) Without developer address opt in DfF 0.104** 0.101** (0.045) (0.044) OECD x DfF 0.098* (0.055) No OECD x DfF 0.059 (0.052) United States x DfF -0.023 -0.025 (0.047) (0.046) China x DfF 0.196** 0.194** (0.094) (0.094) Hong Kong x DfF 0.627*** 0.625*** (0.087) (0.087) Recognized by the EU 0.096*** 0.018 (0.027) (0.058) Independent authority & law 0.073** -0.042 (0.033) (0.060) With legislation 0.096*** 0.061 (0.025) (0.047) No privacy law 0.203*** 0.202*** (0.038) (0.038) Recognized by the EU x DfF 0.093 (0.063) Independent authority x DfF 0.148** (0.072) With legislation x DfF 0.046 (0.052) characteristics Yes Yes Yes Yes Developer characteristics 23 Yes Yes Yes Yes Week fixed effect Yes Yes Yes Yes Observations 92746 92746 92746 92746 Notes: Probit Marginal effects. Sensitive Data is the dependent variable. Robust standard errors clustered at app level reported in parentheses. The omitted category in all specifi- cations is Europe. Weighted by the number of developers by country. Significance levels: ∗p < .10, ∗ ∗ p < .05, ∗ ∗ ∗p < .01
4.2 Probability to collect Read phone status and Identity We show the robustness of our results to an alternative dependent variable: IMEI. Table 5 shows the results of the estimation of the econometric specification presented in Equation (1). The permission “Read phone status and identify” allows the developer to collect the IMEI number of user’s smartphone. If there were unobserved heterogeneity issues related to collection of all sensitive data, we would expect to see different results for these estimations. Column (1) includes the vector of the variables measuring the OECD institutional frame- work. Column (2) adds the interaction terms between OECD institutional variables and DfF. The interaction term DfF× USA is negative and statistically significant suggesting that within the DfF program, developers in the USA are less likely to collect personal data. Column (3) includes a set of dummies measuring compliance with EU legislation. Develop- ers in countries recognized by EU or countries whose privacy laws are compatible with EU legislation are more likely to collect IMEI data compared to Europe. Across all specifications, the decision to collect IMEI sensitive data appears to decrease with participation in DfF. Overall, developers in the USA are less likely to collect IMEI information. Importantly, developers in China and Hong Kong are more likely to collect this information. This corroborates the intuition of the previous statistical evidence presented in Table 17. There are different alternative explanations. First, there are different examples of applications targeting adults such as Meitu,17 WeChat, Taobao18 produced in China that have raised privacy concerns in USA and in Europe as they collect users’ IMEI information. This might suggest that this permission is likely to be required by Chinese developers. Second, while USA and Europe regulation consider the IMEI data as personal information, the Hong 17 18 ff ff 24
Kong regulation “Personal Data Privacy Ordinance” does not consider persistent identifiers as personal data (Hargreaves and Tsui, 2017). Third, the actual plan of the Chinese government aiming to enlarge the surveillance might encourage app developers to collect unique identifier like the IMEI number. In the same way, IMEI collection also meets security considerations.19 Fourth, to our knowledge there are very few recent allegations of the FTC against foreign developers for violating COPPA law; it might not be considered as a credible threat for foreign developers. 19 , last retrieved, December 9, 2018. 25
Table 5: Probit estimation: IMEI sensitive data (1) (2) (3) (4) high download -0.158** -0.085 -0.159** -0.085 (0.069) (0.094) (0.069) (0.094) OECD 0.243*** 0.095 (0.061) (0.120) No OECD 0.378*** 0.430*** (0.064) (0.112) USA 0.119** 0.301*** 0.118** 0.301*** (0.051) (0.105) (0.051) (0.105) China 1.300*** 0.561** 1.301*** 0.560** (0.105) (0.262) (0.105) (0.262) Hong Kong 1.141*** 0.017 1.142*** 0.018 (0.089) (0.240) (0.089) (0.240) Without developer address 0.220*** 0.058 0.221*** 0.060 (0.061) (0.098) (0.061) (0.098) DfF -0.272*** -0.313*** -0.271*** -0.312*** (0.050) (0.117) (0.050) (0.117) DfF × high download -0.113 -0.114 (0.092) (0.092) DfF × USA -0.261** -0.260** (0.120) (0.120) DfF × China 0.866*** 0.868*** (0.286) (0.286) DfF × Hong Kong 1.318*** 1.318*** (0.257) (0.257) DfF × OECD 0.194 (0.139) DfF × No OECD -0.128 (0.136) DfF × Without developer address 0.272** 0.272** (0.113) (0.113) Recognized by the EU 0.195** -0.176 (0.078) (0.168) Independent authority & law 0.187** 0.034 (0.089) (0.178) With legislation 0.364*** 0.444*** (0.065) (0.111) No privacy law 0.743*** 0.433** (0.107) (0.190) DfF × Recognized by the EU 0.479** (0.189) DfF × Independent authority & law 0.198 (0.203) DfF × With legislation -0.193 (0.138) DfF × No privacy law 0.428* (0.224) Constant -0.441** -0.226 -0.441** -0.228 (0.209) (0.229) (0.209) (0.229) App characteristics Yes Yes Yes Yes Developer characteristics Yes Yes Yes Yes Week fixed effect Yes Yes Yes Yes Observations 92746 92746 92746 92746 Log-likelihood -2.684e+07 -2.666e+07 -2.683e+07 -2.665e+07 Notes: Probit estimation. The dependent variable is the dummy variable IMEI. Robust standard errors clustered at app level reported in parentheses. The omitted category in all specifications is Europe. Weighted by the number of developers by country. Significance levels: ∗p < .10, ∗ ∗26 p < .05, ∗ ∗ ∗p < .01
4.3 Third parties and targeted ad COPPA legislation also regulates the distribution of targeted ads. To identify whether the apps offered targeted ads, we identify the third parties that allow their distribution. For this purpose, we collected data on third parties and categorized them.20 To measure whether the inclusion of targeted ads in apps is associated with the collection of sensitive user data, we created an alternative dependent variable to Sensitive Data, named Sensitive Data Ad Targeting which takes the value 1 if the app collects sensitive data (as defined by the variable Sensitive Data) and at the same time has third parties that offer targeted ads. Table 16 shows the percentage of third party ads used by developers in each group of countries and by apps collecting sensitive data. Tables 6 shows the results of the main estimation with the alternative dependent variable Sensitive Data Ad Targeting. Overall, the variable DfF is negative in all specifications, sug- gesting that platform design aimed at COPPA compliance and personalized ads is effective. Column (2) shows the estimation with the interaction terms between the set of institutional variables and DfF. While developers in non-OECD countries that decide to opt in to the DfF program are likely to collect sensitive data, only developers in Hong Kong and China do not comply with COPPA when they decide to opt in to the DfF program. Column (4) shows the estimation with the interaction terms between the set of regulation variables and DfF. The interaction term Without developer address × DfF is positive and significant, suggesting that developers without an address collect more sensitive data and are more likely to offer 20 The third parties providing targeted ad are: AdMarvel, AdMob, AdsMogo, AdWhirl, AirPush, AppLovin, CaulyAds, Chartboost, InMobi, Inneractive, Jumptap, LeadBolt, Madhouse SmartMAD, MdotM, Mediba Admaker, Millennial Media, MobClix, MobFox, MobWIN, MoPub, Nexage, Noqoush AdFalcon, Revmob, Smaato, Smart AdServer, Sponsorpay, Tap for Tap, Tapjoy, YuMe. 27
targeted ads. 28
Table 6: Probit estimation: Sensitive Data Ad Targeting (1) (2) (3) (4) high download 0.353*** 0.541*** 0.353*** 0.542*** (0.107) (0.134) (0.107) (0.134) DfF -0.607*** -0.440** -0.606*** -0.438** (0.070) (0.202) (0.070) (0.202) OECD 0.275*** 0.062 (0.083) (0.163) No OECD 0.358*** 0.394*** (0.087) (0.149) USA 0.110 0.115 0.110 0.115 (0.078) (0.141) (0.078) (0.141) China 1.342*** 0.585** 1.343*** 0.585** (0.116) (0.298) (0.116) (0.298) Hong Kong 1.232*** -0.046 1.231*** -0.045 (0.098) (0.258) (0.098) (0.258) Without developer address 0.093 -0.036 0.092 -0.036 (0.092) (0.137) (0.092) (0.137) DfF × high download -0.265 -0.265 (0.168) (0.168) DfF × USA -0.035 -0.035 (0.169) (0.169) DfF × China 0.891*** 0.892*** (0.322) (0.322) DfF × Hong Kong1 1.464*** 1.464*** (0.277) (0.277) DfF × OECD 0.302 (0.191) DfF × No OECD -0.105 (0.186) DfF × Without developer address 0.336** 0.335** (0.164) (0.164) Recognized by the EU 0.325*** 0.034 (0.094) (0.187) Independent authority & law -0.165 -0.196 (0.133) (0.237) With legislation 0.347*** 0.398*** (0.087) (0.148) No privacy law 0.676*** 0.430** (0.116) (0.207) DfF × Recognized by the EU 0.407* (0.215) DfF × Independent authority & law 0.046 (0.285) DfF × With legislation -0.142 (0.187) DfF × No privacy law 0.351 (0.245) Constant -1.253*** -1.265*** -1.253*** -1.265*** (0.287) (0.305) (0.287) (0.305) App characteristics Yes Yes Yes Yes Developer characteristics Yes Yes Yes Yes Week fixed effect Yes Yes Yes Yes Observations 92746 92746 92746 92746 Log-likelihood -3.454e+09 -3.432e+09 -3.452e+09 -3.430e+09 Notes: Probit estimation. Sensitive Data Ad Targeting is the dependent variable. Robust standard errors clustered at app level reported in parentheses. The omitted category in all specifications is Europe. Weighted by the number of developers per country. Significance levels: ∗p < .10, ∗ ∗29 p < .05, ∗ ∗ ∗p < .01
4.4 Tax haven countries In order to investigate whether the probability of collecting sensitive data might be affected by a country’s legal framework, we identify which of our sample countries were considered “tax havens” at the time of data collection. Hong Kong has been on the list of tax heaven countries. We construct a binary variable called Tax haven which takes the value 1 if the country is considered a tax haven jurisdiction. Tax regulation can affect collection of various types of personal data. Developing an app in a tax haven country would suggest that the developer is intentionally trying to subvert the regulation: collecting personal information can be profitable and is likely to increase the incentive to violate COPPA. Column (1) of Table 7 presents the estimates of the tax haven country specification. The estimation in column (2) includes the interaction terms between the set of tax regulation settings and DfF. Developers in countries considered tax havens are more likely to collect COPPA sensitive data even if they comply with DfF. The other results are consistent with our previous comments. 30
5 Developers and app heterogeneity 5.1 Developer specialization in children’s apps Compliance with the regulation can entail additional costs for companies, especially in the children’s apps market. To assess whether developers specializing in children’s apps are likely to bear the regu- lation costs if they join DfF, we include the variable Developer: Child specialization in our regression to measure the ratio between the number of apps that each developer produces in Google Play and the overall number of apps she produces. Table 8 presents the estimates. All the specifications include the interaction term DfF × Child specialization, along with variables for app and developer characteristics and time fixed effects. The regressions are weighted by the number of apps produced in each country. Table 8 column (1) presents the estimation for the overall sample. The interaction term DfF × Child specialization is nega- tive and significant, suggesting that apps created by children’s app specialists who decided to opt in to DfF are likely to bear regulation costs. To further study the heterogeneity of the specialization effect, we estimate the same specification on different sub-samples. Columns (2), (3) and (4) respectively present the estimates for the sub-sample of apps produced in the USA, China and Hong-Kong; Columns (5) and (6) present the estimates for the sub-samples of apps produced in the OECD and the non-OECD countries. Column (7) shows estimates for countries without a privacy law. In columns (1), (2) and (5), the interaction terms DfF × Child specialization are negative and significant, which could indicate that children’s apps developers of the most advanced countries are able to cover costs of compliance. In Column (6), the interaction term DfF 31
× Child specialization is positive and significant. suggesting that developers in non-OECD countries will be less likely to bear the costs of regulation. 32
Table 7: Probit estimation: The effect of Tax haven on compliance (1) (2) DfF -0.463*** -0.449*** (0.047) (0.112) Other countries 0.388*** 0.233** (0.054) (0.101) United States 0.234*** 0.333*** (0.050) (0.104) China 1.169*** 0.621** (0.107) (0.266) Hong Kong 1.185*** -0.421* (0.093) (0.244) Tax heaven countries 0.372*** -0.107 (0.081) (0.218) Without developer address 0.253*** 0.099 (0.059) (0.094) High download -0.195*** -0.030 (0.066) (0.088) DfF × Other countries 0.205* (0.118) DfF × United States -0.135 (0.118) DfF × China 0.663** (0.291) DfF × Hong Kong 1.894*** (0.262) DfF × Tax heaven countries 0.565** (0.233) DfF × Without developer address 0.241** (0.111) DfF × High download -0.268*** (0.086) Constant -0.069 0.047 (0.193) (0.214) App characteristics Yes Yes Developer characteristics Yes Yes Week fixed effect Yes Yes Observations 92746 92746 Log-likelihood -3.153e+07 -3.136e+07 Notes: Probit estimation. Sensitive Data is the dependent variable. Robust standard errors clustered at app level reported in parentheses. The omitted category in all specifications is Europe. Weighted by 33 the number of developers per country. Significance levels: ∗p < .10, ∗ ∗ p < .05, ∗ ∗ ∗p < .01
Table 8: Is developer’s children specialization reducing costs of compliance? Overall USA China Hong-Kong OECD Non OECD No privacy law (1) (2) (3) (4) (5) (6) (7) Child Specialization 0.066 -0.045 1.937 -0.707 0.077 -0.370* 1.577** (0.115) (0.196) (2.684) (0.914) (0.123) (0.203) (0.668) DfF -0.360*** -0.508*** -1.136 1.137*** -0.328*** -0.271*** -0.411* (0.076) (0.130) (1.054) (0.387) (0.072) (0.099) (0.250) DfF × Child Specialization -0.509*** -0.558** -0.488 0.586 -0.617*** 0.417** -1.139* (0.134) (0.221) (2.797) (0.927) (0.137) (0.211) (0.689) high download -0.130* -0.144 1.389* 0.642* -0.179*** -0.063 0.371 (0.076) (0.108) (0.714) (0.356) (0.067) (0.097) (0.253) Constant 0.117 0.700** -11.574*** -7.230*** 0.261 -0.507 1.854 (0.211) (0.325) (4.006) (2.792) (0.205) (0.344) (1.775) App characteristics Yes Yes Yes Yes Yes Yes Yes Developer characteristics Yes Yes Yes Yes Yes Yes Yes Week fixed effect Yes Yes Yes Yes Yes Yes Yes Observations 92746 20863 1876 2985 52816 21040 2484 Log-likelihood -7.021e+09 -11686.934 -588.358 -1264.556 -31147.419 -13375.824 -1446.375 Notes: Probit estimation. Sensitive Data is the dependent variable. App characteristics include all donwload categories. Robust standard errors clustered at app level reported in parentheses. Weighted by the number of developers per country. Significance levels: ∗p < .10, ∗ ∗ p < .05, ∗ ∗ ∗p < .01 34
5.2 Do big developers collect more data? We explore the sizes of developers to investigate whether different sizes are associated with a different probability of collecting sensitive data. To do this, we introduce the number of the developer’s apps available from the Google Play using the variable Number of apps by developers to our regression. This measure allows us to determine whether the behaviors of large professional developers and small developers differ. Table 9 presents the results. Column (1) includes the interaction term between Number of apps by developer and the regulation variables. Column (2) presents the full specification including the interaction term between DfF × Number of apps by developer and a vector of the regulation measures. For developers located in countries with strong regulation, subscription to the DfF program is unlikely to affect the probability of the apps collecting sensitive data. In countries with weak privacy legislation, the size of the developer has an effect. Large developers including those in countries with weak privacy regulation seem better able to bear the costs of regulation. This result is confirmed in the case of big developers originating in China and is in line with the theoretical findings in Campbell et al. (2015) that privacy regulation imposes costs on all firms but that small firms are less likely to be able to bear these costs. 35
You can also read