The TV is Smart and Full of Trackers: Measuring Smart TV Advertising and Tracking - Privacy Enhancing ...
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Proceedings on Privacy Enhancing Technologies ; 2020 (2):129–154 Janus Varmarken† *, Hieu Le† , Anastasia Shuba, Athina Markopoulou, and Zubair Shafiq The TV is Smart and Full of Trackers: Measuring Smart TV Advertising and Tracking Abstract: In this paper, we present a large-scale mea- surement study of the smart TV advertising and track- 1 Introduction ing ecosystem. First, we illuminate the network behav- Smart TV adoption has steadily grown over the last few ior of smart TVs as used in the wild by analyzing net- years, with more than 37% of US households owning at work traffic collected from residential gateways. We find least one smart TV in 2018, a 16% increase over the that smart TVs connect to well-known and platform- previous year [1]. This growth is driven by two trends. specific advertising and tracking services (ATSes). Sec- First, over-the top (OTT) video streaming services such ond, we design and implement software tools that sys- as Hulu and Netflix have become popular, with more tematically explore and collect traffic from the top-1000 than 28 million and 60 million subscribers in the US, re- apps on two popular smart TV platforms, Roku and spectively [2]. Second, smart TV solutions are available Amazon Fire TV. We discover that a subset of apps at relatively affordable prices, with many of the external communicate with a large number of ATSes, and that smart TV boxes/sticks priced less than $50, while built- some ATS organizations only appear on certain plat- in smart TVs now cost only a few hundreds dollars [3]. forms, showing a possible segmentation of the smart As a result, a diverse set of smart TV platforms have TV ATS ecosystem across platforms. Third, we evaluate emerged, and have been integrated into various smart the (in)effectiveness of DNS-based blocklists in prevent- TV products. For example, Apple TV integrates tvOS, ing smart TVs from accessing ATSes. We highlight that and TCL and Sharp TVs integrate Roku. even smart TV-specific blocklists suffer from missed ads Many of these platforms have their own respective and incur functionality breakage. Finally, we examine app store, where the vast majority of smart TV apps our Roku and Fire TV datasets for exposure of person- are ad-supported [4]. OTT advertising, which includes ally identifiable information (PII) and find that hun- smart TV, is expected to increase by 40% to $2 billion in dreds of apps exfiltrate PII to third parties and plat- 2018 [5]. Roku and Fire TV are two of the leading smart form domains. We also find evidence that some apps TV platforms in number of ad requests [6]. Despite their send the advertising ID alongside static PII values, ef- increasing popularity, the advertising and tracking ser- fectively eliminating the user’s ability to opt out of ad vices (“ATSes”) on smart TVs are currently not well personalization. understood by users, researchers, and regulators. In this Keywords: Smart TV; privacy; tracking; advertising; paper, we present one of the first large-scale measure- blocklists ment studies of the emerging smart TV advertising and DOI 10.2478/popets-2020-0021 tracking ecosystem. Received 2019-08-31; revised 2019-12-15; accepted 2019-12-16. In the Wild Measurements (§3). First, we analyze the network traffic of smart TV devices in the wild. We instrument residential gateways of 41 homes and collect flow-level summary logs of the network traffic gener- *Corresponding Author: Janus Varmarken† : University ated by 57 smart TVs from seven different platforms. of California, Irvine, E-mail: jvarmark@uci.edu The comparative analysis of network traffic by differ- Hieu Le† : University of California, Irvine, E-mail: ent smart TV platforms uncovers similarities and differ- hieul@uci.edu. († The first two authors made equal contribu- ences in their characteristics. As expected, we find that tions and share the first authorship). Anastasia Shuba: Broadcom Inc. (The author was a student a substantial fraction of the traffic is related to popular at the University of California, Irvine at the time the work was video streaming services such as Netflix and Hulu. More conducted), E-mail: ashuba@uci.edu importantly, we find generic as well as platform-specific Athina Markopoulou: University of California, Irvine, E- ATSes. Although realistic, the in the wild dataset does mail: athina@uci.edu not provide app-level visibility, i.e., we cannot deter- Zubair Shafiq: University of Iowa, E-mail: zubair- shafiq@uiowa.edu mine which apps generate traffic to ATSes. To address this limitation, we undertake the following major effort.
The TV is Smart and Full of Trackers 130 Controlled Testbed Measurements (§4). We de- by multiple apps (“app prevalence”) and keyword search sign and implement two software tools, Rokustic for (based on ATS related words like “ads” and “measure”). Roku and Firetastic for Amazon Fire TV, which sys- PII Exposures (§6). We further examine the net- tematically explore apps and collect their network traf- work traces of our testbed datasets and find that hun- fic. We use Rokustic and Firetastic to exercise the top- dreds of apps exfiltrate personally identifiable informa- 1000 apps of their respective platforms, and refer to the tion (PII) to third parties and platform-specific parties, collected network traffic as our testbed datasets. We an- mostly for non-functional advertising and tracking pur- alyze the testbed datasets w.r.t. the top Internet des- poses. Alarmingly, we find that many apps send the ad- tinations the apps contact, at the granularity of fully vertising ID alongside static PII values such as the de- qualified domain names (FQDNs), effective second level vice’s serial number. This eliminates the user’s ability domains (eSLDs), and organizations. We use “domain”, to opt out of personalized advertisements by resetting “endpoint”, and “destination” interchangeably in place the advertising ID, since the ATS can simply link an old of FQDN and eSLD when the distinction is clear from advertising ID to its new value by joining on the serial the context. We further separate destinations as first, number. We evaluate the blocklists’ ability to prevent third, and platform-specific party, w.r.t. to the app that exposures of PII and find that they generally perform contacts them. well for Roku, but struggle to prevent exfiltration of the First, we find that the majority of apps contact few device’s serial number and the device ID to third parties ATSes, while about 10% of the apps contact a large and the platform-specific party for Fire TV. number of ATSes. Interestingly, many of these more Contributions. In this paper, we analyze the network concerning apps come from a small set of developers. behavior of smart TVs, both in the wild and in the lab. Second, we find what appears to be a segmentation of Our contributions include the following: (1) providing the smart TV ATS ecosystem across Roku and Fire TV an in-depth comparative analysis of the ATS ecosystems as (1) the two datasets have little overlap in terms of of Roku, Fire TV, and Android; (2) illuminating the ATS domains; (2) some third party ATSes are among key players within the Roku and Fire TV ATS ecosys- the key players on one platform, but completely absent tems by mapping domains to eSLDs and parent orga- on the other; and (3) apps that are present on both plat- nizations; (3) evaluating the effectiveness and adverse forms have little overlap in terms of the domains they effects of an extensive set of blocklists, including smart contact. Third, we compare the top third party ATS TV specific blocklists; (4) instrumenting long experi- domains of the testbed datasets to those of Android. ments per app to uncover approximately twice as many Evaluation of DNS-Based Blocklists (§5). Users domains as [11]; and (5) making our tools, Rokustic and typically rely on DNS-based blocking solutions such as Firetastic, and our testbed datasets available [12]. Pi-hole [7] to prevent in-home devices such as smart Outline. The structure of the rest of the paper is as fol- TVs from accessing ATSes. Thus, we evaluate the effec- lows. Section 2 provides background on smart TVs and tiveness of DNS-based blocklists, selecting those that reviews related work. Section 3 presents the in the wild are most relevant to smart TVs. Specifically, we ex- measurement and analysis of 57 smart TVs from seven amine and test four popular blocklists: (1) Pi-hole De- different platforms in 41 homes. Section 4 presents our fault blocklist (PD) [7], (2) Firebog’s recommended ad- systematic testing approach and comparative analysis vertising and tracking lists (TF) [8], (3) Mother of all of the top-1000 Roku and Fire TV apps. Section 5 eval- Ad-Blocking (MoaAB) [9], and (4) StopAd’s smart TV uates four well-known DNS-based blocklists and shows specific blocklist (SATV) [10]. Our comparative analysis their limitations through analysis of missed ATSes and shows that block rates vary, with Firebog having the app breakage. Section 6 investigates exfiltration of PII. highest coverage across different platforms and StopAd Section 7 concludes the paper, discusses limitations, and blocking the least. We further investigate potential false outlines directions for future work. The appendix pro- negatives (FN) and false positives (FP). We discover vides additional details and results, as well as an evalu- that blocklists miss different ATSes (FN), some of which ation and discussion of the limitations of our approach. are missed by all blocklists, while more aggressive block- lists can suffer from false positives that result in break- ing app functionality. We discuss two ways to discover false negatives, through observing domains contacted
The TV is Smart and Full of Trackers 131 the main ways for smart TV platforms and app devel- 2 Background & Related Work opers to generate revenue [4]. Roku’s advertising rev- enue exceeded $250 million in 2018 and is expected to Background on Smart TVs. A smart TV is es- more than double by 2020 [13]. Both Roku and Fire TV sentially an Internet-connected TV that has apps for take a 30% cut of the advertising revenue from apps on users to stream content, play games, and even browse their platforms [14]. The smart TV advertising ecosys- the web. There are two types of smart TV products tem mirrors many aspects of the web advertising ecosys- in the market: (1) built-in smart TVs, and (2) exter- tem. Most importantly, smart TV advertising uses pro- nal smart TV boxes/sticks, also referred to as over-the- grammatic mechanisms that allow apps to sell their ad top (OTT) streaming devices. Some TV manufacturers, inventory in an automated fashion using behavioral tar- such as Samsung and Sony, offer TVs with built-in smart geting [15, 16]. TV functionality. While several others provide external The rapidly growing smart TV advertising and asso- box/stick solutions, such as Roku (by Roku), Fire TV ciated tracking ecosystem has already warranted privacy (by Amazon), and Apple TV (by Apple), that convert a and security investigations into different smart TV plat- regular TV into a smart TV. In addition, hybrid prod- forms. Consumer Reports examined privacy policies of ucts exist, as some TV manufacturers now integrate ex- various smart TV platforms including Roku, LG, Sony, ternal box/stick solutions directly into their smart TVs. and Vizio [17]. They found that privacy policies are of- For example, TCL and Sharp offer smart TVs that inte- ten challenging to understand and it is difficult for users grate Roku TV, while Insignia and Toshiba offer smart to opt out of different types of tracking. For instance, TVs that integrate Fire TV. We use “smart TV” as an many smart TVs use Automatic Content Recognition umbrella term for TVs with built-in smart TV function- (ACR) to track their users’ viewing data and use it to ality and OTT streaming devices. serve targeted ads [18]. Vizio paid $2.2 million to settle There is a diverse set of smart TV platforms, each the charges by the Federal Trade Commission (FTC) with its own set of apps that users can install on their that they were using ACR to track users’ viewing data TVs. Many smart TVs use an Android-based operating without their knowledge or consent [19]. While smart system (e.g., Sony, AirTV, Philips) or a modified version TV platforms now allow users to opt out of such track- of it (e.g., Fire TV). Regular Android TVs have access ing, it is not straightforward for users to turn it off [20]. to apps from the Google Play Store, while Fire TV has Further, even with ACR turned off, users still must its own app store controlled by Amazon. In both cases, agree to a basic privacy policy that asks for the right to applications for such TVs are built in a manner simi- collect data about users’ location, choice of apps, etc. lar to regular Android applications. Likewise, Apple TV apps are built using technologies and frameworks that Related Work. While the desktop [21–23] and mo- are also available for iOS apps, and both types of apps bile [24–26] ATS ecosystems have been thoroughly stud- can be downloaded from Apple’s App Store. ied, the smart TV ATS ecosystem has not been exam- Some smart TV platforms are distinct as compared ined at scale until recently. to traditional Android or iOS. For example, Samsung Three concurrent papers studied the network be- smart TV apps are built for their own custom platform havior and privacy implications of smart TVs [11, 27, called Tizen and are downloadable from the Tizen app 28]. Ren et al. [27] studied a large set of IoT de- store. Likewise, applications for the Roku platform are vices, spanning multiple device categories. Their re- built using a customized language called BrightScript, sults showed that smart TVs were the category of de- and are accessible via the Roku Channel Store. Yet an- vices that contacted the largest number of third parties, other line of smart TVs such as LG smart TV and Hy- which further motivates our in-depth study of the smart brid broadcast broadband TV (HbbTV) follow a web- TV ATS ecosystem. Huang et al. [28] used crowdsourc- based ecosystem where applications are developed using ing to collect network traffic for IoT devices in the wild HTML, CSS, and JavaScript. Finally, some smart TV and showed that smart TVs contact many trackers by platforms, such as Chromecast, do not have app stores matching the contacted domains against the Disconnect of their own, but are only meant to “cast” content from blocklist. Finally, Moghaddam et al. [11] also instru- other devices such as smartphones. mented the Roku and Fire TV platforms to map the As with mobile apps, smart TV apps can integrate ATS endpoints and the exposure of PII. Our work in- third-party libraries and services, often for advertising dependently confirms the findings of these works w.r.t. and tracking purposes. Serving advertisements is one of the smart TV ATS ecosystem, both by analyzing seven
The TV is Smart and Full of Trackers 132 different smart TV platforms in the wild and by per- example, hulu.com belongs to the Walt Disney Com- forming systematic tests of two platforms (Roku and pany and youtube.com belongs to Alphabet. Fire TV) in the lab. In addition, we further contribute App-Level Party Categorization. The app-level vis- along two fronts. First, we show that even the same app ibility in our testbed experiments (Sec. 4) enables cate- across different smart TV platforms contact different gorization of an Internet destination as a first party or a ATSes, which shows the fragmentation of the smart TV third party w.r.t. app generating the traffic. We provide ATS ecosystem. Second, we evaluate the effectiveness of an overview of the technique here and defer details to different sets of blocklists, including smart TV specific Appendix C.1. blocklists, in terms of their ability to prevent ads and We adopt a technique similar to prior work [24], and their adverse effects on app functionality. We also sug- we augment it to also include a platform-specific party gest ways to aid blocklist curation through analysis of for traffic to platform-related destinations. We match domain usage across apps and PII exposures. tokenized eSLDs with tokenized package/app names Earlier work in this space includes [29] by Ghiglieri and developer names. If the tokens match, we label the and Tews, who studied how broadcasting stations could domain as first party. Otherwise, if the traffic originated track viewing behavior of users in the HbbTV plat- from platform activity rather than app activity, we la- form. In contrast to the rich app-based platforms we bel it as platform-specific party: for Fire TV, AntMoni- study, the HbbTV platform studied in [29] contained tor [37] labels connections with the responsible process; only one HbbTV app that uses HTML5-based overlays for Roku, we check if the eSLD contains “roku”. Other- to provide interactive content. Related to our work, they wise, if the domain is contacted by at least two different found that the HbbTV app loaded third-party tracking apps from different developers, we label it as third party. scripts from Google Analytics. Malkin et al. [30] sur- Lastly, we resort to labeling it as other to capture do- veyed 591 U.S. Internet users about their expectations mains that are only contacted by a single app. on data collection by smart TVs. They found that users would rather enjoy new technology than worry about privacy, and users thus over rely on existing laws and regulations to protect their data. 3 Smart TV Traffic in the Wild In this section, we study the network behavior of smart 2.1 Labeling Methodology TV devices when used by real users by analyzing a dataset collected at residential gateways of tens of Throughout this paper, we provide insight into the homes (we refer to this dataset as the in the wild smart TV ATS ecosystems by labeling a domain ac- dataset). We compare the number of flows and traffic cording to (1) its purpose (ATS or non-ATS); (2) its volumes generated by smart TVs from seven different parent organization (i.e., the domain owner); and (3) its platforms. We analyze the most frequently used domains relation to the app that uses it (first, third, or platform- of each platform by identifying ATS domains and map- specific party). We detail this methodology below. ping each domain to its parent organization. ATS Domains. We identify ATS domains as follows. Data Collection. To study smart TV traffic character- For figures that denote top domains, we check if the istics in the wild, we monitor network traffic of 41 homes FQDN is labeled as ads or tracking by VirusTotal, in a major metropolitan area in the United States. We McAfee, OpenDNS [31–33], or if it is blocked by any sniff network traffic of smart TV devices at the resi- of the blocklists considered in Sec. 5. For figures and dential gateways using off-the-shelf OpenWRT-capable tables that involve entire datasets, we only consider the commodity routers. We collect flow-level summary in- blocklists due to the impracticality of manually labeling formation for network traffic. For each flow, we collect thousands of data points. its start time, FQDN of the external endpoint (using DNS), and the internal device identifier. We identify Parent Organizations. To understand the presence of smart TVs using heuristics that rely on DNS, DHCP, different organizations on smart TV platforms, we map and SSDP traffic and also manually verify the identi- each FQDN to its effective second level domain (eSLD) fied smart TVs by contacting users. Our data collection using Mozilla’s Public Suffix List [34, 35], and use covers a total of 57 smart TVs across 41 homes over Crunchbase [36]’s acquisition and sub-organization in- the duration of approximately 3 weeks in 2018. Note formation to find the parent company of the eSLD. For that we obtained written consent from users, informing
The TV is Smart and Full of Trackers 133 Number of Flows Number of Flows vortex.hulu.com insights-collector.newrelic.com api-global.netflix.com api-global.netflix.com cws-us-east.conviva.com vortex.hulu.com api-global.[1].netflix.com home.hulu.com time-ios.apple.com aeg-personalization.quickplay.com e673.e9.akamaiedge.net uwp-aeg-hbs.quickplay.com play.hulu.com http-v-darwin.hulustream.com home.hulu.com occ-1-1736-999.1.nflxso.net http-v-darwin.hulustream.com d2lkq7nlcrdi7q.cloudfront.net occ-1-1736-999.1.nflxso.net cws-us-east.conviva.com time-ios.g.aaplimg.com cws-110*57.[2].amazonaws.com 1-courier.push.apple.com init-p01st.push.apple.com http-e-darwin.hulustream.com giga.logs.roku.com e1042.b.akamaiedge.net liberty.logs.roku.com occ-0-1736-999.1.nflxso.net occ-0-1736-999.1.nflxso.net itunes.apple.com.edgekey.net ichnaea.netflix.com occ-2-1736-999.1.nflxso.net dtvn-live-sponsored.akamaized.net pt.hulu.com midland.logs.roku.com cws-110*57.[2].amazonaws.com cws-eu-west-1.conviva.com cws-189*47.[2].amazonaws.com cdn-0.nflximg.com d2hzeyj6b557bu.cloudfront.net http-e-darwin.hulustream.com http-v-[3].footprint.net occ-2-1736-999.1.nflxso.net t2.hulu.com atv-ext.amazon.com oca-api.geo.netflix.com init-p01st.push.apple.com b.scorecardresearch.com xp.itunes-apple.com.akadns.net cws.conviva.com auth.hulu.com stream.nbcsports.com e1042.e12.akamaiedge.net tp.akam.nflximg.com mt-ingestion-[4].akadns.net scribe.logs.roku.com nrdp.nccp.netflix.com pubads.g.doubleclick.net a1910.b.akamai.net 0 10k 20k 30k 0 10k 20k 30k (a) Roku (b) Apple Number of Flows api-global.netflix.com mobile-collector.newrelic.com feed.theplatform.com Number of Flows hh.prod.sony[6].com log-ingestion.samsungacr.com android.clients.google.com www.youtube.com www.fox.com api-global.netflix.com flingo.tv lcprd1.samsungcloudsolution.net ichnaea.netflix.com android.clients.google.com pubads.g.doubleclick.net log-2.samsungacr.com clients3.google.com i.ytimg.com www.lookingglass.rocks vortex.hulu.com i.ytimg.com occ-1-1736-999.1.nflxso.net clients4.google.com youtube-ui.l.google.com api.twitter.com play.googleapis.com clients1.google.com artist.api.lv3.cdn.hbo.com ypu.samsungelectronics.com js-agent.newrelic.com i9.ytimg.com www.googleapis.com occ-0-1736-999.1.nflxso.net mtalk.google.com nrdp.nccp.netflix.com comet.api.hbo.com ytimg.l.google.com profile.localytics.com s.youtube.com occ-0-586-590.1.nflxso.net clients4.google.com video-stats.l.google.com connectivitycheck.gstatic.com occ-2-1736-999.1.nflxso.net occ-1-586-590.1.nflxso.net osb.samsungqbe.com assets.fox.com t2.hulu.com sp.auth.adobe.com ichnaea.netflix.com occ-2-586-590.1.nflxso.net http-v-darwin.hulustream.com api.meta.ndmdhs.com upu.samsungelectronics.com api.fox.com tv.deezer.com youtubei.googleapis.com dpu.samsungelectronics.com ocfconnect-[7].samsungiotcloud.com cdn.meta.ndmdhs.com googleads.g.doubleclick.net 0 2k 4k 0 50k 100k (c) Sony (d) Samsung Fig. 1. Top-30 fully qualified domain names in terms of number of flows per device for a subset of the smart TVs in the “in the wild” dataset. See Appendix C.2 for the other brands. Domains identified as ATS are highlighted with red, dashed bars. them of our data collection and research objectives, in external smart TV devices. Thus, we do not differentiate accordance with our institution’s IRB guidelines. between built-in vs. external Roku smart TV devices. Dataset Statistics. Table 1 lists basic statistics of We expect smart TV devices to generate significant smart TV devices observed in our dataset. Overall, we traffic because they are typically used for OTT video note 57 smart TVs from 7 different vendors/platforms streaming [38]. Chromecast devices generate the highest using a variety of technologies. number of flows (exceeding 200 thousand flows) on aver- Devices can be built-in smart TVs such as Samsung age, while Samsung, Apple, and Roku devices generate and Sony, others like Chromecast and Apple TV can nearly 50 thousand flows on average. Roku devices gen- be external stick/box solutions, while devices like Roku erate the highest volume of flows (exceeding 80 GB) on can have both forms. For example, 7 out 9 Roku devices average, with one Roku generating as much as 283 GB. in our dataset were built-in Roku smart TVs, while the Except for LG and Sony devices, all smart TV devices remaining two were external Roku sticks. Note that a generate at least tens of GBs worth of traffic on aver- smart TV platform such as Roku supports the same age. Finally, we note that smart TV devices typically set of apps and a similar interface for both built-in and connect to hundreds of different endpoints on average.
The TV is Smart and Full of Trackers 134 Apple Smart Device Average Average Average NBCUniversal Media comScore TV Count Flow Flow eSLD AT&T Amazon Platform Count Volume Count AEG /Device /Device /Device Roku Conviva (x 1000) (GB) Apple TV Unknown/CDN Apple 16 49.3 46.6 536 Samsung 11 62.6 33.2 369 Walt Disney Company Chromecast 10 201.9 26.3 354 Roku Twitter Deezer Roku 9 48.1 83.0 543 Samsung Vizio 6 43.4 63.4 278 Samsung Netflix LG 4 10.9 0.9 1893 New Relic Sony 1 33.1 0.1 186 Vizio NFL Table 1. Traffic statistics of 57 smart TV devices observed across LG Vizio Verizon 41 homes (“in the wild” dataset). LG Kodi Sony Bravia Sony Localytics Flingo Comcast Endpoint Analysis. Fig. 1 plots the top-30 FQDNs in Chromecast Adobe Systems Time Warner Fox terms of flow count for Roku, Apple, Sony, and Samsung smart TV platforms. The plots for the remaining smart TV platforms are in Appendix C.2. Alphabet We note several similarities in the domains accessed by different smart TV devices. First, video streaming Fig. 2. Mapping of platforms measured in the wild to the par- services such as Netflix and Hulu are popular across the ent organizations of the endpoints they contact (for the top-30 board, as evident from domains such api-global.netflix. FQDNs of each platform). The width of an edge indicates the com and vortex.hulu.com. Second, cloud/CDN services number of distinct FQDNs within that organization that was ac- such as Akamai and AWS (Amazon) also appear for cessed by the platform. different smart TV platforms. Smart TVs likely connect zations such as Apple on the other end of the spectrum. to cloud/CDN services because popular video stream- Furthermore, it reveals the presence of organizations like ing services typically rely on third party CDNs [39, 40]. Conviva, comScore, and Localytics, whose main busi- Third, we note the prevalence of well-known adver- ness is in the advertising and tracking space. We note tising and tracking services (ATSes). For example, that Samsung, Deezer, Roku, LG, and Flingo have the *.scorecardresearch.com and *.newrelic.com are third majority of their domains labeled as ATSes while Net- party tracking services, and pubads.g.doubleclick.net is flix and Walt Disney Company have less than half of a third party advertising service. their domains labeled as ATSes. We notice several platform-specific differences in the domains accessed by different smart TV platforms. Takeaway & Limitations. Traffic analysis of differ- For example, giga.logs.roku.com (Roku), time-ios.apple. ent smart TV platforms in the wild highlights interest- com (Apple), hh.prod.sonyentertainmentnetwork.com ing similarities and differences. As expected, all devices (Sony), and log-ingestion.samsungacr.com (Samsung) generate traffic related to popular video streaming ser- are unique to different types of smart TVs. In addition, vices. In addition, they also access ATSes, both well- we notice platform-specific ATSes. For example, the fol- known and platform-specific. While our vantage point lowing advertising-related domains are not in the top-30 at the residential gateway provides a real-world view of (and therefore not pictured in Fig. 1), but are unique to the behavior of smart TV devices, it lacks granular in- different smart TV platforms: p.ads.roku.com (Roku), formation beyond flows (e.g., packet-level information) ads.samsungads.com (Samsung), and us.info.lgsmartad. and does not tie traffic to the app that generate it. An- com (LG). other limitation of this analysis is that the findings may be biased by the viewing habits of users in these 41 Organizational Analysis. Figure 2 illustrates the mix households. It is unclear how to normalize to provide of different parent organizations contacted by the seven a fair comparison of endpoints accessed by the differ- smart TV platforms in our dataset. It shows the preva- ent smart TV platforms. We address these limitations lence of Alphabet in smart TV platforms like Chrome- cast, Sony, and LG, while revealing competing organi-
The TV is Smart and Full of Trackers 135 next by systematically analyzing two popular smart TV libraries is still possible. For example, the Ooyala IQ platforms in a controlled testbed. SDK [46] provides various analytics services that can be integrated into a Roku app. Thus, such libraries can help ATSes learn the viewing habits of users by collect- ing data from multiple apps. In terms of permissions, 4 Systematic Testing of the Roku Roku only protects microphone access with a permis- and Fire TV Platforms sion and does not require any permission to access the advertising ID. Users can choose to reset this ID and opt In this section, we perform an in-depth, systematic out of targeted advertising at any time [45]. However, study of two smart TV platforms, Roku and Amazon apps and libraries can easily create other IDs or use fin- Fire TV, which we chose since they are popular [41], gerprinting techniques to continue tracking users even affordable ($25), and among the leading smart TV plat- after opt-out. We further elaborate on this in Sec 6. forms in terms of number of ad requests [6]. Sections 4.1 App Selection. The RCS provides a web (and on- and 4.2 present our measurement approach for system- device) interface for browsing the available Roku apps. atically testing the top-1000 apps in each platform while To the best of our knowledge, Roku does not pro- collecting their network traffic. Since app exploration is vide public documentation on how to programmatically automated, no real users are involved, thus no IRB is query the RCS. We therefore reverse-engineer the REST needed. Our measurement approach provides visibility API backing the RCS web interface by inspecting the into the behaviors of individual apps, which was not HTTP(S) requests sent while browsing the RCS, and possible from the vantage point used for the in the wild use this insight to write a script that crawls the RCS dataset. In Section 4.3, we analyze the two testbed for the metadata of all (8515 as of April 2019) apps. datasets, and compare them to each other and to the To test the most relevant apps, we select the Android ATS ecosystem. top-50 apps in 30 out of the 32 categories. We exclude “Themes” and “Screensavers” since these apps do not show up among the regular apps on the Roku and there- 4.1 Roku Data Collection fore cannot be operated using our automation software. We base our selection on the “star rating count”, which In this section, we describe the Roku platform and our we interpret as the review count. Roku apps can be app selection methodology, and present an overview of labeled with multiple categories, thus some apps con- Rokustic—our software tool that automatically explores tribute to the top-50 of multiple categories. This places Roku apps. We use Rokustic to explore and collect traf- the final count of apps in our dataset at 1044. fic from 1044 Roku apps. The resulting network traces are analyzed in Sec. 4.3. Automation (Rokustic). To scale testing of apps, we implement a software tool, Rokustic, that automatically Roku Platform. We start by describing the Roku plat- installs and exercises Roku apps. Due to lack of space, form, which has its own app store—the Roku Chan- we only provide a brief overview here and defer addi- nel Store [42] (RCS)—that offers more than 8500 apps, tional details to Appendix A.1. called “channels”. For security purposes, Roku sand- We run Rokustic on a Raspberry Pi that acts as a boxes each app (apps are not allowed to interact or ac- router and hosts a local wireless network that the Roku cess the data of other apps) and provides limited access is connected to. Rokustic utilizes ECP [47], a REST-like to system resources [43]. Furthermore, Roku apps can- API exposed by the Roku device, to control the Roku. not run in the background. Specifically, app scripts are Given a set of apps to exercise, Rokustic installs each only executed when the user selects a particular app, app by invoking the ECP endpoint that opens up the and when the user exits the app, the script is halted, on-device version of the RCS page for the app, and then and the system resumes control [44]. sends a virtual key press to click the “Add Channel” To display ads, apps typically rely on the Roku Ad- button. To exercise apps, Rokustic first invokes the ECP vertising Framework which is integrated into the Roku endpoint that returns the set of installed apps. For each SDK [45]. The framework allows developers to use ad app, Rokustic then (1) starts tcpdump on the Raspberry servers of their preference and updates automatically Pi’s wireless interface; (2) uses ECP to launch the app without requiring the developer to rebuild the app. Even and invoke a series of virtual key presses in an attempt though such a framework eliminates the need for third to incur content playback; (3) pauses for five minutes to party ATS libraries, the development and usage of such
The TV is Smart and Full of Trackers 136 let the content play; (4) exits the app and repeats from Number of Roku Fire TV Both step 2 an additional two times; (5) terminates tcpdump. Apps exercised 1044 1010 128 Since Roku apps cannot execute in the background Fully qualified domain names (FQDN) 2191 1734 578 FQDNs accessed by multiple apps 669 603 199 (see “Roku Platform”), all captured traffic will belong to URL paths 13899 240713 74 the exercised app and the Roku system. The total inter- Table 2. Summary of the Roku and Fire TV testbed datasets. action time with each app is approximately 16 minutes. The rightmost column summarizes the intersection between the We do not attempt to decrypt TLS traffic as we cannot two testbed datasets. For example, there are 128 apps that are install our own self-signed certificates on the Roku. present both in the Roku dataset and the Fire TV dataset. 4.2 Fire TV Data Collection We rely on AntMonitor [37, 48], an open-source VPN-based library, to intercept all outgoing network In this section, we describe the Fire TV platform, our traffic from the Fire TV, and to label each packet with app selection methodology, and present an overview the package name of the application (or system pro- of Firetastic—our software tool that automatically ex- cess) that generated it. We enable AntMonitor’s TLS plores Fire TV apps. By using Firetastic to control six decryption for added visibility into PII exposures. We Fire TV devices in parallel, we explore and collect traf- analyze the success of TLS decryption in Appendix B. fic from 1010 Fire TV apps within a one week period. In summary, TLS decryption was generally successful The resulting network traces are analyzed in Sec. 4.3. with 10% or fewer failures for 55% of all apps, and 20% Fire TV Platform. Although Fire TV is made by or fewer failures for 80% of all apps. Amazon, its underlying operating system, Fire OS, is For app exploration, we utilize DroidBot [49], a a modified version of Android. This allows apps for Python tool that dynamically maps the UI and sim- Fire TV to be developed in a similar fashion to An- ulates user inputs such as button presses using the An- droid apps. Therefore, all third-party libraries that are droid Debug Bridge (ADB). To increase the probability available for Android apps can also be integrated into of content playback, we configure DroidBot to utilize Fire TV apps. Similarly, application sandboxing and its breadth first search algorithm to explore each app. permissions in Fire TV are analogous to those of An- The intuition is that the main content is often made droid, and any permission requested by the app is inher- available from top-level UI elements. ited by all libraries that the app includes. This allows In summary, for each app, Firetastic: (1) starts third party libraries to track users across apps using AntMonitor; (2) explores the app for 15 minutes; (3) a variety of identifiers, such as Advertising ID, Serial stops AntMonitor; and (4) extracts the .pcapng files Number, and Device ID, etc. We further discuss track- that were generated during testing. We use Firetastic ing through PII exposure in Section 6. to explore apps on six Fire TV devices in parallel. Our test setup is resource-efficient and scalable, using only App Selection. To test the most relevant apps, we pick one computer to send commands to multiple Fire TVs. the top-1000 apps from Amazon’s curated list of “Top Featured” apps. We ignore some apps that use a local VPN (as they would conflict with AntMonitor), that could not be installed manually, and utility apps that 4.3 Comparing Roku and Fire TV can change the device settings (which would affect the test environment). As a result, we ignore around 200 In this section, we analyze the (ATS) domains accessed apps while including 1010 testable applications. Ama- by the apps in the Roku and Fire TV testbed datasets. zon’s app store offers around 4,000 free apps at the time We first provide an overview of the datasets, and analyze of writing, thus our dataset covers approximately 25%. how many (ATS) domains apps contact. We then look Automation (Firetastic). We design and implement closer at the eSLDs and third party ATS domains that a software tool, Firetastic, that integrates the capabil- are contacted by the most apps, including which parent ities of two open source tools for Android: an SDK for organizations they belong to. Furthermore, we compare network traffic collection and a tool for input automa- the top third party ATS domains to those of Android. tion. We provide a brief overview of Firetastic here, and Finally, we compare the domains accessed by apps that defer additional details to Appendix A.2. are present on both testbed platforms.
The TV is Smart and Full of Trackers 137 1.0 1.0 CDF of Tested Apps 0.8 CDF of Tested Apps 0.8 0.6 0.6 0.4 0.4 0.2 0.2 Roku Roku Fire TV Fire TV 0.0 0.0 0 20 40 60 80 100 0 20 40 60 80 Number of Distinct Domains Number of Distinct ATS Domains (a) Roku & Fire TV: Distinct domains per app. (b) Roku & Fire TV: Distinct ATS domains per app. A do- main is considered an ATS if it is labeled as so by any of the blocklists considered in Sec. 5. Number of Apps Number of Apps roku.com amazon.com doubleclick.net amazonaws.com googlesyndication.com amazon-adsystem.com google-analytics.com amazonvideo.com scorecardresearch.com media-amazon.com amazonaws.com doubleclick.net spotxchange.com amazon-dss.com ifood.tv cloudfront.net tremorhub.com crashlytics.com irchan.com googleapis.com vimeo.com amazonalexa.com 2mdn.net google-analytics.com 1rx.io googlesyndication.com ravm.tv ssl-images-amazon.com vimeocdn.com facebook.com imrworldwide.com google.com adrta.com flurry.com akamaized.net scorecardresearch.com innovid.com ifood.tv insightexpressai.com gstatic.com demdex.net youtube.com gvt1.com uplynk.com adsrvr.org ytimg.com akamaihd.net serving-sys.com monarchads.com unity3d.com agkn.com Platform Party akamaihd.net Platform Party cloudfront.net googlevideo.com springserve.com Third Party moatads.com Third Party stickyadstv.com First Party adobe.com First Party ewscloud.com titantv.com 0 250 500 750 1000 0 500 1000 (c) Roku: Top-30 eSLDs. (d) Fire TV: Top-30 eSLDs. Number of Apps Number of Apps pubads.g.doubleclick.net e.crashlytics.com tpc.googlesyndication.com pubads.g.doubleclick.net www.google-analytics.com ssl.google-analytics.com search.spotxchange.com pagead2.googlesyndication.com securepubads.g.doubleclick.net googleads.g.doubleclick.net ad.doubleclick.net imasdk.googleapis.com event.spotxchange.com settings.crashlytics.com b.scorecardresearch.com data.flurry.com sb.scorecardresearch.com graph.facebook.com googleads4.g.doubleclick.net b.scorecardresearch.com ade.googlesyndication.com reports.crashlytics.com gcdn.2mdn.net csi.gstatic.com adrta.com www.google-analytics.com PD[0].ads.tremorhub.com config.uca.cloud.unity3d.com secure.insightexpressai.com applab-sdk.amazon.com events.tremorhub.com ad.doubleclick.net dpm.demdex.net z.moatads.com cm.g.doubleclick.net ade.googlesyndication.com insight.adsrvr.org cdp.cloud.unity3d.com ctv.monarchads.com dpm.demdex.net 0 250 500 0 50 100 (e) Roku: Top-20 third party ATS domains. (f) Fire TV: Top-20 third party ATS domains. Fig. 3. Analysis of domain usage per app and the top domains across all apps in the Roku and Fire TV testbed datasets.
The TV is Smart and Full of Trackers 138 Oracle domains contacted per app in Fig. 3b, the vast majority Amazon Unity Tech (80%) of apps from the two platforms behave similarly. Numitas On the positive side, for both platforms, around Facebook Adobe Systems 60% of the apps contact only a small handful ATS comScore domains. Yet, about 10% of the Roku and Fire TV Fire TV The Trade Desk apps contact 20+ and 10+ ATS domains, respectively. Barons Media PIxalate These concerning apps come from a small set of devel- Bain Capital opers. For instance, for Roku, “Future Today Inc.” [50], Telaria “8ctave ITV”, and “StuffWeLike” [51] are responsible RTL Group for 51%, 13%, and 11%, respectively, of these apps. Roku On the Fire TV side, “HTVMA Solutions, Inc.” [52] is responsible for 15% of the apps, and “Gray Televi- sion, Inc.” [53] is responsible for 12% of the apps. Alphabet Key Players. Figures 3c (Roku) and 3d (Fire TV) present the top-30 eSLDs in terms of the number of apps that contact a subdomain of the eSLD. We define an eSLD’s app penetration as the percentage of apps in the dataset that contact the eSLD. Fig. 4. Mapping of platforms measured in our testbed environ- Platform. The top eSLD of each platform has 100% ment to the parent organizations of the top-20 third-party ATS app penetration and belongs to the platform operator. domains their apps contact. The width of an edge indicates the While these eSLDs also cover subdomains that provide number of apps that contact each organization. functionality, we note that both platform operators are engaged in advertising and tracking, as shown in Fig. 9 Overview. The datasets collected using Rokustic and in Appendix C.3, which separates traffic to the eSLDs Firetastic are summarized in Table 2. For Roku, we dis- in Figs. 3c and 3d by ATS and non-ATS FQDNs. cover 2191 distinct FQDNs, 699 of which are contacted Third Parties. As evident from Figs. 3c and 3d, Al- by multiple apps. For Fire TV, we discover 1734 dis- phabet has a strong presence in the ATS space of both tinct FQDNs, 603 of which are contacted by multiple platforms, with *.doubleclick.net, an ad delivery end- apps. We also find 578 FQDNs that appear in both point, achieving 58% and 35% app penetration for Roku datasets, 199 of which are contacted by multiple apps. and Fire TV, respectively. Its analytic services also rank Our automation uncovers approximately twice as many high, with google-analytics.com in the top-20 for both FQDNs as [11], possibly due to longer experiments and platforms and crashlytics.com in the top-10 for Fire TV. different app exploration goals. We further detail this in To better understand additional key third party ATSes, Appendix A.3. we strip away the platform-specific endpoints and re- Leveraging the blocklists from Sec. 5, we identify port the top-20 third party ATS FQDNs for Roku and 314 ATS domains that are unique to the Roku dataset, Fire TV in Figs. 3e and 3f, respectively. We note that 285 that are unique to the Fire TV dataset, and an over- both platforms use distinct third party ATSes. For ex- lap of 227 between the two datasets. When considering ample, SpotX (*.spotxchange.com), which serves video eSLDs of the ATS domains, we find 68 eSLDs that are ads, is a significant player in the Roku ATS space with unique to the Roku dataset, 100 that are unique to the 17% app penetration, but only maintains 1% app pene- Fire TV dataset, and an overlap of 138 eSLDs. These tration for Fire TV. Even when considering the smaller numbers suggest that the ATS ecosystems of the two players, we see little overlap between the two platforms, platforms have substantial differences, which we ana- suggesting these players focus their efforts on a single lyze in further detail later in this section. platform. For example, Kantar Group’s insightexpres- sai.com analytics service has 7% app penetration on the Number of (ATS) Domains Contacted. Figure 3a Roku platform, but only 0.01% on the Fire TV platform. presents the empirical CDF of the number of distinct domains each app in the two testbed datasets contacts. Parent Organization Analysis. We further analyze Fire TV apps appear more “chatty”—most Fire TV the parent organizations of Roku and Fire TV third apps contact about twice as many domains as the Roku party ATS endpoints in Fig. 4, using the method de- apps. However, when we consider the number of ATS scribed earlier in Section 2.1. Interestingly, the set of top
The TV is Smart and Full of Trackers 139 120 Common FireTV 100 Number of Hostnames Roku 80 60 40 20 0 Mediterranean Food CNNgo VH1 Pluto TV P. Allen Smith Free Movies Now Relax My Dog Italian Recipes BET Philo Baking by ifood.tv Hallmark Channel Bloomberg Thai recipes Comedy Central Cool School Paramount Network MTV Lifetime Chinese Recipes A&E BestCooks iFood.tv AMC Stand-up Comedy Low Carb 360 CBS Sports Stream HISTORY Fun With Roblox by CraftSmart Telemundo Deportes Watch TBS Watch TNT The Holy Tales Bible BBC America Blippi Drinks Fox Business Network Pilates Red Bull TV Om Nom Stories Haystack TV TechSmart.tv MOOVIMEX NFL Sunday Ticket PlayStation Vue ABC News Smithsonian Channel Baby By HappyKids.tv Sling TV Now You Know JTV Live iHeartRadio Wow, I Never Knew That WSB-TV Channel 2 Outside TV Features UP Faith & Family The BILLIARD Channel FilmRise Kids NRA TV HappyKids Fig. 5. Top-60 common apps (apps present in both testbed datasets) ordered by the number of domains that each app contacts. Con- sidering all 128 common apps, there are 597 domains which are exclusive to Roku apps, 496 domains which are exclusive to Fire TV apps, and 155 domains which are contacted by both the Roku and the Fire TV versions of the same app. third party organizations is rather diverse, with only a (data.flurry.com) both have a strong presence on both slight overlap in the shape of Adobe Systems and com- Fire TV and Android. Some of the third party ATS ob- Score, possibly suggesting that the remaining organiza- served for Fire TV, which were not present for Android, tions focus their efforts on a single platform. Fire TV include comScore, Adobe (dpm.demdex.net), and Ama- shows gaming and social media ATSes from Unity Tech zon (applab-sdk.amazon.com). and Facebook, whereas Roku exhibits more traffic to Common Apps in Roku and Fire TV Next, we ATSes from companies that focus on video ads such as compare the Roku and Fire TV datasets at the app- The Trade Desk, Telaria, and RTL Group. Similar to the level by analyzing the FQDNs accessed by the set of in the wild organization analysis in Fig 2, we again note apps that appear on both platforms, referred to as com- that Alphabet dominates the set of third party ATSes mon apps. Recall from Table 2 that the datasets col- on both Roku and Fire TV. lected using Rokustic and Firetastic contain a total of Comparing to Android ATS Ecosystem. Next, we 128 common apps. We identified common apps by fuzzy compare the top-20 third party ATS endpoints in our matching app names since they sometimes vary slightly Roku and Fire TV datasets (Figs. 3e and 3f) with those for each platform (e.g., “TechSmart.tv” on Roku vs. reported for Android [24]. “TechSmart” on Fire TV). We further cross-referenced Roku vs. Android. The key third party ATSes with the developer’s name to validate that the apps were in Roku (Fig. 3e) differ from the Android platform. indeed the same (e.g., both TechSmart apps are created For example, SpotX (*.spotxchange.com) and comScore by “Future Today”). The 128 common apps contact a (*.scorecardresearch.com) both have a strong presence total of 1248 distinct FQDNs. Out of these, 597 FQDNs on Roku, but are not among the key players for An- are exclusively contacted by Roku apps, 496 are exclu- droid. In contrast, Facebook’s graph.facebook.com is sively contacted by Fire TV apps, and only 155 FQDNs the second most popular ATS domain on Android, are contacted by both Roku and Fire TV apps. but insignificant on Roku. The set of top third party Figure 5 reports the amount of overlapping and non- ATSes in Roku is also more diverse and includes overlapping FQDNs for the top-60 common apps (in smaller organizations such as Pixalate (adrta.com) terms of the number of distinct FQDNs that each app and Telaria (*.tremorhub.com). While Alphabet has a contacts). In general, the set of FQDNs contacted by strong foothold in both ATS ecosystems, it is less sig- both the Roku and the Fire TV versions of the same nificant for Roku (9 out 20 ATS FQDNs are Alphabet- app is much smaller than the set of platform-specific owned, vs. 16 out of 20 for Android). FQDNs. From inspecting the common FQDNs for some Fire TV vs. Android. In contrast to Roku, Fire TV of the apps in Fig. 5, we find that these generally include is more similar to Android: we see an overlap of 9 endpoints that serve content. For example, for Mediter- FQDNs, 7 of which are owned by Alphabet. This ranean Food, the only two common FQDNs are subdo- is expected, given that Fire TV is based off of An- mains of ifood.tv, which belong to the parent organiza- droid and thus natively supports the ATS services of tion behind the app. This makes intuitive sense as the Android. Facebook (graph.facebook.com) and Verizon same app presumably offers the same content on both
The TV is Smart and Full of Trackers 140 platforms and must therefore access the same servers to Block Rate (%) download said content. On the other hand, the platform- Platform # Domains PD TF MoaAB SATV specific domains contain obvious ATS endpoints such Dataset obtained “in the wild” Apple 3179 10% 13% 12% 5% as ads.yahoo.com and ads.stickyadstv.com for the Roku Samsumg 1765 14% 19% 15% 8% version of the app, and aax-us-east.amazon-adsystem. Chromecast 1576 9% 15% 15% 5% com and mobileanalytics.us-east-1.amazonaws.com for Roku 2312 15% 19% 18% 7% the Fire TV version of the app. In conclusion, our anal- Vizio 942 16% 18% 16% 11% ysis of common apps reveals (to our surprise) little over- LG 627 45% 54% 50% 27% lap in the ATS endpoints they access, which further sug- Sony 119 16% 24% 16% 7% Dataset obtained in our testbed gests that the smart TV ATS ecosystem is segmented Roku 2191 17% 22% 20% 9% across platforms. Fire TV 1734 22% 27% 22% 9% Takeaway. The ATS ecosystems of the Roku and Table 3. Block rates of the four blocklists when applied to the Fire TV platforms seem to differ substantially: (1) the domains in our datasets. full set of ATS domains contacted by apps in the two datasets have little overlap; (2) some organizations are key players on one platform, but almost absent on the Setup. We evaluate the following blocklists: other (e.g., SpotX has a significant presence on Roku, 1. Pi-hole Default (PD): We test blocklists included but is almost absent on Fire TV, whereas Facebook has in Pi-hole’s default configuration [56] to imitate the a reasonable foothold Fire TV, but almost zero presence experience of a typical Pi-hole user. This set has on Roku); and (3) apps present in both datasets have seven hosts files including Disconnect.me ads, Dis- little overlap in terms of the ATS domains they contact. connect.me tracking, hpHosts, CAMELEON, Mal- Finally, we find that the key third party ATS players on wareDomains, StevenBlack, and Zeustracker. PD Android have little overlap with Roku, but substantial contains a total of about 133K entries. overlap with Fire TV, which intuitively makes sense as 2. The Firebog (TF): We test nine advertising and Fire TV is built on top of Android. five tracking blocklists recommended by “The Big Blocklist Collection” [8], to emulate the experience of an advanced Pi-hole user. This includes: Discon- 5 Blocklists for Smart TVs nect.me ads, hpHosts, a dedicated blocklist target- ing smart TVs, and hosts versions of EasyList and In this section, we evaluate four well-known DNS-based EasyPrivacy. TF contains 162K entries total. blocklists’ ability to prevent smart TVs from accessing 3. Mother of all Ad-Blocking (MoaAB): We test ATSes and their adverse effects on app functionality. We this curated hosts file [9] that targets a wide-range further demonstrate how the datasets resulting from au- of unwanted services including advertising, track- tomated app exploration may aid in curating new can- ing, (cookies, page counters, web bugs), and mal- didate rules for blocklists. ware (phishing, spyware) to again imitate the expe- rience of an advanced Pi-hole user. MoaAB contains a total of about 255K entries. 5.1 Evaluating Popular DNS Blocklists 4. StopAd (SATV): We test a commercial smart TV focused blocklist by StopAd [10]. This list partic- DNS-based blocking solutions such as Pi-hole [7] are ularly targets Android based smart TV platforms used to prevent in-home devices, including smart TVs, such as Fire TV. We extract StopAd’s list by analyz- from accessing ATS domains [54]. To block advertis- ing its APK using Android Studio’s APK Analyzer ing and tracking traffic, they essentially “blackhole” [57]. SATV contains a total of about 3K entries. DNS requests to known ATS domains. Specifically, they match the domain name in the DNS request against a We applied these blocklists to both our in the wild and set of blocklists, which are essentially curated hosts files testbed datasets and we report the results next. that contain rules for well-known ATS domains. If the Block Rates. We start our analysis by comparing how domain name is found in one of the blocklists, it is typi- many FQDNs are blocked by the different blocklists. cally mapped to 0.0.0.0 or 127.0.0.1 to prevent outbound We define a blocklist’s block rate as the number of dis- traffic to that domain [55]. tinct FQDNs that are blocked by the list, over the total
The TV is Smart and Full of Trackers 141 App Name No List PD TF MoaAB SATV No No No No No No No No No No Ads Breakage Ads Breakage Ads Breakage Ads Breakage Ads Breakage Common Pluto TV 5 5 5 5 5 iFood.tv 5 5 Tubi — 6 YouTube 5 5 5 5 5 Roku CBS News Live 5 6 6 6 Top The Roku Channel 5 6 5 Sony Crackle 5 5 5 5 Common Random WatchFreeComedyFlix 5 5 5 Live Past 100 Well SmartWoman 5 Pluto TV 5 5 5 5 5 5 6 iFood.tv 5 — 6 — 6 Tubi Downloader Fire TV The CW for Fire TV 5 — 6 — 6 — 6 5 Top FoxNow 5 — 6 — 6 5 5 Watch TNT Random KCRA3 Sacramento 5 — 6 — 6 5 5 Watch the Weather Channel 5 5 Jackpot Pokers by PokerStars 5 Table 4. Missed ads and functionality breakage for different blocklists when employed during manual interaction with 10 Roku apps and 10 Fire TV apps. For “No Ads”, a checkmark ( ) indicates that no ads were shown during the experiment, a cross (5) indicates that some ad(s) appeared during the experiment, and a dash (—) indicates that breakage prevented interaction with the app alto- gether. For “No Breakage”, a checkmark ( ) indicates that the app functioned correctly, a cross (5) indicates minor breakage, and a bold cross (6) indicates major breakage. number of distinct FQDNs in the dataset. Table 3 com- acting with a sample of apps from our testbed datasets pares the block rates of the aforementioned blocklists on while coding for ads and app breakage. We sample 10 our in the wild and testbed datasets. Overall, we note Roku apps and 10 Fire TV apps, including the top- that TF, closely followed by MoaAB and PD, blocks the 4 free apps, three apps that are present on both plat- highest fraction of domains across all of the platforms forms, and an additional three randomly selected apps. in both the in the wild and testbed datasets. SATV is We test each app five times: once without any blocklist the distant last in terms of block rate. It is noteworthy and four times where we individually deploy each of the that TF blocks more domains than MoaAB despite being aforementioned blocklists. During each experiment, we about one-third shorter. We surmise this is because TF attempt to trigger ads by playing multiple videos and/or includes a smart TV focused hosts file, and thus catches live TV channels and fast-forwarding through video con- more relevant smart TV ATSes. This finding shows that tent. We take note of any functionality breakage (due to the size of a blocklist does not necessarily translate to false positives) and visually observable missed ads (due its coverage. to false negatives). We differentiate between minor and Blocklist Mistakes. Motivated by the differences in major functionality breakage as follows: minor breakage the block rates of the four blocklists, we next compare when the app’s main content remains available but the them in terms of false negatives (FN) and false posi- application suffers from minor user interface glitches or tives (FP). False negatives occur when a blocklist does occasional freezes; and major breakage when the app’s not block requests to ATSes and may result in (visually content becomes completely unavailable or the app fails observable) ads or (visually unobservable) PII exfiltra- to launch. tion. False positives occur when a blocklist blocks re- Missed Ads vs. Functionality Breakage. Table 4 quests that enable app functionality and may result in summarizes the results of our manual analysis for missed (visually observable) app breakage. ads and functionality breakage. Overall, we find that We first systematically quantify visually observable none of the blocklists are able to block ads from all of false negatives and false positives of blocklists by inter- the sampled apps while avoiding breakage. In particular,
You can also read