An Extensive Evaluation of the Internet's Open Proxies - arXiv
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Last updated: June 28, 2018 DRAFT – Please visit https://bit.ly/2ItDbPE for latest version An Extensive Evaluation of the Internet’s Open Proxies Akshaya Mani∗1 , Tavish Vaidya†1 , David Dworken2 , and Micah Sherr1 1 Georgetown University 2 Northeastern University Abstract been (mis)used for malicious purposes, including (but not lim- arXiv:1806.10258v1 [cs.CR] 27 Jun 2018 ited to) sending spam, injecting ads, and serving as stepping Open proxies forward traffic on behalf of any Internet user. stones for various attacks [38, 46]. But open proxies have also Listed on open proxy aggregator sites, they are often used to been used for far less nefarious reasons, including circumvent- bypass geographic region restrictions or circumvent censor- ing censorship efforts. More generally, they permit accessing ship. Open proxies sometimes also provide a weak form of otherwise inaccessible information and, controversially, can be anonymity by concealing the requestor’s IP address. used to bypass regional content filters (e.g., to watch a sporting To better understand their behavior and performance, we event that, due to licensing restrictions, would otherwise not conducted a comprehensive study of open proxies, encompass- be viewable). Importantly, along with VPNs, proxies have been ing more than 107,000 listed open proxies and 13M proxy re- suggested as a means of enhancing Internet privacy and pro- quests over a 50 day period. While previous studies have tecting browsing history in the face of the recent expansion of focused on malicious open proxies’ manipulation of HTML data collection rights by U.S. Internet service providers [30]. content to insert/modify ads, we provide a more broad study An interesting related question, and one that unfortunately that examines the availability, success rates, diversity, and also has not undergone rigorous study, is why the owners of open (mis)behavior of proxies. proxy servers run such services. In some instances, the proxies Our results show that listed open proxies suffer poor are due to misconfiguration or even compromise. There is a availability—more than 92% of open proxies that appear on ag- rich history of accidental open proxies, particularly for open gregator sites are unresponsive to proxy requests. Much more proxies that operate as SMTP relays [43]. Still, many proxies troubling, we find numerous examples of malicious open prox- are operated by choice. Although the rationales here are not ies in which HTML content is manipulated to mine cryptocur- well-understood, we posit that some operators may run proxies rency (that is, cryptojacking). We additionally detect TLS man- as a political statement about privacy and the freedom to access in-the-middle (MitM) attacks, and discover numerous instances information. in which binaries fetched through proxies were modified to in- A more menacing motivation for running a proxy is that it clude remote access trojans and other forms of malware. As provides its operator with an expanded view of network traffic. a point of comparison, we conduct and discuss a similar mea- Proxy operators are well situated to eavesdrop on communica- surement study of the behavior of Tor exit relays. We find no tion, perform man-in-the-middle (MitM) attacks, and monitize instances in which Tor relays performed TLS MitM or manipu- their service by injecting spurious ads [46]. We note that while lated content, suggesting that Tor offers a far more reliable and it is relatively difficult to operate a backbone Internet router safe form of proxied communication. and monitor traffic, it is not difficult to act as an open proxy and increase one’s view of others’ communication. 1 Introduction We are of course not the first to suggest that open proxies should be treated with a healthy supply of skepticism, due in Open proxy servers are unrestricted proxies that allow access large part to the ease in which proxies can be instantiated and from any Internet user. Open proxies are fairly prevalent on the configured to eavesdrop and manipulate communication. Most Internet, and several websites are devoted to maintaining large noteworthy is the recent study of open proxies by Tsirantonakis lists of available open proxies. In contrast to VPNs [29] and et al. [46]. There, the authors explore the extent to which open some (but not all) anonymity systems, open proxies are gener- proxies modify proxied HTML pages to inject ads, collect user ally easy for users to use, requiring only minimal configuration information, and redirect users to malware-containing pages. changes (e.g., adjusting “network settings”) rather than the in- Similar to Tsirantonakis et al.’s study, we also consider the mod- stallation of new software. ification of retrieved webpages. This paper compliments their As surveyed in this paper, there is a variety of types of open work by conducting a large and more broad study of the Inter- proxies, with differing capabilities. This leads to variation in net’s open proxies. how proxies are used in practice. Clearly, open proxies have We perform an extensive study of the open proxy ecosys- ∗ Co-first authors. tem, consisting of more than 107,000 publicly listed open prox- † Co-first authors. ies. We find that the vast majority (92%) of open proxies that 1
Last updated: June 28, 2018 DRAFT – Please visit https://bit.ly/2ItDbPE for latest version are publicly listed on aggregator websites are either unavailable whose local firewall policies prohibit non-HTTP traffic (e.g., as or otherwise do not allow proxy traffic. Surprisingly, the open is sometimes the case on corporate networks).1 proxies that do allow proxy traffic have very little geographic or We distinguish between two types of proxies that rely on network diversity: five countries account for nearly 60% of the HTTP. The first, which we refer to simply as HTTP proxies, Internet’s working open proxies, and 41% of such proxies are allows clients to specify a fully-qualified URL—with any domain hosted by just ten autonomous systems (ASes). We also evalu- name—as part of a HTTP GET request to the proxy. The proxy ate the performance of the proxies, and find a wide distribution parses the GET URL and if the requested URL is on a different of effective goodputs. host, the proxy makes its own request to fetch the URL and for- To understand the security implications of relaying traffic wards the response back to the client. Conceptually, a HTTP through open proxies, we present a series of experiments de- proxy is a web server that serves pages that are hosted else- signed to understand a broad range of proxy behaviors. We where. compare the results of traffic received via direct communica- A significant disadvantage of HTTP proxies is that it is in- tion with traffic that traverses through proxies. While our re- compatible with e2e security protocols such as TLS, since it is sults indicate that the majority of open proxies seemingly op- the HTTP proxy (not the user’s browser) that initiates the con- erate correctly (that is, they forward the traffic unimpeded), we nection to the proxied URL. The inability to support fetching find a surprisingly large number of instances of misbehavior. HTTPS URLs increasingly limits the use of HTTP proxies due In particular, as we discuss in more detail in what follows, we to the growing adoption rate of HTTPS [28] on the web. discover many instances in which proxies manipulate HTML In contrast, CONNECT proxies use the HTTP CONNECT content, not only to insert ads, but also to mine cryptocurrency method [32] to establish e2e tunnels between the client (i.e., the (i.e., cryptojacking). Our study also discloses numerous attacks browser) and the destination webserver. CONNECT proxies al- in which proxies return trojan Windows executables and re- low the client to specify a host and port. The proxy then initi- mote access trojans (RATs), and conduct TLS MitM attacks. ates a TCP connection to the specified target and then transpar- As a point of comparison, we compare the level of manip- ently forwards all future TCP data from the client to the des- ulation observed using open proxies to that when content is tination, and vice versa. Importantly, CONNECT proxies for- fetched over Tor [23]. Over the course of our nearly month- ward traffic at the TCP level (layer 4) and are not limited to any long experiment in which we fetched files from every available particular application-layer protocol. CONNECT proxies sup- Tor exit relay, we found no instances in which the requested port TLS/HTTPS connections since the browser can effectively content was manipulated. Our results suggest that Tor offers a communicate unencumbered to the destination webserver and more reliable and trustworthy form of proxied communication. perform an ordinary TLS handshake. Example protocol inter- In summary, this paper describes a large-scale and broad actions for HTTP and CONNECT proxies are provided in Fig- study of the Internet’s open proxies that (i) examines the ures 14 and 15 in Appendix A. open proxy ecosystem, (ii) measures open proxy performance, The network addresses (IPs and ports) of open proxies are (iii) measures the prevelance of previously undisclosed attacks, listed on proxy aggregator sites. These sites also some- and (iv) relates its findings to a similar study (which we con- times use the term transparent, meaning that the proxy sup- duct) of Tor exit relay behavior. Our results demonstrate that ports at least e2e TCP tunneling; SOCKS and CONNECT prox- misbehavior abounds on the Internet’s open proxies and that ies meet this criterion. As another dimension, aggregator sites the use of such proxies carries substantial risk. also sometimes categorize proxies as anonymous proxies. These proxies purportedly do not reveal the client’s IP address to the destination. We explore the degree to which open proxies pro- 2 Background vide anonymity in §5.1. The Socket Secure (SOCKS) protocol [34] was introduced in 1992 as a means to ease the configuration of network fire- 3 Related Work walls [33]. In SOCKS, a server listens for incoming TCP con- We are not the first to investigate the use of proxies on the Inter- nections from a client. Once connected, a client can then tun- net. Weaver et al. [52] performed a measurement study to detect nel TCP and/or UDP traffic through the SOCKS proxy to its the presence of transparent HTTP proxies on connections from chosen destination(s). Since SOCKS forwards arbitrary TCP clients to servers. They found that 14% of tested connections and UDP traffic, clients can use end-to-end (e2e) security pro- were proxied somewhere along the network path between the tocols (in particular, TLS/SSL) through SOCKS proxies. Cur- client and destination. Our work focuses on the intentional use rently, SOCKSv4 and SOCKSv5 servers are both deployed on of freely available proxies and gathers evidence of misbehavior the Internet, with the latter adding authentication features that by such proxies. enforce access control. In this paper, we do not distinguish Scott et al. [40] provide insights about the use, distribution, between SOCKSv4 and SOCKSv5, and use the general term and traffic characteristics of open proxies on the Internet. How- SOCKS proxies to describe proxies running either version. More commonly (see §5.1), open proxies use HTTP as a 1 Although, as we report in §5.1, most proxies that use HTTP as a transport transport mechanism. This has the benefit of supporting clients mechanism do not listen on ports 80 or 443, potentially nullifying this advantage. 2
Last updated: June 28, 2018 DRAFT – Please visit https://bit.ly/2ItDbPE for latest version ever, they do not focus on detecting any malicious behavior by Proxy Aggregator Proxies open proxies, which is a core focus of this work. clarketm [5] 6,343 multiproxy.org (all) [9] 1,524 Other work has investigated the misuse of proxy servers. Of multiproxy.org (anon) [10] 373 NordVPN [11] 29,194 note, Jessen et al. [41] deploy low-interaction honeypots to ex- ProxyBroker [22] 73,905 plore the (ab)use of open proxies to send spam. Researchers workingproxies.org [15] 1,250 have also examined the abuse of the Codeen CDN network [50] xroxy [16] 345 by looking at the requests that were forwarded by the network Total (unique proxies) 107,034 and mitigation strategies employed to minimize the forwarding of malicious requests [38, 51]. While we also investigate mis- Table 1: Sources of open proxies. A given proxy may be listed by more behavior, we focus on uncovering malicious behavior by open than one aggregator. proxies rather than malicious use of proxies by users. Tyson et al. [47] study HTTP header manipulation on the 4 Methodology & Experimental Setup Internet by various open proxies and discuss various observed Our study of the Internet’s open proxy servers has two main factors affecting HTTP header manipulation. Huang et al. [31] goals: (i) to measure the proxies’ availability, composition, and show that ASes sometimes use middleboxes that interfere with performance and (ii) to assess the degree to which proxies ex- HTTP traffic and inject HTTP headers across a wide range hibit malicious behavior. We begin by describing the method- of networks including mobile and data centers. Durumeric et ology used to perform our study. al. [26] examine the degree of HTTPS interception by middle- We conducted our study over a 50 day period, beginning on boxes and argue that such interception significantly reduces se- 2018-04-12 and ending on 2018-05-31. Our measurement ap- curity. Our work has greater scope, and focuses not only on paratus, described next, was installed on 16 locations (listed in header manipulation or HTTPS interception, but also uncovers Table 8 in Appendix B); these included 15 geographically di- several other forms of suspicious behavior exhibited by open verse AWS regions and an installation at our local institution. proxies. Waked et al. [49] show that commercially used TLS We term each instance a client location. Having multiple client middleboxes do not perform sufficient validation checks on SSL locations allows us to determine whether proxies behave differ- certificates and expose their clients to trivial attacks. We also ently based on the network and/or geographic locations of the found that some open proxies exhibit similar behavior. requesting client. From each client location, an automated process performed Previous work has also looked at ad injection in HTML con- the following steps once per day: tent [21, 39, 44]. Similarly, we look at content manipulation by open proxies, but consider other forms of misbehavior beyond 1. Populate: We collect and combine lists of advertised ad injection. proxies from a number of proxy aggregator sites. We aug- ment this list with the results of running ProxyBroker [22], Most related to this work is the recent study by Tsirantonakis an open source tool for finding open proxies. In all cases, et al. [46] that also looks at content manipulation by examining proxies were listed by aggregator sites as tuples contain- roughly 66K open HTTP proxies on the Internet. Their work ing an IPv4 address and a TCP port number. The complete provides an in-depth analysis of different types of malicious be- list of sources of proxies is listed in Table 1. We emphasize havior exhibited by proxies by injecting Javascript code. Their that the list of proxies is (re)fetched daily since, as we show analysis shows that proxies inject Javascript aimed at track- in §5.1, open proxies are subject to high levels of churn. ing users, stealing user information, fingerprinting browsers, and replacing ads. Complementary to their work, we provide 2. Classify: Next, from each client location, we attempt an in-breadth analysis of 107K unique open proxies by eval- to connect to each proxy and, if successful, determine uating their availability, performance, and behavior across a whether the proxy is a HTTP, CONNECT, or SOCKS large spectrum of potential attacks. We look at manipulation proxy. of content by open proxies, not only through the injection of 3. Fetch: Finally, we request several files (URLs) from the Javascript, but also by modification of different types of re- set of proxies that we were successfully able to classify. quested content (binary files, etc.) and through TLS MitM. Ad- ditionally, unlike existing work, we evaluate the behavior of In more detail, during the Fetch step, we retrieve the follow- open proxies based on different client locations. Laudably, Tsir- ing files from each proxy that we were able to classify, using antonakis et al. [46] built a service to detect malicious open unencrypted HTTP connections: an HTML page, a Flash object proxies on a daily basis and to publicly report the proxies (.swf), a Windows executable (.exe), a JPEG image, a ZIP file, a deemed unsafe. We compare our findings using this service. As Windows batch (.bat) file, a Linux/UNIX shell script (.sh), and a an alternative to open proxies, we compare the performance of Java JAR archive. With the exception of the .exe file (explained open proxies to Tor exit relays (§9), and argue that Tor provides in more detail in §7), the URLs are hosted on web servers at our a safer means of proxied communication. institution. 3
Last updated: June 28, 2018 DRAFT – Please visit https://bit.ly/2ItDbPE for latest version For CONNECT and SOCKS proxies (i.e., the proxies that sup- We use the MaxMind GeoLite2 City and ASN databases [36] port TLS), we also request files over HTTPS from a properly to resolve each proxy’s IP address to a physical location and configured (i.e., with a valid certificate) web server running an autonomous system (AS). Tables 2 and 4 respectively re- at our institution. We additionally request HTML files from port the most frequent locations and ASes from among the https://revoked.badssl.com/ and https://self-signed.badssl.com/ proxies that are listed, responsive (i.e., respond to proxy re- which, respectively, use revoked and self-signed certificates. quests), and successfully deliver the expected content at least The rationale for fetching content from sites with invalid cer- once. There is surprisingly little geographic and network tificates is discussed in §8. In all cases, we set the User-Agent diversity among the proxies. Ten countries are responsible HTTP request header to match that of Google Chrome version for nearly three-quarters of the world’s working proxies, while 62.0.3202.94 running on Mac OS X. Brazil alone is home to nearly 20% of the proxies that forward For each proxy request, we record whether the request com- expected content. pleted. If we received a response from the proxy, we also record Similarly, a handful of ASes are privy to traffic from a dis- the HTTP status code and response string (e.g., “200 OK”) re- proportionate amount of the open proxies. In particular, U.S.- turned by the proxy, the size of the response, the content of based DigitalOcean and the Chinese No. 31 Jin-rong Street AS the response, the MIME-type of the response (as determined each host approximately 7% of the working proxies. In general, by filemagic), the time-to-last-byte (TTLB) for receiving the re- the distribution of open proxies on the Internet is very skewed, sponse, the HTTP response headers, and (in the case of HTTPS with roughly 40% of proxies confined to only 10 ASes. More- requests) the certificate received. over, open proxies are found on only a small fraction of the Throughout the remainder of this paper, we use the term ex- Internet: although more than 31,000 proxies accepted proxy re- pected content to refer to the correct contents of the file and quests during the course of our experiment, they resided on just unexpected content to refer to content returned by a proxy 2,971 (5.8%) of the Internet’s approximately 51,500 autonomous that does not match the file indicated by the requested URL. systems [4]. A correctly functioning open proxy should thus return the ex- We identified Squid, an open-source cacheing proxy, as the pected content. As we explore in more detail below, unexpected most frequent proxy software among the relays that (i) re- content does not necessarily indicate malicious behavior. For sponded to our client requests and (ii) inserted a self-identifying instance, unexpected content could be an HTML page indicat- Via or X-Via HTTP header. As shown in Table 3, Squid ing that the proxy is misconfigured or that the proxy requires was used by over 85% of such proxies. Overall, we identified user authentication. 130 different self-reported proxy software systems, although we We remark that a weakness of our study, and one that note that not all proxies include Via or X-Via HTTP headers we share with studies that examine similar (mis)behavior in (84.43% do not) and that such headers are easily forged and may anonymity and proxy services [20, 46, 54], is that we cannot not actually reflect the actual software. easily differentiate between manipulation that occurs at a proxy As reported in Figure 4, approximately 62% of working prox- and manipulation that occurs somewhere along the path be- ies listened on TCP ports 8080, 3128, or 20183. Port 8080 is tween the proxy and the destination. Our findings of misbe- an alternative port frequently used for web traffic or cacheing havior can be viewed as indications that a proxy should not be servers, and 3128 is the default port used by Squid. The popu- used, either because it was itself malicious or because its net- larity of port 20183 is surprising; it is not listed as a standard work location makes traffic routed through it routinely vulner- port by the Internet Assigned Numbers Authority (IANA). The able. standard web port, port 80, is used by 7% of open proxies. The frequent use of ports 8080 and 3128 by open proxies provides a potential means of discovering additional potential proxies: 5 Proxy Availability & Performance Censys, which is based on ZMap [25] Internet scans, reports 7.3M Internet hosts listening on port 8080; Shodan lists 1.1M The number of unique proxies listed by proxy aggregator sites, hosts with services listening on port 3128 [12]. As we discuss in over time, is shown in Figure 1. The median number of proxies §10, we limit our study to only the proxies whose addresses are listed on the aggregator sites over all days in the measurement listed publicly by aggregator sites, and thus for ethical reasons period is 41,520, with a range of [38,843; 48,296]. In total, dur- do not probe these additional 8.4M hosts to discover unlisted ing the course of our study, we indexed approximately 107,000 proxies. unique proxies that were listed on aggregator sites. We find that more than 92% of open proxies that are 5.1 Performance advertised on proxy aggregator sites are offline or oth- erwise unavailable. Figure 2 plots the number of responsive We evaluate the performance of proxies by considering good- proxies over time for which we were able to establish at least put, computed as the size of a fetched file (we use a 1MiB file) one connection and retrieve content. We remark that the Fig- divided by the time taken to download it through a proxy. We ure includes proxies that returned unexpected content. Across consider only instances in which the response reflects the ex- our measurement study, the median daily number of responsive pected content (i.e., the 1MiB file). Figure 5 shows the range proxies is 3,283; the medians for HTTP, CONNECT, and SOCKS of proxies’ daily average goodput for requests that yielded proxies are 1,613; 1,525; and 74, respectively. the expected content. Overall, the average goodput for such 4
Last updated: June 28, 2018 DRAFT – Please visit https://bit.ly/2ItDbPE for latest version 1000 900 Total Fetches Expected Unexpected Number of Fetches (x1000) 50k 5000 800 40k 700 4000 600 Proxies listed Working proxies 30k 500 3000 400 20k 300 2000 200 10k 1000 100 a ap p-so st-2c ap uthe h-1a ca heas a eu ntra b ea n nt b -w b -w c tho u-w -2a ti c -e a -e c -w b -w a -2b eu est-1 ins t-3 us ast-1 rth t-1 ut t-1 us st-1 us est-1 sa- tutio -ce t-2 -ce l-1 eu ral-1 us ast-2 au e est est r's es -no as -so as -so ut a ea ap rthe 0 0 -no Apr 15 Apr 22 Apr 29 May 6 May 13 May 20 May 27 ap 2018 Apr 15 Apr 22 Apr 29 May 6 May 13 May 20 May 27 Jun 3 2018 HTTP CONNECT SOCKS Total Figure 1: Number of unique open proxies listed Figure 3: Total fetches and fetches with ex- on aggregator sites, over time. Figure 2: Working proxies, by type, over time. pected and unexpected content, by client lo- cation. Listed Responsive Correct Country Percent Country Percent Country Percent Software Percent China 18.81% Brazil 17.97% Brazil 19.48% squid 87.07% Brazil 15.55% China 16.75% China 14.74% http_scan_by 2.89% United States 15.19% United States 12.12% United States 12.17% 1.1 www.santillana.com.mx 2.29% Indonesia 5.95% Indonesia 6.77% Indonesia 7.15% 1.0 PCDN 1.78% Thailand 5.14% Thailand 5.73% Thailand 6.06% HTTP/1.1 sophos.http.proxy:3128 1.74% Russian Federation 4.60% Russian Federation 5.11% Russian Federation 5.40% swproxy 0.57% Germany 2.53% Singapore 2.61% Singapore 2.78% Cdn Cache Server 0.40% Singapore 2.34% Germany 2.59% India 2.52% MusicEdgeServer 0.34% India 2.21% India 2.54% Germany 2.44% 1.1 Pxanony 0.22% Italy 1.79% Canada 1.93% Canada 1.91% 1.1 j5k-8 (jaguar/3.0-11) 0.21% All others (count: 145) 25.69% All others (count: 136) 25.65% All others (count: 131) 25.17% All others (count: 120) 2.50% Table 2: Locations of proxies listed on aggregator sites (left), capable of accepting proxy connections (center), Table 3: Proxy software. and forwarding correct content at least once (right). proxies is 128.5KiBps, with an interquartile range (IQR) of Figure 7 shows the cumulative distribution (y-axis) of these [39.5; 160.9] KiBps. proxies’ failure rates (x-axis). Of proxies that respond to proxy We also consider proxies’ utility over time. A proxy that of- requests with 2xx HTTP success codes, we find that 92.0% fers high goodput but functions only sporadically is not par- consistently deliver the expected content. Alarmingly, ap- ticularly useful. We define the success rate to be the fraction proximately 8% of the proxies at least sometimes pro- of proxy requests that were successfully completed and yielded vided unexpected content, and 3.6% of the proxies con- the expected content. Figure 6 plots proxies’ success rates as a sistently returned unexpected content—all with HTTP re- function of their average goodput. We note that, generally, the sponse codes that indicate success. In §6 and §7, we explore proxies that offer the highest goodput (highlighted in the Figure cases in which the content has been purposefully and mali- with the oval) also tend to offer the highest success rates. The ciously manipulated—for example, to return a trojan .exe file existence of stratified “lines” when the success rate is 1/3, 1/2, or to insert spurious ads in retrieved HTML content. Further, and 2/3 is somewhat surprising: this indicates a regular peri- as shown in Figure 3, the behavior of proxies—that is, whether odicity or schedule during which these proxies are available. they returned expected or unexpected content—does not signif- icantly vary with the location of the requesting client. 5.2 Expected vs. Unexpected Content 5.3 Anonymity Proxies do not always return the expected content. However, not all unexpected content is malicious. For example, many Open proxies are sometimes used as a simple method of hiding tested proxies return a login page or a page conveying an au- a user’s IP address. We find that such a strategy is mostly inef- thentication error, regardless of the requested URL. These indi- fective, with nearly two-thirds of tested proxies exposing cate that the listed open proxy is either misconfigured, actually the requestor’s IP address. a private proxy, or is some other misconfigured service. Here, In more detail, we inspect the HTTP request headers that are we answer the question: to what extent do listed open proxies sent by a proxy to the destination webserver. These include return the expected content? the headers added by our client toolchain (specifically: User- For our analysis, we consider the proxies that have responded Agent and Accept) and those inserted by the proxy. To ac- to at least one proxy request with a non-zero byte response complish this, we constructed and hosted a simple web appli- and a 2xx HTTP response code that indicates success (i.e., in cation that records HTTP request headers. We then examined [200, 299]). For each such proxy, we determine its failure rate, the request headers that were produced when we used a client which we define as 1 − success rate. That is, a proxy’s failure at our local institution to access our web application through rate is the fraction of returned responses that constitute unex- each proxy. Of the proxies that were able to connect to our pected content. web application (13,740), we found that 66.08% (9,079) inserted 5
Last updated: June 28, 2018 DRAFT – Please visit https://bit.ly/2ItDbPE for latest version Listed Responsive Correct ASN Name Percent ASN Name Percent ASN Name Percent 4134 No.31,Jin-rong Street 9.34% 14061 DigitalOcean, LLC 7.81% 14061 DigitalOcean, LLC 7.47% 14061 DigitalOcean, LLC 6.69% 4134 No.31,Jin-rong Street 7.73% 4134 No.31,Jin-rong Street 6.65% 18881 TELEFÃŤNICA BRASIL S.A 3.55% 18881 TELEFÃŤNICA BRASIL S.A 4.10% 18881 TELEFÃŤNICA BRASIL S.A 4.52% 4837 CHINA UNICOM China169 Backbone 2.51% 17974 PT Telekomunikasi Indonesia 2.63% 17974 PT Telekomunikasi Indonesia 2.77% 17974 PT Telekomunikasi Indonesia 2.26% 4837 CHINA UNICOM China169 Backbone 2.53% 4837 CHINA UNICOM China169 Backbone 2.40% 13335 Cloudflare Inc 1.81% 45758 Triple T Internet/Triple T Broadband 1.97% 45758 Triple T Internet/Triple T Broadband 2.10% 16276 OVH SAS 1.73% 20473 Choopa, LLC 1.66% 20473 Choopa, LLC 1.82% 45758 Triple T Internet/Triple T Broadband 1.73% 53246 Cyber Info Provedor de Acesso LTDA ME 1.62% 53246 Cyber Info Provedor de Acesso LTDA ME 1.79% 20473 Choopa, LLC 1.44% 16276 OVH SAS 1.52% 16276 OVH SAS 1.56% 53246 Cyber Info Provedor de Acesso LTDA ME 1.38% 31034 Aruba S.p.A. 1.33% 31034 Aruba S.p.A. 1.45% — All others (count: 3419) 64.27% — All others (count: 2961) 61.17% — All others (count: 2759) 59.06% Table 4: Most popular ASNs for proxies listed on aggregator sites (left), proxies capable of accepting connections (center), and proxies that forwarded correct content at least once (right). 10M 1 5 2 3 Port 1M 0.8 3128 18 Goodput (Bps) 5 20 17% r t .3 % 2 Success rate Po 13 100k 0.6 5 Port 8 2 0 10k 7.5% 0.4 5 34.5 0 808 281 2 Port 53 % 4.82% 1000 Port All 5 0.2 oth ers 2018-04-12 2018-04-13 2018-04-14 2018-04-15 2018-04-16 2018-04-17 2018-04-18 2018-04-19 2018-04-20 2018-04-21 2018-04-22 2018-04-23 2018-04-24 2018-04-25 2018-04-26 2018-04-27 2018-04-28 2018-04-29 2018-04-30 2018-05-01 2018-05-02 2018-05-03 2018-05-04 2018-05-05 2018-05-06 2018-05-07 2018-05-08 2018-05-09 2018-05-10 2018-05-11 2018-05-12 2018-05-13 2018-05-14 2018-05-15 2018-05-16 2018-05-17 2018-05-18 2018-05-19 2018-05-20 2018-05-21 2018-05-22 2018-05-23 2018-05-24 2018-05-25 2018-05-26 2018-05-27 2018-05-28 2018-05-29 2018-05-30 2018-05-31 ( 22 883 .9% po 0 rts ) 5 1000 2 5 10k 2 5 100k 2 5 1M Goodput (Bps) Figure 4: Proxy ports, by Figure 5: Proxy goodput (log-scale). Box plots depict the range popularity. of proxies’ average goodput over its daily requests. The box de- Figure 6: Proxy success rates and average picts the IQR with the median; the whiskers denote 1.5 × IQR. goodput (log-scale). The highlighted oval Points beyond the whiskers are outliers. denotes the densest region of the graph. 90 53.25 Benign 1 Not Benign Cumulative Distribution Percentage 60 35.5 Unable to Determine Count 0.95 30 17.75 0.9 0 0.0 t nt ed d n in l ed n Ad Or ng ve P tion + pp tia en ize io io 0.85 te c l g je a at el ki ro en ct t l va ra In igin on r ac ab nc je ho sd ot gu ui C In oj u nl ut Eq Tr nfi pt o U Ad na N ry co U C is 0.8 M Ea 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Fraction of Incorrect Responses Figure 8: Classification of modified HTML retrieved on 2018-05-07. Figure 7: Cumulative distribution of the proxies’ failure rate (i.e., the fraction of responses that constituted unexpected content). client at our local institution. We observe a median of 1,133 successful requests (i.e., completed requests with 2xx HTTP at least one header (most commonly, X-Forwarded-For) response code) per day. that contained the IP address of our client. Overall, we find that 10.5% of requests for HTML pages produced unexpected HTML content. As noted above, not all unexpected content necessarily cor- 6 HTML Manipulation responds to malicious behavior. Since the observed fraction of unexpected HTML content is almost the same for each day of We begin our study of malicious behavior among open prox- our measurement study, we perform an in-depth analysis for a ies by considering the manipulation of fetched HTML content. random day, 2018-05-07, to determine whether the HTML ma- In the absence of end-to-end SSL/TLS (i.e., the use of HTTPS), nipulation could be categorized as malicious. On that date, we a malicious proxy can trivially either modify the web server’s observe requests through 1,259 proxies, of which 169 (about response or respond with arbitrary content of its choosing. 13.4%) return unexpected HTML content. (We discard requests To detect such misbehavior, we fetch an HTML page via the that did not yield 2xx success HTTP response codes.) open proxies each day between 2018-05-04 and 2018-05-31. For We manually analyze these 169 modified HTML files. Our simplicity, we focus on the HTML pages fetched through the methodology is as follows. First, we inspect each file and di- 6
Last updated: June 28, 2018 DRAFT – Please visit https://bit.ly/2ItDbPE for latest version vide them into two categories: benign (non-malicious) files and suspicious files. The latter category represents HTML files that contain Javascript code (either directly in the file or through the var miner = new Client.Anonymous( inclusion of an external Javascript file). ’ { throttle: 0.1 } ’, We then collectively inspect each of these benign and suspi- ); miner.start(); cious files and classify them into five benign classes and four non-benign classes. This inspection process involves manually examining the files, rendering them in a browser on a virtual machine, and potentially visiting Javascript URLs that were in- Figure 9: Injected HTML code with link to remote cryptojacking serted. The five classes that we posit do not indicate malicious- Javascript. The user identifier has been redacted. ness are as follows: 200 Count (x1000) • Equivalent: the fetched HTML renders equivalently 150 Total fetches in a web browser to the expected content. Oddly, 100 we found instances in which HTML tag attributes 50 were inconsequentially reordered; for instance, tags 0 such as are rewritten as 1500 Unexpected Content Malicious . 1000 Count • Misconfiguration: the retrieved content constituted error 500 pages, often displaying “not accessible” messages. 0 • No Content: the page contained no content. 0.5 • Truncated: the retrieved page was a truncated version of Malicious Percentage Percentage 0.4 the expected content. 0.3 • Unauthorized: the pages showed error messages such as 0.2 0.1 “invalid user” or “access denied”. 0.0 18 18 18 18 18 18 18 The identified classes of proxy misbehavior are: 8, 5, 2, 9, 6, 3, 0, r1 r2 y0 y0 y1 y2 y3 Ap Ap Ma Ma Ma Ma Ma • Ad Injection: the proxy replaced HTML content with Figure 10: Per day total requests (top), unexpected responses and ma- Javascript that rendered extraneous advertisements. licious responses (middle), and percentage of malicious responses rela- • Original + Ad Injection: the returned HTML contained the tive to total responses (bottom). expected content, but also included a single ad injection. • Cryptojacking: the returned HTML include Javascript code that would cause the user’s browser to mine cryptocurren- we determined that each such instance uses the same injected cies on behalf of the proxy. Javascript code, shown in Figure 9. • Potential Eavesdropping: the retrieved page contained The referenced min.js script is obfuscated Javascript. Af- Javascript that caused the user’s browser to visit pages ter decoding it, we determined that it is similar to Coinhive’s [6] from its history, potentially revealing these pages to the Monero2 cryptocurrency [8] mining Javascript. The 64 charac- proxy if they were previously visited without the use of ter long argument to the Client constructor serves as the identi- the proxy. fier for the user to be paid in the original Coinhive setting. (We redact this argument in Figure 9 because it potentially identifies The classifications of the modified HTML files with their a criminal actor.) Finally, we note that all Coinhive endpoints counts and percentages are shown in Figure 8. We were un- described in the copied Coinhive script are replaced with other able to classify six HTML responses, described in the Figure as domains, indicating that whoever is running the infrastructure “Unlabeled”. to collect the mining results is not using Coinhive’s service. Overall, we find that approximately 80% of the unexpected HTML responses are benign. Approximately 72% of the pages contained errors, likely due to private access proxies or due to 7 File Manipulation misconfigured proxy software. Over the duration of our 50 day experiment, we made more We find that 16.6% of unexpected HTML responses (and than 4.8M successful requests through 21,385 proxies to fetch 2.2% of overall proxy requests for HTML content) corre- a variety of non-HTML files (specifically, Windows .exe, Java spond to malicious activity, including ad injection, cryp- .jar, Flash, .zip, and Windows .bat files). As before, we define a tojacking, and eavesdropping. Among the malicious activ- successful request to be one in which the proxy returns a 2xx ity, most prominent (13%) is ad injection. HTTP response code and a non-zero content size. Furthermore, Perhaps most interestingly, we find that about 3% of the we exclude HTML, plaintext, and PHP responses (as determined files include Javascript that performs cryptojacking—that is, the unauthorized use of the proxy user’s processor to mine 2 Monero is advertised as a “secure, private, and untraceable” cryptocur- cryptocurrencies in the background. Upon further inspection, rency [8]. 7
Last updated: June 28, 2018 DRAFT – Please visit https://bit.ly/2ItDbPE for latest version Requested Return File Type (VirusTotal) File Type EXE Flash JAR ZIP HTML ISO GZIP XML Unknown EXE 6614/6713 - - - 410/423 - 0/39 0/34 12/234 Flash - 0/1546 - - 352/362 - 0/3 0/34 0/32 JAR - - 0/1075 2/400 338/351 - 0/2 0/34 0/55 ZIP - - - 12/4162 380/395 - 0/18 0/34 0/176 TEXT - - - - 436/446 545/545 0/2 0/34 23/10910 BAT - - - - 556/574 - 0/98 0/34 2/719 Table 5: Requested file types and various response file types determined by VirusTotal for unexpected responses. Each entry for a given request and response file type shows the fraction of malicious responses. No. of antivirus systems flagging 0 1 2 3 4 5 6 18 as malicious No. of Files 19,802 587 1,274 1,508 4,232 1,513 23 545 (67.16%) (1.99%) (4.32%) (5.11%) (14.35%) (5.13%) (0.08%) (1.85%) Table 6: The number of files and percentage (relative to all unexpected content files) (bottom row) flagged as malicious by varying number of antivirus systems (top row) used by VirusTotal. by filemagic) from our analysis to avoid uninteresting instances Table 6 shows the number of unexpected responses that are of error pages or unauthorized access pages (see §6). flagged as malicious, broken down by the number of antivirus tools used by VirusTotal that classified it as malicious. At the Figure 10 (top) shows the frequency of total successful re- extreme, for example, we find 545 retrieved files that are con- quests made per day, averaging approximately 97K requests sidered malicious by 18 different antivirus systems. per day throughout the measurement period. Overall, 29,484 (0.61%) of such requests (made by 6.76% of the proxies), consti- tute unexpected content. 7.1 Detailed Findings To check whether unexpected content is malicious, we sub- In what follows, we discuss in more detail our findings of mali- mitted all 29,484 unexpected responses to VirusTotal [14] for cious proxy activity, organized by file type. scanning. VirusTotal scans uploaded files using multiple an- tivirus tools and returns detailed analysis results, including the Windows Executables (.exe). Almost all (98.5%) of unex- uploaded files determined file types. Table 5 provides a sum- pected responses for .exe files are classified as malicious. We mary of our findings. Each entry for a given request and re- find that 1.93% (413/21,385) of the proxies maliciously sponse type (determined by VirusTotal) indicates the fraction modified the .exe file at least once during our measure- of responses with unexpected content that were flagged as ma- ment period. The infections include malware from the Expiro licious by VirusTotal. Note that VirusTotal (correctly) deter- malware family that can be used to steal personal information mined various responses to be HTML that were wrongly clas- and provide remote access to the attacker [2], and flavors of sified as non-HTML by filemagic in our initial pass of filtering malware from the Crypt and Artemis trojan families [1, 3]. Ta- out HTML pages. ble 9 in Appendix C lists the top 10 reported infections for .exe files. To establish a baseline, we verified that all of the expected responses are not classified by VirusTotal as being malicious. Flash and .jar Files. VirusTotal did not flag any of the mod- That is, no single antivirus tool used by VirusTool marks the ified Flash or Java .jar files as malicious. We are unsure of expected content as being malicious, as expected. whether this indicates (i) benign (but unexplained) instances in which proxies rewrite these files, or (ii) a limitation of the On the contrary, VirusTotal flags 32.84% (9,682/29,484) of the scanners used by VirusTotal. unexpected responses as malicious; that is, at least one of the antivirus systems used by VirusTotal flags the content as ma- ZIP Files. For ZIP file responses with unexpected con- licious. Figure 10 (middle) shows the number of unexpected tent, only a single antivirus system (McAfee-GW-Edition, and malicious responses per day. The percentage of malicious v2017.2786) flagged 0.28% (14/4562) of them as malicious. Virus- responses (relative to total success successful fetches per day) Total did not provide any details about the infection. remained fairly consistent throughout our study (Figure 10 (bot- HTML Files. We received HTML responses from proxies (as tom)). Across our experiment period, we find that, on av- determined by VirusTotal) irrespective of the requested content erage, 0.2% of daily proxy responses are classified as mali- type. Roughly 0.2% (44/21,385) of the proxies returned unex- cious by at least one antivirus system used by VirusTotal. pected content at least once, with 43 of them returning mali- 8
Last updated: June 28, 2018 DRAFT – Please visit https://bit.ly/2ItDbPE for latest version No. Malicious Proxies AS Number AS Name 1 0.8 72 4134 No.31,Jin-rong Street 45 14061 DigitalOcean, LLC 0.6 CDF 25 4837 CHINA UNICOM China169 Backbone 0.4 14 17974 PT Telekomunikasi Indonesia 9 56046 China Mobile communications corporation 0.2 0 1 2 5 10 2 5 100 2 5 Table 7: The five ASes with the most number of malicious proxies. Number of malicious responses, per malicious proxy cious content as well. Almost 97% (2,472/2,551) of these HTML Figure 11: Consistency of malicious proxy behavior. responses were labeled as malicious. Table 10 in Appendix C shows the reported infections and the number of HTML re- sponses with malicious code. Upon further manual examina- which plots (in log-scale) the cumulative distribution of the tion, we found that all of the 2,472 malicious responses detected number of times that malicious proxies (defined as a proxy that by VirusTotal contain the same Monero cryptocurrency [8] ever behaves maliciously) exhibit misbehavior. Here, we see mining Javascript shown in Figure 9. that half of proxies that return malicious content do so at least twice. More than 25% of malicious proxies return at least 10 files ISO Files. Surprisingly, on 545 occasions, we received ISO with malicious content, while the top 10% of malicious proxies files when requesting a 1MiB text file. All of the 545 ISO re- return 56 or more malicious files. sponses were exactly 1MiB in size, but were flagged as mali- Surprisingly, none of the 469 discovered proxies that return cious by VirusTotal. Table 11 in Appendix C shows the var- malicious content (§7.1) are listed on the service run by Tsir- ious infections reported by VirusTotal. We note that all 545 antonakis et al. [46] that reports misbehaving proxies. This files were infected with the Vittalia Trojan, a rootkit for Win- suggests that correctly identifying misbehaving proxies is very dows. The content of 520 of the 545 ISO images was identi- challenging, since proxy misbehavior may be transient and may cal while the remaining 25 responses were identical to one an- take different forms. other. Although the ISO files were clearly malicious, only 0.04% (9/21,385) of the proxies returned a malicious ISO response at least once. 8 SSL/TLS Analysis Shell Scripts. We fetched shell scripts to determine if ma- licious proxies would modify (or replace) the scripts in transit. We find that 70% (14,607/20,893) of the proxies that fetch ex- We find that 22.75% (211,288/928,431) of the requests for shell pected content at least once allow TLS traffic to pass through scripts result in responses with unexpected content. To hone- them. We emphasize that supporting HTTPS incurs no over- in on malicious activity, we discarded responses whose MIME- head on behalf of the proxy, since the proxy is not involved in type was not “text/x-shellscript”; we found 1,020 instances in the (end-to-end) cryptography and is merely transporting the which the responses were unexpected but were determined to ciphertext. It is worth noting that as of early June 2018, more be shell scripts. Oddly, all 1,020 instances of unexpected con- than 70% of loaded web pages were retrieved using HTTPS [35]. tent correspond to just four unique responses, summarized in The lack of universal HTTPS support among the open proxies Table 12 in Appendix C. All of these modifications appear to be effectively forces some users to downgrade their security, buck- non-malicious (strangely, one frequent modification inserts the ing the trend of moving towards a more secure web. text “Pop HerePop HerePop HerePop HerePop HerePop Here- We consider misbehavior among two dimensions: attempts Pop HerePop Here” and little else) and are most likely due to at decreasing the security of the communication through TLS misconfiguration. Overall, we did not find any evidence of ma- MitM or downgrade attacks, and alteration of the content. licious shell script manipulation during our measurements. SSL/TLS Stripping. We first examine whether any of the proxies rewrite HTML link tags to downgrade the trans- 7.2 Network Diversity and Consistency of Mali- port from HTTPS to HTTP. For example, a malicious proxy cious Proxies could attempt to increase its ability to eavesdrop and mod- ify unencrypted communication by replacing all links to https: For the responses that we deemed malicious, we also looked //example.com with http://example.com on all webpages re- at the distribution of responsible proxies. Table 7 shows the trieved over HTTP. To perform this test, we fetched via each top five ASes with the most number of proxies that performed proxy a HTML page hosted on a web server at our institution malicious manipulations. over HTTP. This HTML page contained six HTTPS links. We We also looked at the daily behavior of the top three ma- did not find evidence of proxies stripping SSL by replacing the licious proxies over the duration of our measurement, as de- included links. termined by the number of malicious responses returned. The most malicious proxy returned malicious content 100% of the SSL Certificate Manipulation. In this experiment, we time. The second and third most malicious proxies returned search for instances of SSL/TLS certificate manipulation. While malicious responses 97.7% and 87.9% of the time, respectively, interfering with an HTTPS connection would cause browser when they were reachable. This trend is apparent in Figure 11, warning messages, numerous studies have shown that even 9
Last updated: June 28, 2018 DRAFT – Please visit https://bit.ly/2ItDbPE for latest version 4 Revoked the proxies that modified these certificates were operated by Percentage 3 2 schools and were incorrectly configured to serve requests from 1 any network location. Interestingly, all of the certificates in- 0 serted by these school proxies had the expected subject com- 4 mon name and were valid (but not normally verifiable, since Self-signed Percentage 3 the school is not a root CA). This leads to an interesting result: 2 1 if the schools pre-installed root CA certificates on students’ or 0 employees’ computers, then they are significantly degrading 4 the security of their users. That is, they are masking the fact Valid that requested webpages have expired or revoked certificates Percentage 3 2 by replacing these invalid certificates with ones that would be 1 0 verified. This is similar to the effects observed by Durumeric , 1 8 , 1 8 , 1 8 , 1 8 , 1 8 , 1 8 , 1 8 , 1 8 et al. [26] in their examination of enterprise middleware boxes, r 12 r 19 r 26 y0 3 y1 0 y1 7 y2 4 y3 1 Ap Ap Ap Ma Ma Ma Ma Ma which also degraded security by replacing invalid certificates with valid ones signed by the enterprise’s CA. Figure 12: Percentage of proxies that return expected content but per- form TLS/SSL MitM, for websites that use revoked (top), self-signed Performing TLS MitM also allows a malicious proxy to mod- (middle), and valid and unexpired certificates (bottom). ify page content. To detect such behavior, using the same ap- proach as described in §6, we analyzed HTML pages fetched over TLS via the open proxies. We did not find any malicious security-conscious users regularly ignore such warnings [17, activity. 42]. We test the proxies against two categories of domains: one 9 Comparison With Tor with a valid and verifiable SSL certificate hosted on a web server at our institution; and domains with incorrect or invalid Tor [23] provides anonymous TCP communication by routing SSL certificates (https://revoked.badssl.com, https://self-signed. user traffic through multiple relays (typically three) using lay- badssl.com/). We include the latter category since we posit that ered encryption. The first relay in the path (or circuit) is the a smart attacker might perform SSL MitM only in instances guard relay, and the final relay through which traffic exits is in which the connection would otherwise use revoked or self- the exit relay. The original data transmitted is visible only signed certificates; in such cases, the browser would issue a at the exit relays. Therefore, unless e2e encryption is used, warning even in the absence of the proxy’s manipulation. data can be eavesdropped by malicious or compromised exits. To detect SSL/TLS certificate manipulation, we fetch the Prior research studies have found evidence of malicious behav- three domains (listed above) via the open proxies each day be- ior such as interception of credentials, etc., by a small fraction tween 2018-04-12 and 2018-05-31. For simplicity, we focus on of Tor exit relays, especially when the traffic was not e2e en- the pages fetched through the client at our local institution. crypted [20, 37, 54]. We now compare the level of manipulation Overall, we find that 1.06% (102/9,625) of proxies that as observed using open proxies to that when content is fetched support HTTPS perform TLS/SSL MitM by inserting a over Tor. modified certificate. To determine whether any of the proxies We modify Exitmap [54], a fast scanner that fetches files performing SSL/TLS MitM might belong to a known botnet, we through all Tor exit relays. To maintain consistency with searched the SSL Fingerprint Blacklist and Dyre SSL Fingerprint our earlier experiments, we fetched the same set of files (e.g., Blacklist [13] for the modified certificates’ fingerprints. We did HTML, .exe, etc.) as described in §6 and §7 over HTTP, and ac- not find any blacklisted certificates. cessed the same HTTPS URLs as described in §8. We fetched We next consider proxies that fetch the expected content but these files each day through every available Tor exit relay be- modify the SSL/TLS certificate. Such behavior suggests that the tween 2018-05-06 and 2018-05-31, during which the median proxy is eavesdropping on HTTPS connections. The percent- number of available exit relays was 722. age of such eavesdropping proxies, per day, for the different Approximately 13.8% of connections and 1.8% of fetches categories of certificates is plotted in Figure 12. Overall, 0.85% timed out when using Exitmap. This is unsurprising since Ex- (82/9,607) of the proxies that return expected responses appear itmap does not perform the same (and necessary) bandwidth- to be eavesdropping. We did not find any evidence of proxies weighted relay selection as the standard Tor client [23, 54]. selectively targeting incorrect or invalid SSL certificates. Over our 26 day Tor experiment, we found no instance in Finally, we analyze the modified TLS certificates inserted by which a Tor exit relay manipulated either file contents the eavesdropping proxies when the requested site is https: or SSL/TLS certificates. Comparing our results to §6-8, this //revoked.badssl.com or https://self-signed.badssl.com/ (i.e., strongly suggests that Tor is a more trustworthy network for re- when the genuine certificate is revoked or self-signed, respec- trieving forwarded content. However, since Tor exits may still tively). We find that there were 435 modified certificates from passively eavesdrop (which we would not detect), we concur 21 unique issuers. The issuer common name (CN) strongly sug- with the conventional wisdom that e2e encryption (i.e., HTTPS) gests that 19% (4/21) of the issuers were schools. We posit that is appropriate when using Tor. 10
Last updated: June 28, 2018 DRAFT – Please visit https://bit.ly/2ItDbPE for latest version 3.5 ×10 5 Our study meets the Menlo Report’s Justice criterion by dis- Median Throughput (bytes/sec) Tor tributing our measurements equally across all identified open Proxy proxies. 2.5 Finally, we achieve the Respect for Law and Public Interest cri- terion by (i) conducting only legal queries (i.e., we are not re- 1.5 questing any content that is likely to be illegal or censored) and (ii) being transparent in our methods (see §4). 0.5 8 8 8 8 8, 1 5, 1 2, 1 9, 1 y0 y1 y2 y2 Ma Ma Ma Ma 11 Conclusion Figure 13: Tor vs. open proxy median throughput per day while fetch- ing a static file of size 1 MiB. Open proxies provide a free and simple way to bypass regional content filters and achieve a limited degree of anonymity. How- ever, the absence of any security guarantees for traffic passing We also compare the performance of Tor to that of the open through these proxies makes their use highly risky: users may proxies. We compute the open proxies throughput as described unintentionally expose their traffic to malicious manipulations, in §5.1. For Tor, we rely on data from the Tor Metrics Portal [45]; especially when no end-to-end security mechanisms (e.g., TLS) specifically, we use the median time taken per day to download are present. a static file of size 1 MiB and derive Tor’s median throughput Our study of the Internet’s open proxies—the largest con- between 2018-05-06 and 2018-05-31. Figure 13 shows the Tor ducted to date—discloses and quantifies new forms of mis- and open proxies’ median throughput per day. We find that behavior, reinforcing the notion that open proxies should be the Tor median throughput is roughly twice the open proxy used with extreme caution. We found numerous instances throughput. This suggests that Tor performs relatively faster of misbehavior, including the insertion of spurious ads and than the open proxies. cryptocurrency-mining Javascript, TLS MitM, and the injection of RATs and other forms of malware. Moreover, we found that 92% of advertised proxies listed on open proxy aggregator sites 10 Ethical considerations are nonfunctional. In contrast, our nearly monthlong study of the Tor network found zero instances of misbehavior and far We designed and conducted our measurements to minimize greater stability and goodput, indicating that Tor offers a safer risk. In particular, we believe our study’s design and implemen- and more reliable form of proxied communication. tation meet the criteria described in the Menlo Report [24], an While we remain wary about the use of open proxies, some ethical framework for conducting network measurements that of the risks we identify can be at least partially mitigated. Tools has been widely adopted by the networking and computer secu- such as HTTPS Everywhere [7] can help reduce the risk of traf- rity communities. The Menlo Report describes four principles fic manipulation by forcing end-to-end protections. The con- of ethical research. tinued rollout of certificate transparency and similar measures We achieve the principle of Respect for Persons by avoiding will also likely reduce (but not eliminate) risk, as they thwart the collection of data belonging to individual users. In our ex- the certificate manipulation attacks described above. periments, we use open proxies largely as they are intended: We emphasize, however, that where e2e integrity and au- we issue standard well-formed proxy requests and request only thenticity guarantees are not possible (e.g., for unencrypted benign (non-malicious) traffic from non-controversial websites. web traffic), the use of open proxies still carries substantial risk. We record only our own traffic, and in no instance do we mon- Given users’ difficulty in adhering to safe browsing habits even itor or capture the behavior of other proxy users. We do not in the absence of proxies [17, 27, 42], we are hesitant to recom- attempt to discover non-publicly listed proxies by scanning the mend relying on browser-based protections to defend against Internet. By focusing exclusively on proxies that are already malicious proxy behavior. Our findings suggest that the risks publicly listed, we do not risk disclosing the existence of any of using open proxies are plentiful, and likely far outweigh their previously unknown proxy. benefits. We achieve Beneficience by minimizing potential harms while providing societal benefits (i.e., exposing the dangers of using open proxies). Unlike other studies that explicitly probe for in- References stances of Internet censorship by requesting potentially objec- tionable content [18, 19, 48, 53], our measurements avoid ex- [1] Win32/Expiro, 2011. Available at https://www. posing proxy operators to risk (e.g., government sanction) by securitystronghold.com/gates/win32.trojan.crypt.html. retrieving only URLs that are very unlikely to be censored. Nor [2] Win32/Expiro, 2011. Available at https: do our measurements consume significant resources: we re- //www.microsoft.com/en-us/wdsi/threats/ quest a small handful of URLs from each proxy; the average malware-encyclopedia-description?Name=Win32% size of the content is just 177 KiB. 2FExpiro. 11
You can also read