Quantifying the collateral damage of Blacklisting
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Quantifying the collateral damage of Blacklisting Sivaramakrishnan Anushah Hossain Jelena Mirkovic Ramanathan UC Berkeley, ICSI USC USC Minlan Yu Sadia Afroz Harvard University ICSI, Avast ABSTRACT From the perspective of a network operator, there is no Blacklists, consisting of known malicious IP addresses, can be way to measure the excessive blocking for a blocking mecha- used as a simple method to block malicious traffic. However, nism. How would a network operator know how many users blacklists can potentially lead to unjust blocking to legiti- are unjustly blocked? mate users due to IP address reuse, where more users could This paper proposes two techniques to measure unjust be blocked than intended. IP addresses can be reused either blocking by IP blacklisting. Blacklists are lists of identifiers at the same time (Network Address Translation) or over time (most often IP addresses) that are associated with malicious (dynamic addressing). We propose two new techniques to activities. For a network operator, blacklists provide a simple identify reused addresses. We built a crawler using the Bit- method to quickly block malicious traffic entering their net- Torrent Distributed Hash Table to detect NATed addresses work. Blacklists can have unjust blocking due to two forms and use the RIPE Atlas measurement logs to detect dynami- of address reuse: 1) NATed addresses where several users cally allocated address spaces. We then analyze 151 publicly share the same IP address at the same time and 2) dynamic available blacklists to show the implications of reused ad- addressing where the same IP address is allocated to multiple dresses and find that 53–60% of blacklists contain reused users over time. addresses having about 30.6K–45.1K listings of reused ad- In this study, we make the following contributions: dresses. We also find that reused addresses can potentially Detecting reused addresses: We propose two new tech- affect as many as 78 legitimate users for as many as 44 days. niques to identify reused addresses that provide high-confidence detection, leverage only public datasets, and can be replicated by other researchers. While extensive prior work exists on detecting NATed [7, 8, 42, 46, 48, 57, 64, 68] and dynamic addresses [37, 56, 73], we find that they either do not pro- 1 INTRODUCTION vide sufficiently accurate and fine-grained information per IP address or do not publicly release the final list of reused Consider one user’s experience with Cloudflare discussed addresses or prefixes . To detect NATed IP addresses, we under a trouble ticket [23]. When the user tried to access implement a crawler using the BitTorrent’s Distributed Hash any website hosted by Cloudflare, they were unnecessarily Table (DHT). Our crawler detects when BitTorrent users blocked and subjected to CAPTCHAs. On further inspec- simultaneously use the same IP address (Section 3.1), and tion, the user found that their public IP address was shared thus we can measure the lower bound of users behind that IP with many other users via a Network Address Translation address that would be adversely affected if the address were (NAT). One of the NAT users was running a spam campaign blacklisted. To detect dynamic addresses, we use the RIPE leading to the NAT’s IP address being listed in many black- Atlas probe measurement logs to identify probes whose IP lists. It is known that Cloudflare uses its own IP reputation addresses change frequently and thus determine IP prefixes with the help of blacklists [20] to protect Cloudflare’s cus- that are dynamically allocated (Section 3.2). By determin- tomers. Thus users behind reused addresses will be unjustly ing dynamic prefixes, we can identify users that would be blocked whenever they access websites hosted on Cloud- affected by address reuse when allocated a previously black- flare [22] and similar problems are found with other hosting listed IP address. providers [32, 33]. There is very little recourse for legitimate Measuring the impact of reused addresses: We apply users that are unjustly blocked. In fact, Cloudflare’s best our detection techniques to identify reused addresses in 151 practice [24] recommends users to obtain a new IP address, publicly available blacklists (Section 4). About 60% of black- by either resetting their device or by contacting their ISP. In lists list at least one NATed address and about 53% of list at reality, obtaining a new untainted static IP address may be least one dynamic address that will lead to unjust blocking. impossible or too costly for many users [55, 61, 72]. We find 45.1K and 30.6K listings in blacklists for each type of 1
Submission to IMC’20, Oct Sivaramakrishnan 20, Pittsburgh, PA, Ramanathan, USA Anushah Hossain, Jelena Mirkovic, Minlan Yu, and Sadia Afroz reuse, respectively. NATed and dynamic addresses in black- non-CGN addresses. However, identifying ASes that use lists can have an impact on end-users by blocking as many CGN is not useful for our research, since blacklists list IP as 78 users for as long as 44 days. addresses and we cannot tell which of these are CGN. Finally, we survey 65 network operators (Section 6) on their blacklisting practices and find that IP blacklists are often 3 TECHNIQUES used to directly block traffic. To assist network operators We propose two novel techniques to identify reused IP ad- to avoid unjust blocking, we make our techniques publicly dresses. We use a BitTorrent-based crawler to identify NATed available and also publish a new address list that has all addresses and to estimate a lower bound on the number of reused addresses we detect1 . users behind a NATed address. To identify dynamically ad- dressed /24 prefixes, we extend Padmanabhan et al. [54]’s 2 RELATED WORK idea of using the RIPE Atlas measurement logs. Our pri- Existing studies identify autonomous systems (ASes) or IP orities in designing these approaches were: (1) IP address prefixes that may be reused (e.g., that use carrier-grade NAT) granularity, (2) high accuracy of a positive detection, and (3) using heuristics. However, to estimate the impact of black- reasonable coverage. In other words, we accept some loss listing reused addresses, we need to accurately identify IP in coverage to achieve the first two goals: accuracy and fine addresses that are reused. Müeller et al. [48] use traceroutes to granularity of detection. Our findings are therefore a lower a dedicated server in an ISP to detect middleboxes (including bound on reused addresses. NAT). Other techniques use IPid [7], OS fingerprinting [8] or UDP hole punching [64] to detect NATed addresses. Ne- 3.1 Identifying NATed Addresses talyzr [42] and NetPiculet [68], on the other hand, require We crawl the BitTorrent network to identify NATed addresses users to install Android applications that carry out measure- among BitTorrent users. BitTorrent is a popular peer-to- ments from the client’s device. Though these techniques peer network for content exchange. A new BitTorrent user are effective in detecting NATs, they require many users to learns about other users as it joins the network. Every user install custom applications to achieve significant coverage, generates its own unique 160-bit node_id that is obtained by and must continuously incentivize them to conduct measure- hashing the (possibly private) IP address of the user and a ments. These measurements are also no longer active. random number. A new user learns IP addresses and port Other approaches to reused address identification use pri- numbers of eight other users through the BitTorrent protocol vate data, which they do not release and we cannot replicate – these users become the neighbors of the new user. The them. Richter et al. [56] and Casado et al. [14] observed IP protocol supports two messages – bt_ping to periodically addresses using NAT or dynamic addressing by monitoring ping active neighbors and get_nodes to get a list of neighbors server connection log of a CDN. Xie et al. [73] analyzed of any given node. We built a crawler that uses bt_ping and Hotmail user-login trace to determine dynamic addresses. get_nodes messages to crawl the BitTorrent network and Cai et al. [37] present an ongoing survey by sending ICMP identify BitTorrent users using the same IP address at the ECHO messages to 1% of the IPv4 address space. Based on same time, indicating such users are likely behind a NAT. the responses, they define metrics on availability, volatil- Initially, the crawler sends a get_nodes message to the Bit- ity, and median up-time to determine address blocks that Torrent bootstrap node, which returns a set of its neighbors. are potentially dynamically allocated. This work produces The crawler maintains a list of discovered BitTorrent users, a public dataset, which we compare against our approach and issues further get_nodes messages to the users on the in Section 5. However, this work has several limitations. An list. The messages are issued in the order of discovery time. ICMP reply from an IP address need not uniquely identify Replies to the get_nodes messages include the IP address, the host using the IP address since firewalls and middleboxes port number, node_id and the BitTorrent version of the node. can reply on behalf of hosts. Further, some networks filter To prevent unnecessary probing, the BitTorrent crawler is outgoing ICMP traffic, potentially leading to undercounting. restricted only to address spaces where blacklists are present Finally, this work introduces an ad-hoc estimate of dynami- (Section 4). If the crawler finds a new user with an already cally allocated prefixes based on the address uptime, and we discovered IP address, but with a different port number, it cannot establish its accuracy. tries to establish the reason behind this occurrence: (1) mul- BitTorrent network has been used to identify carrier-grade tiple BitTorrent users are using the same IP address (NATed NATs (CGNs) in autonomous systems [46, 57]. These tech- address), or (2) the BitTorrent user has changed the port niques leverage the fact that CGN public-facing IP addresses number and the crawler encountered stale information. We are likely to appear more frequently in a time window than do not use node_id to determine multiple BitTorrent nodes using the same IP address, because the BitTorrent user can 1 https://archive.org/details/IMC_407 regenerate a new node_id every time their machine reboots. 2
Quantifying the collateral damage of Blacklisting Submission to IMC’20, Oct 20, Pittsburgh, PA, USA 1 2 3 4 Port: 2215, Port: 2215, Port: 2215, Port: 2215, 12281 12281 12281 12281 Port: 155, Port: 155, Port: 155, Port: 155, IP 1 1821 IP 1 1821 IP 1 1821 IP 1 1821 IP 2 IP 2 IP 2 IP 2 Port: 6681 Port: 6681 Port: 6681 Port: 6681 IP 3 IP 3 IP 3 IP 3 Crawler Crawler Crawler NAT 2x 2x 1x 2x bt_ping bt_ping reply reply Figure 1: BitTorrent crawler to detect NATed reused addresses. To determine if more than one active BitTorrent users user using BitTorrent at the same time. We are thus likely to share the same IP address at the same time, the crawler grossly underestimate the number of users behind a NAT. issues bt_ping’s to all discovered ports behind a given IP ad- dress, and waits for responses. If the crawler gets more than 3.2 Identifying Dynamic Addresses two responses with two different node_id’s and two different The RIPE NCC’s Atlas project deploys custom devices that port numbers, we conclude that the IP address is shared by conduct various measurement tasks. Every RIPE Atlas probe multiple BitTorrent users. Our technique produces an under- connects to a central infrastructure to get instructions on estimate of NATed IP addresses and users, but it provides us new measurement tasks and to update measurement data. All a high assurance of correct positive identifications. measurements are logged to include the unique probe ID and Figure 1 shows an example of how our BitTorrent crawler the IP address through which the measurement was made. finds NATed addresses. The crawler encounters two users Probes are usually deployed within the customer premises ( 1 and 2) having the same IP address, but with two dif- equipment (CPE) of the user. We use the measurement logs to ferent ports in 1 (with ports 2215, 12281 with 1 and ports infer the dynamics of IP address allocation in the covering /24 155, 1821 with 2). To verify if multiple active BitTorrent address prefix. We identify three properties of measurement users are using the same IP address, the crawler issues four logs that help us to find dynamic addresses. bt_ping messages in 2 , one for each port across two IP ad- 1) Dynamic addressing observed over time: We ob- dresses and waits for responses. The crawler receives two serve RIPE Atlas probe measurement logs for 16 months to replies from 2 and one reply from 1 (in 3 ), therefore determine dynamic addressing. Observing logs for a long determining that 2 is shared by multiple BitTorrent users duration allows us to understand the frequency of IP address and 1 is not (in 4 ). reallocation in probes. BitTorrent bt_ping messages are sent over UDP, which 2) Frequency of IP address change: We consider probes means that they could be lost in transit. To compensate for that have gone through multiple IP address allocations within this, the crawler sends bt_ping messages every hour for all the same AS during the monitoring period, to eliminate the IP addresses that have more than one discovered port. probes that experienced IP address changes rarely and to The crawler logs all the messages (bt_ping or get_nodes) sent remove probes that have changed locations. Among the re- and all the messages received with the timestamps, which maining probes, we consider probes whose average dura- are then processed to determine NATed IP addresses. The tion between every IP address change is within 1 day. This BitTorrent crawler is rate-limited to minimize disturbance helps to estimate the risk involved in blacklisting these IP to the users we probe. Once the crawler sends out a message addresses, since blacklisting may be effective only for one (get_nodes or bt_ping) to all discovered ports associated with day, and will lead to unjust blocking afterward. an IP address, the crawler does not contact that same IP 3) Extent of dynamic addressing: If we detect that an address for the next 20 minutes. IP address is dynamic, according to the criteria we described Limitations: Our crawler can only detect NATed ad- above, we consider that the entire /24 prefix covering this dresses that have more than one BitTorrent user. We will IP address is dynamically allocated. Usually, IP address re- miss NATs in networks where BitTorrent is not popular, allocation occurs from a pool of IP addresses. It is hard to or where BitTorrent traffic is filtered. Further, we can only determine this address pool, and network operators do not detect NATed addresses, that are reused by more than one make such information public. A conservative approach is 3
Submission to IMC’20, Oct Sivaramakrishnan 20, Pittsburgh, PA, Ramanathan, USA Anushah Hossain, Jelena Mirkovic, Minlan Yu, and Sadia Afroz threshold blacklisted addresses blacklisted BitTorrent addresses blacklisted RIPE addresses 1 (#) of allocated addresses 1000 0.1 100 CDF 0.01 10 0.001 1 0 2k 4k 6k 8k 10k 12k 1 10 100 1000 10k 100k RIPE atlas probes (#) of ASes Figure 2: IP addresses allocated to RIPE Atlas probes. Figure 3: CDF of blacklisted and reused addresses from each AS. to consider the entire /24 prefix as dynamic as contiguous addresses are usually administered together and /24 prefixes several autonomous systems and can allocate addresses from have been previously used to identify network characteris- multiple autonomous systems. Even within the IP address tics [18, 27, 28, 37, 56]. spaces covered by RIPE Atlas probes, our technique only We use RIPE Atlas connection logs from 1 Jan 2019 to 11 presents a lower-bound of prefixes that are dynamically allo- May 2020. In this period, we observe over 15,703 RIPE Atlas cated. For probes that change their IP addresses frequently, probes that were allocated 311K IP addresses (referred to as we consider the entire /24 prefix to be potentially dynami- RIPE addresses). About 13.1% (or 2K) of probes go through cally allocated. However, we may incorrectly gauge these address changes but have addresses allocated across multiple boundaries. Network operators may deploy dynamic address- autonomous systems. Figure 2 shows the remaining 13.6K ing within larger prefixes, leading us to underestimate the probes and the number of addresses allocated to them. The extent of unjust blocking. In some cases, they may deploy dy- majority of the probes (59% or 9.3K) did not go through any IP namic addressing within smaller prefixes, thereby overcount- address change in this period and the remaining 27% (or 4.2K) ing the number of dynamic addresses. Estimating boundaries of probes go through multiple address changes. To determine is difficult because ISPs have their own private policies for if a probe belongs to a dynamically allocated address space, dynamic addressing. These limitations exist in other related we choose an appropriate threshold of address changes. We works [37, 56, 73] as well. determine this by obtaining the knee point of this graph by using a technique proposed by Satopää et al. [58] and 4 DETECTION estimate the threshold to be 8 addresses. Blacklists typically list entities that have sent spam, DDoS This leaves us with 16.6% (2.6K) of the probes that have attacks, dictionary attacks, or malicious scans. To quantify at least eight IP address allocations during our monitoring unjust blocking by blacklists, we use 151 public blacklists period, and where all reallocated IP addresses belong to the shown in Table 2 in the Appendix B. These lists are actively same autonomous system. These 2.6K probes had a total maintained and include popular lists like DShield [40], NixS- of 204K IP addresses, with an average of 78 IP addresses pam [53], Spamhaus [4], Alienvault [2] and Abuse.ch [1]. allocated to each probe. In other words, 16.6% of all RIPE We collect blacklist data for 83 days over two measurement Atlas probes contributed to 65.5% of all IP addresses allocated periods from 03 Aug 2019 to 10 Sep 2019 (39 days) and 29 to all RIPE Atlas probes. As a final step to consider probes Mar 2020 to 11 May 2020 (44 days). During our measurement that regularly change addresses within one day, we end up periods, we observed 2.2M IP addresses. Each blacklist, on with 4% (629) of probes that are in dynamically allocated average, has 30K IP addresses. address spaces. Figure 4 shows the number of reused addresses discov- Limitations: Our detection of dynamic addresses is lim- ered by our techniques. We ran our BitTorrent crawler at ited only to IP prefixes that deploy a RIPE Atlas probe. An- the same time as the blacklist collection period. To prevent other limitation is that our detection technique involves unnecessary probing, we restrict the crawler to the black- choosing RIPE probes that have been allocated IP addresses listed address space of 899K /24 prefixes (as discussed in from the same autonomous system. However, ISPs may own Section 3.1). During this period, the crawler sent 1.6 billion 4
Quantifying the collateral damage of Blacklisting Submission to IMC’20, Oct 20, Pittsburgh, PA, USA NATed addresses Question Response NATed + BitTorrent External blacklists 85% NATed IPs blacklisted IPs (2M) IPs Blacklist (48.7M) Avg:2 (29.7K) usage Paid-for blacklists Dynamic addresses Max:39 Blacklisted Probes with Probes with Probes that Avg:10 addresses in address changes frequent address change address Public blacklists RIPE prefixes in same AS changes daily Max:68 (53.7K) (34.4K) (33.1K) (22.7K) Active Directly block IPs 59% defense Threat intelligence system 35% Figure 4: Detecting NATed and dynamic addresses. Dynamic addressing* 76% Issues Carrier-grade NATs* 56% Table 1: Summary of survey responses on usage of blacklists. Questions in (*) were only taken by 34 out bt_ping messages and received 779M responses (48.6% re- of 65 respondants. sponse rate). The crawler identified a total of 48.7M unique IP addresses that use BitTorrent, belonging to 203M unique node_id’s. Among the discovered BitTorrent IP addresses, blacklisted dynamic addresses, we observe that reused ad- 2M are NATed; out of these, 29.7K are blacklisted. We use dresses are present in many blacklists (up to 60%). Black- the 311K RIPE addresses obtained from Section 3.2, and con- listing reused addresses can have serious impact on users. vert them into 90.5K /24 RIPE prefixes. About 53.7K black- Blacklisting NATed addresses can lead to unjust blocking of listed addresses are in RIPE prefixes. The overlap with RIPE multiple users having the same IP address and dynamically prefixes decreases as we apply our technique. At first, by allocated addresses can lead to unjust blocking of a user that eliminating all probes that change addresses across multiple has been newly allocated a previously blacklisted address. ASes, the number of blacklisted addresses reduces to 34.4K. We find that a reused address can be present in blacklists for While considering probes that have at least eight address as many as 44 days and can potentially block as many as 78 changes, the number of blacklisted addresses further reduces users. to 33.1K. Finally, after considering probes that change their Reused addresses are present in many blacklists: Fig- addresses daily, we have 22.7K blacklisted addresses that are ure 5 and Figure 6 shows blacklists that list reused addresses. dynamically allocated. There are 61 blacklists (40% of all blacklists) that do not list To determine the feasibility of our techniques to discover any NATed addresses and 72 blacklists (47% of all blacklists) reused addresses, we estimate (shown in Figure 3) the extent that do not list any dynamic addresses. We discover 45.1K of overlap between the address spaces where BitTorrent or listings2 that include 29.7K IP addresses that are NATed. We RIPE addresses are present and the address spaces where also discover 30.6K listings that include 22.7K IP addresses blacklisted addresses are present. The ASes in Figure 3 are that are dynamic addresses. On average, a blacklist lists 501 arranged in increasing order of the number of blacklisted NATed IP addresses and 387 dynamic addresses. addresses present in them. Blacklisted addresses are present Some blacklists list more reused addresses than oth- in about 26K autonomous systems. Blacklisted addresses ers: The top 10 blacklists contribute 65.9% of all listings using BitTorrent are present in 7.7K (or 29.6%) ASes and of NATed addresses and 72.6% of all listings of dynamic ad- blacklisted addresses among RIPE prefixes are present in 1.9K dresses. The three highest presence of NATed addresses are (or 17.1%) ASes. Our technique has significant coverage in from spam/reputation blacklists – Stopforumspam, Nixspam the ten most blacklisted ASes. These ASes contribute to 606K and Alienvault listing about 3.3K-8.6K (or 0.01%–0.03% of (or 27.7%) of all blacklisted addresses. Among these addresses, blacklists) such IP addresses. Similarly, the three highest pres- 38K (6.4%) use BitTorrent and 4.4K (0.7%) are among RIPE ence of dynamic addresses are from Stopforumspam, Nixspam prefixes. AS4134, belonging to China Telecom Backbone, has and Bad IPs, primarily used to mitigate spam and identify IP the highest number of blacklisted addresses (202K or 9%). addresses with a poor reputation. These blacklists list about Among the blacklisted addresses from AS4134, about 3% (or 2.6K–5.9K (or 0.01%–0.02% of blacklists) dynamic addresses. 6.2K) use BitTorrent and 0.4% (or 817) are in RIPE prefixes. Most reused addresses are removed quickly: Figure 7 shows the CDF of blacklisted NATed and dynamic addresses 5 ANALYSIS and the duration in days that they were present in a blacklist. In this section, we estimate the potential extent of unjust On average, NATed IP addresses are removed within ten days, blocking caused by blacklisting reused addresses. Although 2 An IP address can be present in different blacklists, therefore the number we identify only 29.7K blacklisted NATed addresses and 22.7K of listings need not be equal to the number of reused IP addresses. 5
Submission to IMC’20, Oct Sivaramakrishnan 20, Pittsburgh, PA, Ramanathan, USA Anushah Hossain, Jelena Mirkovic, Minlan Yu, and Sadia Afroz RIPE Cai et al. NATed addresses dynamic addresses 10k 10k 1 CDF of IP addresses 1000 0.8 1000 0.6 log(#) log(#) 100 100 0.4 10 10 0.2 1 0 1 10 20 30 40 50 60 70 80 90 10 20 30 40 50 60 70 0 10 20 30 40 (#) of blacklists (#) of blacklists (#) of days in blacklists Figure 5: NATed addresses in black- Figure 6: Dynamic addresses in Figure 7: Duration distribution of lists. blacklists. reused addresses. is Cai et al. [37]. We use their techniques and the datasets (IT86c and IT89w) that most closely match our monitoring 1 periods to compare dynamic addresses. Figure 6 (black line) CDF of IP addresses 0.95 shows the number of blacklisted addresses that overlap with 0.9 their study. Cai et al. have broader coverage in some black- 0.85 lists compared to this study; this is likely due to the absence 0.8 of RIPE Atlas probes in those address regions. We find that 0.75 the total number of listings discovered by Cai et al. is roughly the same as our technique: they detect 29.8K listings com- 0.7 pared to 30.6K listings using our technique. 10 20 30 40 50 60 70 (#) of users with the same IP address 6 UNDERSTANDING BLACKLISTS USAGE Figure 8: Number of users behind NATed addresses in We survey 65 network operators (see Appendix A for the blacklists. full survey), on how they use blacklists to identify mali- cious traffic, the role blacklists play in traffic filtering, and their perceptions of limitations of blacklists due to reused and dynamic addresses are removed within three days from addresses. Our survey indicates that 85% of respondents blacklists. Dynamic addresses are removed much faster than use blacklists, and 59% of respondents use blacklists to di- NATed IP addresses. Within two days, 77.5% of all dynamic rectly block malicious traffic (Table 1). 34 survey participants addresses are removed from blacklists, compared to only 60% responded to direct questions about the impact of reused of NATed IP addresses. In the worst case, reused addresses addresses. Of these, 56% believe that blacklists are inaccurate are present for the entire monitoring period of 44 days. due to NAT and 76% believe dynamic addressing introduces Blacklisting NATed addresses impact many users: Bit- inaccuracies. It is evident that network operators use black- Torrent crawler described in Section 3.1, helps us quantify lists and are aware of unjust blocking caused by reused ad- the lower bound of active users using the same IP address dresses . To assist network operators, we make our crawler that would be affected by blacklisting. Figure 8 shows the and scripts to determine reused addresses public3 . Network CDF of NATed blacklisted addresses and the lower bound of operators using blacklists can use our list to divert traffic affected users. For most of these IP addresses, we detect only from these sources to more sophisticated systems that use two simultaneous users (68.5%). 97.8% of the IP addresses other identifiers (cookies, payload signatures) to filter un- have fewer than ten simultaneous users. The remaining 2.2% wanted traffic and avoid unjust blocking. Network operators of the IP addresses are shared by many more users. At the using application-specific blacklists (such as spam blacklist), maximum, we detect 78 simultaneous users behind an IP can also use our list to reduce false positives with greylist- address. ing [43]which is already built into popular spam filtering Exploring other techniques: As discussed in Section 2, systems like Spamassassin [69] or Spamd [13]. there are many other techniques for detecting reused ad- dresses. The only technique that can be reproduced at scale 3 https://archive.org/details/IMC_407 6
Quantifying the collateral damage of Blacklisting Submission to IMC’20, Oct 20, Pittsburgh, PA, USA 7 CONCLUSION [21] GPF Comics. 2020. The GPF DNS Block List. https://www.gpf- comics.com/dnsbl/. (May 2020). We present two techniques to identify reused addresses and [22] Cloudflare Community. 2018. Getting Cloudflare capcha on analyze 151 publicly available blacklists to quantify their almost every website I visit for my home network. Help! impact. We find 53–60% of blacklists list at least one reused https://community.cloudflare.com/t/getting-cloudflare-capcha- address. We also find 30.6K–45.1K listings of reused addresses on-almost-every-website-i-visit-for-my-home-network-help/42534. in blacklists. Reused addresses can be present in blacklists (Nov 2018). [23] Cloudflare Community. 2019. Blocked IP address: Sharing for as long as 44 days affecting as much as 78 users. Finally, IPs. https://community.cloudflare.com/t/cloudflare-blocking-my-ip/ to assist blacklist maintainers to reduce unjust blocking, we 65453/57. (Mar 2019). made our discovered reused addresses public. [24] Cloudflare Community. 2019. Community Tip - Best Practices For Captcha Challenges. https://community.cloudflare.com/t/community- tip-best-practices-for-captcha-challenges/56301. (Jan 2019). REFERENCES [25] CruzIt. 2020. Server Blocklist / Blacklist - CruzIT.com - PHP, Linux & DNS Tools, Apache, MySQL, Postfix, Web & Email Spam Prevention [1] Abuse.ch. 2020. Swiss Security Blog - Abuse.ch. https://www.abuse.ch/. Information. http://www.cruzit.com/wbl.php. (May 2020). (May 2020). [26] Cybercrime. 2020. CyberCrime Tracker. http://cybercrime- [2] Alienvault. 2020. Alienvault Reputation System. https:// tracker.net/. (May 2020). www.alienvault.com/. (May 2020). [27] Alberto Dainotti, Karyn Benson, Alistair King, KC Claffy, Michael [3] Antispam. 2020. ImproWare. http://antispam.imp.ch/. (May 2020). Kallitsis, Eduard Glatz, and Xenofontas Dimitropoulos. 2013. Estimat- (Accessed on 05/13/2020). ing internet address space usage through passive measurements. ACM [4] Charles Arthur. 2006. Can an American judge take a British company SIGCOMM Computer Communication Review 44, 1 (2013), 42–49. offline? (October 2006). https://www.theguardian.com/technology/ [28] Alberto Dainotti, Karyn Benson, Alistair King, Bradley Huffaker, Ed- 2006/oct/19/guardianweeklytechnologysection3 uard Glatz, Xenofontas Dimitropoulos, Philipp Richter, Alessandro [5] BadIPs. 2020. badips.com | an IP based abuse tracker. https:// Finamore, and Alex C Snoeren. 2016. Lost in space: improving infer- www.badips.com/. (May 2020). ence of IPv4 address space utilization. IEEE Journal on Selected Areas [6] Bambenek. 2020. Bambenek Consulting Feeds. http: in Communications 34, 6 (2016), 1862–1876. //osint.bambenekconsulting.com/feeds/. (May 2020). [29] Binary Defense. 2020. Binary Defense Systems | Defend. Protect. Secure. [7] Steven M Bellovin. 2002. A technique for counting NATted hosts. In https://www.binarydefense.com/. (May 2020). Proceedings of the 2nd ACM SIGCOMM Workshop on Internet measur- [30] DYN. 2020. Index of /pub/malware-feeds/. http://security- ment. ACM, 267–272. research.dyndns.org/pub/malware-feeds/. (May 2020). [8] Robert Beverly. 2004. A robust classifier for passive TCP/IP finger- [31] IP finder. 2020. IP Blacklist Cloud - Protect your website. https: printing. In International Workshop on Passive and Active Network //www.ip-finder.me/. (May 2020). Measurement. Springer, 158–167. [32] Comcast Forums. 2018. Dirty (blacklisted) IPs issued to Comcast [9] Blocklist.de. 2020. Blocklist.de fail2ban reporting service. https:// Business Account holders. https://forums.businesshelp.comcast.com/ www.blocklist.de/en/index.html. (May 2020). t5/Connectivity/Dirty-blacklisted-IPs-issued-to-Comcast-Business- [10] Botscout. 2020. We catch bots so that you don’t have to. https:// Account-holders/td-p/34297. (Mar 2018). www.botscout.com. (May 2020). [33] Verizon Forums. 2020. IP address blocked by SORBS, Verizon will [11] Botvrij. 2020. botvrij.eu - powered by MISP. http://www.botvrij.eu/. do nothing. https://forums.verizon.com/t5/Fios-Internet/IP-address- (May 2020). blocked-by-SORBS-Verizon-will-do-nothing/td-p/892536. (Feb 2020). [12] Malware Bytes. 2020. hpHosts - by Malware Bytes. https://hosts- [34] Daniel Gerzo. 2020. Daniel Gerzo BruteForceBlocker. http:// file.net/. (May 2020). danger.rulez.sk/index.php/bruteforceblocker/. (May 2020). [13] Calomel. 2017. Spamd tarpit and greylisting daemon. https:// [35] Greensnow. 2020. Greensnow Statistics. https://greensnow.co/. (May calomel.org/spamd_config.html. (Jan 2017). 2020). [14] Martin Casado and Michael J Freedman. 2007. Peering through the [36] Charles B. Haley. 2020. SSH Dictionary Attacks. http://charles.the- shroud: The effect of edge opacity on IP-based client identification. In haleys.org/. (May 2020). 4th {USENIX} Symposium on Networked Systems Design & Implemen- [37] John Heidemann, Yuri Pradkin, Ramesh Govindan, Christos Pa- tation ({NSDI} 07). padopoulos, Genevieve Bartlett, and Joseph Bannister. 2008. Cen- [15] Taichung Education Center. 2020. Taichung Education Center. https: sus and survey of the visible internet. In Proceedings of the 8th ACM //www.tc.edu.tw/net/netflow/lkout/recent/30. (May 2020). SIGCOMM conference on Internet measurement. 169–182. [16] CIArmy. 2020. CINSscore. http://ciarmy.com/. (May 2020). [38] Project Honeypot. 2020. Project Honeypot. https: [17] Cisco. 2020. Cisco Talos - Additional Resources. http:// //www.projecthoneypot.org/. (May 2020). www.talosintelligence.com/. (May 2020). [39] IBM. 2020. IBM X-Force Exchange. https:// [18] Kimberly Claffy, Young Hyun, Ken Keys, Marina Fomenkov, and Dmitri exchange.xforce.ibmcloud.com/. (May 2020). Krioukov. 2009. Internet mapping: from art to science. In 2009 Cyber- [40] SANS Institute. 2019. Internet Storm Center. https://dshield.org/ security Applications & Technology Conference for Homeland Security. about.html. (Sept 2019). IEEE, 205–211. [41] My IP. 2020. My IP - Blacklist Checks. https://www.myip.ms/info/ [19] Cleantalk. 2020. Cloud spam protection for forums, boards, blogs and about. (May 2020). sites. https://www.cleantalk.org. (May 2020). [42] Christian Kreibich, Nicholas Weaver, Boris Nechaev, and Vern Paxson. [20] Cloudflare. 2020. Understanding Cloudflare Challenge Passage 2010. Netalyzr: illuminating the edge network. In Proceedings of the (Captcha). https://support.cloudflare.com/hc/en-us/articles/200170136. 10th ACM SIGCOMM conference on Internet measurement. ACM, 246– (Feb 2020). 259. 7
Submission to IMC’20, Oct Sivaramakrishnan 20, Pittsburgh, PA, Ramanathan, USA Anushah Hossain, Jelena Mirkovic, Minlan Yu, and Sadia Afroz [43] M Kucherawy and D Crocker. 2012. Email greylisting: An applicability statement for smtp. Technical Report. RFC 6647, June. [44] Snort Labs. 2020. Sourcefire VRT Labs. https://labs.snort.org/. (May 2020). Spam [45] Malware Domain List. 2020. Malware Domain List. http:// Reputation www.malwaredomainlist.com/. (May 2020). [46] I. Livadariu, K. Benson, A. Elmokashfi, A. Dainotti, and A. Dhamdhere. DDoS 2018. Inferring Carrier-Grade NAT Deployment in the Wild. In IEEE Bruteforce Conference on Computer Communications (INFOCOM). Ransomware [47] Malc0de. 2020. Malc0de Database. http://malc0de.com/database/. (May 2020). SSH [48] Andreas Müller, Florian Wohlfart, and Georg Carle. 2013. Analysis and HTTP topology-based traversal of cascaded large scale NATs. In Proceedings Backdoor of the 2013 workshop on Hot topics in middleboxes and network function virtualization. ACM, 43–48. FTP [49] Blocklist NET. 2020. BlockList.net.ua. https://blocklist.net.ua/. (May Banking 2020). VOIP [50] Normshield. 2020. Normshield - Cyber Risk Scorecard. https:// 0 20 40 60 80 100 www.normshield.com/. (May 2020). [51] NoThink. 2020. NoThink Individual Blacklist Maintainer. http: (%) of operators //www.nothink.org/. (May 2020). [52] Nullsecure. 2020. nullsecure. https://nullsecure.org/. (May 2020). [53] Heise Online. 2020. Nixspam Blacklist. https://goo.gl/jsyksA. (May Figure 9: Types of blacklists used by operators that 2020). have faced issues with reused addresses in blacklists. [54] R. Padmanabhan, A. Dhamdhere, E. Aben, k. claffy, and N. Spring. 2016. Reasons Dynamic Addresses Change. In Internet Measurement Conference (IMC). [67] VX Vault. 2020. VX Vault ViriList. http://vxvault.net/ViriList.php. [55] Spectrum Partners. 2020. Spectrum Static IP. https: (May 2020). //partners.spectrum.com/content/spectrum/business/en/internet/ [68] Zhaoguang Wang, Zhiyun Qian, Qiang Xu, Zhuoqing Mao, and Ming staticip.html. (May 2020). Zhang. 2011. An untold story of middleboxes in cellular networks. [56] Philipp Richter, Georgios Smaragdakis, David Plonka, and Arthur In ACM SIGCOMM Computer Communication Review, Vol. 41. ACM, Berger. 2016. Beyond Counting: New Perspectives on the Active IPv4 374–385. Address Space. In Proceedings of ACM IMC 2016. Santa Monica, CA. [69] Apache Wiki. 2019. Other Trick For Blocking Spam. [57] Philipp Richter, Florian Wohlfart, Narseo Vallina-Rodriguez, Mark https://cwiki.apache.org/confluence/display/SPAMASSASSIN/ Allman, Randy Bush, Anja Feldmann, Christian Kreibich, Nicholas OtherTricks#OtherTricks-Greylisting. (Jul 2019). Weaver, and Vern Paxson. 2016. A Multi-perspective Analysis of [70] Wikipedia. 2019. Internet network operators’ group — Wikipedia, Carrier-Grade NAT Deployment. In Proceedings of ACM IMC 2016. The Free Encyclopedia. (June 2019). https://en.wikipedia.org/w/ Santa Monica, CA. index.php?title=Internet_network_operators%27_group&oldid= [58] Ville Satopaa, Jeannie Albrecht, David Irwin, and Barath Raghavan. 906511356 2011. Finding a" kneedle" in a haystack: Detecting knee points in [71] Chris Wilcox, Christos Papadopoulos, and John Heidemann. 2010. system behavior. In 2011 31st international conference on distributed Correlating spam activity with ip address characteristics. In 2010 INFO- computing systems workshops. IEEE, 166–171. COM IEEE Conference on Computer Communications Workshops. IEEE, [59] Sblam. 2020. Sblam! http://sblam.com/. (May 2020). 1–6. [60] Stop Forum Spam. 2020. Stop Forum Spam. https: [72] Xfinity. 2020. Business Class Internet at Home. https:// //stopforumspam.com/. (May 2020). www.xfinity.com/hub/business/internet-for-home-business. (May [61] ARS Technica. 2020. ATT raises prices 7% by making its customers 2020). pay ATT’s property taxes. https://arstechnica.com/tech-policy/2019/ [73] Yinglian Xie, Fang Yu, Kannan Achan, Eliot Gillum, Moises Goldszmidt, 10/att-raises-prices-7-by-making-its-customers-pay-atts-property- and Ted Wobber. 2007. How dynamic are IP addresses?. In ACM taxes/. (Oct 2020). SIGCOMM Computer Communication Review, Vol. 37. ACM, 301–312. [62] Threatcrowd. 2020. Threat Crowd - Open Source Threat Intelligence. [74] ZeroDot1. 2020. CoinBlockerLists. https://gitlab.com/ZeroDot1/ https://threatcrowd.org/. (May 2020). CoinBlockerLists. (May 2020). [63] Emerging Threats. 2020. Emerging Threats Rules. https: //rules.emergingthreats.net/fwrules/emerging-Block-IPs.txt. (May A USAGE AND PERCEPTIONS OF 2020). [64] Kazuhiro Tobe, Akihiro Shimoda, and Shegeki Goto. 2010. Extended BLACKLISTS UDP Multiple Hole Punching Method to Traverse Large Scale NATs. In this section, we survey network operators to understand Proceedings of the Asia-Pacific Advanced Network 30 (2010), 30–36. blacklist usage to filter suspicious traffic. This is when users [65] Turris. 2020. Greylist :: Project:Turris. https://www.turris.cz/en/ greylist. (May 2020). of reused addresses could be unjustly blocked. Further, we [66] URLVir. 2020. URLVir: Monitor Malicious Executable Urls. http: try to understand the operator’s anecdotal experiences on //www.urlvir.com/. (May 2020). (Accessed on 05/13/2020). blacklisting reused addresses. This helps us to establish the importance of this issue from a human perspective. 8
Quantifying the collateral damage of Blacklisting Submission to IMC’20, Oct 20, Pittsburgh, PA, USA We circulated an online questionnaire to regional net- this question. 56% of the respondents (19 out of 34) believed work operator groups by posting to all groups that published carrier-grade NAT (CGN) affected the accuracy of blacklists, open-access mailing lists (identified via [70]). We received citing cases where legitimate users were getting blocked be- responses from members of forty groups. Network operators’ cause of a shared address. About 76% of respondents (26 out mailing lists are forums for network engineers, operators, of 34) said that dynamic addressing affected the accuracy of and other technical professionals to coordinate and dissemi- blacklists. As a part of the survey, operators identified the nate information about network security, peering, routing, type of external blacklists used in their network. Figure 9 and other operational Internet issues. We chose this strategy shows the type of blacklists used operators that have faced to maximize outreach to the relevant communities, offering issues with blacklists due to reused addresses. Among the apologies in advance for potential overlap as operators may blacklists subscribed by these operators, we find spam and subscribe to more than one list (e.g. NYNOG for New York, reputation blacklists to have the highest consequences of USA, in addition to NANOG for North America). Mailing list blocking reused addresses. Although our findings are anec- subscribers were informed that the purpose of the study was dotal, previous studies have shown malicious activities such to better understand current blacklisting practices and chal- as spamming to be correlated with dynamically allocated lenges. Participants could complete the survey anonymously address spaces [71, 73]. and were offered the option to subscribe to receive findings from the completed study. The survey included 24 questions B BLACKLIST DATASET on what blacklists are used, the role they play in filtering A network operator could use its own set of blacklists to (e.g. indirect blocking or as an input to a threat intelligence determine reused addresses. We use 151 public blacklists system), and their perceived benefits and limitations (see shown in Table 2. We collected blacklist data for 83 days over Section C of Appendix). Between July and August 2019, 65 two measurement periods from 03 Aug 2019 to 10 Sep 2019 respondents finished and submitted survey answers. Survey (39 days) and 29 Mar 2020 to 11 May 2020 (44 days). As a part participants operate networks in five continents, including of the survey (Section A), some network operators manually end-user and enterprise ISPs and content providers. Sizes of listed external blacklists used by them. There are 27 such networks vary from 100 to over 10 million users. Our key blacklists indicated by a * in Table 2. findings are as follows: Blacklists are widely used and are used for active C QUESTIONNAIRE ON PERCEPTIONS defense: Network operators use two types of blacklists – OF BLACKLISTS operator curated internal blacklists and external blacklists that includes paid-for or publicly available blacklists. About Questions that permitted open-ended responses are denoted 70% of operators maintained internal blacklists and 85% used with asterisks. external blacklists. Network operators often use multiple (1) What is your company’s name and AS number if avail- blacklists to defend their networks. 55% of respondents used able?* two or more different types of blacklists. On average, net- (2) What is your position / your role in network manage- work operators subscribed to 2 paid-for lists and 10 publicly ment?* available blacklists and can use up to 39 paid-for and 68 (3) What is your email address?* public blacklists. Usage of blacklists can have consequences (4) May we reach out to you via email: to inform you once as network operators often use them to block traffic. 59% the results of this survey are publicly available of surveyed network operators use blacklisted addresses to (5) May we reach out to you via email: with further ques- directly block traffic, and fewer than 35% of network oper- tions ators use blacklisted addresses as an input to other threat (6) What type of network do you run? (more than one intelligence systems. Therefore, depending on the blacklists choice possible) used, network operators could unjustly block users in reused (7) How many subscribers do you connect to the Internet? addresses. Our survey also finds that some network oper- (8) In what geographic region(s) do you operate? ators set manual filters to override blacklisting in address (9) Do you maintain internal blacklists? spaces that they believed were dynamically allocated. We (10) How and why did you develop internal blacklists? How also find that blacklists have usage beyond blocking. One of do they compare to third-party blacklists?* the surveyed network operators checks its own addresses on (11) How many third-party blacklists do you use? blacklists before assigning them to new customers, to avoid (12) Which of the following types of third-party blacklists unjust blocking. do you use? (Please select all that apply) Perceived inaccuracies due to reused addresses: When (13) What factors determine which third-party blacklists asked directly, only 34 of the survey respondents answered you use?* 9
Submission to IMC’20, Oct Sivaramakrishnan 20, Pittsburgh, PA, Ramanathan, USA Anushah Hossain, Jelena Mirkovic, Minlan Yu, and Sadia Afroz Maintainer # of blacklists (14) Do you use third-party blacklists to directly block ma- Bad IPs [5] 44 licious activity? Bambenek [6] 22 (15) Do you use third-party blacklists as an input to a threat *Abuse.ch [1] 10 intelligence system? Normshield [50] 9 (16) In your experience, do third-party blacklists provide *Blocklist.de [9] 9 accurate information on threats? Malware bytes [12] 9 (17) What are the shortcomings of any third-party black- *Project Honeypot [38] 4 lists you are familiar with?* CoinBlockerLists [74] 4 (18) What are the strengths of any third-party blacklists NoThink [51] 3 you are familiar with?* Emerging threats [63] 2 (19) How do your filtering practices vary according to type ImproWare [3] 2 of attack or blacklist?* Botvrij.EU [11] 2 (20) To help us map your responses to the blacklists we are IP Finder [31] 1 monitoring, please list the third-party blacklists you *Cleantalk [19] 1 use.* Sblam! [59] 1 (21) Do you see the quality of blacklists being affected by: *Nixspam [53] 1 Dynamic addressing Blocklist Project [49] 1 (22) Do you see the quality of blacklists being affected by: BruteforceBlocker [34] 1 Carrier grade NATs Cruzit [25] 1 (23) Do you see the quality of blacklists being affected by: Haley [36] 1 Other* Botscout [10] 1 (24) How could blacklists be improved?* My IP [41] 1 (25) Do you donate data from your network to commu- Taichung [15] 1 nity blacklist sources (such as Project Honeypot or *Cisco Talos [17] 1 DShield)? Alienvault [2] 1 (26) Is there anything else you would like to share with Binary Defense [29] 1 us?* GreenSnow [35] 1 Snort Labs [44] 1 GPF Comics [21] 1 Turris [65] 1 CINSscore [16] 1 Nullsecure [52] 1 DYN [30] 1 Malware domain list [45] 1 Malc0de [47] 1 URLVir [66] 1 Threatcrowd [62] 1 CyberCrime [26] 1 IBM X-Force [39] 1 VXVault [67] 1 *Stopforumspam [60] 1 Total 151 Table 2: Each row shows the number of blacklists pro- vided by the blacklist maintainer. We monitor 151 blacklists to determine the number of blacklisted ad- dresses using NAT or are dynamically allocated. Black- lists used by network operators who took our survey are marked with (*). 10
You can also read