THREATINGESTOR DOCUMENTATION - INQUEST LABS - JAN 21, 2020 - READ THE DOCS
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
ThreatIngestor Documentation InQuest Labs Jan 21, 2020
Contents 1 Welcome to ThreatIngestor 3 1.1 What is ThreatIngestor? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Try it out . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2 Installation 7 3 Basic Usage 9 3.1 Minimal Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.2 Standard Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 4 Example Workflows 13 4.1 Multiple Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 4.2 Full-Circle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 4.3 Queue Workers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 4.4 Automate as Much as Possible . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 5 Source Plugins 25 5.1 Available Plugins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 6 Operator Plugins 33 6.1 Available Plugins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 7 Artifacts 41 7.1 Domains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 7.2 Hashes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 7.3 IP Addresses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 7.4 Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 7.5 URLs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 7.6 YARA Signatures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 8 Extras 45 8.1 Quick Webapp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 8.2 Queue Workers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 9 Observability 49 9.1 Logging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 9.2 Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 i
9.3 Notifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 10 Developing 51 10.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 10.2 Source Plugins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 10.3 Operator Plugins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 11 API Documentation 53 11.1 threatingestor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 12 Glossary 65 Python Module Index 67 Index 69 ii
ThreatIngestor Documentation ThreatIngestor is a flexible, configuration-driven, extensible framework for consuming threat intelligence. It can watch Twitter, RSS feeds, and other sources, extract meaningful information like C2 IPs/domains and YARA signatures, and send that information to other systems for analysis. Use ThreatIngestor alongside ThreatKB or MISP to automate importing public C2s and YARA signatures, or integrate it into your existing workflow with custom operator plugins. Contents 1
ThreatIngestor Documentation 2 Contents
CHAPTER 1 Welcome to ThreatIngestor 1.1 What is ThreatIngestor? ThreatIngestor helps you collect threat intelligence from public feeds, and gives you context on that intelligence so you can research it further, and put it to use protecting yourself or your organization. There is a never-ending stream of publicly available information on malicious activies online, but compiling all that information manually can take a lot of manual effort and time. ThreatIngestor automates as much of that work as possible, so you can focus on more important things. Fig. 1: A screenshot of Twitter user @MalwareConfig’s feed, showing two tweets with defanged C2 domains and IP addresses. Because it is completely modular and configuration-driven, ThreatIngestor is super flexible, and should fit easily into any threat intel workflow. 3
ThreatIngestor Documentation 1.2 Try it out If you want to try ThreatIngestor right now, here’s the quickest way to get up and running: First, make sure you have Python 3.6+ and pip installed: $ python3 -V Python 3.6.6 $ python3 -m pip -V pip from (python 3.6) (If you don’t, you’ll want to find installation instructions for Python and pip specific to your operating system.) Next, install ThreatIngestor and the dependencies we’ll be using: python3 -m pip install threatingestor feedparser hug Download this example configuration file, and run ThreatIngestor: threatingestor inquest-blog-sqlite.yml After several seconds, the command should exit without any errors, and you should see a new file artifacts.db in the same folder where you ran the command. That’s where all the intel we gathered is stored. Fire up the quick web interface that comes with ThreatIngestor: hug -m threatingestor.extras.webapp And open http://localhost:8000/ in your web browser. You should see something like this: Fig. 2: The ThreatIngestor quick web interface, index. Click on one of the links to view all the artifacts of that type that were collected. That’s it! 4 Chapter 1. Welcome to ThreatIngestor
ThreatIngestor Documentation Fig. 3: The ThreatIngestor quick web interface, domains table. In a real environment, you would probably use something like ThreatKB or MISP to store your artifacts, instead of just an SQLite database like the one this quick web interface is reading from. If you wanted to do some automated investigation of the things you find, instead of just tossing them into a database, you could do that too. For more ThreatIngestor tutorials, take a look at the InQuest blog. 1.3 Support If you need help getting set up, or run into any issues, feel free to open an Issue. You can also reach out to @InQuest on Twitter. We’d love to hear any feedback you have on ThreatIngestor, its documentation, or how you’re putting it to work for you! 1.3. Support 5
ThreatIngestor Documentation 6 Chapter 1. Welcome to ThreatIngestor
CHAPTER 2 Installation ThreatIngestor requires Python 3.6+. You may need to install the Python development headers seperately. On Ubuntu/Debian-based systems, try: sudo apt-get install python3-dev Then install threatingestor from pip: pip install threatingestor By default, threatingestor does not pull all dependencies for plugins you may not use. If you want to use a certain plugin, you’ll need to pull in its dependencies as well. For example, if you want to use SQS queues: pip install threatingestor[sqs] If you want to use Beanstalk and Twitter: pip install threatingestor[beanstalk,twitter] Or if you don’t know what you might need, and want to just pull in everything: pip install threatingestor[all] Note: If you’d like to use the git source, you will also need to have Git installed. If you want to use the notification support, install Notifiers separately: pip install notifiers. 7
ThreatIngestor Documentation 8 Chapter 2. Installation
CHAPTER 3 Basic Usage All ThreatIngestor configuration is done via YAML. If you’re not familiar with YAML, Ansible has a YAML syntax guide that goes over some of the basics. For the purposes of this documentation, we’ll assume no prior knowledge of YAML. In the use cases below, we’ll go into detail on how ThreatIngestor config is layed out, and give some concrete examples you can use right away. 3.1 Minimal Case For the most basic ThreatIngestor setup, you will want to configure at least one source, one operator, and set the general settings (as shown below). First create a new config.yml file, and add the general section: general: daemon: true sleep: 900 state_path: state.db Configure ThreatIngestor to run continuously or manually. If you set daemon to true, ThreatIngestor will watch your sources in a loop; set it to false to run manually, or via cron or some other scheduler. Set sleep to the number of seconds to wait between each check - this will be ignored if daemon is set false. Don’t set the sleep too low, or you may run into rate limits or other issues. If in doubt, keep this above 900 (fifteen minutes). The state_path should be a local or absolute path where ThreatIngestor will write out the state database, which is used internally to track where it left off in each source (e.g. the most recent blog post processed from an RSS feed). Next, create the sources section, and add your sources. To configure the source, you should give it a unique name like inquest-rss. Each source also uses a module like twitter, rss, or sqs. Choose the module for the expected format of the source data. For easy testing, we’ll use an RSS source and a CSV operator for this example: sources: - name: inquest-rss (continues on next page) 9
ThreatIngestor Documentation (continued from previous page) module: rss url: http://blog.inquest.net/atom.xml feed_type: messy Note the dash before the name key, signifying this and the following keys are part of a single list element. We’ll circle back to this distinction below in the “Standard Case” walkthrough. For this source, we assign a name inquest-rss, tell it to use the rss module, and fill in the required options for the rss module, which are url and feed_type. Note: To see what configuration options each module allows, check out the corresponding documentation on the Source Plugins and Operator Plugins pages. Similarly, the operators identify a name, a module, and other settings for output of information extracted from the sources. operators: - name: csv module: csv filename: output.csv Here we create an operator using the csv module, name it csv, and specify a filename where we want to store the output. Note again the dash before the name key. Putting it all together, here’s our completed config.yml file: general: daemon: true sleep: 900 state_path: state.db sources: - name: inquest-rss module: rss url: http://blog.inquest.net/atom.xml feed_type: messy operators: - name: csv module: csv filename: output.csv Now that the config file is all set up, run ThreatIngestor: threatingestor config.yml It should write out a output.csv file that looks something like this: URL,http://purl.org/dc/dcmitype/,http://blog.inquest.net/blog/2018/02/07/cve-2018- ˓→4878-adobe-flash-0day-itw/,"\n On February 1st, Adobe published bulletin APSA18-01 ˓→ for CVE-2018-4878 describing a use-after-free (UAF) vulnerability affecting Flash ˓→ve..." Domain,purl.org,http://blog.inquest.net/blog/2018/02/07/cve-2018-4878-adobe-flash- ˓→0day-itw/,"\n On February 1st, Adobe published bulletin APSA18-01 for CVE-2018- ˓→4878 describing a use-after-free (UAF) vulnerability affecting Flash ve..." URL,http://purl.org/dc/elements/1.1,http://blog.inquest.net/blog/2018/02/07/cve-2018- ˓→4878-adobe-flash-0day-itw/,"\n On February 1st, Adobe published bulletin APSA18-01 ˓→ for CVE-2018-4878 describing a use-after-free (UAF) vulnerability affecting Flash (continues on next page) ˓→ve..." 10 Chapter 3. Basic Usage
ThreatIngestor Documentation (continued from previous page) ... Assuming you are running in daemon mode, ThreatIngestor will continue to check the blog and append new artifacts to the CSV as it finds them. For further configuration, continue to the Standard Case section or see the detailed sections about source plugins, and operator plugins. 3.2 Standard Case Generally, you are going to want multiple sources feeding into one or more operators. Let’s consider this standard use case: Create your config.yml: general: daemon: true sleep: 900 state_path: state.db For Twitter integration, you’ll need to grab the tokens, keys, and secrets for your Twitter account. Follow these steps from the Twitter documentation: https://developer.twitter.com/en/docs/basics/authentication/guides/access-tokens. For ThreatKB, while logged in to your ThreatKB instance, click the profile dropdown in the top right of the page, then choose “My API Keys”. Click the “+” to generate a new token/key pair, and copy them somewhere safe. Once you have all the secrets you need, create a new section in your config file called credentials, and two list elements inside it for Twitter and ThreatKB: 3.2. Standard Case 11
ThreatIngestor Documentation credentials: - name: twitter-auth # https://dev.twitter.com/oauth/overview/application-owner-access-tokens api_key: api_secret_key: access_token: access_token_secret: - name: threatkb-auth url: https://mythreatkb token: MYTOKEN secret_key: MYKEY The dash before each name key signifies the start of a new element in the credentials list. This allows us to define an unlimited number of reusable credential sets, which we can reference by name in the sources and operators we’ll define next. Fill out the rest of the ThreatIngestor configuration file with the sources and operators: sources: - name: twitter-inquest-c2-list module: twitter credentials: twitter-auth # https://dev.twitter.com/rest/reference/get/lists/statuses owner_screen_name: InQuest slug: c2-feed - name: twitter-hxxp-no-opendir module: twitter credentials: twitter-auth # https://developer.twitter.com/en/docs/tweets/search/api-reference/get-search- ˓→tweets.html q: hxxp -open - name: rss-vendor-x module: rss url: https://example.com/rss.xml feed_type: messy - name: rss-vendor-y module: rss url: https://example.com/rss.xml feed_type: messy operators: - name: mythreatkb # Send artifacts to a ThreatKB instance module: threatkb credentials: threatkb-auth state: Inbox Now that everything is all set up, run the ingestor: threatingestor config.yml You should see your ThreatKB Inbox start filling up with newly extracted C2 IPs and domains. 12 Chapter 3. Basic Usage
CHAPTER 4 Example Workflows The standard use case for ThreatIngestor is pretty simple - just pull from Twitter and RSS, extract IOCs, and send them to ThreatKB. That said, there is a lot more you can do with just a few changes to the configuration file. Here, we’ll go over some more advanced use cases, to give you an idea what this tool can do. 4.1 Multiple Operators By adding more than one operator, you can tell ThreatIngestor to send artifacts to multiple locations. This might be useful if you want to send to ThreatKB while also writing out a local log file. Combine this with a few operator options though, and you can now send specific artifacts to different operators depending on type, source, or advanced filters. Consider the following workflow: 13
ThreatIngestor Documentation We want artifacts from “Twitter C2 List” and “Vendor X Blog” to go directly to ThreatKB. URLs and domains from “Twitter Search: #opendir” and “Domain Masquerade Feed” should go to our crawler, which will look for malicious content or evidence of phishing attacks. Any URLs from “Twitter Search: virustotal.com” that match the filter for a direct URL to a sample should be sent to our “Automated Analysis” system, which will log in to VirusTotal, download the sample, and analyze it. We don’t want to see VirusTotal links or open directories in ThreatKB though, because those aren’t C2s. This config accomplishes all of that: general: daemon: true sleep: 900 state_path: state.db credentials: - name: twitter-auth api_key: api_secret_key: access_token: access_token_secret: - name: threatkb-auth url: http://mythreatkb token: MYTOKEN secret_key: MYKEY - name: aws-auth aws_access_key_id: MYKEY aws_secret_access_key: MYSECRET aws_region: MYREGION (continues on next page) 14 Chapter 4. Example Workflows
ThreatIngestor Documentation (continued from previous page) sources: - name: twitter-feed-c2 module: twitter credentials: twitter-auth owner_screen_name: InQuest slug: c2-feed - name: twitter-search-opendir module: twitter credentials: twitter-auth q: '#opendir' - name: twitter-search-vt module: twitter credentials: twitter-auth q: virustotal.com - name: vendor-x module: rss url: http://example.com/rss.xml feed_type: messy - name: domain-masq-feed module: web url: http://example.com/feed.txt operators: - name: my-threatkb module: threatkb credentials: threatkb-auth allowed_sources: [twitter-feed-c2, vendor-x] state: Ingestor - name: my-crawler module: sqs credentials: aws-auth allowed_sources: [twitter-search-opendir, domain-masq-feed] artifact_types: [URL] queue_name: crawler domain: {domain} url: {url} source_type: url - name: my-analyzer module: sqs credentials: aws-auth allowed_sources: [twitter-search-vt] filter: https?://virustotal.com/.*/analysis artifact_types: [URL] queue_name: analyzer url: {url} source_type: virustotal Note that in this example, our Crawler and Automated Analysis systems will be watching the configured SQS queues for new artifacts. You can use SQS, or add your own custom operator plugins to send artifacts wherever you want. 4.1. Multiple Operators 15
ThreatIngestor Documentation 4.2 Full-Circle ThreatIngestor can both read from and write to SQS queues, which allows us to set up a “full circle” workflow. (Note that you can also replace SQS with Beanstalk or custom plugins to achieve the same effect.) In this workflow, we can extract artifacts from a source, send them off to some SQS listener for processing, and that listener can send the processed content back into ThreatIngestor’s input queue for extraction. Consider the following workflow: Here, we have two Twitter sources: our C2 list and a search for “pastebin.com ioc”, and one SQS source: the input queue. We then have two operators: ThreatKB, and an SQS Pastebin Processor application. We want all the C2s we pull from the Twitter C2 list to go directly to ThreatKB. We also want any pastebin links from either Twitter source to be sent to the SQS Pastebin Processor. That Processor will grab the raw text from the pastebin link, and send it to the ThreatIngestor input queue, where all the IOCs will be extracted and sent to ThreatKB for further analysis. Here’s an example config file that accomplishes all that: general: daemon: true sleep: 900 state_path: state.db credentials: - name: twitter-auth api_key: api_secret_key: access_token: access_token_secret: - name: threatkb-auth url: http://mythreatkb token: MYTOKEN secret_key: MYKEY - name: aws-auth aws_access_key_id: MYKEY aws_secret_access_key: MYSECRET aws_region: MYREGION sources: - name: twitter-feed-c2 module: twitter credentials: twitter-auth owner_screen_name: InQuest slug: c2-feed (continues on next page) 16 Chapter 4. Example Workflows
ThreatIngestor Documentation (continued from previous page) - name: twitter-search-pastebin module: twitter credentials: twitter-auth q: pastebin.com ioc - name: sqs-input module: sqs credentials: aws-auth queue_name: threatingestor operators: - name: my-threatkb module: threatkb credentials: threatkb-auth allowed_sources: [sqs-input, twitter-feed-c2] state: Ingestor - name: pastebin-processor module: sqs credentials: aws-auth allowed_sources: [twitter-feed-c2, twitter-search-pastebin] artifact_types: [URL] filter: https?://pastebin.com/.+ queue_name: pastebin-processor url: {url} 4.3 Queue Workers The ThreatIngestor plugin architecture lets developers integrate with external systems with relative ease - but not everything makes sense as a plugin. Both source and operator plugins are expected to run to completion quickly, then exit and wait for the next run before working again. For long-running tasks (think VirusTotal / MultiAV scan, malware sandbox, web crawler, domain brute force, etc), implementing them as plugins that block until completion would break the workflow. Instead, consider using a queue workflow. In a typical queue workflow, an operator should queue up jobs for each artifact it receives (typically with SQS or Beanstalk), and an external tool we’ll call a queue worker should read from that queue and perform any necessary long-running tasks. When the tasks are complete, the queue worker should send a job to another queue, where it can be picked up by a ThreatIngestor queue source (like the SQS and Beanstalk sources). Note: In the “Full-Circle” workflow above, the “SQS Pastebin Processor” is a queue worker. Lets look at an example of a queue workflow using one of the provided queue workers, the File System Watcher. Let’s say we want to watch a directory for new YARA rules, and automatically send them to our MISP server. Here’s how the ThreatIngestor config would look: 4.3. Queue Workers 17
ThreatIngestor Documentation general: daemon: true sleep: 900 state_path: state.db credentials: - name: misp-auth url: http://mymisp key: MYKEY ssl: false - name: aws-auth aws_access_key_id: MYKEY aws_secret_access_key: MYSECRET aws_region: MYREGION sources: - name: fs-watcher module: sqs credentials: aws-auth queue_name: yara-rules paths: [content] reference: filename operators: - name: misp module: misp credentials: misp-auth artifact_types: [YARASignature] In a separate file (we’ll use fswatcher.yml), set up the config for the queue worker: module: sqs aws_access_key_id: MYKEY aws_secret_access_key: MYSECRET aws_region: MYREGION out_queue: yara-rules watch_path: MY_RULES_FOLDER Run the included File System Watcher: python3 -m threatingestor.extras.fswatcher fswatcher.yml When new YARA rules are added to MY_RULES_FOLDER, the File System Watcher sends jobs to the yara-rules queue: { "rules": "rule myNewRule { condition: false }", "filename": "mynewrule.yara" } Run ThreatIngestor, and it’ll read from the yara-rules queue, extracting artifacts from the content field in the job, and using the filename as the artifact’s reference text. When it finds YARA rules, it will send them off through the MISP operator. By combining custom plugins with custom queue workers, developers can extend ThreatIngestor functionality to fit arbitrarily complex intel workflows. 18 Chapter 4. Example Workflows
ThreatIngestor Documentation 4.4 Automate as Much as Possible Everything in ThreatIngestor is built around the basic idea that some intel tasks can be automated, and some can’t. The goal, then, is to automate everything that can be, and give as much information to the person doing the analysis as possible. Up to this point, all our workflows have followed pretty similar patterns: we read in a bunch of information, extract what looks interesting, and send it off for storage somewhere. We’re assuming there’s an analyst at the end of that process, looking at the information we’ve extracted, weeding out false positives, and making decisions on what is actually important. ThreatIngestor provides the artifacts, and some context to give the analyst a starting point to begin their research. But could we go a step further, and automate some of the repetitive research tasks too? Let’s see how far we can take this. . . 4.4.1 Investigating network artifacts URLs, domains, and IP addresses all represent some kind of network resource, but what we want to do with them can be completely different depending on the context. Suppose we’re getting some network artifacts that we know are C2 endpoints. For these, the end goal is to verify they’re malicious, and block any communication with them to prevent malicious activity. We have some feeds that tell us about active attacks coming from certain IPs. These could be from something like failed SSL login attempts in our server logs, public honeypots, or sites like DShield that monitor global attack patterns. Depending on the severity and trustworthiness of the source, we might want to just block these, or dig up some extra information to see if we need to take more specific action. We’re also getting another set of network artifacts that we know are “open directories” - publicly accessible links a malicious actor might have used as a drop site for data exfiltration, or to host tools to help them carry out attacks. These can be a treasure trove of new malware samples, stolen information, and clues to help explore the methods of malicious actors; but they often disappear quickly after they’ve been discovered by a security researcher. For these, the end goal is to clone all the content as quickly and safely as possible, and save it for later investigation. Other sources are feeding us links to live malicious content: maybe a malware sample we can download from a sandbox or muti-AV, an exploit being used to deliver malicious content, or a second-stage payload being downloaded by a dropper. Whatever it is, the end goal for us is to download and analyze the content, and figure out how we can protect against it. Finally, we’re also getting some artifacts that look like “suspicious masquerades” - websites pretending to be a login page for a bank, a Google account, or some other legitimate resource. For these, the end goal is to crawl the contents and save them for comparison (we can use this information for attribution - linking them back to malicious actors or phishing toolkits), then make sure we’re blocking them so no one accidently falls victim to the phishing attempts. In all of these cases, the automatable actions boil down to a few things: • Collect metadata (whois, GeoIP, dig, . . . ) • Collect content (download, crawl) • Enrich from public resources (check block lists, reputation databases, network scans like Shodan, . . . ) • Block the resource (modify firewalls, generate rules for IDS/IPS, . . . ) • Share intelligence (publish intel feeds, push to a ThreatKB/MISP instance, post to places like Twitter and Slack, ...) Some of these, like the intel sharing, can be set up as simple operators. Others, like checking whois records, or kicking off a crawler, can be queue workers that know what to do with the enrichment information after they gather it. 4.4. Automate as Much as Possible 19
ThreatIngestor Documentation Often, we’ll be enriching artifacts with this additional information. But with the right sources, we can help weed out false positives too! Decreasing the amount of noise the analyst sees saves time and effort for more important things. If we see a domain in a list of known-good sites, maybe we just delete the artifacts altogether, or flag them as probable false positives and provide context as to why. 4.4.2 Investigating file artifacts Hashes, YARA signatures, and sometimes URLs can all carry information about interesting files. When we’re using Twitter and RSS sources, the most common file artifacts will most likely be hashes. These are typ- ically either malicious software samples (executables, PDF or Word documents, etc), or “dropped files” that were left behind as traces of a sample’s execution. Obtaining the original hashed file is sometimes possible through paywalled services like VirusTotal Enterprise, searching free malware corpora, or simply asking the threat intel community if anyone has a copy of the file. If those methods fail, the hash can still be used as a universally understandable reference to uniquely identify the file and perhaps find scan results or existing research describing the file’s capabilities. YARA signatures can be run over existing malware corpora, or used with threat hunting services like those provided by VirusTotal Enterprise or Hybrid Analysis YARA search, to find matching files. URLs to “open directories,” direct downloads, or mirrored samples hosted by threat intel sites are a great way to get copies of a file for more detailed analysis. When working with files, the automatable actions look something like this: • Find samples (download from a URL, find public samples from a hash, run YARA signatures over a corpora to find matches, . . . ) • Enrich from public resources (search for a hash on multi-AV and sandbox sites, check reputation databases, . . . ) • Perform automated static analysis (AV scan, metadata extraction, . . . ) • Perform automated dynamic analysis (run in a sandbox) • Save the file somewhere for manual analysis • Block the file (generate YARA signatures, add hashes to a block list, . . . ) • Share intelligence (publish intel feeds, push to ThreatKB/MISP instance, mirror content for download, post to places like Twitter and Slack, . . . ) Again, some of these can be accomplished with operator plugins, while others will require custom queue workers. 4.4.3 Doing it all The filtering capabilities of ThreatIngestor mean that no matter what your workflow looks like, you should always be able to automate everything with a single config file. Lets see what it looks like if we put everything together in once place: 20 Chapter 4. Example Workflows
ThreatIngestor Documentation And the ThreatIngestor config file: general: daemon: true sleep: 900 state_path: state.db credentials: - name: twitter-auth # https://dev.twitter.com/oauth/overview/application-owner-access-tokens api_key: api_secret_key: access_token: access_token_secret: - name: github-auth username: user # Could also use password instead https://github.blog/2013-05-16-personal-api- ˓→tokens/ # https://github.com/settings/tokens token: TOKEN_OR_PASSWORD - name: threatkb-auth url: http://mythreatkb token: MYTOKEN secret_key: MYKEY (continues on next page) 4.4. Automate as Much as Possible 21
ThreatIngestor Documentation (continued from previous page) - name: misp-auth url: http://mymisp key: MYKEY ssl: false - name: aws-auth aws_access_key_id: MY_KEY aws_secret_access_key: MY_SECRET aws_region: MY_REGION sources: - name: twitter-feed-c2 module: twitter credentials: twitter-auth owner_screen_name: InQuest slug: c2-feed - name: twitter-open-directory module: twitter credentials: twitter-auth # https://developer.twitter.com/en/docs/tweets/search/api-reference/get-search- ˓→tweets.html q: '"open directory" #malware' - name: twitter-search-opendir module: twitter credentials: twitter-auth q: '#opendir' - name: twitter-masq module: twitter credentials: twitter-auth q: "domain masquerade" - name: twitter-search-vt module: twitter credentials: twitter-auth q: virustotal.com - name: twitter-search-pastebin module: twitter credentials: twitter-auth q: pastebin.com ioc - name: github-cve18 module: github credentials: github-auth search: CVE-2018- - name: git-yara-rules module: git url: https://github.com/InQuest/yara-rules.git local_path: /opt/threatingestor/git/yara-rules - name: rss-myiocfeed module: rss (continues on next page) 22 Chapter 4. Example Workflows
ThreatIngestor Documentation (continued from previous page) url: https://example.com/rss.xml feed_type: messy - name: rss-vendor-x module: rss url: http://example.com/rss.xml feed_type: messy - name: sqs-input module: sqs credentials: aws-auth queue_name: threatingestor paths: [content] reference: reference - name: sqs-fswatcher module: sqs credentials: aws-auth queue_name: fswatcher paths: [content] reference: filename - name: domain-masq-feed module: web url: http://example.com/masquerades.txt - name: attack-feed module: web url: http://example.com/attacks.txt operators: - name: mythreatkb module: threatkb credentials: threatkb-auth allowed_sources: [twitter-feed-c2, rss-.*, git-.*, sqs-.*] state: Inbox - name: mymisp module: misp credentials: misp-auth - name: pastebin-processor module: sqs credentials: aws-auth allowed_sources: [twitter-feed-c2, twitter-search-pastebin] artifact_types: [URL] filter: https?://pastebin.com/.+ queue_name: pastebin-processor url: {url} - name: my-crawler module: sqs credentials: aws-auth allowed_sources: [twitter-search-opendir, domain-masq-feed] artifact_types: [URL] queue_name: crawler domain: {domain} (continues on next page) 4.4. Automate as Much as Possible 23
ThreatIngestor Documentation (continued from previous page) url: {url} source_type: url - name: my-analyzer module: sqs credentials: aws-auth allowed_sources: [twitter-search-vt] filter: https?://virustotal.com/.*/analysis artifact_types: [URL] queue_name: analyzer url: {url} source_type: virustotal - name: osint-enrich-domain module: sqs credentials: aws-auth artifact_types = [URL] filter: is_domain queue_name: osint-enrich-domain domain: {domain} - name: osint-enrich-ip module: sqs credentials: aws-auth artifact_types = [URL] filter: is_ip queue_name: osint-enrich-ip ip: {domain} - name: repdb-check module: sqs credentials: aws-auth artifact_types = [URL, IPAddress, Domain] queue_name: repdb-check artifact: {artifact} - name: yara-scan module: sqs credentials: aws-auth artifact_types = [YARASignature] queue_name: yara-scan rule: {artifact} - name: virustotal-downloader module: sqs credentials: aws-auth artifact_types = [Hash, URL] allowed_sources: [twitter-search-vt] queue_name: vt-downloader content: {artifact} Hopefully, this gives some idea what exactly ThreatIngestor is capable of. Whether you are looking to detect and respond to zero-day threats, keep up with the intel community, share your own research, or just block phishing domains on your home network, anything is possible. 24 Chapter 4. Example Workflows
CHAPTER 5 Source Plugins For each source specified, ThreatIngestor handles artifact import. Sources may link to Twitter, Blogs, etc. Artifacts are imported from those sources and could include URLs, IP Addresses, YARA Signatures, etc. All source plugins maintain state between runs, allowing them to skip previously processed artifacts and get right to work finding new indicators. To add a source to your configuration file, include a section like this: sources: - name: mysource module: mysourcemodule You can add as many sources as you need, all under the same sources: list. sources: - name: mysource module: mysourcemodule - name: myothersource module: mysourcemodule Note the use of dashes to signify the start of each item in the list, and matching indentation for all the keys within each item. The module option must match one of the sources listed below, or your custom source. The name is freeform. All sources allow credentials such as usernames, passwords, OAuth tokens, etc to be defined in a seperate credentials section and referenced by name with a credentials keyword. Consider a plugin that accepts a token and a secret. In config.yml, you would set it up the credentials and sources sections like this: credentials: - name: mysource-auth token: MYTOKEN secret: MYSECRET sources: (continues on next page) 25
ThreatIngestor Documentation (continued from previous page) - name: mysource credentials: mysource-auth This allows the same credentials to be reused for several different sources (or operators) without having to duplicate them in each source definition. 5.1 Available Plugins The available source plugins are: 5.1.1 Beanstalk The Beanstalk source can be used to read content from Beanstalk queues. This, combined with the Beanstalk Opera- tor, allows a full-circle workflow. Configuration Options • module (required): beanstalk • paths (required): A list of XPath-like expressions representing the JSON fields you want to extract from. • reference: An XPath-like expression representing the JSON field you want to use as a reference. (default: source name). • host (required): Host to connect to. • port (required): Port to connect over. • queue_name (required): The name of the Beanstalk queue you want to use. Example Configuration Inside the sources section of your configuration file: - name: beanstalk-input module: beanstalk paths: [content] reference: reference host: 127.0.0.1 port: 11300 queue_name: MYQUEUENAME If you are expecting JSON jobs in the Beanstalk queue that look like this: { "content": "freeform text", "reference": "http://example.com" } The above config will extract artifacts from the value of the content key, and use the value of the reference key as the artifact’s reference. If you instead had JSON jobs like this: 26 Chapter 5. Source Plugins
ThreatIngestor Documentation { "data": { "text": "freeform text", "more": "more text", "ref": "http://example.com" } } And you want to extract from text and more, with ref as a reference, you could set up your config to account for the more complex JSON structure: paths: [data.text, data.more] reference: data.ref This flexibility allows easier integration with arbitrary systems. 5.1.2 Git The first time it’s run, each Git source will clone the configured repository, look for any files matching *.{rule, rules,yar,yara}, and extract YARA rules. On any subsequent runs, it will run git pull, check for new and updated files matching the same patterns, and extract YARA rules from those files. Configuration Options • module (required): git • url (required): URL (can be https, git, ssh, etc) of remote to clone. • local_path (required): folder on disk (relative or absolute) to clone into. Example Configuration Inside the sources section of your configuration file: - name: inquest-yara-rules module: git url: https://github.com/InQuest/yara-rules.git local_path: /opt/threatingestor/git/yara-rules 5.1.3 GitHub Repository Search The GitHub source plugin uses GitHub’s repository search API to find new interesting repos, and create a Task artifact for each. Configuration Options • module (required): github • search (required): search term(s). • username: Optional username for authentication. • token: Optional token or password for authentication. 5.1. Available Plugins 27
ThreatIngestor Documentation Example Configuration The following examples all assume GitHub credentials have already been configured in the credentials section of the config, like this: credentials: - name: github-auth username: myuser token: MYTOKEN Note: GitHub credentials are optional, but increase the rate limit for API requests significantly. If you are doing more than one or two low- volume searches, you should set up the credentials. Inside the sources section of your configuration file: - name: github-cve-repos credentials: github-auth module: github search: CVE-2018- 5.1.4 RSS The RSS source pulls from standard RSS and Atom feeds, and extracts artifacts from within the feed content. It does not follow links to full blog posts. For each RSS feed, you’ll need to define a feed_type for IOC extraction. Valid feed types are: • messy: Only look at obfuscated URLs, assume all IPs are valid. • clean: Treat everything as valid C2 URL/IP. • afterioc Treat everything after the last occurance of the string “Indicators of Compromise” as valid C2 URL/IP. Configuration Options • module (required): rss • feed_type (required): see above; if unsure, use messy. • url (required): URL to the RSS or Atom feed. Example Configuration Inside the sources section of your configuration file: - name: rss-myiocfeed module: rss url: https://example.com/rss.xml feed_type: messy 28 Chapter 5. Source Plugins
ThreatIngestor Documentation 5.1.5 SQS The SQS source can be used to read content from Amazon SQS queues. This, combined with the SQS Operator, allows a full-circle workflow. Configuration Options • module (required): sqs • paths (required): A list of XPath-like expressions representing the JSON fields you want to extract from. • reference: An XPath-like expression representing the JSON field you want to use as a reference. (default: source name). • aws_access_key_id (required): Your AWS access key ID. • aws_secret_access_key (required): Your AWS secret access key. • aws_region (required): Your AWS region name. • queue_name (required): The name of the SQS queue you want to use. Example Configuration The following example assumes AWS credentials have already been configured in the credentials section of the config, like this: credentials: - name: aws-auth aws_access_key_id: MYKEY aws_secret_access_key: MYSECRET aws_region: MYREGION Inside the sources section of your configuration file: - name: sqs-input module: sqs paths: [content] reference: reference credentials: aws-auth queue_name: MYQUEUENAME If you are expecting JSON jobs in the SQS queue that look like this: { "content": "freeform text", "reference": "http://example.com" } The above config will extract artifacts from the value of the content key, and use the value of the reference key as the artifact’s reference. If you instead had JSON jobs like this: { "data": { "text": "freeform text", "more": "more text", (continues on next page) 5.1. Available Plugins 29
ThreatIngestor Documentation (continued from previous page) "ref": "http://example.com" } } And you want to extract from text and more, with ref as a reference, you could set up your config to account for the more complex JSON structure: paths: [data.text, data.more] reference: data.ref This flexibility allows easier integration with arbitrary systems. 5.1.6 Twitter The Twitter source can use several Twitter API endpoints out of the box: @mentions, Twitter lists, user timeline, and standard search. Configuration Options • module (required): twitter • api_key (required): Consumer API key (See Twitter oauth docs). • api_secret_key (required): Consumer API secret key (See Twitter oauth docs). • access_token (required): Twitter access token (See Twitter oauth docs). • access_token_secret (required): Twitter access token secret (See Twitter oauth docs). • defanged_only: Defaults to true. If set to false, the Twitter source will include all expanded links found in Tweets. If set to true, it will include only defanged links. After the above general options, you may include valid options for one of the supported Twitter endpoints, as described below. (If you do not include any extra options, the Twitter plugin will default to reading from your @mentions.) Any extra options defined in the config will be passed in directly to the Twitter endpoint, so you can configure some extra options not shown here. See the relevant Twitter documentation for more information on supported parameters. Mentions: This is the default behavior. Twitter list: • owner_screen_name: Twitter user who owns the list. • slug: The name of the Twitter list. Twitter user timeline: • screen_name: Twitter user to watch. Twitter search: • q: Twitter search term, can be multiple words including hashtags. 30 Chapter 5. Source Plugins
ThreatIngestor Documentation Example Configuration The following examples all assume Twitter credentials have already been configured in the credentials section of the config, like this: credentials: - name: twitter-auth api_key: MY_KEY api_secret_key: MY_SECRET_KEY access_token: MY_TOKEN access_token_secret: MY_TOKEN_SECRET Inside the sources section of the config, create a new item for the source you wish to define. Examples for each of the supported Twitter endpoints are provided below. Mentions: - name: twitter-my-mentions module: twitter credentials: twitter-auth Twitter list: - name: twitter-inquest-c2-list module: twitter credentials: twitter-auth owner_screen_name: InQuest slug: c2-feed Twitter user timeline: - name:twitter-inquest-timeline module: twitter credentials: twitter-auth screen_name: InQuest Twitter search: - name: twitter-open-directory module: twitter credentials: twitter-auth q: '"open directory" #malware' Note: When searching for Twitter hashtags, be sure to put quotes around your search term, as shown in the example above. Otherwise, the # character will be treated as the beginning of a YAML comment. 5.1.7 Web The Web source will periodically check a URL for changes, and extract any artifacts it finds. This is useful for ingesting threat intel feeds that don’t already have a ThreatIngestor source plugin, without having to write your own custom plugin. Use it for plaintext IP blacklists, C2 URL CSVs, and more. 5.1. Available Plugins 31
ThreatIngestor Documentation Configuration Options • module (required): web • url (required): URL of the web content you want to poll. Example Configuration Inside the sources section of your configuration file: - name: mylist module: web url: http://example.com/feed.txt 32 Chapter 5. Source Plugins
CHAPTER 6 Operator Plugins Operator plugins handle artifact export. They can be configured to only send certain artifact types, only send artifacts from certain sources, filter down artifacts to only those matching a certain regex, and more. To add an operator to your configuration file, include a section like this: operators: - name: myoperator module: myoperatormodule The module option must match one of the operators listed below, or your custom operator. The following options are globally accepted by all operators: • allowed_sources: List (in YAML syntax) of source names to allow. • artifact_types: List (in YAML syntax) of artifact types to allow. • filter: A regex, or comma-separated list (not in YAML syntax) of some special keywords. All of these options are inclusive, so only artifacts matching the restrictions will be sent through the operator. Example: sources: - name: mysource module: mysourcemodule - name: myothersource module: mysourcemodule operators: - name: non-ip-based-urls module: myoperatormodule allowed_sources: [mysource] filter: is_domain artifact_types: [URL] (continues on next page) 33
ThreatIngestor Documentation (continued from previous page) - name: google-domain-masquerade module: myoperatormodule allowed_sources: [mysource, myothersource] filter: ([^\.]google.com$|google.com[^/]) artifact_types: [URL, Domain] By combining these three options, you can include any number of different sources and operators in your config, and still only send exactly the artifacts you want to each operator. All operators allow credentials such as usernames, passwords, OAuth tokens, etc to be defined in a seperate credentials section and referenced by name with a credentials keyword. Consider a plugin that accepts a token and a secret. In config.yml, you would set it up the credentials and operators sections like this: credentials: - name: myoperator-auth token: MYTOKEN secret: MYSECRET operators: - name: myoperator credentials: myoperator-auth This allows the same credentials to be reused for several different operators (or sources), without having to duplicate them in each operator definition. 6.1 Available Plugins The available operator plugins are: 6.1.1 Beanstalk Beanstalk is a simple work queue server, that may be easier to get started with than Amazon SQS for those who don’t already have AWS accounts. The Beanstalk operator enables you to send output to work queues, which you can then consume from Beanstalk sources, or external applications. This operator is extremely flexible, as it accepts arbitrary config options and passes them through to the queue. Configuration Options • module (required): beanstalk • host (required): Host to connect to. • port (required): Port to connect over. • queue_name (required): The name of the Beanstalk tube you want to use. Any other options defined in the Beanstalk operator section will be passed in to your queue as part of a JSON object, after string interpolation to fill in artifact content. For example, {domain} will be replaced with the C2 domain being exported. 34 Chapter 6. Operator Plugins
ThreatIngestor Documentation Example Configuration Inside the operators section of your configuration file: - name: my-beanstalk-queue module: beanstalk host: 127.0.0.1 port: 11300 queue_name: my-queue domain: {domain} url: {url} source_type: url download_path: /data/ingestor In this example, the resulting JSON object for a URL artifact of http://example.com/ sent to the Beanstalk queue would be: { "domain": "example.com", "url": "http://example.com/", "source_type": "url", "download_path": "/data/ingestor" } 6.1.2 CSV File The most basic of the included operators, the CSV operator simply writes extracted artifacts to a CSV file. The columns in the file are, in order: 1. Artifact type (URL, Domain, IPAddress, etc) 2. Artifact content (example.com, 1.1.1.1) 3. Reference link (URL of the source tweet, blog post, etc) 4. Reference text (Tweet text, snippet from a blog post, etc) This operator often comes in handy if you want to quickly and easily test your ThreatIngestor configuration is working as expected. Configuration Options • module (required): csv • filename (required): filename with relative or absolute path. Example Configuration Inside the operators section of your configuration file: - name: mycsv module: csv filename: output.csv 6.1. Available Plugins 35
ThreatIngestor Documentation 6.1.3 MISP The MISP operator will send extracted artifacts to your MISP instance, as objects attached to events. When this plugin is configured, events should show up on your MISP instance with the name “ThreatIngestor Event: {SOURCE}”, where “{SOURCE}” is the name of the source plugin that extracted the attached objects. Artifact context (reference link and text, if any) will also be attached to the event, as “internal” objects. The following artifacts are supported by the MISP plugin: • Domains • Hashes (MD5, SHA1, SHA256) • IP Addresses • URLs • YARA Signatures If other artifact types are sent through this plugin, the artifacts will be ignored. Configuration Options • module (required): misp • url (required): Base URL for your MISP instance. • secret_key (required): Your MISP authentication key. • ssl: Verify SSL certificate? (default: true) • tags: List of tags to attach to events (default: [type:OSINT]) Example Configuration The following example assumes MISP credentials have already been configured in the credentials section of the config, like this: credentials: - name: misp-auth url: http://mymisp key: MYKEY ssl: false Inside the operators section of your configuration file: - name: mymisp module: misp credentials: misp-auth 6.1.4 MySQL The MySQL operator feeds artifacts into a single MySQL table. The table defined in the config will be created if it does not exist. The columns in the table are: 1. artifact: Artifact content (example.com, 1.1.1.1, etc). 1. artifact_type: Artifact type (domain, yarasignature, etc). 2. reference_link: URL of the source tweet, blog post, etc. 3. reference_text: 36 Chapter 6. Operator Plugins
ThreatIngestor Documentation Tweet text, snippet from a blog post, etc. 4. created_date: MySQL DATETIME. 5. state: For external use, always NULL. You can use this to keep track of the current investigation status of artifacts, if you so choose. Configuration Options • module (required): mysql • host (required): Database host. • port: Database port (default: 3306). • user (required): Database user (must have table create permission, or insert permission on the existing artifacts table defined below). • password: Password for user. • table (required): Artifacts table (will be created if it does not exist; must follow the required schema). Example Configuration The following example assumes MySQL credentials have already been configured in the credentials section of the config, like this: credentials: - name: mysql-auth host: MYHOST port: MYPORT user: MYUSER password: MYPASSWORD database: MYDATABASE Inside the operators section of your configuration file: - name: my-db module: mysql credentials: mysql-auth table: artifacts 6.1.5 SQLite The SQLite operator feeds artifacts into a simple database, with zero setup required. This operator often comes in handy if you want to quickly and easily test your ThreatIngestor configuration is working as expected, but scales better than the CSV operator. One table will be created per artifact type. The columns in each table are, in order: 1. artifact: Artifact content (example.com, 1.1.1.1, etc). 2. reference_link: URL of the source tweet, blog post, etc. 3. reference_text: Tweet text, snippet from a blog post, etc. 4. created_date: ISO-8601 date string, always UTC. 5. state: For external use, always NULL. You can use this to keep track of the current investigation status of artifacts, if you so choose. 6.1. Available Plugins 37
ThreatIngestor Documentation You can also use the included ThreatIngestor “quick web interface” to get an easier overview of the artifacts in your database, or set up a JSON API with a single command: hug -m threatingestor.extras.webapp Note: Don’t have hug? pip install hug! If you want to use the webapp, make sure your SQLite database is called artifacts.db and in the same folder where you’re running hug. Configuration Options • module (required): sqlite • filename (required): filename with relative or absolute path. Example Configuration Inside the operators section of your configuration file: - name: mysqlite module: sqlite filename: output.db 6.1.6 Amazon SQS The SQS operator allows ThreatIngestor to integrate out-of-the-box with any system that supports reading from SQS queues. This operator is extremely flexible, as it accepts arbitrary config options and passes them through to the queue. Configuration Options • module (required): sqs • aws_access_key_id (required): Your AWS access key ID. • aws_secret_access_key (required): Your AWS secret access key. • aws_region (required): Your AWS region name. • queue_name (required): The name of the SQS queue you want to use. Any other options defined in the SQS operator section will be passed in to your queue as part of a JSON object, after string interpolation to fill in artifact content. For example, {domain} will be replaced with the C2 domain being exported. Example Configuration The following example assumes AWS credentials have already been configured in the credentials section of the config, like this: 38 Chapter 6. Operator Plugins
ThreatIngestor Documentation credentials: - name: aws-auth aws_access_key_id: MYKEY aws_secret_access_key: MYSECRET aws_region: MYREGION Inside the operators section of your configuration file: - name: myqueue module: sqs credentials: aws-auth queue_name: my-queue domain: {domain} url: {url} source_type: url download_path: /data/ingestor In this example, the resulting JSON object for a URL artifact of http://example.com/ sent to the SQS queue would be: { "domain": "example.com", "url": "http://example.com/", "source_type": "url", "download_path": "/data/ingestor" } 6.1.7 ThreatKB The ThreatKB operator will send extracted artifacts to your ThreatKB instance. Configuration Options • module (required): threatkb • url (required): Base URL for your ThreatKB instance. • token (required): Your ThreatKB authentication token. • secret_key (required): Your ThreatKB authentication secret key. • state (required): The State you want assigned to created artifacts. Example Configuration The following example assumes ThreatKB credentials have already been configured in the credentials section of the config, like this: credentials: - name: threatkb-auth url: http://mythreatkb token: MYTOKEN secret_key: MYKEY Inside the operators section of your configuration file: 6.1. Available Plugins 39
You can also read