THREATINGESTOR DOCUMENTATION - INQUEST LABS - JAN 21, 2020 - READ THE DOCS

Page created by Derek Reyes

Society

English

Like
Share
Embed
Fullscreen
Slides
Download HTML
Download PDF
Abuse

←

→

Page content transcription

If your browser does not render page correctly, please read the page content below

ThreatIngestor Documentation

                   InQuest Labs

                       Jan 21, 2020

Contents

1   Welcome to ThreatIngestor                                                                                                                                                                                  3
    1.1 What is ThreatIngestor? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                                                                                              3
    1.2 Try it out . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                                                                                           4
    1.3 Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                                                                                            5

2   Installation                                                                                                                                                                                               7

3   Basic Usage                                                                                                                                                                                                 9
    3.1 Minimal Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                                                                                              9
    3.2 Standard Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                                                                                              11

4   Example Workflows                                                                                                                                                                                          13
    4.1 Multiple Operators . . . . . .                 .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   13
    4.2 Full-Circle . . . . . . . . . .                .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   16
    4.3 Queue Workers . . . . . . . .                  .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   17
    4.4 Automate as Much as Possible                   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   19

5   Source Plugins                                                                                                                                                                                             25
    5.1 Available Plugins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                                                                                            26

6   Operator Plugins                                                                                                                                                                                           33
    6.1 Available Plugins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                                                                                            34

7   Artifacts                                                                                                                                                                                                  41
    7.1 Domains . . . . .      .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   41
    7.2 Hashes . . . . . .     .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   42
    7.3 IP Addresses . . .     .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   42
    7.4 Tasks . . . . . . .    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   42
    7.5 URLs . . . . . . .     .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   42
    7.6 YARA Signatures        .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   42

8   Extras                                                                                                                                                                                                     45
    8.1 Quick Webapp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                                                                                               45
    8.2 Queue Workers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                                                                                              45

9   Observability                                                                                                                                                                                              49
    9.1 Logging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                                                                                            49
    9.2 Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                                                                                            49

                                                                                                                                                                                                                i

9.3   Notifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   50

10 Developing                                                                                                            51
   10.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .       51
   10.2 Source Plugins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .       51
   10.3 Operator Plugins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .       52

11 API Documentation                                                                                                     53
   11.1 threatingestor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     53

12 Glossary                                                                                                              65

Python Module Index                                                                                                      67

Index                                                                                                                    69

ii

ThreatIngestor Documentation

ThreatIngestor is a flexible, configuration-driven, extensible framework for consuming threat intelligence.
It can watch Twitter, RSS feeds, and other sources, extract meaningful information like C2 IPs/domains and YARA
signatures, and send that information to other systems for analysis.
Use ThreatIngestor alongside ThreatKB or MISP to automate importing public C2s and YARA signatures, or integrate
it into your existing workflow with custom operator plugins.

Contents                                                                                                      1

ThreatIngestor Documentation

2                              Contents

CHAPTER           1

                                                                           Welcome to ThreatIngestor

1.1 What is ThreatIngestor?

ThreatIngestor helps you collect threat intelligence from public feeds, and gives you context on that intelligence so
you can research it further, and put it to use protecting yourself or your organization.
There is a never-ending stream of publicly available information on malicious activies online, but compiling all that
information manually can take a lot of manual effort and time. ThreatIngestor automates as much of that work as
possible, so you can focus on more important things.

Fig. 1: A screenshot of Twitter user @MalwareConfig’s feed, showing two tweets with defanged C2 domains and IP
addresses.

Because it is completely modular and configuration-driven, ThreatIngestor is super flexible, and should fit easily into
any threat intel workflow.

                                                                                                                     3

ThreatIngestor Documentation

1.2 Try it out

If you want to try ThreatIngestor right now, here’s the quickest way to get up and running:
First, make sure you have Python 3.6+ and pip installed:

$ python3 -V
Python 3.6.6
$ python3 -m pip -V
pip  from  (python 3.6)

(If you don’t, you’ll want to find installation instructions for Python and pip specific to your operating system.)
Next, install ThreatIngestor and the dependencies we’ll be using:

python3 -m pip install threatingestor feedparser hug

Download this example configuration file, and run ThreatIngestor:

threatingestor inquest-blog-sqlite.yml

After several seconds, the command should exit without any errors, and you should see a new file artifacts.db
in the same folder where you ran the command. That’s where all the intel we gathered is stored.
Fire up the quick web interface that comes with ThreatIngestor:

hug -m threatingestor.extras.webapp

And open http://localhost:8000/ in your web browser. You should see something like this:

                                 Fig. 2: The ThreatIngestor quick web interface, index.

Click on one of the links to view all the artifacts of that type that were collected.
That’s it!

4                                                                          Chapter 1. Welcome to ThreatIngestor

ThreatIngestor Documentation

                           Fig. 3: The ThreatIngestor quick web interface, domains table.

In a real environment, you would probably use something like ThreatKB or MISP to store your artifacts, instead of
just an SQLite database like the one this quick web interface is reading from. If you wanted to do some automated
investigation of the things you find, instead of just tossing them into a database, you could do that too.
For more ThreatIngestor tutorials, take a look at the InQuest blog.

1.3 Support

If you need help getting set up, or run into any issues, feel free to open an Issue. You can also reach out to @InQuest
on Twitter.
We’d love to hear any feedback you have on ThreatIngestor, its documentation, or how you’re putting it to work for
you!

1.3. Support                                                                                                         5

ThreatIngestor Documentation

6                              Chapter 1. Welcome to ThreatIngestor

CHAPTER           2

                                                                                                  Installation

ThreatIngestor requires Python 3.6+.
You may need to install the Python development headers seperately. On Ubuntu/Debian-based systems, try:

sudo apt-get install python3-dev

Then install threatingestor from pip:

pip install threatingestor

By default, threatingestor does not pull all dependencies for plugins you may not use. If you want to use a certain
plugin, you’ll need to pull in its dependencies as well. For example, if you want to use SQS queues:

pip install threatingestor[sqs]

If you want to use Beanstalk and Twitter:

pip install threatingestor[beanstalk,twitter]

Or if you don’t know what you might need, and want to just pull in everything:

pip install threatingestor[all]

Note: If you’d like to use the git source, you will also need to have Git installed.
If you want to use the notification support, install Notifiers separately: pip install notifiers.

                                                                                                                 7

ThreatIngestor Documentation

8                              Chapter 2. Installation

CHAPTER 3

Basic Usage

All ThreatIngestor configuration is done via YAML. If you’re not familiar with YAML, Ansible has a YAML syntax
guide that goes over some of the basics. For the purposes of this documentation, we’ll assume no prior knowledge of
YAML.
In the use cases below, we’ll go into detail on how ThreatIngestor config is layed out, and give some concrete examples
you can use right away.

3.1 Minimal Case

For the most basic ThreatIngestor setup, you will want to configure at least one source, one operator, and set the
general settings (as shown below).
First create a new config.yml file, and add the general section:

general:
daemon: true
sleep: 900
state_path: state.db

Configure ThreatIngestor to run continuously or manually. If you set daemon to true, ThreatIngestor will watch
your sources in a loop; set it to false to run manually, or via cron or some other scheduler. Set sleep to the number
of seconds to wait between each check - this will be ignored if daemon is set false. Don’t set the sleep too low,
or you may run into rate limits or other issues. If in doubt, keep this above 900 (fifteen minutes). The state_path
should be a local or absolute path where ThreatIngestor will write out the state database, which is used internally to
track where it left off in each source (e.g. the most recent blog post processed from an RSS feed).
Next, create the sources section, and add your sources. To configure the source, you should give it a unique name
like inquest-rss. Each source also uses a module like twitter, rss, or sqs. Choose the module for the expected
format of the source data. For easy testing, we’ll use an RSS source and a CSV operator for this example:

sources:
- name: inquest-rss
(continues on next page)

ThreatIngestor Documentation

                                                                                               (continued from previous page)
     module: rss
     url: http://blog.inquest.net/atom.xml
     feed_type: messy

Note the dash before the name key, signifying this and the following keys are part of a single list element. We’ll circle
back to this distinction below in the “Standard Case” walkthrough. For this source, we assign a name inquest-rss,
tell it to use the rss module, and fill in the required options for the rss module, which are url and feed_type.

Note: To see what configuration options each module allows, check out the corresponding documentation on the
Source Plugins and Operator Plugins pages.

Similarly, the operators identify a name, a module, and other settings for output of information extracted from the
sources.
operators:
  - name: csv
    module: csv
    filename: output.csv

Here we create an operator using the csv module, name it csv, and specify a filename where we want to store the
output. Note again the dash before the name key.
Putting it all together, here’s our completed config.yml file:
general:
    daemon: true
    sleep: 900
    state_path: state.db

sources:
  - name: inquest-rss
    module: rss
    url: http://blog.inquest.net/atom.xml
    feed_type: messy

operators:
  - name: csv
    module: csv
    filename: output.csv

Now that the config file is all set up, run ThreatIngestor:
threatingestor config.yml

It should write out a output.csv file that looks something like this:
URL,http://purl.org/dc/dcmitype/,http://blog.inquest.net/blog/2018/02/07/cve-2018-
 ˓→4878-adobe-flash-0day-itw/,"\n On February 1st, Adobe published bulletin    APSA18-01
 ˓→ for CVE-2018-4878 describing a use-after-free (UAF) vulnerability affecting Flash

 ˓→ve..."

Domain,purl.org,http://blog.inquest.net/blog/2018/02/07/cve-2018-4878-adobe-flash-
 ˓→0day-itw/,"\n On February 1st, Adobe published bulletin  APSA18-01 for CVE-2018-
 ˓→4878 describing a use-after-free (UAF) vulnerability affecting Flash ve..."

URL,http://purl.org/dc/elements/1.1,http://blog.inquest.net/blog/2018/02/07/cve-2018-
 ˓→4878-adobe-flash-0day-itw/,"\n On February 1st, Adobe published bulletin    APSA18-01
 ˓→ for CVE-2018-4878 describing a use-after-free (UAF) vulnerability affecting Flash
                                                                         (continues on next page)
 ˓→ve..."

10                                                                                         Chapter 3. Basic Usage

ThreatIngestor Documentation

                                                                                             (continued from previous page)
...

Assuming you are running in daemon mode, ThreatIngestor will continue to check the blog and append new artifacts to
the CSV as it finds them. For further configuration, continue to the Standard Case section or see the detailed sections
about source plugins, and operator plugins.

3.2 Standard Case

Generally, you are going to want multiple sources feeding into one or more operators. Let’s consider this standard use
case:

Create your config.yml:

general:
    daemon: true
    sleep: 900
    state_path: state.db

For Twitter integration, you’ll need to grab the tokens, keys, and secrets for your Twitter account. Follow these steps
from the Twitter documentation: https://developer.twitter.com/en/docs/basics/authentication/guides/access-tokens.
For ThreatKB, while logged in to your ThreatKB instance, click the profile dropdown in the top right of the page, then
choose “My API Keys”. Click the “+” to generate a new token/key pair, and copy them somewhere safe.
Once you have all the secrets you need, create a new section in your config file called credentials, and two list
elements inside it for Twitter and ThreatKB:

3.2. Standard Case                                                                                                     11

ThreatIngestor Documentation

credentials:
  - name: twitter-auth
    # https://dev.twitter.com/oauth/overview/application-owner-access-tokens
    api_key:
    api_secret_key:
    access_token:
    access_token_secret:

  - name: threatkb-auth
    url: https://mythreatkb
    token: MYTOKEN
    secret_key: MYKEY

The dash before each name key signifies the start of a new element in the credentials list. This allows us to
define an unlimited number of reusable credential sets, which we can reference by name in the sources and operators
we’ll define next.
Fill out the rest of the ThreatIngestor configuration file with the sources and operators:

sources:
  - name: twitter-inquest-c2-list
    module: twitter
    credentials: twitter-auth
    # https://dev.twitter.com/rest/reference/get/lists/statuses
    owner_screen_name: InQuest
    slug: c2-feed

  - name: twitter-hxxp-no-opendir
    module: twitter
    credentials: twitter-auth
    # https://developer.twitter.com/en/docs/tweets/search/api-reference/get-search-
˓→tweets.html

    q: hxxp -open

  - name: rss-vendor-x
    module: rss
    url: https://example.com/rss.xml
    feed_type: messy

  - name: rss-vendor-y
    module: rss
    url: https://example.com/rss.xml
    feed_type: messy

operators:
  - name: mythreatkb
    # Send artifacts to a ThreatKB instance
    module: threatkb
    credentials: threatkb-auth
    state: Inbox

Now that everything is all set up, run the ingestor:

threatingestor config.yml

You should see your ThreatKB Inbox start filling up with newly extracted C2 IPs and domains.

12                                                                                           Chapter 3. Basic Usage

CHAPTER           4

                                                                                     Example Workflows

The standard use case for ThreatIngestor is pretty simple - just pull from Twitter and RSS, extract IOCs, and send
them to ThreatKB. That said, there is a lot more you can do with just a few changes to the configuration file. Here,
we’ll go over some more advanced use cases, to give you an idea what this tool can do.

4.1 Multiple Operators

By adding more than one operator, you can tell ThreatIngestor to send artifacts to multiple locations. This might be
useful if you want to send to ThreatKB while also writing out a local log file. Combine this with a few operator
options though, and you can now send specific artifacts to different operators depending on type, source, or advanced
filters. Consider the following workflow:

                                                                                                                  13

ThreatIngestor Documentation

We want artifacts from “Twitter C2 List” and “Vendor X Blog” to go directly to ThreatKB. URLs and domains from
“Twitter Search: #opendir” and “Domain Masquerade Feed” should go to our crawler, which will look for malicious
content or evidence of phishing attacks. Any URLs from “Twitter Search: virustotal.com” that match the filter for a
direct URL to a sample should be sent to our “Automated Analysis” system, which will log in to VirusTotal, download
the sample, and analyze it. We don’t want to see VirusTotal links or open directories in ThreatKB though, because
those aren’t C2s. This config accomplishes all of that:

general:
    daemon: true
    sleep: 900
    state_path: state.db

credentials:
  - name: twitter-auth
    api_key:
    api_secret_key:
    access_token:
    access_token_secret:

  - name: threatkb-auth
    url: http://mythreatkb
    token: MYTOKEN
    secret_key: MYKEY

  - name: aws-auth
    aws_access_key_id: MYKEY
    aws_secret_access_key: MYSECRET
    aws_region: MYREGION

                                                                                               (continues on next page)

14                                                                            Chapter 4. Example Workflows

ThreatIngestor Documentation

                                                                                       (continued from previous page)
sources:
  - name: twitter-feed-c2
    module: twitter
    credentials: twitter-auth
    owner_screen_name: InQuest
    slug: c2-feed

  - name: twitter-search-opendir
    module: twitter
    credentials: twitter-auth
    q: '#opendir'

  - name: twitter-search-vt
    module: twitter
    credentials: twitter-auth
    q: virustotal.com

  - name: vendor-x
    module: rss
    url: http://example.com/rss.xml
    feed_type: messy

  - name: domain-masq-feed
    module: web
    url: http://example.com/feed.txt

operators:
  - name: my-threatkb
    module: threatkb
    credentials: threatkb-auth
    allowed_sources: [twitter-feed-c2, vendor-x]
    state: Ingestor

  - name: my-crawler
    module: sqs
    credentials: aws-auth
    allowed_sources: [twitter-search-opendir, domain-masq-feed]
    artifact_types: [URL]
    queue_name: crawler
    domain: {domain}
    url: {url}
    source_type: url

  - name: my-analyzer
    module: sqs
    credentials: aws-auth
    allowed_sources: [twitter-search-vt]
    filter: https?://virustotal.com/.*/analysis
    artifact_types: [URL]
    queue_name: analyzer
    url: {url}
    source_type: virustotal

Note that in this example, our Crawler and Automated Analysis systems will be watching the configured SQS queues
for new artifacts. You can use SQS, or add your own custom operator plugins to send artifacts wherever you want.

4.1. Multiple Operators                                                                                          15

ThreatIngestor Documentation

4.2 Full-Circle

ThreatIngestor can both read from and write to SQS queues, which allows us to set up a “full circle” workflow. (Note
that you can also replace SQS with Beanstalk or custom plugins to achieve the same effect.) In this workflow, we
can extract artifacts from a source, send them off to some SQS listener for processing, and that listener can send the
processed content back into ThreatIngestor’s input queue for extraction. Consider the following workflow:

Here, we have two Twitter sources: our C2 list and a search for “pastebin.com ioc”, and one SQS source: the input
queue. We then have two operators: ThreatKB, and an SQS Pastebin Processor application. We want all the C2s we
pull from the Twitter C2 list to go directly to ThreatKB. We also want any pastebin links from either Twitter source to
be sent to the SQS Pastebin Processor. That Processor will grab the raw text from the pastebin link, and send it to the
ThreatIngestor input queue, where all the IOCs will be extracted and sent to ThreatKB for further analysis. Here’s an
example config file that accomplishes all that:
general:
    daemon: true
    sleep: 900
    state_path: state.db

credentials:
  - name: twitter-auth
    api_key:
    api_secret_key:
    access_token:
    access_token_secret:

  - name: threatkb-auth
    url: http://mythreatkb
    token: MYTOKEN
    secret_key: MYKEY

  - name: aws-auth
    aws_access_key_id: MYKEY
    aws_secret_access_key: MYSECRET
    aws_region: MYREGION

sources:
  - name: twitter-feed-c2
    module: twitter
    credentials: twitter-auth
    owner_screen_name: InQuest
    slug: c2-feed
                                                                                                   (continues on next page)

16                                                                               Chapter 4. Example Workflows

ThreatIngestor Documentation

                                                                                          (continued from previous page)

  - name: twitter-search-pastebin
    module: twitter
    credentials: twitter-auth
    q: pastebin.com ioc

  - name: sqs-input
    module: sqs
    credentials: aws-auth
    queue_name: threatingestor

operators:
  - name: my-threatkb
    module: threatkb
    credentials: threatkb-auth
    allowed_sources: [sqs-input, twitter-feed-c2]
    state: Ingestor

  - name: pastebin-processor
    module: sqs
    credentials: aws-auth
    allowed_sources: [twitter-feed-c2, twitter-search-pastebin]
    artifact_types: [URL]
    filter: https?://pastebin.com/.+
    queue_name: pastebin-processor
    url: {url}

4.3 Queue Workers

The ThreatIngestor plugin architecture lets developers integrate with external systems with relative ease - but not
everything makes sense as a plugin. Both source and operator plugins are expected to run to completion quickly,
then exit and wait for the next run before working again. For long-running tasks (think VirusTotal / MultiAV scan,
malware sandbox, web crawler, domain brute force, etc), implementing them as plugins that block until completion
would break the workflow. Instead, consider using a queue workflow.
In a typical queue workflow, an operator should queue up jobs for each artifact it receives (typically with SQS or
Beanstalk), and an external tool we’ll call a queue worker should read from that queue and perform any necessary
long-running tasks. When the tasks are complete, the queue worker should send a job to another queue, where it can
be picked up by a ThreatIngestor queue source (like the SQS and Beanstalk sources).

Note: In the “Full-Circle” workflow above, the “SQS Pastebin Processor” is a queue worker.

Lets look at an example of a queue workflow using one of the provided queue workers, the File System Watcher.

Let’s say we want to watch a directory for new YARA rules, and automatically send them to our MISP server. Here’s
how the ThreatIngestor config would look:

4.3. Queue Workers                                                                                                  17

ThreatIngestor Documentation

general:
    daemon: true
    sleep: 900
    state_path: state.db

credentials:
  - name: misp-auth
    url: http://mymisp
    key: MYKEY
    ssl: false

    - name: aws-auth
      aws_access_key_id: MYKEY
      aws_secret_access_key: MYSECRET
      aws_region: MYREGION

sources:
  - name: fs-watcher
    module: sqs
    credentials: aws-auth
    queue_name: yara-rules
    paths: [content]
    reference: filename

operators:
  - name: misp
    module: misp
    credentials: misp-auth
    artifact_types: [YARASignature]

In a separate file (we’ll use fswatcher.yml), set up the config for the queue worker:

module: sqs
aws_access_key_id: MYKEY
aws_secret_access_key: MYSECRET
aws_region: MYREGION
out_queue: yara-rules
watch_path: MY_RULES_FOLDER

Run the included File System Watcher:

python3 -m threatingestor.extras.fswatcher fswatcher.yml

When new YARA rules are added to MY_RULES_FOLDER, the File System Watcher sends jobs to the yara-rules
queue:

{
     "rules": "rule myNewRule { condition: false }",
     "filename": "mynewrule.yara"
}

Run ThreatIngestor, and it’ll read from the yara-rules queue, extracting artifacts from the content field in the
job, and using the filename as the artifact’s reference text. When it finds YARA rules, it will send them off through
the MISP operator.
By combining custom plugins with custom queue workers, developers can extend ThreatIngestor functionality to fit
arbitrarily complex intel workflows.

18                                                                             Chapter 4. Example Workflows

ThreatIngestor Documentation

4.4 Automate as Much as Possible

Everything in ThreatIngestor is built around the basic idea that some intel tasks can be automated, and some can’t.
The goal, then, is to automate everything that can be, and give as much information to the person doing the analysis
as possible.
Up to this point, all our workflows have followed pretty similar patterns: we read in a bunch of information, extract
what looks interesting, and send it off for storage somewhere. We’re assuming there’s an analyst at the end of that
process, looking at the information we’ve extracted, weeding out false positives, and making decisions on what is
actually important. ThreatIngestor provides the artifacts, and some context to give the analyst a starting point to begin
their research. But could we go a step further, and automate some of the repetitive research tasks too? Let’s see how
far we can take this. . .

4.4.1 Investigating network artifacts

URLs, domains, and IP addresses all represent some kind of network resource, but what we want to do with them can
be completely different depending on the context.
Suppose we’re getting some network artifacts that we know are C2 endpoints. For these, the end goal is to verify
they’re malicious, and block any communication with them to prevent malicious activity.
We have some feeds that tell us about active attacks coming from certain IPs. These could be from something like
failed SSL login attempts in our server logs, public honeypots, or sites like DShield that monitor global attack patterns.
Depending on the severity and trustworthiness of the source, we might want to just block these, or dig up some extra
information to see if we need to take more specific action.
We’re also getting another set of network artifacts that we know are “open directories” - publicly accessible links a
malicious actor might have used as a drop site for data exfiltration, or to host tools to help them carry out attacks.
These can be a treasure trove of new malware samples, stolen information, and clues to help explore the methods of
malicious actors; but they often disappear quickly after they’ve been discovered by a security researcher. For these,
the end goal is to clone all the content as quickly and safely as possible, and save it for later investigation.
Other sources are feeding us links to live malicious content: maybe a malware sample we can download from a
sandbox or muti-AV, an exploit being used to deliver malicious content, or a second-stage payload being downloaded
by a dropper. Whatever it is, the end goal for us is to download and analyze the content, and figure out how we can
protect against it.
Finally, we’re also getting some artifacts that look like “suspicious masquerades” - websites pretending to be a login
page for a bank, a Google account, or some other legitimate resource. For these, the end goal is to crawl the contents
and save them for comparison (we can use this information for attribution - linking them back to malicious actors or
phishing toolkits), then make sure we’re blocking them so no one accidently falls victim to the phishing attempts.
In all of these cases, the automatable actions boil down to a few things:
    • Collect metadata (whois, GeoIP, dig, . . . )
    • Collect content (download, crawl)
    • Enrich from public resources (check block lists, reputation databases, network scans like Shodan, . . . )
    • Block the resource (modify firewalls, generate rules for IDS/IPS, . . . )
    • Share intelligence (publish intel feeds, push to a ThreatKB/MISP instance, post to places like Twitter and Slack,
      ...)
Some of these, like the intel sharing, can be set up as simple operators. Others, like checking whois records, or kicking
off a crawler, can be queue workers that know what to do with the enrichment information after they gather it.

4.4. Automate as Much as Possible                                                                                      19

ThreatIngestor Documentation

Often, we’ll be enriching artifacts with this additional information. But with the right sources, we can help weed out
false positives too! Decreasing the amount of noise the analyst sees saves time and effort for more important things. If
we see a domain in a list of known-good sites, maybe we just delete the artifacts altogether, or flag them as probable
false positives and provide context as to why.

4.4.2 Investigating file artifacts

Hashes, YARA signatures, and sometimes URLs can all carry information about interesting files.
When we’re using Twitter and RSS sources, the most common file artifacts will most likely be hashes. These are typ-
ically either malicious software samples (executables, PDF or Word documents, etc), or “dropped files” that were left
behind as traces of a sample’s execution. Obtaining the original hashed file is sometimes possible through paywalled
services like VirusTotal Enterprise, searching free malware corpora, or simply asking the threat intel community if
anyone has a copy of the file. If those methods fail, the hash can still be used as a universally understandable reference
to uniquely identify the file and perhaps find scan results or existing research describing the file’s capabilities.
YARA signatures can be run over existing malware corpora, or used with threat hunting services like those provided
by VirusTotal Enterprise or Hybrid Analysis YARA search, to find matching files.
URLs to “open directories,” direct downloads, or mirrored samples hosted by threat intel sites are a great way to get
copies of a file for more detailed analysis.
When working with files, the automatable actions look something like this:
     • Find samples (download from a URL, find public samples from a hash, run YARA signatures over a corpora to
       find matches, . . . )
     • Enrich from public resources (search for a hash on multi-AV and sandbox sites, check reputation databases, . . . )
     • Perform automated static analysis (AV scan, metadata extraction, . . . )
     • Perform automated dynamic analysis (run in a sandbox)
     • Save the file somewhere for manual analysis
     • Block the file (generate YARA signatures, add hashes to a block list, . . . )
     • Share intelligence (publish intel feeds, push to ThreatKB/MISP instance, mirror content for download, post to
       places like Twitter and Slack, . . . )
Again, some of these can be accomplished with operator plugins, while others will require custom queue workers.

4.4.3 Doing it all

The filtering capabilities of ThreatIngestor mean that no matter what your workflow looks like, you should always be
able to automate everything with a single config file.
Lets see what it looks like if we put everything together in once place:

20                                                                                     Chapter 4. Example Workflows

ThreatIngestor Documentation

And the ThreatIngestor config file:

general:
    daemon: true
    sleep: 900
    state_path: state.db

credentials:
  - name: twitter-auth
    # https://dev.twitter.com/oauth/overview/application-owner-access-tokens
    api_key:
    api_secret_key:
    access_token:
    access_token_secret:

  - name: github-auth
    username: user
    # Could also use password instead https://github.blog/2013-05-16-personal-api-
˓→tokens/

    # https://github.com/settings/tokens
    token: TOKEN_OR_PASSWORD

  - name: threatkb-auth
    url: http://mythreatkb
    token: MYTOKEN
    secret_key: MYKEY
                                                                        (continues on next page)

4.4. Automate as Much as Possible                                                           21

ThreatIngestor Documentation

                                                                    (continued from previous page)

  - name: misp-auth
    url: http://mymisp
    key: MYKEY
    ssl: false

  - name: aws-auth
    aws_access_key_id: MY_KEY
    aws_secret_access_key: MY_SECRET
    aws_region: MY_REGION

sources:
  - name: twitter-feed-c2
    module: twitter
    credentials: twitter-auth
    owner_screen_name: InQuest
    slug: c2-feed

  - name: twitter-open-directory
    module: twitter
    credentials: twitter-auth
    # https://developer.twitter.com/en/docs/tweets/search/api-reference/get-search-
˓→tweets.html

    q: '"open directory" #malware'

  - name: twitter-search-opendir
    module: twitter
    credentials: twitter-auth
    q: '#opendir'

  - name: twitter-masq
    module: twitter
    credentials: twitter-auth
    q: "domain masquerade"

  - name: twitter-search-vt
    module: twitter
    credentials: twitter-auth
    q: virustotal.com

  - name: twitter-search-pastebin
    module: twitter
    credentials: twitter-auth
    q: pastebin.com ioc

  - name: github-cve18
    module: github
    credentials: github-auth
    search: CVE-2018-

  - name: git-yara-rules
    module: git
    url: https://github.com/InQuest/yara-rules.git
    local_path: /opt/threatingestor/git/yara-rules

  - name: rss-myiocfeed
    module: rss
                                                                          (continues on next page)

22                                                         Chapter 4. Example Workflows

ThreatIngestor Documentation

                                                                    (continued from previous page)
    url: https://example.com/rss.xml
    feed_type: messy

  - name: rss-vendor-x
    module: rss
    url: http://example.com/rss.xml
    feed_type: messy

  - name: sqs-input
    module: sqs
    credentials: aws-auth
    queue_name: threatingestor
    paths: [content]
    reference: reference

  - name: sqs-fswatcher
    module: sqs
    credentials: aws-auth
    queue_name: fswatcher
    paths: [content]
    reference: filename

  - name: domain-masq-feed
    module: web
    url: http://example.com/masquerades.txt

  - name: attack-feed
    module: web
    url: http://example.com/attacks.txt

operators:
  - name: mythreatkb
    module: threatkb
    credentials: threatkb-auth
    allowed_sources: [twitter-feed-c2, rss-.*, git-.*, sqs-.*]
    state: Inbox

  - name: mymisp
    module: misp
    credentials: misp-auth

  - name: pastebin-processor
    module: sqs
    credentials: aws-auth
    allowed_sources: [twitter-feed-c2, twitter-search-pastebin]
    artifact_types: [URL]
    filter: https?://pastebin.com/.+
    queue_name: pastebin-processor
    url: {url}

  - name: my-crawler
    module: sqs
    credentials: aws-auth
    allowed_sources: [twitter-search-opendir, domain-masq-feed]
    artifact_types: [URL]
    queue_name: crawler
    domain: {domain}
                                                                          (continues on next page)

4.4. Automate as Much as Possible                                                             23

ThreatIngestor Documentation

                                                                                            (continued from previous page)
     url: {url}
     source_type: url

  - name: my-analyzer
    module: sqs
    credentials: aws-auth
    allowed_sources: [twitter-search-vt]
    filter: https?://virustotal.com/.*/analysis
    artifact_types: [URL]
    queue_name: analyzer
    url: {url}
    source_type: virustotal

  - name: osint-enrich-domain
    module: sqs
    credentials: aws-auth
    artifact_types = [URL]
    filter: is_domain
    queue_name: osint-enrich-domain
    domain: {domain}

  - name: osint-enrich-ip
    module: sqs
    credentials: aws-auth
    artifact_types = [URL]
    filter: is_ip
    queue_name: osint-enrich-ip
    ip: {domain}

  - name: repdb-check
    module: sqs
    credentials: aws-auth
    artifact_types = [URL, IPAddress, Domain]
    queue_name: repdb-check
    artifact: {artifact}

  - name: yara-scan
    module: sqs
    credentials: aws-auth
    artifact_types = [YARASignature]
    queue_name: yara-scan
    rule: {artifact}

  - name: virustotal-downloader
    module: sqs
    credentials: aws-auth
    artifact_types = [Hash, URL]
    allowed_sources: [twitter-search-vt]
    queue_name: vt-downloader
    content: {artifact}

Hopefully, this gives some idea what exactly ThreatIngestor is capable of. Whether you are looking to detect and
respond to zero-day threats, keep up with the intel community, share your own research, or just block phishing domains
on your home network, anything is possible.

24                                                                              Chapter 4. Example Workflows

CHAPTER 5

Source Plugins

For each source specified, ThreatIngestor handles artifact import. Sources may link to Twitter, Blogs, etc.
Artifacts are imported from those sources and could include URLs, IP Addresses, YARA Signatures, etc. All source
plugins maintain state between runs, allowing them to skip previously processed artifacts and get right to work finding
new indicators.
To add a source to your configuration file, include a section like this:
sources:
- name: mysource
module: mysourcemodule

You can add as many sources as you need, all under the same sources: list.
sources:
- name: mysource
module: mysourcemodule

- name: myothersource
module: mysourcemodule

Note the use of dashes to signify the start of each item in the list, and matching indentation for all the keys within each
item.
The module option must match one of the sources listed below, or your custom source. The name is freeform.
All sources allow credentials such as usernames, passwords, OAuth tokens, etc to be defined in a seperate
credentials section and referenced by name with a credentials keyword. Consider a plugin that accepts
a token and a secret. In config.yml, you would set it up the credentials and sources sections like this:
credentials:
- name: mysource-auth
token: MYTOKEN
secret: MYSECRET

sources:
(continues on next page)

ThreatIngestor Documentation

(continued from previous page)
- name: mysource
credentials: mysource-auth

This allows the same credentials to be reused for several different sources (or operators) without having to duplicate
them in each source definition.

5.1 Available Plugins

The available source plugins are:

5.1.1 Beanstalk

The Beanstalk source can be used to read content from Beanstalk queues. This, combined with the Beanstalk Opera-
tor, allows a full-circle workflow.

Configuration Options

• module (required): beanstalk
• paths (required): A list of XPath-like expressions representing the JSON fields you want to extract from.
• reference: An XPath-like expression representing the JSON field you want to use as a reference. (default:
source name).
• host (required): Host to connect to.
• port (required): Port to connect over.
• queue_name (required): The name of the Beanstalk queue you want to use.

Example Configuration

Inside the sources section of your configuration file:

- name: beanstalk-input
module: beanstalk
paths: [content]
reference: reference
host: 127.0.0.1
port: 11300
queue_name: MYQUEUENAME

If you are expecting JSON jobs in the Beanstalk queue that look like this:

{
"content": "freeform text",
"reference": "http://example.com"
}

The above config will extract artifacts from the value of the content key, and use the value of the reference key
as the artifact’s reference.
If you instead had JSON jobs like this:

26 Chapter 5. Source Plugins

ThreatIngestor Documentation

{
"data": {
"text": "freeform text",
"more": "more text",
"ref": "http://example.com"
}
}

And you want to extract from text and more, with ref as a reference, you could set up your config to account for
the more complex JSON structure:

paths: [data.text, data.more]
reference: data.ref

This flexibility allows easier integration with arbitrary systems.

5.1.2 Git

The first time it’s run, each Git source will clone the configured repository, look for any files matching *.{rule,
rules,yar,yara}, and extract YARA rules. On any subsequent runs, it will run git pull, check for new and
updated files matching the same patterns, and extract YARA rules from those files.

Configuration Options

• module (required): git
• url (required): URL (can be https, git, ssh, etc) of remote to clone.
• local_path (required): folder on disk (relative or absolute) to clone into.

Example Configuration

Inside the sources section of your configuration file:

- name: inquest-yara-rules
module: git
url: https://github.com/InQuest/yara-rules.git
local_path: /opt/threatingestor/git/yara-rules

5.1.3 GitHub Repository Search

The GitHub source plugin uses GitHub’s repository search API to find new interesting repos, and create a Task artifact
for each.

Configuration Options

• module (required): github
• search (required): search term(s).
• username: Optional username for authentication.
• token: Optional token or password for authentication.

5.1. Available Plugins 27

ThreatIngestor Documentation

Example Configuration

The following examples all assume GitHub credentials have already been configured in the credentials section
of the config, like this:

credentials:
- name: github-auth
username: myuser
token: MYTOKEN

Note: GitHub credentials are optional, but increase the rate limit for API requests significantly. If you are doing more
than one or two low- volume searches, you should set up the credentials.

Inside the sources section of your configuration file:

- name: github-cve-repos
credentials: github-auth
module: github
search: CVE-2018-

5.1.4 RSS

The RSS source pulls from standard RSS and Atom feeds, and extracts artifacts from within the feed content. It does
not follow links to full blog posts.
For each RSS feed, you’ll need to define a feed_type for IOC extraction. Valid feed types are:
• messy: Only look at obfuscated URLs, assume all IPs are valid.
• clean: Treat everything as valid C2 URL/IP.
• afterioc Treat everything after the last occurance of the string “Indicators of Compromise” as valid C2
URL/IP.

Configuration Options

• module (required): rss
• feed_type (required): see above; if unsure, use messy.
• url (required): URL to the RSS or Atom feed.

Example Configuration

Inside the sources section of your configuration file:

- name: rss-myiocfeed
module: rss
url: https://example.com/rss.xml
feed_type: messy

28 Chapter 5. Source Plugins

ThreatIngestor Documentation

5.1.5 SQS

The SQS source can be used to read content from Amazon SQS queues. This, combined with the SQS Operator,
allows a full-circle workflow.

Configuration Options

• module (required): sqs
• paths (required): A list of XPath-like expressions representing the JSON fields you want to extract from.
• reference: An XPath-like expression representing the JSON field you want to use as a reference. (default:
source name).
• aws_access_key_id (required): Your AWS access key ID.
• aws_secret_access_key (required): Your AWS secret access key.
• aws_region (required): Your AWS region name.
• queue_name (required): The name of the SQS queue you want to use.

Example Configuration

The following example assumes AWS credentials have already been configured in the credentials section of the
config, like this:
credentials:
- name: aws-auth
aws_access_key_id: MYKEY
aws_secret_access_key: MYSECRET
aws_region: MYREGION

Inside the sources section of your configuration file:
- name: sqs-input
module: sqs
paths: [content]
reference: reference
credentials: aws-auth
queue_name: MYQUEUENAME

If you are expecting JSON jobs in the SQS queue that look like this:
{
"content": "freeform text",
"reference": "http://example.com"
}

The above config will extract artifacts from the value of the content key, and use the value of the reference key
as the artifact’s reference.
If you instead had JSON jobs like this:
{
"data": {
"text": "freeform text",
"more": "more text",
(continues on next page)

5.1. Available Plugins 29

ThreatIngestor Documentation

(continued from previous page)
"ref": "http://example.com"
}
}

And you want to extract from text and more, with ref as a reference, you could set up your config to account for
the more complex JSON structure:

paths: [data.text, data.more]
reference: data.ref

This flexibility allows easier integration with arbitrary systems.

5.1.6 Twitter

The Twitter source can use several Twitter API endpoints out of the box: @mentions, Twitter lists, user timeline, and
standard search.

Configuration Options

• module (required): twitter
• api_key (required): Consumer API key (See Twitter oauth docs).
• api_secret_key (required): Consumer API secret key (See Twitter oauth docs).
• access_token (required): Twitter access token (See Twitter oauth docs).
• access_token_secret (required): Twitter access token secret (See Twitter oauth docs).
• defanged_only: Defaults to true. If set to false, the Twitter source will include all expanded links found
in Tweets. If set to true, it will include only defanged links.
After the above general options, you may include valid options for one of the supported Twitter endpoints, as described
below. (If you do not include any extra options, the Twitter plugin will default to reading from your @mentions.) Any
extra options defined in the config will be passed in directly to the Twitter endpoint, so you can configure some extra
options not shown here. See the relevant Twitter documentation for more information on supported parameters.
Mentions:
This is the default behavior.
Twitter list:
• owner_screen_name: Twitter user who owns the list.
• slug: The name of the Twitter list.
Twitter user timeline:
• screen_name: Twitter user to watch.
Twitter search:
• q: Twitter search term, can be multiple words including hashtags.

30 Chapter 5. Source Plugins

ThreatIngestor Documentation

Example Configuration

The following examples all assume Twitter credentials have already been configured in the credentials section of
the config, like this:

credentials:
  - name: twitter-auth
    api_key: MY_KEY
    api_secret_key: MY_SECRET_KEY
    access_token: MY_TOKEN
    access_token_secret: MY_TOKEN_SECRET

Inside the sources section of the config, create a new item for the source you wish to define. Examples for each of
the supported Twitter endpoints are provided below.
Mentions:

- name: twitter-my-mentions
  module: twitter
  credentials: twitter-auth

Twitter list:

- name: twitter-inquest-c2-list
  module: twitter
  credentials: twitter-auth
  owner_screen_name: InQuest
  slug: c2-feed

Twitter user timeline:

- name:twitter-inquest-timeline
  module: twitter
  credentials: twitter-auth
  screen_name: InQuest

Twitter search:

- name: twitter-open-directory
  module: twitter
  credentials: twitter-auth
  q: '"open directory" #malware'

Note: When searching for Twitter hashtags, be sure to put quotes around your search term, as shown in the example
above. Otherwise, the # character will be treated as the beginning of a YAML comment.

5.1.7 Web

The Web source will periodically check a URL for changes, and extract any artifacts it finds. This is useful for
ingesting threat intel feeds that don’t already have a ThreatIngestor source plugin, without having to write your own
custom plugin. Use it for plaintext IP blacklists, C2 URL CSVs, and more.

5.1. Available Plugins                                                                                            31

ThreatIngestor Documentation

Configuration Options

     • module (required): web
     • url (required): URL of the web content you want to poll.

Example Configuration

Inside the sources section of your configuration file:

- name: mylist
  module: web
  url: http://example.com/feed.txt

32                                                                Chapter 5. Source Plugins

CHAPTER              6

                                                                                              Operator Plugins

Operator plugins handle artifact export. They can be configured to only send certain artifact types, only send artifacts
from certain sources, filter down artifacts to only those matching a certain regex, and more.
To add an operator to your configuration file, include a section like this:
operators:
  - name: myoperator
    module: myoperatormodule

The module option must match one of the operators listed below, or your custom operator.
The following options are globally accepted by all operators:
    • allowed_sources: List (in YAML syntax) of source names to allow.
    • artifact_types: List (in YAML syntax) of artifact types to allow.
    • filter: A regex, or comma-separated list (not in YAML syntax) of some special keywords.
All of these options are inclusive, so only artifacts matching the restrictions will be sent through the operator.
Example:
sources:
  - name: mysource
    module: mysourcemodule

  - name: myothersource
    module: mysourcemodule

operators:
  - name: non-ip-based-urls
    module: myoperatormodule
    allowed_sources: [mysource]
    filter: is_domain
    artifact_types: [URL]

                                                                                                      (continues on next page)

                                                                                                                          33

ThreatIngestor Documentation

(continued from previous page)
- name: google-domain-masquerade
module: myoperatormodule
allowed_sources: [mysource, myothersource]
filter: ([^\.]google.com$|google.com[^/])
artifact_types: [URL, Domain]

By combining these three options, you can include any number of different sources and operators in your config, and
still only send exactly the artifacts you want to each operator.
All operators allow credentials such as usernames, passwords, OAuth tokens, etc to be defined in a seperate
credentials section and referenced by name with a credentials keyword. Consider a plugin that accepts
a token and a secret. In config.yml, you would set it up the credentials and operators sections like
this:

credentials:
- name: myoperator-auth
token: MYTOKEN
secret: MYSECRET

operators:
- name: myoperator
credentials: myoperator-auth

This allows the same credentials to be reused for several different operators (or sources), without having to duplicate
them in each operator definition.

6.1 Available Plugins

The available operator plugins are:

6.1.1 Beanstalk

Beanstalk is a simple work queue server, that may be easier to get started with than Amazon SQS for those who don’t
already have AWS accounts.
The Beanstalk operator enables you to send output to work queues, which you can then consume from Beanstalk
sources, or external applications. This operator is extremely flexible, as it accepts arbitrary config options and passes
them through to the queue.

Configuration Options

• module (required): beanstalk
• host (required): Host to connect to.
• port (required): Port to connect over.
• queue_name (required): The name of the Beanstalk tube you want to use.
Any other options defined in the Beanstalk operator section will be passed in to your queue as part of a JSON object,
after string interpolation to fill in artifact content. For example, {domain} will be replaced with the C2 domain being
exported.

34 Chapter 6. Operator Plugins

ThreatIngestor Documentation

Example Configuration

Inside the operators section of your configuration file:

- name: my-beanstalk-queue
  module: beanstalk
  host: 127.0.0.1
  port: 11300
  queue_name: my-queue
  domain: {domain}
  url: {url}
  source_type: url
  download_path: /data/ingestor

In this example, the resulting JSON object for a URL artifact of http://example.com/ sent to the Beanstalk
queue would be:

{
      "domain": "example.com",
      "url": "http://example.com/",
      "source_type": "url",
      "download_path": "/data/ingestor"
}

6.1.2 CSV File

The most basic of the included operators, the CSV operator simply writes extracted artifacts to a CSV file. The
columns in the file are, in order:
    1. Artifact type (URL, Domain, IPAddress, etc)
    2. Artifact content (example.com, 1.1.1.1)
    3. Reference link (URL of the source tweet, blog post, etc)
    4. Reference text (Tweet text, snippet from a blog post, etc)
This operator often comes in handy if you want to quickly and easily test your ThreatIngestor configuration is working
as expected.

Configuration Options

     • module (required): csv
     • filename (required): filename with relative or absolute path.

Example Configuration

Inside the operators section of your configuration file:

- name: mycsv
  module: csv
  filename: output.csv

6.1. Available Plugins                                                                                             35

ThreatIngestor Documentation

6.1.3 MISP

The MISP operator will send extracted artifacts to your MISP instance, as objects attached to events.
When this plugin is configured, events should show up on your MISP instance with the name “ThreatIngestor Event:
{SOURCE}”, where “{SOURCE}” is the name of the source plugin that extracted the attached objects. Artifact
context (reference link and text, if any) will also be attached to the event, as “internal” objects.
The following artifacts are supported by the MISP plugin:
• Domains
• Hashes (MD5, SHA1, SHA256)
• IP Addresses
• URLs
• YARA Signatures
If other artifact types are sent through this plugin, the artifacts will be ignored.

Configuration Options

• module (required): misp
• url (required): Base URL for your MISP instance.
• secret_key (required): Your MISP authentication key.
• ssl: Verify SSL certificate? (default: true)
• tags: List of tags to attach to events (default: [type:OSINT])

Example Configuration

The following example assumes MISP credentials have already been configured in the credentials section of the
config, like this:

credentials:
- name: misp-auth
url: http://mymisp
key: MYKEY
ssl: false

Inside the operators section of your configuration file:

- name: mymisp
module: misp
credentials: misp-auth

6.1.4 MySQL

The MySQL operator feeds artifacts into a single MySQL table.
The table defined in the config will be created if it does not exist. The columns in the table are:
1. artifact: Artifact content (example.com, 1.1.1.1, etc). 1. artifact_type: Artifact type (domain,
yarasignature, etc). 2. reference_link: URL of the source tweet, blog post, etc. 3. reference_text:

36 Chapter 6. Operator Plugins

ThreatIngestor Documentation

Tweet text, snippet from a blog post, etc. 4. created_date: MySQL DATETIME. 5. state: For external use,
always NULL. You can use this to keep track of the current investigation status of artifacts, if you so choose.

Configuration Options

• module (required): mysql
• host (required): Database host.
• port: Database port (default: 3306).
• user (required): Database user (must have table create permission, or insert permission on the existing artifacts
table defined below).
• password: Password for user.
• table (required): Artifacts table (will be created if it does not exist; must follow the required schema).

Example Configuration

The following example assumes MySQL credentials have already been configured in the credentials section of
the config, like this:

credentials:
- name: mysql-auth
host: MYHOST
port: MYPORT
user: MYUSER
password: MYPASSWORD
database: MYDATABASE

Inside the operators section of your configuration file:

- name: my-db
module: mysql
credentials: mysql-auth
table: artifacts

6.1.5 SQLite

The SQLite operator feeds artifacts into a simple database, with zero setup required.
This operator often comes in handy if you want to quickly and easily test your ThreatIngestor configuration is working
as expected, but scales better than the CSV operator.
One table will be created per artifact type. The columns in each table are, in order:
1. artifact: Artifact content (example.com, 1.1.1.1, etc).
2. reference_link: URL of the source tweet, blog post, etc.
3. reference_text: Tweet text, snippet from a blog post, etc.
4. created_date: ISO-8601 date string, always UTC.
5. state: For external use, always NULL. You can use this to keep track of the current investigation status of
artifacts, if you so choose.

6.1. Available Plugins 37

ThreatIngestor Documentation

You can also use the included ThreatIngestor “quick web interface” to get an easier overview of the artifacts in your
database, or set up a JSON API with a single command:

hug -m threatingestor.extras.webapp

Note: Don’t have hug? pip install hug!

If you want to use the webapp, make sure your SQLite database is called artifacts.db and in the same folder
where you’re running hug.

Configuration Options

• module (required): sqlite
• filename (required): filename with relative or absolute path.

Example Configuration

Inside the operators section of your configuration file:

- name: mysqlite
module: sqlite
filename: output.db

6.1.6 Amazon SQS

The SQS operator allows ThreatIngestor to integrate out-of-the-box with any system that supports reading from SQS
queues. This operator is extremely flexible, as it accepts arbitrary config options and passes them through to the queue.

Configuration Options

• module (required): sqs
• aws_access_key_id (required): Your AWS access key ID.
• aws_secret_access_key (required): Your AWS secret access key.
• aws_region (required): Your AWS region name.
• queue_name (required): The name of the SQS queue you want to use.
Any other options defined in the SQS operator section will be passed in to your queue as part of a JSON object, after
string interpolation to fill in artifact content. For example, {domain} will be replaced with the C2 domain being
exported.

Example Configuration

The following example assumes AWS credentials have already been configured in the credentials section of the
config, like this:

38 Chapter 6. Operator Plugins

ThreatIngestor Documentation

credentials:
  - name: aws-auth
    aws_access_key_id: MYKEY
    aws_secret_access_key: MYSECRET
    aws_region: MYREGION

Inside the operators section of your configuration file:

- name: myqueue
  module: sqs
  credentials: aws-auth
  queue_name: my-queue
  domain: {domain}
  url: {url}
  source_type: url
  download_path: /data/ingestor

In this example, the resulting JSON object for a URL artifact of http://example.com/ sent to the SQS queue
would be:

{
     "domain": "example.com",
     "url": "http://example.com/",
     "source_type": "url",
     "download_path": "/data/ingestor"
}

6.1.7 ThreatKB

The ThreatKB operator will send extracted artifacts to your ThreatKB instance.

Configuration Options

    • module (required): threatkb
    • url (required): Base URL for your ThreatKB instance.
    • token (required): Your ThreatKB authentication token.
    • secret_key (required): Your ThreatKB authentication secret key.
    • state (required): The State you want assigned to created artifacts.

Example Configuration

The following example assumes ThreatKB credentials have already been configured in the credentials section of
the config, like this:

credentials:
  - name: threatkb-auth
    url: http://mythreatkb
    token: MYTOKEN
    secret_key: MYKEY

Inside the operators section of your configuration file:

6.1. Available Plugins                                                                                     39

You can also read