Bolster Documentation - Release 0.1.1 Andrew Bolster

Page created by Wesley Simpson
 
CONTINUE READING
Bolster Documentation
             Release 0.1.1

           Andrew Bolster

                 Aug 20, 2021
CONTENTS:

1   Bolster                                                                                                                                                                                               1
    1.1 Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                                                                                      1
    1.2 Credits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                                                                                       1

2   Installation                                                                                                                                                                                          3
    2.1 Stable release . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                                                                                      3
    2.2 From sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                                                                                        3

3   Usage                                                                                                                                                                                                 5

4   Contributing                                                                                                                                                                                          7
    4.1 Types of Contributions .     .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    7
    4.2 Get Started! . . . . . . .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    8
    4.3 Pull Request Guidelines      .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    9
    4.4 Tips . . . . . . . . . . .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    9
    4.5 Deploying . . . . . . .      .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    9

5   Credits                                                                                                                                                                                              11
    5.1 Development Lead . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                                                                                         11
    5.2 Contributors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                                                                                       11

6   History                                                                                                                                                                                              13
    6.1 0.1.0 (2021-03-11) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                                                                                       13
    6.2 0.1.1 (2021-05-18) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                                                                                       13

7   API Reference                                                                                                                                                                                        15
    7.1 bolster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                                                                                          15

8   Indices and tables                                                                                                                                                                                   35

Python Module Index                                                                                                                                                                                      37

Index                                                                                                                                                                                                    39

                                                                                                                                                                                                          i
ii
CHAPTER

                                                                                                        ONE

                                                                                                 BOLSTER

Bolster’s Brain, you’ve been warned
   • Free software: GNU General Public License v3
   • Documentation: https://bolster.readthedocs.io.

1.1 Features

   • Efficient tree/node traversal and iteration
   • Datetime helpers
   • Concurrecy Helpers
   • Web safe Encapsulation/Decapsulation helpers
   • pandas-esque aggregate/transform_r functions
   • “Best Practice” AWS service handling

1.2 Credits

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.

                                                                                                            1
Bolster Documentation, Release 0.1.1

2                                      Chapter 1. Bolster
CHAPTER

                                                                                                               TWO

                                                                                                 INSTALLATION

2.1 Stable release

To install Bolster, run this command in your terminal:

$ pip install bolster

This is the preferred method to install Bolster, as it will always install the most recent stable release.
If you don’t have pip installed, this Python installation guide can guide you through the process.

2.2 From sources

The sources for Bolster can be downloaded from the Github repo.
You can either clone the public repository:

$ git clone git://github.com/andrewbolster/bolster

Or download the tarball:

$ curl -OJL https://github.com/andrewbolster/bolster/tarball/master

Once you have a copy of the source, you can install it with:

$ python setup.py install

                                                                                                                   3
Bolster Documentation, Release 0.1.1

4                                      Chapter 2. Installation
CHAPTER

                               THREE

                               USAGE

To use Bolster in a project:

import bolster

                                     5
Bolster Documentation, Release 0.1.1

6                                      Chapter 3. Usage
CHAPTER

                                                                                                             FOUR

                                                                                           CONTRIBUTING

Contributions are welcome, and they are greatly appreciated! Every little bit helps, and credit will always be given.
You can contribute in many ways:

4.1 Types of Contributions

4.1.1 Report Bugs

Report bugs at https://github.com/andrewbolster/bolster/issues.
If you are reporting a bug, please include:
    • Your operating system name and version.
    • Any details about your local setup that might be helpful in troubleshooting.
    • Detailed steps to reproduce the bug.

4.1.2 Fix Bugs

Look through the GitHub issues for bugs. Anything tagged with “bug” and “help wanted” is open to whoever wants
to implement it.

4.1.3 Implement Features

Look through the GitHub issues for features. Anything tagged with “enhancement” and “help wanted” is open to
whoever wants to implement it.

4.1.4 Write Documentation

Bolster could always use more documentation, whether as part of the official Bolster docs, in docstrings, or even on
the web in blog posts, articles, and such.

                                                                                                                        7
Bolster Documentation, Release 0.1.1

4.1.5 Submit Feedback

The best way to send feedback is to file an issue at https://github.com/andrewbolster/bolster/issues.
If you are proposing a feature:
     • Explain in detail how it would work.
     • Keep the scope as narrow as possible, to make it easier to implement.
     • Remember that this is a volunteer-driven project, and that contributions are welcome :)

4.2 Get Started!

Ready to contribute? Here’s how to set up bolster for local development.
    1. Fork the bolster repo on GitHub.
    2. Clone your fork locally:

       $ git clone git@github.com:your_name_here/bolster.git

    3. Install your local copy into a virtualenv. Assuming you have virtualenvwrapper installed, this is how you set up
       your fork for local development:

       $ mkvirtualenv bolster
       $ cd bolster/
       $ python setup.py develop

    4. Create a branch for local development:

       $ git checkout -b name-of-your-bugfix-or-feature

       Now you can make your changes locally.
    5. When you’re done making changes, check that your changes pass flake8 and the tests, including testing other
       Python versions with tox:

       $ flake8 bolster tests
       $ python setup.py test or pytest
       $ tox

       To get flake8 and tox, just pip install them into your virtualenv.
    6. Commit your changes and push your branch to GitHub:

       $ git add .
       $ git commit -m "Your detailed description of your changes."
       $ git push origin name-of-your-bugfix-or-feature

    7. Submit a pull request through the GitHub website.

8                                                                                         Chapter 4. Contributing
Bolster Documentation, Release 0.1.1

4.3 Pull Request Guidelines

Before you submit a pull request, check that it meets these guidelines:
   1. The pull request should include tests.
   2. If the pull request adds functionality, the docs should be updated. Put your new functionality into a function
      with a docstring, and add the feature to the list in README.rst.
   3. The pull request should work for Python 3.5, 3.6, 3.7 and 3.8, and for PyPy. Check https://travis-ci.com/
      andrewbolster/bolster/pull_requests and make sure that the tests pass for all supported Python versions.

4.4 Tips

To run a subset of tests:

$ pytest tests.test_bolster

4.5 Deploying

A reminder for the maintainers on how to deploy. Make sure all your changes are committed (including an entry in
HISTORY.rst). Then run:

$ bump2version patch # possible: major / minor / patch
$ git push
$ git push --tags

Travis will then deploy to PyPI if tests pass.

4.3. Pull Request Guidelines                                                                                      9
Bolster Documentation, Release 0.1.1

10                                     Chapter 4. Contributing
CHAPTER

                                                  FIVE

                                               CREDITS

5.1 Development Lead

    • Andrew Bolster 

5.2 Contributors

None yet. Why not be the first?

                                                     11
Bolster Documentation, Release 0.1.1

12                                     Chapter 5. Credits
CHAPTER

                                                                                                         SIX

                                                                                               HISTORY

6.1 0.1.0 (2021-03-11)

  • First release on PyPI.

6.2 0.1.1 (2021-05-18)

  • Decouple from pg_config requirement (now using psycopg2-binary, which while less than ideal, can run in zee
    clouds)
  • Dependency updates for security vulns
  • Companies House API

                                                                                                            13
Bolster Documentation, Release 0.1.1

14                                     Chapter 6. History
CHAPTER

                                                                                                                SEVEN

                                                                                            API REFERENCE

This page contains auto-generated API reference documentation1 .

7.1 bolster

Top-level package for Bolster.

7.1.1 Subpackages

bolster.data_sources

Submodules

bolster.data_sources.companies_house

Module Contents

Functions

 get_basic_company_data_url()              Parse the companies house website to get the current
                                           URL for the ‘BasicCompanyData’
 query_basic_company_data(query_func = al- Grab the url for the basic company data, and walk
 ways)                                     through the CSV files within, and
 companies_house_record_might_be_farset(r)A heuristic function for working out if a record in the
                                           companies house registry might be based in Farset Labs
 get_companies_house_records_that_might_be_in_farset()

bolster.data_sources.companies_house.get_basic_company_data_url()
    Parse the companies house website to get the current URL for the ‘BasicCompanyData’
        Currently uses the ‘one file’ method but it could be split into the multi files for memory efficiency
               Return type AnyStr
bolster.data_sources.companies_house.query_basic_company_data(query_func=always)
    Grab the url for the basic company data, and walk through the CSV files within, and for each row in each CSV
  1   Created with sphinx-autoapi

                                                                                                                     15
Bolster Documentation, Release 0.1.1

      file, parse the row data through the given query_func such that if query_func(row) is True it will be yielded
           Parameters query_func (Callable[Ellipsis, bool]) –
           Return type Iterator[Dict]
bolster.data_sources.companies_house.companies_house_record_might_be_farset(r)
    A heuristic function for working out if a record in the companies house registry might be based in Farset Labs
    Almost certainly incomplete and needs more testing/validation
           Parameters r (Dict) –
           Return type bool
bolster.data_sources.companies_house.get_companies_house_records_that_might_be_in_farset()
           Return type Iterator[Dict]

bolster.stats

Submodules

bolster.stats.distributions

Module Contents

Functions

 best_fit_distribution(data,               bins=200,         Model data by finding best fit distribution to data
 ax=None, include_slow=False, discriminator='sse')

bolster.stats.distributions.best_fit_distribution(data, bins=200, ax=None, in-
                                                        clude_slow=False, discrimina-
                                                        tor='sse')
    Model data by finding best fit distribution to data

bolster.utils

Subpackages

bolster.utils.aws

AWS based Asset handling
Includes S3, Kinesis, SSM, SQS, Lambda self-invocation and Redshift querying helpers

16                                                                                      Chapter 7. API Reference
Bolster Documentation, Release 0.1.1

Package Contents

Classes

 KinesisLoader                                          Kinesis batchwise insertion handler with chunking and
                                                        retry

Functions

 chunks(iterable, size=10)                              Outputs  chunks of size N from an iterable (gen-
                                                        erator)
 start_session(*args, restart=False, **kwargs)

 get_s3_client()

 put_s3(obj, key, bucket, keys=None, gzip = True,       Take either a list of dicts (and dump them as csv to s3)
 client=None)                                           or a
 get_s3(key, bucket, gzip = True, log_exception=True,   Get Object from S3, generally with gzip decompression
 client=None)                                           included.
 check_s3(key, bucket, client=None)                     https://www.peterbe.com/plog/
                                                        fastest-way-to-find-out-if-a-file-exists-in-s3
 get_matching_s3_objects(bucket,           prefix='',   Generate objects in an S3 bucket.
 suffix='', client=None)
 get_matching_s3_keys(bucket, **kwargs)                 Generate the keys in an S3 bucket.
 select_from_csv(bucket, key, fields, client=None)

 get_latest_key(prefix, bucket, key = None,             Walk a given S3 bucket for the lexicographically highest
 client=None)                                           item in the given bucket (defaults to the analysis store
 get_sqs_client()

 send_to_sqs(records,        queue,   chunksize = 1,    Send records in chunks of chunksize for a given sqs
 client=None)                                           queue in json-serialised format
 get_ssm_client()

 get_ssm_param(param_name, client=None)                 Locally memoized getter for configuration parameters
                                                        stored in the AWS “Simple Systems Manager” (now just
 fh_json_decode(content)                                Customised JSON Decoder for consuming Firehose
                                                        batched records;
 decapsulate_kinesis_payloads(event)                    Decapsulate base64 encoded kinesis data records to a
                                                        list
 iterate_kinesis_payloads(event)                        Iterate over a base64 encoded kinesis data record, yield-
                                                        ing entries
 send_to_kinesis(records, stream, partition_key =       Accessory function for the KinesisLoader class
 None)
 get_sns_client()

 invoke_self_async(event, context)                      Have the Lambda invoke itself asynchronously, passing
                                                        the same event it received originally,
                                                                                       continues on next page

7.1. bolster                                                                                                   17
Bolster Documentation, Release 0.1.1

                                   Table 4 – continued from previous page
 query(q,                           redshift_conn_dict, Helper for making queries to redshift (or any postgres
 named_cursor='bolster_query_cursor', **kwargs)         compatible backend)
 SQSWrapper(event, context, queuename, function,
 timeout=60000, reinvokelimit=10, maxmessages=1,
 raise_exceptions=True, deduplicate=False, fkwargs={},
 client=None)

Attributes

 logger

 session

 _ssm_params

bolster.utils.aws.chunks(iterable, size=10)
    Outputs  chunks of size N from an iterable (generator)
           Parameters
                  • iterable (Iterable) – param size:
                  • iterable – Iterable:
                  • size – (Default value = 10)
           Return type Generator[List, None, None]
      Returns:

      >>> next((b for b in chunks(range(10), 2)))
      [0, 1]
      >>> [b for b in chunks(list(range(10)), 2)]
      [[0, 1], [2, 3], [4, 5], [6, 7], [8, 9]]

bolster.utils.aws.logger
bolster.utils.aws.session :Optional[boto3.Session]
bolster.utils.aws.start_session(*args, restart=False, **kwargs)
           Return type boto3.Session
bolster.utils.aws.get_s3_client()
bolster.utils.aws.put_s3(obj, key, bucket, keys=None, gzip=True, client=None)
    Take either a list of dicts (and dump them as csv to s3) or a StringIO buffer (and dump-as-is to s3)
           Parameters
                  • obj (Union[Sequence[Dict], io.StringIO]) – List of records to be written to
                    CSV (or StringIO for direct upload):
                  • key (str) – Destination Key
                  • bucket (str) – Destination Bucket (Default value = S3_ANALYSIS_STORE)

18                                                                                    Chapter 7. API Reference
Bolster Documentation, Release 0.1.1

                  • keys – List of expected keys, can be used to filter or set the order of key entry in the
                    resultant file (Default value = None)
                  • gzip (bool) – Compress the object (Default value = True)
           Return type dict
      Returns:
bolster.utils.aws.get_s3(key, bucket, gzip=True, log_exception=True, client=None)
    Get Object from S3, generally with gzip decompression included.
           Parameters
                  • key (str) – param bucket:
                  • gzip (bool) – return:
                  • key – str:
                  • bucket (str) – str: (Default value = S3_ANALYSIS_STORE)
                  • gzip – bool: (Default value = True)
           Return type io.StringIO
      Returns:
bolster.utils.aws.check_s3(key, bucket, client=None)
    https://www.peterbe.com/plog/fastest-way-to-find-out-if-a-file-exists-in-s3
           Parameters
                  • key (str) – str:
                  • bucket (str) – str: (Default value = S3_ANALYSIS_STORE)
           Return type bool
      Returns:
bolster.utils.aws.get_matching_s3_objects(bucket, prefix='', suffix='', client=None)
    Generate objects in an S3 bucket.
      https://alexwlchan.net/2018/01/listing-s3-keys-redux/
           Parameters
                  • bucket (AnyStr) – Name of the S3 bucket.
                  • prefix – Only fetch objects whose key starts with this prefix (optional).
                  • suffix – Only fetch objects whose keys end with this suffix (optional).
           Return type Iterator
bolster.utils.aws.get_matching_s3_keys(bucket, **kwargs)
    Generate the keys in an S3 bucket. https://alexwlchan.net/2018/01/listing-s3-keys-redux/
           Parameters
                  • bucket (AnyStr) – Name of the S3 bucket.
                  • prefix – Only fetch keys that start with this prefix (optional).
                  • suffix – Only fetch keys that end with this suffix (optional).
           Return type Iterator
bolster.utils.aws.select_from_csv(bucket, key, fields, client=None)

7.1. bolster                                                                                                   19
Bolster Documentation, Release 0.1.1

           Return type List
bolster.utils.aws.get_latest_key(prefix, bucket, key=None, client=None)
    Walk a given S3 bucket for the lexicographically highest item in the given bucket (defaults to the analysis store
    defined in utils.env)
      Accepts a key callable that can be used to decide how the candidate keys are sorted.
      For example, to use loose-versioning, distutils.version.LooseVersion can be passed as the key argument
           Parameters
                  • prefix (str) – param bucket:
                  • key (Optional[Callable]) – return:
                  • prefix – str:
                  • bucket (str) – str: (Default value = S3_ANALYSIS_STORE)
                  • key – Optional[Callable]: (Default value = None)
           Return type str
      Returns:
bolster.utils.aws.get_sqs_client()
bolster.utils.aws.send_to_sqs(records, queue, chunksize=1, client=None)
    Send records in chunks of chunksize for a given sqs queue in json-serialised format
           Parameters
                  • records (Iterator) – param queue:
                  • chunksize (int) – return:
                  • records – Iterator:
                  • queue (str) – str:
                  • chunksize – int: (Default value = 1)
           Return type None
      Returns:
bolster.utils.aws._ssm_params
bolster.utils.aws.get_ssm_client()
bolster.utils.aws.get_ssm_param(param_name, client=None)
    Locally memoized getter for configuration parameters stored in the AWS “Simple Systems Manager” (now just
    systems manager) Parameter Store
           Parameters
                  • param_name (str) – return:
                  • param_name – str:
           Return type str
      Returns:
bolster.utils.aws.fh_json_decode(content)
    Customised JSON Decoder for consuming Firehose batched records;

20                                                                                     Chapter 7. API Reference
Bolster Documentation, Release 0.1.1

      Firehose doesn’t include entry separators between entries, so we intercept the raw_decoder on JSONDecodeEr-
      ror and ‘skip’ over the ‘where is my comma?’ issue and continue to parse the rest of the content until we reach
      the end of the given content string.
           Parameters content (AnyStr) – AnyStr:
           Return type Iterator[Union[Dict, List]]
      Returns:

      >>> list(fh_json_decode('{"test":"value"}{"test":"othervalue"}'))
      [{'test': 'value'}, {'test': 'othervalue'}]

bolster.utils.aws.decapsulate_kinesis_payloads(event)
    Decapsulate base64 encoded kinesis data records to a list
           Parameters event (Dict) – Dict:
           Return type List[Dict]
      Returns:
bolster.utils.aws.iterate_kinesis_payloads(event)
    Iterate over a base64 encoded kinesis data record, yielding entries
           Parameters
                  • event (Dict) – return:
                  • event – Dict:
           Return type Generator[Dict, None, None]
      Returns:
class bolster.utils.aws.KinesisLoader(batch_size=500,                                maximum_records=None,
                                      stream=None)
    Bases: object
      Kinesis batchwise insertion handler with chunking and retry
      generate_and_submit(self, items, partition_key=None)
          Submit batches of items to the configured stream
                 Parameters
                      • items (Iterator) – param partition_key:
                      • items – Iterator:
                      • partition_key (str) – str: (Default value = None)
                 Return type SupportsInt
           Returns:
      submit_batch_until_successful(self, this_batch, response)
          If needed, retry a batch of records, backing off exponentially until it goes through
                 Parameters
                      • this_batch (List) – List:
                      • response (Dict) – Dict:
           Returns:

7.1. bolster                                                                                                      21
Bolster Documentation, Release 0.1.1

bolster.utils.aws.send_to_kinesis(records, stream, partition_key=None)
    Accessory function for the KinesisLoader class
           Parameters
                 • records (Iterator[Sequence]) – Iterator[Sequence]:
                 • stream (str) – str:
                 • partition_key (str) – str: (Default value = None)
           Return type int
     Returns:
bolster.utils.aws.get_sns_client()
bolster.utils.aws.invoke_self_async(event, context)
    Have the Lambda invoke itself asynchronously, passing the same event it received originally, and tagging the
    event as ‘async’ so it’s actually processed
     THIS DOES NOT WORK FROM WITHIN A VPC! (There is no lambda-invoke endpoint accessible without
     poking lots of holes in the VPC.
           Parameters
                 • event (Dict) – Dict:
                 • context (Any) – Any:
     Returns:
bolster.utils.aws.query(q, redshift_conn_dict, named_cursor='bolster_query_cursor', **kwargs)
    Helper for making queries to redshift (or any postgres compatible backend)

     {
         "user":"USERNAME",
         "host":"HOSTNAME",
         "connect_timeout":3,
         "dbname":"DATABASE",
         "port":5439,
         "password":"SUPERSECRETPASSWORD1111"
     }

     This function implements the ‘is_local’ check if it is getting it’s configuration dictionary from the parameter
     store, and will overwrite the ‘host’ in the store with a resolvable hostname for the ALDS datastore.
     Basically, if you’re not working on ALDS, in a few very specific locations, or are outside the ALDS VPC, give
     this a sensible dictionary.
     kwargs are passed through as vars to the SQL execution, i.e. to be used with substitution queries, eg:

     query("select * from table where id = %(my_id)s", my_id = 14228)

     NOTE! If you use % wildcards (i.e. LIKE ‘%string’), you’re gonna have a bad time. . . (Use the POSIX regex
     instead: https://docs.aws.amazon.com/redshift/latest/dg/pattern-matching-conditions-posix.html)
           Parameters
                 • q (str) – param redshift_conn_dict:
                 • kwargs – return:
                 • q – str:
                 • redshift_conn_dict (dict) – dict: (Default value = None)

22                                                                                    Chapter 7. API Reference
Bolster Documentation, Release 0.1.1

                  • **kwargs –
           Return type Iterator[Dict]
      Returns:
bolster.utils.aws.SQSWrapper(event, context, queuename, function, timeout=60000, rein-
                             vokelimit=10, maxmessages=1, raise_exceptions=True, dedupli-
                             cate=False, fkwargs={}, client=None)

Submodules

bolster.utils.deco

Module Contents

Functions

 timed(func)                                               This decorator prints the execution time for the deco-
                                                           rated function.

bolster.utils.deco.timed(func)
    This decorator prints the execution time for the decorated function.

bolster.utils.dt

Module Contents

Functions

 round_to_week(dt)                                         Return a date for the Monday before the given date
 round_to_month(dt)                                        Return a date for the first day of the month of a given
                                                           date
 utc_midnight_on(dt)                                       Some services don’t like timezones, so this helper func-
                                                           tion converts datetime.date and

bolster.utils.dt.round_to_week(dt)
    Return a date for the Monday before the given date
           Parameters
                  • dt (Union[datetime.datetime, datetime.date]) – return:
                  • dt – datetime:
           Return type datetime.date
      Returns:

      >>> round_to_week(datetime(2018,8,9,12,1))
      datetime.date(2018, 8, 6)
                                                                                                (continues on next page)

7.1. bolster                                                                                                        23
Bolster Documentation, Release 0.1.1

                                                                                            (continued from previous page)
      >>> round_to_week(date(2018,8,9))
      datetime.date(2018, 8, 6)

bolster.utils.dt.round_to_month(dt)
    Return a date for the first day of the month of a given date
           Parameters
                  • dt (Union[datetime.datetime, datetime.date]) – return:
                  • dt – datetime:
           Return type datetime.date
      Returns:

      >>> round_to_month(datetime(2018,8,9,12,1))
      datetime.date(2018, 8, 1)
      >>> round_to_month(date(2018,8,9))
      datetime.date(2018, 8, 1)

bolster.utils.dt.utc_midnight_on(dt)
    Some services don’t like timezones, so this helper function converts datetime.date and datetime.datetime objects
    to a datetime.datetime object corresponding to UTC Midnight on that date.
      Pays primary attention to the actual Date of the input, regardless of if the combination of given-time and time-
      zone would roll over into another date.
           Parameters
                  • dt (datetime.datetime) – return:
                  • dt – datetime:
           Return type datetime.datetime

      >>> utc_midnight_on(datetime(2018,9,1,12,12))
      datetime.datetime(2018, 9, 1, 0, 0, tzinfo=datetime.timezone.utc)

      >>> utc_midnight_on(datetime(2018,9,1,12,12, tzinfo=timezone(timedelta(hours=-
       ˓→13))))

      datetime.datetime(2018, 9, 1, 0, 0, tzinfo=datetime.timezone.utc)

bolster.utils.web

Module Contents

Functions

 download_extract_zip(url)                                  Download a ZIP file and extract its contents in memory

bolster.utils.web.download_extract_zip(url)
    Download a ZIP file and extract its contents in memory yields (filename, file-like object) pairs

24                                                                                     Chapter 7. API Reference
Bolster Documentation, Release 0.1.1

7.1.2 Submodules

bolster.cli

Console script for bolster.

Module Contents

Functions

 main(args=None)                                       Console script for bolster.

bolster.cli.main(args=None)
    Console script for bolster.

7.1.3 Package Contents

Classes

 memoize                                               cache the return value of a method

Functions

 _dumb_passthrough(x, **kwargs)                        Pointless passthrough replacement for tqdm (and simi-
                                                       lar) fallback
 always(x, **kwargs)                                   Pointless passthrough replacement for ‘always true’ fil-
                                                       tering
 poolmap(f, iterable, max_workers = None, progress =   Helper function to encapsulate a ThreadPoolExecutor
 None, **kwargs)                                       mapped function workflow
 batch(seq, n = 1)                                     Split a sequence into n-length batches (is still iterable,
                                                       not list)
 chunks(iterable, size=10)                             Outputs  chunks of size N from an iterable (gen-
                                                       erator)
 arg_exception_logger(func)                            Helper Decorator to provide info on the arguments that
                                                       cause the exception of a wrapped function
 backoff(exception_to_check, tries = 5, delay = 0.2,   Retry calling the decorated function using an exponen-
 backoff = 2, logger = logger)                         tial backoff.
 tag_gen(seq, **kwargs)                                Generator stream that adds a kwargs to each entry
                                                       yielded
 exceptional_executor(futures,               excep-    Generator for concurrent.Futures handling
 tion_handler=None, timeout=None)
 working_directory(path)                               Contextmanager that changes working directory and re-
                                                       turns to previous on exit.
 compress_for_relay(obj)                               Compress json-serializable object to a gzipped base64
                                                       string
                                                                                   continues on next page

7.1. bolster                                                                                                   25
Bolster Documentation, Release 0.1.1

                                   Table 11 – continued from previous page
 decompress_from_relay(msg)                             Uncompress gzipped base64 string to a json-serializable
                                                        object
 pretty_print_request(req, expose_auth=False, At this point it is completely built and ready
 authentication_header_blacklist = None)
 get_recursively(search_dict, field)                    Takes a dict with nested lists and dicts,
 transform_(r, rule_keys)                               Generic Item-wise transformation function;
 diff(new, old, excluded_fields = None)                 Perform a one-depth diff of a pair of dictionaries
 aggregate(base, group_key, item_key, condition = Abstracted groupby-sum for lists of dicts
 None)
 breadth(d)                                             Get the total ‘width’ of a tree
 depth(d)                                               Get the maximum depth of a tree
 set_keys(d)                                            Extract the set of all keys of a nested dict/tree
 keys_at(d, n, i = 0)                                   Extract the keys of a tree at a given depth
 items_at(d, n, i = 0)                                  Extract the elements from a tree at a given depth
 leaves(d)                                              Iterate on the leaves of a tree
 leaf_paths(d, path = None)

 flatten_dict(d, head = '', sep = ':')

 uncollect_object(d)

 dict_concat_safe(d, keys, default = None)                Really Lazy Func because dict.get(‘key’,default) is a
                                                          pain in the ass for lists

Attributes

 __author__

 __email__

 __version__

 logger

bolster.__author__ = Andrew Bolster
bolster.__email__ = me@andrewbolster.info
bolster.__version__ = 0.1.1
bolster.logger
bolster._dumb_passthrough(x, **kwargs)
    Pointless passthrough replacement for tqdm (and similar) fallback
           Parameters x – return:
      Returns:
bolster.always(x, **kwargs)
    Pointless passthrough replacement for ‘always true’ filtering

26                                                                                 Chapter 7. API Reference
Bolster Documentation, Release 0.1.1

      >>> always('false')
      True
      >>> always(False)
      True
      >>> always(True)
      True

            Return type bool

bolster.poolmap(f, iterable, max_workers=None, progress=None, **kwargs)
    Helper function to encapsulate a ThreadPoolExecutor mapped function workflow Accepts (assumed to be tqdm
    style) progress monitor callback
      kwargs are passed identically to all f(i) calls for each i in iterable
            Parameters
                   • f (Callable) – function to map across
                   • iterable (Iterable) –
                   • max_workers (Optional[int]) – (Default value = None)
                   • progress (Callable) – (Default value = None)
                   • **kwargs – passed as arguments to f
            Return type Dict
      Returns:
bolster.batch(seq, n=1)
    Split a sequence into n-length batches (is still iterable, not list)
            Parameters
                   • seq (Sequence) –
                   • n (int) – (Default value = 1)
            Return type Generator[Iterable, None, None]
      Returns:

      >>> next((b for b in batch(range(10), 2)))
      range(0, 2)
      >>> [b for b in batch(list(range(10)), 2)]
      [[0, 1], [2, 3], [4, 5], [6, 7], [8, 9]]

bolster.chunks(iterable, size=10)
    Outputs  chunks of size N from an iterable (generator)
            Parameters
                   • iterable (Iterable) – param size:
                   • iterable – Iterable:
                   • size – (Default value = 10)
            Return type Generator[List, None, None]
      Returns:

7.1. bolster                                                                                                    27
Bolster Documentation, Release 0.1.1

      >>> next((b for b in chunks(range(10), 2)))
      [0, 1]
      >>> [b for b in chunks(list(range(10)), 2)]
      [[0, 1], [2, 3], [4, 5], [6, 7], [8, 9]]

bolster.arg_exception_logger(func)
    Helper Decorator to provide info on the arguments that cause the exception of a wrapped function
             Parameters func (Callable) –
             Return type Callable
      Returns:
bolster.backoff(exception_to_check, tries=5, delay=0.2, backoff=2, logger=logger)
    Retry calling the decorated function using an exponential backoff.
      http://www.saltycrane.com/blog/2009/11/trying-out-retry-decorator-python/ original from: http://wiki.python.
      org/moin/PythonDecoratorLibrary#Retry
             Parameters
                   • exception_to_check                               (Union[BaseException,
                     Sequence[BaseException]]) – the exception to check. may be a tuple of
                   • tries (SupportsInt) –
                   • delay (SupportsFloat) –
                   • backoff (SupportsFloat) –
                   • logger (Optional[logging.Logger]) –

      exceptions to check tries: number of times to try (not retry) before giving up (Default value = 5) delay: initial
           delay between retries in seconds (Default value = 0.4) backoff: backoff multiplier e.g. value of 2 will
           double the delay
      each retry (Default value = 2) logger: logger to use. If None, print (Default value = local utils logger)

      Returns:
exception bolster.MultipleErrors(errors=None)
    Bases: BaseException
      Exception Class to enable the capturing of multiple exceptions without interrupting control flow, i.e. catch the
      exception, but carry on and report the exceptions at the end.
      E.g.

      exceptions = MultipleErrors()
      try:
           do_risky_thing_with(this) #raises ValueError
      except:
           exceptions.capture_current_exception()
      try:
           do_other_thing_with(this) #raises AttributeError
      except:
           exceptions.capture_current_exception()
      exceptions.do_raise()

28                                                                                     Chapter 7. API Reference
Bolster Documentation, Release 0.1.1

      Traceback (most recent call last):
         ....
     Value Error

     Traceback (most recent call last):
         ...
     AttributeError

     classmethod _traceback_for(cls, exc_info)
         Formatting!
     __str__(self )
         Return str(self).
     capture_current_exception(self )
         Gathers exception info from the current context and retains it
     do_raise(self )
         Raises itself if it contains any errors
bolster.tag_gen(seq, **kwargs)
    Generator stream that adds a kwargs to each entry yielded
     The below example shows the creation of an empty dict generator where tag_gen is used to insert a new key/value
     (k=1) in each item on the fly

     >>> all([i['k'] == 1 for i in tag_gen(({} for _ in range(4)), k=1)])
     True

           Parameters
                 • seq (Iterator[Dict]) – param kwargs:
                 • seq – Iterator[Dict]:
                 • **kwargs –
           Return type Iterator[Dict]

bolster.exceptional_executor(futures, exception_handler=None, timeout=None)
    Generator for concurrent.Futures handling
     When an exception is raised in an executing Future, f.result() called on it’s own will raise that exception in the
     parent thread, killing execution and causing loss of ‘future local’ scope.
     Instead, query the future for it’s exception state first, and handle that separately, by default by logging it as an
     exception.
           Parameters
                 • futures (Sequence[concurrent.futures.Future]) –
                 • exception_handler –
                 • timeout –
           Return type Iterator
     Returns:
bolster.working_directory(path)
    Contextmanager that changes working directory and returns to previous on exit.

7.1. bolster                                                                                                          29
Bolster Documentation, Release 0.1.1

           Parameters path (Union[str, pathlib.Path]) – Union[str: Path]:
           Return type contextlib.AbstractContextManager
bolster.compress_for_relay(obj)
    Compress json-serializable object to a gzipped base64 string
           Parameters
                  • obj (Union[List, Dict]) – return:
                  • obj – Union[List,Dict]:
           Return type AnyStr

      >>> decompress_from_relay(compress_for_relay(['test']))
      ['test']

      >>> decompress_from_relay(compress_for_relay({'test':'test'}))
      {'test': 'test'}

bolster.decompress_from_relay(msg)
    Uncompress gzipped base64 string to a json-serializable object [‘test’]
           Parameters msg (AnyStr) – AnyStr:
           Return type Union[List, Dict]
      Returns:
class bolster.memoize(func)
    Bases: object
      cache the return value of a method
      This class is meant to be used as a decorator of methods. The return value from a given method invocation
      will be cached on the instance whose method was invoked. All arguments passed to a method decorated with
      memoize must be hashable.
      If a memoized method is invoked directly on its class the result will not be cached. Instead the method will be
      invoked like a static method:

      class Obj(object):
          @memoize
          def add_to(self, arg):
          return self + arg

      Obj.add_to(1) # not enough arguments
      Obj.add_to(1, 2) # returns 3, result is not cached

      Source: http://code.activestate.com/recipes/577452-a-memoize-decorator-for-instance-methods/
      Augmented with cache hit/miss population Counters
      __get__(self, obj, objtype=None)
      __call__(self, *args, **kw)
bolster.pretty_print_request(req, expose_auth=False, authentication_header_blacklist=None)
    At this point it is completely built and ready to be fired; it is “prepared”.
      However pay attention at the formatting used in this function because it is programmed to be pretty printed and
      may differ from the actual request.

30                                                                                    Chapter 7. API Reference
Bolster Documentation, Release 0.1.1

            Parameters
                   • req –
                   • expose_auth – (Default value = False)
                   • authentication_header_blacklist (Optional[List]) –
            Return type None
      Returns:
bolster.get_recursively(search_dict, field)
    Takes a dict with nested lists and dicts, and searches all dicts for a key of the field provided.
      Originally taken from https://stackoverflow.com/a/20254842
            Parameters
                   • search_dict (Dict) – Dict:
                   • field (str) – str:
            Return type List
      Returns:

      >>> get_recursively({'id' : 5,'children' : {'id' : 6,'children' : {'id' : 7,
       ˓→'children' : {}}}}, 'id')

      [5, 6, 7]

bolster.transform_(r, rule_keys)
    Generic Item-wise transformation function; The values in r are updated based on key-matching in rule_keys, i.e.
    -> out[k] = rule_keys[k] (r[k])
      HOWEVER, this can do more that straight callable mapping; can also update the key, i.e., for a given rule such
      that R = rule_keys[k]:
      R can be used to select that field to be selected in the output >>> r = {‘a’:’1’,’b’:’2’,’c’:’3’} >>> transform_(r,
      {‘a’:None}) {‘a’: ‘1’}
      Rename a key >>> transform_(r, {‘a’:(‘A’,None)}) {‘A’: ‘1’}
      Apply a function to a key’s value >>> transform_(r, {‘a’:(‘a’,int)}) {‘a’: 1}
      Or a combination of these >>> transform_(r, {‘a’:(‘A’,int), ‘b’:None}) {‘A’: 1, ‘b’: ‘2’}
            Parameters
                   • r (Dict) –
                   • rule_keys (Dict[AnyStr, Optional[Tuple]]) –
            Return type Dict
bolster.diff(new, old, excluded_fields=None)
    Perform a one-depth diff of a pair of dictionaries
            Parameters
                   • new (Dict) –
                   • old (Dict) –
                   • excluded_fields (Optional[set]) –
            Return type Dict

7.1. bolster                                                                                                          31
Bolster Documentation, Release 0.1.1

bolster.aggregate(base, group_key, item_key, condition=None)
    Abstracted groupby-sum for lists of dicts operationally equivalent to ` df = pd.DataFrame(base) df.
    where(condition).groupby(group_key)[item_key].sum() `
            Parameters
                  • base (List[Dict]) –
                  • group_key (Union[AnyStr, Tuple[AnyStr], List[AnyStr]]) –
                  • item_key (AnyStr) –
                  • condition (Optional[Callable]) –
      Returns:
bolster.breadth(d)
    Get the total ‘width’ of a tree
      > Why was this a thing? No idea
bolster.depth(d)
    Get the maximum depth of a tree
            Parameters d (Dict) –
            Return type SupportsInt
bolster.set_keys(d)
    Extract the set of all keys of a nested dict/tree
            Parameters d (Dict) –
            Return type Set
bolster.keys_at(d, n, i=0)
    Extract the keys of a tree at a given depth
            Parameters
                  • d (Dict) –
                  • n (SupportsInt) –
                  • i (SupportsInt) –
            Return type Iterator
bolster.items_at(d, n, i=0)
    Extract the elements from a tree at a given depth
            Parameters
                  • d (Dict) –
                  • n (SupportsInt) –
                  • i (SupportsInt) –
            Return type Iterator[Tuple]
bolster.leaves(d)
    Iterate on the leaves of a tree
            Parameters d (Dict) –
            Return type Iterator
bolster.leaf_paths(d, path=None)

32                                                                          Chapter 7. API Reference
Bolster Documentation, Release 0.1.1

           Parameters
                  • d (Dict) –
                  • path (Optional[List]) –
           Return type Iterator[Tuple[List, Dict]]
bolster.flatten_dict(d, head='', sep=':')
           Parameters
                  • d (Dict) –
                  • head (AnyStr) –
                  • sep (AnyStr) –
           Return type Dict
bolster.uncollect_object(d)
           Parameters d (Dict) –
           Return type Dict
bolster.dict_concat_safe(d, keys, default=None)
    Really Lazy Func because dict.get(‘key’,default) is a pain in the ass for lists
           Parameters
                  • d (Dict) –
                  • keys (List[Hashable]) –
                  • default (Optional) –
           Return type Iterator

7.1. bolster                                                                                              33
Bolster Documentation, Release 0.1.1

34                                     Chapter 7. API Reference
CHAPTER

                         EIGHT

             INDICES AND TABLES

• genindex
• modindex
• search

                              35
Bolster Documentation, Release 0.1.1

36                                     Chapter 8. Indices and tables
PYTHON MODULE INDEX

b
bolster, 15
bolster.cli, 25
bolster.data_sources, 15
bolster.data_sources.companies_house,
       15
bolster.stats, 16
bolster.stats.distributions, 16
bolster.utils, 16
bolster.utils.aws, 16
bolster.utils.deco, 23
bolster.utils.dt, 23
bolster.utils.web, 24

                                                         37
Bolster Documentation, Release 0.1.1

38                                     Python Module Index
INDEX

Symbols                                          bolster.utils.web
__author__ (in module bolster), 26                  module, 24
__call__() (bolster.memoize method), 30          breadth() (in module bolster), 32
__email__ (in module bolster), 26
__get__() (bolster.memoize method), 30           C
__str__() (bolster.MultipleErrors method), 29    capture_current_exception()                   (bol-
__version__ (in module bolster), 26                     ster.MultipleErrors method), 29
_dumb_passthrough() (in module bolster), 26      check_s3() (in module bolster.utils.aws), 19
_ssm_params (in module bolster.utils.aws), 20    chunks() (in module bolster), 27
_traceback_for() (bolster.MultipleErrors class   chunks() (in module bolster.utils.aws), 18
       method), 29                               companies_house_record_might_be_farset()
                                                        (in                 module              bol-
A                                                       ster.data_sources.companies_house), 16
aggregate() (in module bolster), 31              compress_for_relay() (in module bolster), 30
always() (in module bolster), 26
arg_exception_logger() (in module bolster), 28   D
                                                decapsulate_kinesis_payloads() (in module
B                                                      bolster.utils.aws), 21
backoff() (in module bolster), 28               decompress_from_relay() (in module bolster),
batch() (in module bolster), 27                        30
best_fit_distribution() (in module         bol- depth() (in module bolster), 32
       ster.stats.distributions), 16            dict_concat_safe() (in module bolster), 33
bolster                                         diff() (in module bolster), 31
   module, 15                                   do_raise() (bolster.MultipleErrors method), 29
bolster.cli                                     download_extract_zip() (in module bol-
   module, 25                                          ster.utils.web), 24
bolster.data_sources
   module, 15
                                                 E
bolster.data_sources.companies_house             exceptional_executor() (in module bolster), 29
   module, 15
bolster.stats                                    F
   module, 16                                    fh_json_decode() (in module bolster.utils.aws), 20
bolster.stats.distributions                      flatten_dict() (in module bolster), 33
   module, 16
bolster.utils                                    G
   module, 16                                    generate_and_submit()                           (bol-
bolster.utils.aws                                       ster.utils.aws.KinesisLoader method), 21
   module, 16                                    get_basic_company_data_url() (in module
bolster.utils.deco                                      bolster.data_sources.companies_house), 15
   module, 23                                    get_companies_house_records_that_might_be_in_farset
bolster.utils.dt                                        (in                 module                bol-
   module, 23                                           ster.data_sources.companies_house), 16

                                                                                                 39
Bolster Documentation, Release 0.1.1

get_latest_key() (in module bolster.utils.aws), 20    Q
get_matching_s3_keys() (in module bol-                query() (in module bolster.utils.aws), 22
       ster.utils.aws), 19                            query_basic_company_data() (in module bol-
get_matching_s3_objects() (in module bol-                    ster.data_sources.companies_house), 15
       ster.utils.aws), 19
get_recursively() (in module bolster), 31             R
get_s3() (in module bolster.utils.aws), 19
                                                      round_to_month() (in module bolster.utils.dt), 24
get_s3_client() (in module bolster.utils.aws), 18
                                                      round_to_week() (in module bolster.utils.dt), 23
get_sns_client() (in module bolster.utils.aws), 22
get_sqs_client() (in module bolster.utils.aws), 20    S
get_ssm_client() (in module bolster.utils.aws), 20
get_ssm_param() (in module bolster.utils.aws), 20   select_from_csv() (in module bolster.utils.aws),
                                                           19
I                                                   send_to_kinesis()        (in module bolster.utils.aws),
                                                           21
invoke_self_async()          (in      module   bol-
                                                    send_to_sqs() (in module bolster.utils.aws), 20
       ster.utils.aws), 22
                                                    session (in module bolster.utils.aws), 18
items_at() (in module bolster), 32
                                                    set_keys() (in module bolster), 32
iterate_kinesis_payloads() (in module bol-
                                                    SQSWrapper() (in module bolster.utils.aws), 23
       ster.utils.aws), 21
                                                    start_session() (in module bolster.utils.aws), 18
                                                    submit_batch_until_successful()                   (bol-
K                                                          ster.utils.aws.KinesisLoader method), 21
keys_at() (in module bolster), 32
KinesisLoader (class in bolster.utils.aws), 21      T
                                                      tag_gen() (in module bolster), 29
L                                                     timed() (in module bolster.utils.deco), 23
leaf_paths() (in module bolster), 32                  transform_() (in module bolster), 31
leaves() (in module bolster), 32
logger (in module bolster), 26                        U
logger (in module bolster.utils.aws), 18
                                                      uncollect_object() (in module bolster), 33
                                                      utc_midnight_on() (in module bolster.utils.dt), 24
M
main() (in module bolster.cli), 25       W
memoize (class in bolster), 30
                                         working_directory() (in module bolster), 29
module
   bolster, 15
   bolster.cli, 25
   bolster.data_sources, 15
   bolster.data_sources.companies_house,
       15
   bolster.stats, 16
   bolster.stats.distributions, 16
   bolster.utils, 16
   bolster.utils.aws, 16
   bolster.utils.deco, 23
   bolster.utils.dt, 23
   bolster.utils.web, 24
MultipleErrors, 28

P
poolmap() (in module bolster), 27
pretty_print_request() (in module bolster), 30
put_s3() (in module bolster.utils.aws), 18

40                                                                                                   Index
You can also read