Bolster Documentation - Release 0.1.1 Andrew Bolster
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Bolster Documentation Release 0.1.1 Andrew Bolster Aug 20, 2021
CONTENTS: 1 Bolster 1 1.1 Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Credits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 Installation 3 2.1 Stable release . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.2 From sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 3 Usage 5 4 Contributing 7 4.1 Types of Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 4.2 Get Started! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 4.3 Pull Request Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 4.4 Tips . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 4.5 Deploying . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 5 Credits 11 5.1 Development Lead . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 5.2 Contributors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 6 History 13 6.1 0.1.0 (2021-03-11) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 6.2 0.1.1 (2021-05-18) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 7 API Reference 15 7.1 bolster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 8 Indices and tables 35 Python Module Index 37 Index 39 i
ii
CHAPTER ONE BOLSTER Bolster’s Brain, you’ve been warned • Free software: GNU General Public License v3 • Documentation: https://bolster.readthedocs.io. 1.1 Features • Efficient tree/node traversal and iteration • Datetime helpers • Concurrecy Helpers • Web safe Encapsulation/Decapsulation helpers • pandas-esque aggregate/transform_r functions • “Best Practice” AWS service handling 1.2 Credits This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template. 1
Bolster Documentation, Release 0.1.1 2 Chapter 1. Bolster
CHAPTER TWO INSTALLATION 2.1 Stable release To install Bolster, run this command in your terminal: $ pip install bolster This is the preferred method to install Bolster, as it will always install the most recent stable release. If you don’t have pip installed, this Python installation guide can guide you through the process. 2.2 From sources The sources for Bolster can be downloaded from the Github repo. You can either clone the public repository: $ git clone git://github.com/andrewbolster/bolster Or download the tarball: $ curl -OJL https://github.com/andrewbolster/bolster/tarball/master Once you have a copy of the source, you can install it with: $ python setup.py install 3
Bolster Documentation, Release 0.1.1 4 Chapter 2. Installation
CHAPTER THREE USAGE To use Bolster in a project: import bolster 5
Bolster Documentation, Release 0.1.1 6 Chapter 3. Usage
CHAPTER FOUR CONTRIBUTING Contributions are welcome, and they are greatly appreciated! Every little bit helps, and credit will always be given. You can contribute in many ways: 4.1 Types of Contributions 4.1.1 Report Bugs Report bugs at https://github.com/andrewbolster/bolster/issues. If you are reporting a bug, please include: • Your operating system name and version. • Any details about your local setup that might be helpful in troubleshooting. • Detailed steps to reproduce the bug. 4.1.2 Fix Bugs Look through the GitHub issues for bugs. Anything tagged with “bug” and “help wanted” is open to whoever wants to implement it. 4.1.3 Implement Features Look through the GitHub issues for features. Anything tagged with “enhancement” and “help wanted” is open to whoever wants to implement it. 4.1.4 Write Documentation Bolster could always use more documentation, whether as part of the official Bolster docs, in docstrings, or even on the web in blog posts, articles, and such. 7
Bolster Documentation, Release 0.1.1 4.1.5 Submit Feedback The best way to send feedback is to file an issue at https://github.com/andrewbolster/bolster/issues. If you are proposing a feature: • Explain in detail how it would work. • Keep the scope as narrow as possible, to make it easier to implement. • Remember that this is a volunteer-driven project, and that contributions are welcome :) 4.2 Get Started! Ready to contribute? Here’s how to set up bolster for local development. 1. Fork the bolster repo on GitHub. 2. Clone your fork locally: $ git clone git@github.com:your_name_here/bolster.git 3. Install your local copy into a virtualenv. Assuming you have virtualenvwrapper installed, this is how you set up your fork for local development: $ mkvirtualenv bolster $ cd bolster/ $ python setup.py develop 4. Create a branch for local development: $ git checkout -b name-of-your-bugfix-or-feature Now you can make your changes locally. 5. When you’re done making changes, check that your changes pass flake8 and the tests, including testing other Python versions with tox: $ flake8 bolster tests $ python setup.py test or pytest $ tox To get flake8 and tox, just pip install them into your virtualenv. 6. Commit your changes and push your branch to GitHub: $ git add . $ git commit -m "Your detailed description of your changes." $ git push origin name-of-your-bugfix-or-feature 7. Submit a pull request through the GitHub website. 8 Chapter 4. Contributing
Bolster Documentation, Release 0.1.1 4.3 Pull Request Guidelines Before you submit a pull request, check that it meets these guidelines: 1. The pull request should include tests. 2. If the pull request adds functionality, the docs should be updated. Put your new functionality into a function with a docstring, and add the feature to the list in README.rst. 3. The pull request should work for Python 3.5, 3.6, 3.7 and 3.8, and for PyPy. Check https://travis-ci.com/ andrewbolster/bolster/pull_requests and make sure that the tests pass for all supported Python versions. 4.4 Tips To run a subset of tests: $ pytest tests.test_bolster 4.5 Deploying A reminder for the maintainers on how to deploy. Make sure all your changes are committed (including an entry in HISTORY.rst). Then run: $ bump2version patch # possible: major / minor / patch $ git push $ git push --tags Travis will then deploy to PyPI if tests pass. 4.3. Pull Request Guidelines 9
Bolster Documentation, Release 0.1.1 10 Chapter 4. Contributing
CHAPTER FIVE CREDITS 5.1 Development Lead • Andrew Bolster 5.2 Contributors None yet. Why not be the first? 11
Bolster Documentation, Release 0.1.1 12 Chapter 5. Credits
CHAPTER SIX HISTORY 6.1 0.1.0 (2021-03-11) • First release on PyPI. 6.2 0.1.1 (2021-05-18) • Decouple from pg_config requirement (now using psycopg2-binary, which while less than ideal, can run in zee clouds) • Dependency updates for security vulns • Companies House API 13
Bolster Documentation, Release 0.1.1 14 Chapter 6. History
CHAPTER SEVEN API REFERENCE This page contains auto-generated API reference documentation1 . 7.1 bolster Top-level package for Bolster. 7.1.1 Subpackages bolster.data_sources Submodules bolster.data_sources.companies_house Module Contents Functions get_basic_company_data_url() Parse the companies house website to get the current URL for the ‘BasicCompanyData’ query_basic_company_data(query_func = al- Grab the url for the basic company data, and walk ways) through the CSV files within, and companies_house_record_might_be_farset(r)A heuristic function for working out if a record in the companies house registry might be based in Farset Labs get_companies_house_records_that_might_be_in_farset() bolster.data_sources.companies_house.get_basic_company_data_url() Parse the companies house website to get the current URL for the ‘BasicCompanyData’ Currently uses the ‘one file’ method but it could be split into the multi files for memory efficiency Return type AnyStr bolster.data_sources.companies_house.query_basic_company_data(query_func=always) Grab the url for the basic company data, and walk through the CSV files within, and for each row in each CSV 1 Created with sphinx-autoapi 15
Bolster Documentation, Release 0.1.1 file, parse the row data through the given query_func such that if query_func(row) is True it will be yielded Parameters query_func (Callable[Ellipsis, bool]) – Return type Iterator[Dict] bolster.data_sources.companies_house.companies_house_record_might_be_farset(r) A heuristic function for working out if a record in the companies house registry might be based in Farset Labs Almost certainly incomplete and needs more testing/validation Parameters r (Dict) – Return type bool bolster.data_sources.companies_house.get_companies_house_records_that_might_be_in_farset() Return type Iterator[Dict] bolster.stats Submodules bolster.stats.distributions Module Contents Functions best_fit_distribution(data, bins=200, Model data by finding best fit distribution to data ax=None, include_slow=False, discriminator='sse') bolster.stats.distributions.best_fit_distribution(data, bins=200, ax=None, in- clude_slow=False, discrimina- tor='sse') Model data by finding best fit distribution to data bolster.utils Subpackages bolster.utils.aws AWS based Asset handling Includes S3, Kinesis, SSM, SQS, Lambda self-invocation and Redshift querying helpers 16 Chapter 7. API Reference
Bolster Documentation, Release 0.1.1 Package Contents Classes KinesisLoader Kinesis batchwise insertion handler with chunking and retry Functions chunks(iterable, size=10) Outputs chunks of size N from an iterable (gen- erator) start_session(*args, restart=False, **kwargs) get_s3_client() put_s3(obj, key, bucket, keys=None, gzip = True, Take either a list of dicts (and dump them as csv to s3) client=None) or a get_s3(key, bucket, gzip = True, log_exception=True, Get Object from S3, generally with gzip decompression client=None) included. check_s3(key, bucket, client=None) https://www.peterbe.com/plog/ fastest-way-to-find-out-if-a-file-exists-in-s3 get_matching_s3_objects(bucket, prefix='', Generate objects in an S3 bucket. suffix='', client=None) get_matching_s3_keys(bucket, **kwargs) Generate the keys in an S3 bucket. select_from_csv(bucket, key, fields, client=None) get_latest_key(prefix, bucket, key = None, Walk a given S3 bucket for the lexicographically highest client=None) item in the given bucket (defaults to the analysis store get_sqs_client() send_to_sqs(records, queue, chunksize = 1, Send records in chunks of chunksize for a given sqs client=None) queue in json-serialised format get_ssm_client() get_ssm_param(param_name, client=None) Locally memoized getter for configuration parameters stored in the AWS “Simple Systems Manager” (now just fh_json_decode(content) Customised JSON Decoder for consuming Firehose batched records; decapsulate_kinesis_payloads(event) Decapsulate base64 encoded kinesis data records to a list iterate_kinesis_payloads(event) Iterate over a base64 encoded kinesis data record, yield- ing entries send_to_kinesis(records, stream, partition_key = Accessory function for the KinesisLoader class None) get_sns_client() invoke_self_async(event, context) Have the Lambda invoke itself asynchronously, passing the same event it received originally, continues on next page 7.1. bolster 17
Bolster Documentation, Release 0.1.1 Table 4 – continued from previous page query(q, redshift_conn_dict, Helper for making queries to redshift (or any postgres named_cursor='bolster_query_cursor', **kwargs) compatible backend) SQSWrapper(event, context, queuename, function, timeout=60000, reinvokelimit=10, maxmessages=1, raise_exceptions=True, deduplicate=False, fkwargs={}, client=None) Attributes logger session _ssm_params bolster.utils.aws.chunks(iterable, size=10) Outputs chunks of size N from an iterable (generator) Parameters • iterable (Iterable) – param size: • iterable – Iterable: • size – (Default value = 10) Return type Generator[List, None, None] Returns: >>> next((b for b in chunks(range(10), 2))) [0, 1] >>> [b for b in chunks(list(range(10)), 2)] [[0, 1], [2, 3], [4, 5], [6, 7], [8, 9]] bolster.utils.aws.logger bolster.utils.aws.session :Optional[boto3.Session] bolster.utils.aws.start_session(*args, restart=False, **kwargs) Return type boto3.Session bolster.utils.aws.get_s3_client() bolster.utils.aws.put_s3(obj, key, bucket, keys=None, gzip=True, client=None) Take either a list of dicts (and dump them as csv to s3) or a StringIO buffer (and dump-as-is to s3) Parameters • obj (Union[Sequence[Dict], io.StringIO]) – List of records to be written to CSV (or StringIO for direct upload): • key (str) – Destination Key • bucket (str) – Destination Bucket (Default value = S3_ANALYSIS_STORE) 18 Chapter 7. API Reference
Bolster Documentation, Release 0.1.1 • keys – List of expected keys, can be used to filter or set the order of key entry in the resultant file (Default value = None) • gzip (bool) – Compress the object (Default value = True) Return type dict Returns: bolster.utils.aws.get_s3(key, bucket, gzip=True, log_exception=True, client=None) Get Object from S3, generally with gzip decompression included. Parameters • key (str) – param bucket: • gzip (bool) – return: • key – str: • bucket (str) – str: (Default value = S3_ANALYSIS_STORE) • gzip – bool: (Default value = True) Return type io.StringIO Returns: bolster.utils.aws.check_s3(key, bucket, client=None) https://www.peterbe.com/plog/fastest-way-to-find-out-if-a-file-exists-in-s3 Parameters • key (str) – str: • bucket (str) – str: (Default value = S3_ANALYSIS_STORE) Return type bool Returns: bolster.utils.aws.get_matching_s3_objects(bucket, prefix='', suffix='', client=None) Generate objects in an S3 bucket. https://alexwlchan.net/2018/01/listing-s3-keys-redux/ Parameters • bucket (AnyStr) – Name of the S3 bucket. • prefix – Only fetch objects whose key starts with this prefix (optional). • suffix – Only fetch objects whose keys end with this suffix (optional). Return type Iterator bolster.utils.aws.get_matching_s3_keys(bucket, **kwargs) Generate the keys in an S3 bucket. https://alexwlchan.net/2018/01/listing-s3-keys-redux/ Parameters • bucket (AnyStr) – Name of the S3 bucket. • prefix – Only fetch keys that start with this prefix (optional). • suffix – Only fetch keys that end with this suffix (optional). Return type Iterator bolster.utils.aws.select_from_csv(bucket, key, fields, client=None) 7.1. bolster 19
Bolster Documentation, Release 0.1.1 Return type List bolster.utils.aws.get_latest_key(prefix, bucket, key=None, client=None) Walk a given S3 bucket for the lexicographically highest item in the given bucket (defaults to the analysis store defined in utils.env) Accepts a key callable that can be used to decide how the candidate keys are sorted. For example, to use loose-versioning, distutils.version.LooseVersion can be passed as the key argument Parameters • prefix (str) – param bucket: • key (Optional[Callable]) – return: • prefix – str: • bucket (str) – str: (Default value = S3_ANALYSIS_STORE) • key – Optional[Callable]: (Default value = None) Return type str Returns: bolster.utils.aws.get_sqs_client() bolster.utils.aws.send_to_sqs(records, queue, chunksize=1, client=None) Send records in chunks of chunksize for a given sqs queue in json-serialised format Parameters • records (Iterator) – param queue: • chunksize (int) – return: • records – Iterator: • queue (str) – str: • chunksize – int: (Default value = 1) Return type None Returns: bolster.utils.aws._ssm_params bolster.utils.aws.get_ssm_client() bolster.utils.aws.get_ssm_param(param_name, client=None) Locally memoized getter for configuration parameters stored in the AWS “Simple Systems Manager” (now just systems manager) Parameter Store Parameters • param_name (str) – return: • param_name – str: Return type str Returns: bolster.utils.aws.fh_json_decode(content) Customised JSON Decoder for consuming Firehose batched records; 20 Chapter 7. API Reference
Bolster Documentation, Release 0.1.1 Firehose doesn’t include entry separators between entries, so we intercept the raw_decoder on JSONDecodeEr- ror and ‘skip’ over the ‘where is my comma?’ issue and continue to parse the rest of the content until we reach the end of the given content string. Parameters content (AnyStr) – AnyStr: Return type Iterator[Union[Dict, List]] Returns: >>> list(fh_json_decode('{"test":"value"}{"test":"othervalue"}')) [{'test': 'value'}, {'test': 'othervalue'}] bolster.utils.aws.decapsulate_kinesis_payloads(event) Decapsulate base64 encoded kinesis data records to a list Parameters event (Dict) – Dict: Return type List[Dict] Returns: bolster.utils.aws.iterate_kinesis_payloads(event) Iterate over a base64 encoded kinesis data record, yielding entries Parameters • event (Dict) – return: • event – Dict: Return type Generator[Dict, None, None] Returns: class bolster.utils.aws.KinesisLoader(batch_size=500, maximum_records=None, stream=None) Bases: object Kinesis batchwise insertion handler with chunking and retry generate_and_submit(self, items, partition_key=None) Submit batches of items to the configured stream Parameters • items (Iterator) – param partition_key: • items – Iterator: • partition_key (str) – str: (Default value = None) Return type SupportsInt Returns: submit_batch_until_successful(self, this_batch, response) If needed, retry a batch of records, backing off exponentially until it goes through Parameters • this_batch (List) – List: • response (Dict) – Dict: Returns: 7.1. bolster 21
Bolster Documentation, Release 0.1.1 bolster.utils.aws.send_to_kinesis(records, stream, partition_key=None) Accessory function for the KinesisLoader class Parameters • records (Iterator[Sequence]) – Iterator[Sequence]: • stream (str) – str: • partition_key (str) – str: (Default value = None) Return type int Returns: bolster.utils.aws.get_sns_client() bolster.utils.aws.invoke_self_async(event, context) Have the Lambda invoke itself asynchronously, passing the same event it received originally, and tagging the event as ‘async’ so it’s actually processed THIS DOES NOT WORK FROM WITHIN A VPC! (There is no lambda-invoke endpoint accessible without poking lots of holes in the VPC. Parameters • event (Dict) – Dict: • context (Any) – Any: Returns: bolster.utils.aws.query(q, redshift_conn_dict, named_cursor='bolster_query_cursor', **kwargs) Helper for making queries to redshift (or any postgres compatible backend) { "user":"USERNAME", "host":"HOSTNAME", "connect_timeout":3, "dbname":"DATABASE", "port":5439, "password":"SUPERSECRETPASSWORD1111" } This function implements the ‘is_local’ check if it is getting it’s configuration dictionary from the parameter store, and will overwrite the ‘host’ in the store with a resolvable hostname for the ALDS datastore. Basically, if you’re not working on ALDS, in a few very specific locations, or are outside the ALDS VPC, give this a sensible dictionary. kwargs are passed through as vars to the SQL execution, i.e. to be used with substitution queries, eg: query("select * from table where id = %(my_id)s", my_id = 14228) NOTE! If you use % wildcards (i.e. LIKE ‘%string’), you’re gonna have a bad time. . . (Use the POSIX regex instead: https://docs.aws.amazon.com/redshift/latest/dg/pattern-matching-conditions-posix.html) Parameters • q (str) – param redshift_conn_dict: • kwargs – return: • q – str: • redshift_conn_dict (dict) – dict: (Default value = None) 22 Chapter 7. API Reference
Bolster Documentation, Release 0.1.1 • **kwargs – Return type Iterator[Dict] Returns: bolster.utils.aws.SQSWrapper(event, context, queuename, function, timeout=60000, rein- vokelimit=10, maxmessages=1, raise_exceptions=True, dedupli- cate=False, fkwargs={}, client=None) Submodules bolster.utils.deco Module Contents Functions timed(func) This decorator prints the execution time for the deco- rated function. bolster.utils.deco.timed(func) This decorator prints the execution time for the decorated function. bolster.utils.dt Module Contents Functions round_to_week(dt) Return a date for the Monday before the given date round_to_month(dt) Return a date for the first day of the month of a given date utc_midnight_on(dt) Some services don’t like timezones, so this helper func- tion converts datetime.date and bolster.utils.dt.round_to_week(dt) Return a date for the Monday before the given date Parameters • dt (Union[datetime.datetime, datetime.date]) – return: • dt – datetime: Return type datetime.date Returns: >>> round_to_week(datetime(2018,8,9,12,1)) datetime.date(2018, 8, 6) (continues on next page) 7.1. bolster 23
Bolster Documentation, Release 0.1.1 (continued from previous page) >>> round_to_week(date(2018,8,9)) datetime.date(2018, 8, 6) bolster.utils.dt.round_to_month(dt) Return a date for the first day of the month of a given date Parameters • dt (Union[datetime.datetime, datetime.date]) – return: • dt – datetime: Return type datetime.date Returns: >>> round_to_month(datetime(2018,8,9,12,1)) datetime.date(2018, 8, 1) >>> round_to_month(date(2018,8,9)) datetime.date(2018, 8, 1) bolster.utils.dt.utc_midnight_on(dt) Some services don’t like timezones, so this helper function converts datetime.date and datetime.datetime objects to a datetime.datetime object corresponding to UTC Midnight on that date. Pays primary attention to the actual Date of the input, regardless of if the combination of given-time and time- zone would roll over into another date. Parameters • dt (datetime.datetime) – return: • dt – datetime: Return type datetime.datetime >>> utc_midnight_on(datetime(2018,9,1,12,12)) datetime.datetime(2018, 9, 1, 0, 0, tzinfo=datetime.timezone.utc) >>> utc_midnight_on(datetime(2018,9,1,12,12, tzinfo=timezone(timedelta(hours=- ˓→13)))) datetime.datetime(2018, 9, 1, 0, 0, tzinfo=datetime.timezone.utc) bolster.utils.web Module Contents Functions download_extract_zip(url) Download a ZIP file and extract its contents in memory bolster.utils.web.download_extract_zip(url) Download a ZIP file and extract its contents in memory yields (filename, file-like object) pairs 24 Chapter 7. API Reference
Bolster Documentation, Release 0.1.1 7.1.2 Submodules bolster.cli Console script for bolster. Module Contents Functions main(args=None) Console script for bolster. bolster.cli.main(args=None) Console script for bolster. 7.1.3 Package Contents Classes memoize cache the return value of a method Functions _dumb_passthrough(x, **kwargs) Pointless passthrough replacement for tqdm (and simi- lar) fallback always(x, **kwargs) Pointless passthrough replacement for ‘always true’ fil- tering poolmap(f, iterable, max_workers = None, progress = Helper function to encapsulate a ThreadPoolExecutor None, **kwargs) mapped function workflow batch(seq, n = 1) Split a sequence into n-length batches (is still iterable, not list) chunks(iterable, size=10) Outputs chunks of size N from an iterable (gen- erator) arg_exception_logger(func) Helper Decorator to provide info on the arguments that cause the exception of a wrapped function backoff(exception_to_check, tries = 5, delay = 0.2, Retry calling the decorated function using an exponen- backoff = 2, logger = logger) tial backoff. tag_gen(seq, **kwargs) Generator stream that adds a kwargs to each entry yielded exceptional_executor(futures, excep- Generator for concurrent.Futures handling tion_handler=None, timeout=None) working_directory(path) Contextmanager that changes working directory and re- turns to previous on exit. compress_for_relay(obj) Compress json-serializable object to a gzipped base64 string continues on next page 7.1. bolster 25
Bolster Documentation, Release 0.1.1 Table 11 – continued from previous page decompress_from_relay(msg) Uncompress gzipped base64 string to a json-serializable object pretty_print_request(req, expose_auth=False, At this point it is completely built and ready authentication_header_blacklist = None) get_recursively(search_dict, field) Takes a dict with nested lists and dicts, transform_(r, rule_keys) Generic Item-wise transformation function; diff(new, old, excluded_fields = None) Perform a one-depth diff of a pair of dictionaries aggregate(base, group_key, item_key, condition = Abstracted groupby-sum for lists of dicts None) breadth(d) Get the total ‘width’ of a tree depth(d) Get the maximum depth of a tree set_keys(d) Extract the set of all keys of a nested dict/tree keys_at(d, n, i = 0) Extract the keys of a tree at a given depth items_at(d, n, i = 0) Extract the elements from a tree at a given depth leaves(d) Iterate on the leaves of a tree leaf_paths(d, path = None) flatten_dict(d, head = '', sep = ':') uncollect_object(d) dict_concat_safe(d, keys, default = None) Really Lazy Func because dict.get(‘key’,default) is a pain in the ass for lists Attributes __author__ __email__ __version__ logger bolster.__author__ = Andrew Bolster bolster.__email__ = me@andrewbolster.info bolster.__version__ = 0.1.1 bolster.logger bolster._dumb_passthrough(x, **kwargs) Pointless passthrough replacement for tqdm (and similar) fallback Parameters x – return: Returns: bolster.always(x, **kwargs) Pointless passthrough replacement for ‘always true’ filtering 26 Chapter 7. API Reference
Bolster Documentation, Release 0.1.1 >>> always('false') True >>> always(False) True >>> always(True) True Return type bool bolster.poolmap(f, iterable, max_workers=None, progress=None, **kwargs) Helper function to encapsulate a ThreadPoolExecutor mapped function workflow Accepts (assumed to be tqdm style) progress monitor callback kwargs are passed identically to all f(i) calls for each i in iterable Parameters • f (Callable) – function to map across • iterable (Iterable) – • max_workers (Optional[int]) – (Default value = None) • progress (Callable) – (Default value = None) • **kwargs – passed as arguments to f Return type Dict Returns: bolster.batch(seq, n=1) Split a sequence into n-length batches (is still iterable, not list) Parameters • seq (Sequence) – • n (int) – (Default value = 1) Return type Generator[Iterable, None, None] Returns: >>> next((b for b in batch(range(10), 2))) range(0, 2) >>> [b for b in batch(list(range(10)), 2)] [[0, 1], [2, 3], [4, 5], [6, 7], [8, 9]] bolster.chunks(iterable, size=10) Outputs chunks of size N from an iterable (generator) Parameters • iterable (Iterable) – param size: • iterable – Iterable: • size – (Default value = 10) Return type Generator[List, None, None] Returns: 7.1. bolster 27
Bolster Documentation, Release 0.1.1 >>> next((b for b in chunks(range(10), 2))) [0, 1] >>> [b for b in chunks(list(range(10)), 2)] [[0, 1], [2, 3], [4, 5], [6, 7], [8, 9]] bolster.arg_exception_logger(func) Helper Decorator to provide info on the arguments that cause the exception of a wrapped function Parameters func (Callable) – Return type Callable Returns: bolster.backoff(exception_to_check, tries=5, delay=0.2, backoff=2, logger=logger) Retry calling the decorated function using an exponential backoff. http://www.saltycrane.com/blog/2009/11/trying-out-retry-decorator-python/ original from: http://wiki.python. org/moin/PythonDecoratorLibrary#Retry Parameters • exception_to_check (Union[BaseException, Sequence[BaseException]]) – the exception to check. may be a tuple of • tries (SupportsInt) – • delay (SupportsFloat) – • backoff (SupportsFloat) – • logger (Optional[logging.Logger]) – exceptions to check tries: number of times to try (not retry) before giving up (Default value = 5) delay: initial delay between retries in seconds (Default value = 0.4) backoff: backoff multiplier e.g. value of 2 will double the delay each retry (Default value = 2) logger: logger to use. If None, print (Default value = local utils logger) Returns: exception bolster.MultipleErrors(errors=None) Bases: BaseException Exception Class to enable the capturing of multiple exceptions without interrupting control flow, i.e. catch the exception, but carry on and report the exceptions at the end. E.g. exceptions = MultipleErrors() try: do_risky_thing_with(this) #raises ValueError except: exceptions.capture_current_exception() try: do_other_thing_with(this) #raises AttributeError except: exceptions.capture_current_exception() exceptions.do_raise() 28 Chapter 7. API Reference
Bolster Documentation, Release 0.1.1 Traceback (most recent call last): .... Value Error Traceback (most recent call last): ... AttributeError classmethod _traceback_for(cls, exc_info) Formatting! __str__(self ) Return str(self). capture_current_exception(self ) Gathers exception info from the current context and retains it do_raise(self ) Raises itself if it contains any errors bolster.tag_gen(seq, **kwargs) Generator stream that adds a kwargs to each entry yielded The below example shows the creation of an empty dict generator where tag_gen is used to insert a new key/value (k=1) in each item on the fly >>> all([i['k'] == 1 for i in tag_gen(({} for _ in range(4)), k=1)]) True Parameters • seq (Iterator[Dict]) – param kwargs: • seq – Iterator[Dict]: • **kwargs – Return type Iterator[Dict] bolster.exceptional_executor(futures, exception_handler=None, timeout=None) Generator for concurrent.Futures handling When an exception is raised in an executing Future, f.result() called on it’s own will raise that exception in the parent thread, killing execution and causing loss of ‘future local’ scope. Instead, query the future for it’s exception state first, and handle that separately, by default by logging it as an exception. Parameters • futures (Sequence[concurrent.futures.Future]) – • exception_handler – • timeout – Return type Iterator Returns: bolster.working_directory(path) Contextmanager that changes working directory and returns to previous on exit. 7.1. bolster 29
Bolster Documentation, Release 0.1.1 Parameters path (Union[str, pathlib.Path]) – Union[str: Path]: Return type contextlib.AbstractContextManager bolster.compress_for_relay(obj) Compress json-serializable object to a gzipped base64 string Parameters • obj (Union[List, Dict]) – return: • obj – Union[List,Dict]: Return type AnyStr >>> decompress_from_relay(compress_for_relay(['test'])) ['test'] >>> decompress_from_relay(compress_for_relay({'test':'test'})) {'test': 'test'} bolster.decompress_from_relay(msg) Uncompress gzipped base64 string to a json-serializable object [‘test’] Parameters msg (AnyStr) – AnyStr: Return type Union[List, Dict] Returns: class bolster.memoize(func) Bases: object cache the return value of a method This class is meant to be used as a decorator of methods. The return value from a given method invocation will be cached on the instance whose method was invoked. All arguments passed to a method decorated with memoize must be hashable. If a memoized method is invoked directly on its class the result will not be cached. Instead the method will be invoked like a static method: class Obj(object): @memoize def add_to(self, arg): return self + arg Obj.add_to(1) # not enough arguments Obj.add_to(1, 2) # returns 3, result is not cached Source: http://code.activestate.com/recipes/577452-a-memoize-decorator-for-instance-methods/ Augmented with cache hit/miss population Counters __get__(self, obj, objtype=None) __call__(self, *args, **kw) bolster.pretty_print_request(req, expose_auth=False, authentication_header_blacklist=None) At this point it is completely built and ready to be fired; it is “prepared”. However pay attention at the formatting used in this function because it is programmed to be pretty printed and may differ from the actual request. 30 Chapter 7. API Reference
Bolster Documentation, Release 0.1.1 Parameters • req – • expose_auth – (Default value = False) • authentication_header_blacklist (Optional[List]) – Return type None Returns: bolster.get_recursively(search_dict, field) Takes a dict with nested lists and dicts, and searches all dicts for a key of the field provided. Originally taken from https://stackoverflow.com/a/20254842 Parameters • search_dict (Dict) – Dict: • field (str) – str: Return type List Returns: >>> get_recursively({'id' : 5,'children' : {'id' : 6,'children' : {'id' : 7, ˓→'children' : {}}}}, 'id') [5, 6, 7] bolster.transform_(r, rule_keys) Generic Item-wise transformation function; The values in r are updated based on key-matching in rule_keys, i.e. -> out[k] = rule_keys[k] (r[k]) HOWEVER, this can do more that straight callable mapping; can also update the key, i.e., for a given rule such that R = rule_keys[k]: R can be used to select that field to be selected in the output >>> r = {‘a’:’1’,’b’:’2’,’c’:’3’} >>> transform_(r, {‘a’:None}) {‘a’: ‘1’} Rename a key >>> transform_(r, {‘a’:(‘A’,None)}) {‘A’: ‘1’} Apply a function to a key’s value >>> transform_(r, {‘a’:(‘a’,int)}) {‘a’: 1} Or a combination of these >>> transform_(r, {‘a’:(‘A’,int), ‘b’:None}) {‘A’: 1, ‘b’: ‘2’} Parameters • r (Dict) – • rule_keys (Dict[AnyStr, Optional[Tuple]]) – Return type Dict bolster.diff(new, old, excluded_fields=None) Perform a one-depth diff of a pair of dictionaries Parameters • new (Dict) – • old (Dict) – • excluded_fields (Optional[set]) – Return type Dict 7.1. bolster 31
Bolster Documentation, Release 0.1.1 bolster.aggregate(base, group_key, item_key, condition=None) Abstracted groupby-sum for lists of dicts operationally equivalent to ` df = pd.DataFrame(base) df. where(condition).groupby(group_key)[item_key].sum() ` Parameters • base (List[Dict]) – • group_key (Union[AnyStr, Tuple[AnyStr], List[AnyStr]]) – • item_key (AnyStr) – • condition (Optional[Callable]) – Returns: bolster.breadth(d) Get the total ‘width’ of a tree > Why was this a thing? No idea bolster.depth(d) Get the maximum depth of a tree Parameters d (Dict) – Return type SupportsInt bolster.set_keys(d) Extract the set of all keys of a nested dict/tree Parameters d (Dict) – Return type Set bolster.keys_at(d, n, i=0) Extract the keys of a tree at a given depth Parameters • d (Dict) – • n (SupportsInt) – • i (SupportsInt) – Return type Iterator bolster.items_at(d, n, i=0) Extract the elements from a tree at a given depth Parameters • d (Dict) – • n (SupportsInt) – • i (SupportsInt) – Return type Iterator[Tuple] bolster.leaves(d) Iterate on the leaves of a tree Parameters d (Dict) – Return type Iterator bolster.leaf_paths(d, path=None) 32 Chapter 7. API Reference
Bolster Documentation, Release 0.1.1 Parameters • d (Dict) – • path (Optional[List]) – Return type Iterator[Tuple[List, Dict]] bolster.flatten_dict(d, head='', sep=':') Parameters • d (Dict) – • head (AnyStr) – • sep (AnyStr) – Return type Dict bolster.uncollect_object(d) Parameters d (Dict) – Return type Dict bolster.dict_concat_safe(d, keys, default=None) Really Lazy Func because dict.get(‘key’,default) is a pain in the ass for lists Parameters • d (Dict) – • keys (List[Hashable]) – • default (Optional) – Return type Iterator 7.1. bolster 33
Bolster Documentation, Release 0.1.1 34 Chapter 7. API Reference
CHAPTER EIGHT INDICES AND TABLES • genindex • modindex • search 35
Bolster Documentation, Release 0.1.1 36 Chapter 8. Indices and tables
PYTHON MODULE INDEX b bolster, 15 bolster.cli, 25 bolster.data_sources, 15 bolster.data_sources.companies_house, 15 bolster.stats, 16 bolster.stats.distributions, 16 bolster.utils, 16 bolster.utils.aws, 16 bolster.utils.deco, 23 bolster.utils.dt, 23 bolster.utils.web, 24 37
Bolster Documentation, Release 0.1.1 38 Python Module Index
INDEX Symbols bolster.utils.web __author__ (in module bolster), 26 module, 24 __call__() (bolster.memoize method), 30 breadth() (in module bolster), 32 __email__ (in module bolster), 26 __get__() (bolster.memoize method), 30 C __str__() (bolster.MultipleErrors method), 29 capture_current_exception() (bol- __version__ (in module bolster), 26 ster.MultipleErrors method), 29 _dumb_passthrough() (in module bolster), 26 check_s3() (in module bolster.utils.aws), 19 _ssm_params (in module bolster.utils.aws), 20 chunks() (in module bolster), 27 _traceback_for() (bolster.MultipleErrors class chunks() (in module bolster.utils.aws), 18 method), 29 companies_house_record_might_be_farset() (in module bol- A ster.data_sources.companies_house), 16 aggregate() (in module bolster), 31 compress_for_relay() (in module bolster), 30 always() (in module bolster), 26 arg_exception_logger() (in module bolster), 28 D decapsulate_kinesis_payloads() (in module B bolster.utils.aws), 21 backoff() (in module bolster), 28 decompress_from_relay() (in module bolster), batch() (in module bolster), 27 30 best_fit_distribution() (in module bol- depth() (in module bolster), 32 ster.stats.distributions), 16 dict_concat_safe() (in module bolster), 33 bolster diff() (in module bolster), 31 module, 15 do_raise() (bolster.MultipleErrors method), 29 bolster.cli download_extract_zip() (in module bol- module, 25 ster.utils.web), 24 bolster.data_sources module, 15 E bolster.data_sources.companies_house exceptional_executor() (in module bolster), 29 module, 15 bolster.stats F module, 16 fh_json_decode() (in module bolster.utils.aws), 20 bolster.stats.distributions flatten_dict() (in module bolster), 33 module, 16 bolster.utils G module, 16 generate_and_submit() (bol- bolster.utils.aws ster.utils.aws.KinesisLoader method), 21 module, 16 get_basic_company_data_url() (in module bolster.utils.deco bolster.data_sources.companies_house), 15 module, 23 get_companies_house_records_that_might_be_in_farset bolster.utils.dt (in module bol- module, 23 ster.data_sources.companies_house), 16 39
Bolster Documentation, Release 0.1.1 get_latest_key() (in module bolster.utils.aws), 20 Q get_matching_s3_keys() (in module bol- query() (in module bolster.utils.aws), 22 ster.utils.aws), 19 query_basic_company_data() (in module bol- get_matching_s3_objects() (in module bol- ster.data_sources.companies_house), 15 ster.utils.aws), 19 get_recursively() (in module bolster), 31 R get_s3() (in module bolster.utils.aws), 19 round_to_month() (in module bolster.utils.dt), 24 get_s3_client() (in module bolster.utils.aws), 18 round_to_week() (in module bolster.utils.dt), 23 get_sns_client() (in module bolster.utils.aws), 22 get_sqs_client() (in module bolster.utils.aws), 20 S get_ssm_client() (in module bolster.utils.aws), 20 get_ssm_param() (in module bolster.utils.aws), 20 select_from_csv() (in module bolster.utils.aws), 19 I send_to_kinesis() (in module bolster.utils.aws), 21 invoke_self_async() (in module bol- send_to_sqs() (in module bolster.utils.aws), 20 ster.utils.aws), 22 session (in module bolster.utils.aws), 18 items_at() (in module bolster), 32 set_keys() (in module bolster), 32 iterate_kinesis_payloads() (in module bol- SQSWrapper() (in module bolster.utils.aws), 23 ster.utils.aws), 21 start_session() (in module bolster.utils.aws), 18 submit_batch_until_successful() (bol- K ster.utils.aws.KinesisLoader method), 21 keys_at() (in module bolster), 32 KinesisLoader (class in bolster.utils.aws), 21 T tag_gen() (in module bolster), 29 L timed() (in module bolster.utils.deco), 23 leaf_paths() (in module bolster), 32 transform_() (in module bolster), 31 leaves() (in module bolster), 32 logger (in module bolster), 26 U logger (in module bolster.utils.aws), 18 uncollect_object() (in module bolster), 33 utc_midnight_on() (in module bolster.utils.dt), 24 M main() (in module bolster.cli), 25 W memoize (class in bolster), 30 working_directory() (in module bolster), 29 module bolster, 15 bolster.cli, 25 bolster.data_sources, 15 bolster.data_sources.companies_house, 15 bolster.stats, 16 bolster.stats.distributions, 16 bolster.utils, 16 bolster.utils.aws, 16 bolster.utils.deco, 23 bolster.utils.dt, 23 bolster.utils.web, 24 MultipleErrors, 28 P poolmap() (in module bolster), 27 pretty_print_request() (in module bolster), 30 put_s3() (in module bolster.utils.aws), 18 40 Index
You can also read