Skip to content

Commit

Permalink
initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
a-finocchiaro committed Aug 5, 2021
0 parents commit 763591a
Show file tree
Hide file tree
Showing 8 changed files with 14,112 additions and 0 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
setup.py
21 changes: 21 additions & 0 deletions LICENSE.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) [year] [fullname]

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
147 changes: 147 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
# PyBLS

The PyBLS module is a python module specifically designed to interact with the Bureau of Labor Statistics
API and transform the results into a Pandas Dataframe.

## Prerequisites

The following python packages must be installed into your environment:

| Package | Version |
| ------- | ------- |
| Pandas | 1.2.3+ |
| requests | 2.25.1+ |

Any versions lower than this may work, but have not been tested.

## Setup

This tool is designed to only interact with version 2 of the Bureau of Labor Statistics API, which *requires* the user to have an API key from the BLS. To obtain a key [follow this link](https://www.bls.gov/developers/home.htm) and select 'registration'. This will allow you to sign up for an API key.

PyBLS is designed to have your API key be set in an environment variable in the terminal that you are working in. Once the BLS has issued you your API key, set the following environment variable using one of the 2 processes below based on your machine-type:

Windows:
```psh
$Env:BLS_API_KEY='{YOUR_API_KEY}'
```

Mac/Linux:
```sh
export BLS_API_KEY='{YOUR_API_KEY}'
```

There are several advantages to using an API key and version 2 of the Bureau of Labor Statistics API, but the main one is that this will allow a user to query their API up to 500 times per day as opposed to only 25 times with version 1. Version 2 also allows for laregr timeframes per query, and more series IDs in a single query.

## Usage

Below is a simple example of how PyBLS could be called:

```python
from pybls.bls_data import BlsData

my_bls_data = BlsData(
['ENUUS00040010','ENU0400040010'],
2015,
2020
)
```

From here, follow the API guide to see what you are able to do with this BlsData object that has just been instantiated.

## API

### `BlsData.from_json`

Alternate constructor for BlsData that takes a json file of data returned from the BLS
API and uses it to create a BlsData object. Mainly used for testing to limit calls to the BLS api, and so
work can be done offline by just saving the api data locally.

```python
import json
from pybls.bls_data import BlsData

my_bls_data = BlsData.from_json('json_file_with_raw_bls_data.json')
```

### `BlsData.write_to_json`

Writes raw data from BLS API out to a json file to avoid having to re-query the API for testing.

Arguments:
- file_name = str; Name of the file that should be outputted.

```python
from pybls.bls_data import BlsData

my_bls_data = BlsData(
['ENUUS00040010','ENU0400040010'],
2015,
2020
)

my_bls_data.write_to_json('bls_json_data.json')
```

### `BlsData.create_graph`

Returns a graph-able plotly object from the given data and constructed dataframe. Renames columns based on the mapping of seriesIDs to locations from the BLS area codes.
Arguments:
- title = str; graph title
- graph_type = str; the style of graph to be used **(only accepts `line` and `bar`)**
- custom_column_names = dict; mapping of seriesID to custom defined column names. Default=`None`
- transpose = bool; transpose df to graph correctly. Default=False
- short_location_names = bool; removes the state from the coumn names to shorten the length. Default=`True`
- graph_labels = dict; a mapping of x and y axis labels to output a graph with custom labels Default=`None`

Returns a plotly express object.

from pybls.bls_data import BlsData

```python
my_bls_data = BlsData(
['ENUUS00040010','ENU0400040010'],
2015,
2020
)

fig = my_bls_data.create_graph('BLS API Test Graph', 'line', graph_labels = {'date': 'Date', 'value': 'Amount in USD'})

fig.show()
```

### `BlsData.create_table`

Creates an html table from the dataframe with cleaned columns.
Arguments:
- custom_column_names = dict; mapping of series ID to custom column name. Default=`None`
- short_location_names = bool; removes the state from the coumn names to shorten the length. Default=`True`
- index_color = str; the color to apply to the index column and header row. Default=`None`
- descending = bool; changes indexes to sort on descending if True. Default=`False`
- index_label = str; adds a custom index label to the index column in a table. Default=''
- lines = str: colors the borders between cells with a specified color.
- align = str: aligns the text inside of cells in either right, left, or center. Default=None
Returns plotly.graph_object.Figure() object.

```python
my_bls_data = BlsData(
['ENUUS00040010','ENU0400040010'],
2015,
2020
)

fig = my_bls_data.create_table(
custom_column_names = {'ENUUS00040010' : 'Entire US', 'ENU0400040010' : 'Arizona'},
index_color='orange',
descending=True,
line_color='black',
align='left')

fig.show()
```

### `BlsData.clean_df`

Cleans the standard dataframe up by renaming columns with locations, or applying the custom column names.
Arguments:
- custom_column_names = dict; mapping of series ID to custom column name. `Default`=`None`
- short_location_names = bool; removes the state from the coumn names to shorten the length. Default=`True`
28 changes: 28 additions & 0 deletions bls_data/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
"""
Written by: Aaron Finocchiaro
3/2021
pybls module is designed to interact with the Bureau of Labor Statistics
API and translate returned json data into pandas dataframes.
In this init file, some code exists to initialize the dataframes for the
area codes that the BLS data uses to identify the region that a seriesID
pertains to.
"""
import pandas as pd
import pkg_resources

#Construct QCEW area codes DataFrame from area code csv
qcew_stream = pkg_resources.resource_stream(__name__, 'data/area_titles.csv')
qcew_area_codes_df = pd.read_csv(qcew_stream)
qcew_area_codes_df = qcew_area_codes_df.set_index('area_fips')

#Construct OES area codes DataFrame from area code csv
stream = pkg_resources.resource_stream(__name__, 'data/oes_areas.csv')
oes_area_codes_df = pd.read_csv(stream, dtype={'area_code':str})
oes_area_codes_df = oes_area_codes_df.set_index('area_code')

#Construct LA area codes for Local Area Employment Statistics locations
stream = pkg_resources.resource_stream(__name__, 'data/la_area.csv')
la_area_codes_df = pd.read_csv(stream)
la_area_codes_df = la_area_codes_df.set_index('area_code')
Loading

0 comments on commit 763591a

Please sign in to comment.