Update the Import module to actually download the raw files from AWS #10

ChrisTheDBA · 2021-04-29T21:56:32Z

No description provided.

davidpeckham · 2021-07-11T17:22:45Z

I'd like to take this issue. Boto3 looks straightforward, but I'll need credentials for an S3 user with "programmatic access".

davidpeckham · 2021-07-12T01:04:50Z

This copies everything in a bucket. If we only need a subset of files in the bucket, perhaps we put that subset in a separate bucket, or add filtering here.

I tested this on my own S3 storage and an IAM user with AmazonS3ReadOnlyAccess.

$ pip install boto3

import boto3
from pathlib import Path

BUCKET_NAME = "nc-campaign-finance-storage"
LOCAL_DIR = Path.cwd() / 'data'

s3_resource = boto3.resource('s3')
bucket = s3_resource.Bucket(BUCKET_NAME)
for obj in bucket.objects.all():
    s3_file = obj.Object()
    local_file = LOCAL_DIR / s3_file.key
    if local_file.exists():
        if local_file.stat().st_size == s3_file.content_length:
            print(f'{s3_file.key} already downloaded')
            continue
    local_file.parent.mkdir(parents=True, exist_ok=True)
    s3_file.download_file(str(local_file))
    print(f'{s3_file.key}')

print("Done")

ChrisTheDBA · 2021-09-30T16:58:49Z

The change needs to be dynamic to download any and all files not already located in the docker image(a static list of files is not sufficient) and should require elevated privileges requiring AWS secrets.

ChrisTheDBA added bug Something isn't working enhancement New feature or request good first issue Good for newcomers labels Apr 29, 2021

davidpeckham added a commit to davidpeckham/CampaignFinanceDataPipeline that referenced this issue Jul 24, 2021

Fixes ncopenpass#10

9bdeef4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update the Import module to actually download the raw files from AWS #10

Update the Import module to actually download the raw files from AWS #10

ChrisTheDBA commented Apr 29, 2021

davidpeckham commented Jul 11, 2021

davidpeckham commented Jul 12, 2021 •

edited

Loading

ChrisTheDBA commented Sep 30, 2021

Update the Import module to actually download the raw files from AWS #10

Update the Import module to actually download the raw files from AWS #10

Comments

ChrisTheDBA commented Apr 29, 2021

davidpeckham commented Jul 11, 2021

davidpeckham commented Jul 12, 2021 • edited Loading

ChrisTheDBA commented Sep 30, 2021

davidpeckham commented Jul 12, 2021 •

edited

Loading