You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This copies everything in a bucket. If we only need a subset of files in the bucket, perhaps we put that subset in a separate bucket, or add filtering here.
I tested this on my own S3 storage and an IAM user with AmazonS3ReadOnlyAccess.
$ pip install boto3
import boto3
from pathlib import Path
BUCKET_NAME = "nc-campaign-finance-storage"
LOCAL_DIR = Path.cwd() / 'data'
s3_resource = boto3.resource('s3')
bucket = s3_resource.Bucket(BUCKET_NAME)
for obj in bucket.objects.all():
s3_file = obj.Object()
local_file = LOCAL_DIR / s3_file.key
if local_file.exists():
if local_file.stat().st_size == s3_file.content_length:
print(f'{s3_file.key} already downloaded')
continue
local_file.parent.mkdir(parents=True, exist_ok=True)
s3_file.download_file(str(local_file))
print(f'{s3_file.key}')
print("Done")
davidpeckham
added a commit
to davidpeckham/CampaignFinanceDataPipeline
that referenced
this issue
Jul 24, 2021
The change needs to be dynamic to download any and all files not already located in the docker image(a static list of files is not sufficient) and should require elevated privileges requiring AWS secrets.
No description provided.
The text was updated successfully, but these errors were encountered: