You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a ~5 GB blob that I'm consuming in Python code. The Google Compute Engine VM is in the same region as the Google Storage bucket.
When using gcloud storage cp gs://my-bucket/my-blob it takes less than a minute to download to the VM.
When using python -c "with open("/gcs/my-bucket/my-blob", 'b') as f: f.read()" it takes several minutes to download the blob into the running process.
I assume (but haven't tried) whether cp /gcs/my-bucket/my-blob ~ would be faster but I assume it's also slower than gcloud storage.
Why is this and can we expect the same excellent high-download speeds that gcloud storage offers, from simple reads to /gcs in a future release for GCSFuse? The convenience of "just read like a regular file system" is very appreciated and we don't want to introduce additional bucket storage clients if we can avoid it.
The text was updated successfully, but these errors were encountered:
@carlthome I think your Python test script could be written a bit more optimally. Instead of downloading the entire 5Gi into memory when you invoke f.read(), you could download the content in chunks:
with open('<file_path>', 'rb') as f:
while f.read(1024 * 1024):
pass
cp <file_path> <dest_path> should also not take much time.
You can also try a few of the perf-optimizations mentioned here. Optimizations such as the following could be useful:
I have a ~5 GB blob that I'm consuming in Python code. The Google Compute Engine VM is in the same region as the Google Storage bucket.
When using
gcloud storage cp gs://my-bucket/my-blob
it takes less than a minute to download to the VM.When using
python -c "with open("/gcs/my-bucket/my-blob", 'b') as f: f.read()"
it takes several minutes to download the blob into the running process.I assume (but haven't tried) whether
cp /gcs/my-bucket/my-blob ~
would be faster but I assume it's also slower thangcloud storage
.Why is this and can we expect the same excellent high-download speeds that
gcloud storage
offers, from simple reads to /gcs in a future release for GCSFuse? The convenience of "just read like a regular file system" is very appreciated and we don't want to introduce additional bucket storage clients if we can avoid it.The text was updated successfully, but these errors were encountered: