Slow read of large blobs compared to `gcloud storage` #2726

carlthome · 2024-11-29T11:00:32Z

I have a ~5 GB blob that I'm consuming in Python code. The Google Compute Engine VM is in the same region as the Google Storage bucket.

When using gcloud storage cp gs://my-bucket/my-blob it takes less than a minute to download to the VM.

When using python -c "with open("/gcs/my-bucket/my-blob", 'b') as f: f.read()" it takes several minutes to download the blob into the running process.

I assume (but haven't tried) whether cp /gcs/my-bucket/my-blob ~ would be faster but I assume it's also slower than gcloud storage.

Why is this and can we expect the same excellent high-download speeds that gcloud storage offers, from simple reads to /gcs in a future release for GCSFuse? The convenience of "just read like a regular file system" is very appreciated and we don't want to introduce additional bucket storage clients if we can avoid it.

The text was updated successfully, but these errors were encountered:

kislaykishore · 2024-11-29T15:17:10Z

@carlthome I think your Python test script could be written a bit more optimally. Instead of downloading the entire 5Gi into memory when you invoke f.read(), you could download the content in chunks:

with open('<file_path>', 'rb') as f:
    while f.read(1024 * 1024):
        pass

cp <file_path> <dest_path> should also not take much time.

You can also try a few of the perf-optimizations mentioned here. Optimizations such as the following could be useful:

Lemme know how it goes.

carlthome added p2 P2 question Customer Issue: question about how to use tool labels Nov 29, 2024

kislaykishore added the pending customer action label Nov 29, 2024

github-actions bot removed the pending customer action label Nov 30, 2024

kislaykishore added the pending customer action label Nov 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow read of large blobs compared to `gcloud storage` #2726

Slow read of large blobs compared to `gcloud storage` #2726

carlthome commented Nov 29, 2024 •

edited

Loading

kislaykishore commented Nov 29, 2024 •

edited

Loading

Slow read of large blobs compared to gcloud storage #2726

Slow read of large blobs compared to gcloud storage #2726

Comments

carlthome commented Nov 29, 2024 • edited Loading

kislaykishore commented Nov 29, 2024 • edited Loading

Slow read of large blobs compared to `gcloud storage` #2726

Slow read of large blobs compared to `gcloud storage` #2726

carlthome commented Nov 29, 2024 •

edited

Loading

kislaykishore commented Nov 29, 2024 •

edited

Loading