Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming object insert and get, please #48

Open
nlfiedler opened this issue Mar 25, 2021 · 3 comments
Open

Streaming object insert and get, please #48

nlfiedler opened this issue Mar 25, 2021 · 3 comments
Assignees
Labels
C-enhancement Category: Enhancement M-storage Module: Cloud Storage P-medium Priority: Medium

Comments

@nlfiedler
Copy link

I can use memmap to effectively stream a large file when calling create_object(), but when calling get() it will always return a Vec<u8> result. I see there are commented out "writer" and "reader" functions, so I'm filing this request just to track the need for this feature. For my use case, I'm always going to be dealing with files that are 64mb or larger, so streaming would be good.

P.S. The google_storage1 crate defines a ReadSeek trait that is used for uploading files. For download, I think they rely on hyper, enabling std::io::copy() directly to a file.

@Hirevo Hirevo self-assigned this Mar 25, 2021
@Hirevo Hirevo added C-enhancement Category: Enhancement M-storage Module: Cloud Storage P-medium Priority: Medium labels Mar 25, 2021
@Hirevo
Copy link
Member

Hirevo commented Mar 25, 2021

Hello !

I agree that the ability to read and write GCS objects in a streaming fashion is definitely valuable.
The one thing that kind of blocked the implementation was that it was unclear what the API for it should be.

I am currently considering the following API:

impl Object {
    // `ObjectReader` would implement `futures_io::AsyncRead`
    pub async fn reader(&mut self) -> Result<ObjectReader, Error> {
        // ...
    }

    // `ObjectWriter` would implement `futures_io::AsyncWrite`
    pub async fn writer(&mut self, mime_type: impl AsRef<str>) -> Result<ObjectWriter, Error> {
        // ...
    }
}

But other crates sometime go for an API that resembles the following:

impl Object {
    // Asynchronously streams the bytes from the GCS object into the provided writer.
    pub async fn streaming_get<W: AsyncWrite>(&mut self, writer: W) -> Result<(), Error> {
        // ...
    }

    // Asynchronously streams the bytes from the provided reader into the GCS object.
    pub async fn streaming_put<R: AsyncRead>(&mut self, mime_type: impl AsRef<str>, reader: R) -> Result<(), Error> {
        // ...
    }
}

I was more inclined to implement the first design rather than the second one because the second one moves the iteration process away from your control and therefore makes it harder to just iterate over the bytes manually, without needing some kind of IO AsyncRead/AsyncWrite in-memory pipe, like the one from sluice.

But I suspect that even the first design might require this kind of in-memory IO pipe to implement the writer method.

@Roba1993
Copy link

Hi !

I try to implement the storage API right now into the following project of mine https://github.com/Roba1993/stow
Maybe you can get some idea on how to solve it there. I went with a AsyncRead for both file get and put, which works quiete nicely.

@abonander
Copy link

We use rusoto_s3 for pushing to Cloud Storage using the S3-compatible API. It works pretty well and supports streaming bodies. Here's a decent example from their integration tests: https://github.com/rusoto/rusoto/blob/master/integration_tests/tests/s3.rs#L865

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Category: Enhancement M-storage Module: Cloud Storage P-medium Priority: Medium
Projects
None yet
Development

No branches or pull requests

4 participants