Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow ingesting in-memory file-like objects #435

Open
dhirschfeld opened this issue Sep 2, 2024 · 3 comments
Open

Allow ingesting in-memory file-like objects #435

dhirschfeld opened this issue Sep 2, 2024 · 3 comments

Comments

@dhirschfeld
Copy link
Contributor

Writing large amounts of data to disk, only for databricks-sql-connector to then read it back in from disk, is incredibly inefficient.

It would be much more efficient to be able to provide a file-like object to use instead of a filepath. In that way a user could write the data to an in-memory io.BytesIO object instead of writing the data to disk.

@dhirschfeld
Copy link
Contributor Author

i.e. allow passing through fh rather than creating it internally by opening a file from the filesystem:

def _handle_staging_put(
self, presigned_url: str, local_file: str, headers: Optional[dict] = None
):
"""Make an HTTP PUT request
Raise an exception if request fails. Returns no data.
"""
if local_file is None:
raise Error("Cannot perform PUT without specifying a local_file")
with open(local_file, "rb") as fh:
r = requests.put(url=presigned_url, data=fh, headers=headers)

@kravets-levko
Copy link
Contributor

Hi @dhirschfeld! This indeed sounds like an intersting feature, thank you for sharing it! I have to talk with the rest of team first. Databricks SQL GET and PUT commands should have local file path specified, but I don't know if we ever considered using streams instead of real files. If we agree that there are no risks with this approach - we would have to implement it across all drivers eventually

@susodapop
Copy link
Contributor

Some added context, @dhirschfeld's idea is exactly how the e2e tests for this feature behave (since we ran them in github actions where we don't have a real file system to write to). Should be a straightforward modification.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants