You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To ingest the data, I use azcopy to get a file into a container. Then, I have a function which triggers on file ingestion in this container. The function unzips the .bz2 file and sends it to Kernel Memory for ingestion in the form of a stream. The zipped file is about 30GB, when I unzip it the size becomes 100+GB.
This is the error message I get when I try to ingest the files into Kernel Memory:
Handling a file of over 100GB presents too many challenges, including network transfer limits, memory management constraints, VM capacity, and potential timeouts during processing.
To mitigate these issues, I recommend partitioning the data into smaller chunks, ideally between 10-20MB.
This approach will help reducing the likelihood of encountering errors related to memory allocation or network interruptions and make the ingestion process more manageable.
I'm trying to ingest a 100+GB file of legal data into a Kernel Memory service. The data I would like to access are the "opinions" files from this link (https://com-courtlistener-storage.s3-us-west-2.amazonaws.com/list.html?prefix=bulk-data/). They are zipped .bz2 files.
To ingest the data, I use azcopy to get a file into a container. Then, I have a function which triggers on file ingestion in this container. The function unzips the .bz2 file and sends it to Kernel Memory for ingestion in the form of a stream. The zipped file is about 30GB, when I unzip it the size becomes 100+GB.
This is the error message I get when I try to ingest the files into Kernel Memory:
The repository to repro the issue is here: https://github.com/Gpadh/KMFileIngestion/tree/master
Please let me know if I can provide any more details to help.
The text was updated successfully, but these errors were encountered: