-
Notifications
You must be signed in to change notification settings - Fork 460
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support ingesting Excel and CSV files #659
Comments
@adamdougal Given that Azure Document Intelligence is used for some of the data loading, and that supports XLSX and PPT already, is there a reason why it's not already supported? Link to documentation: https://learn.microsoft.com/en-gb/azure/ai-services/document-intelligence/concept-layout?view=doc-intel-4.0.0 Maybe it's not using an up-to-date preview? |
@ferrari-leo Heya, I'm not 100% familiar with the history so not sure if that only started to be supported recently. However, given it is now supported, the only thing stopping this, is the priority and time to implement. We always welcome contributions so if you need this feature and have time we'd love a PR :) |
@adamdougal I'll have a look at where there'll be free time! But my main point is that if the doc intelligence API already supports analysing the layout of an excel file, then why can I not drag an excel into the ingest data tab and have it processed? I can't pinpoint where in the code it distinguishes between an excel and a pdf to throw the error "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet files are not allowed." In theory there shouldn't need to be more time spent on developing this feature because it's inherent in the doc intelligent API |
From memory - I think we were using the old Forms Recognizer API |
Hi, Honestly, Im surprised this is not supported. According to the docs of: https://github.com/Azure-Samples/azure-search-openai-demo/blob/main/docs/data_ingestion.md#supported-document-formats (which seems a less sophisticated version of this repository) Excel and CSVs are supported. Looking the code for CSVs: # These file formats can always be parsed:
file_processors = {
".json": FileProcessor(JsonParser(), SimpleTextSplitter()),
".md": FileProcessor(TextParser(), sentence_text_splitter),
".txt": FileProcessor(TextParser(), sentence_text_splitter),
".csv": FileProcessor(CsvParser(), sentence_text_splitter),
} It references the following file. |
Motivation
To be able to query data in Excel and CSV file format
How would you feel if this feature request was implemented?
Requirements
Links
Tasks
To be filled in by the engineer picking up the issue
The text was updated successfully, but these errors were encountered: