Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve error messages / documentation around scanning hive directories #17436

Closed
nameexhaustion opened this issue Jul 5, 2024 · 0 comments · Fixed by #17480
Closed

Improve error messages / documentation around scanning hive directories #17436

nameexhaustion opened this issue Jul 5, 2024 · 0 comments · Fixed by #17480
Assignees
Labels
A-io-partitioning Area: reading/writing (Hive) partitioned files accepted Ready for implementation enhancement New feature or an improvement of an existing feature P-medium Priority: medium

Comments

@nameexhaustion
Copy link
Collaborator

nameexhaustion commented Jul 5, 2024

Description

Some hive datasets don't work out of the box if the directory is passed to scan_parquet because they contain non-data files, and currently the error messages we end up printing are very cryptic (e.g. parquet: File out of specification: The file must end with PAR1). We should check the file extensions and raise a better error message if we see mixed extensions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-io-partitioning Area: reading/writing (Hive) partitioned files accepted Ready for implementation enhancement New feature or an improvement of an existing feature P-medium Priority: medium
Projects
Archived in project
2 participants