Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include bloom filter statistics when reading parquet metadata with ClickHouse #490

Open
Selfeer opened this issue Oct 2, 2024 · 1 comment

Comments

@Selfeer
Copy link

Selfeer commented Oct 2, 2024

Describe the new feature

We need a way to determine if the bloom filter is applied or not on a parquet file when inspecting the parquet metadata with ClickHouse via SELECT * FROM file('output.parquet', ParquetMetadata). Currently there is no mention of bloom_filter_offset when reading from a parquet with ClickHouse.

Use case

A way to check if the bloom filter is applied or not on the parquet file and have it as one of the checks for QA directly with ClickHouse without relying on 3rd party tools like parquet-tools.

@arthurpassos
Copy link
Collaborator

This might be useful: apache/iceberg#9898 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants