-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: support read row group from packed reader #157
base: main
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: shaoting-huang The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
0485e2a
to
136fe37
Compare
const std::string& path, | ||
const std::shared_ptr<arrow::Schema> schema, | ||
const size_t start_row_group, | ||
const size_t end_row_group, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not easy to know if the end_row_group
is inclusive or not.
@@ -57,9 +67,19 @@ class PackedRecordBatchReader : public arrow::RecordBatchReader { | |||
|
|||
arrow::Status ReadNext(std::shared_ptr<arrow::RecordBatch>* batch) override; | |||
|
|||
Result<std::shared_ptr<arrow::Table>> ReadRowGroup(int file_index, int row_group_index); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function seems not necessary.
@@ -70,6 +90,8 @@ class PackedRecordBatchReader : public arrow::RecordBatchReader { | |||
size_t memory_limit_; | |||
size_t buffer_available_; | |||
std::vector<std::unique_ptr<parquet::arrow::FileReader>> file_readers_; | |||
std::vector<size_t> start_row_groups_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be much cleaner if these attributes go to some FileRecordBatchReader
, because they make no sense in packed scenarios.
cpp/src/packed/reader.cpp
Outdated
std::vector<ColumnOffset>(), | ||
std::set<int>(), | ||
std::vector<size_t>(1, start_row_group), | ||
std::vector<size_t>(1, end_row_group), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validity check required for the two arguments.
Signed-off-by: shaoting-huang <[email protected]>
0ac6c8b
to
5a106dc
Compare
Signed-off-by: shaoting-huang <[email protected]>
related: #158