-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement File Format Reader/Writer #72
Conversation
…o8#53) - Added CsvOptions struct to support CSV read options like `header`, `delimiter`, and `nullValue`. - Implemented ConfigOpts trait for CsvOptions to convert options into key-value pairs. - Updated DataFrameReader to include `csv` method that accepts CsvOptions.
…o8#54) - Added documentation for the CsvOptions struct.
- Updated the csv method in DataFrameReader to support both single string slices and arrays of string slices as input paths.
…so8#54) - Added JsonOptions struct to support JSON read options like `schema`, `multi_line`, `encoding`, and more. - Implemented ConfigOpts trait for JsonOptions to convert options into key-value pairs. - Updated DataFrameReader to include `json` method that accepts JsonOptions. - Documented all available JSON options, including example usage for setting options when reading JSON files. [TO DO] - Write tests to validate JSON options functionality.
…o8#54) - Example usage provided for setting ORC options when reading files. - Write tests to validate ORC options functionality.
…russo8#54) - Added ParquetOptions struct to support Parquet read options like `mergeSchema`, `pathGlobFilter`, and `recursiveFileLookup`. - Implemented ConfigOpts trait for ParquetOptions to convert options into key-value pairs. - Updated DataFrameReader to include `parquet` method that accepts ParquetOptions. - Example usage provided for setting Parquet options when reading files. - Write tests to validate Parquet options functionality.
…so8#54) - Added TextOptions struct to support text read options like `wholetext`, `lineSep`, and `pathGlobFilter`. - Implemented ConfigOpts trait for TextOptions to convert options into key-value pairs. - Updated DataFrameReader to include `text` method that accepts TextOptions. - Example usage provided for setting text options when reading files. - Write tests to validate text options functionality.
…riter (sjrusso8#54) - Added TextOptions struct to support text write options such as `whole_text` and `line_sep`. - Added ParquetOptions struct to support Parquet write options like `merge_schema`, `path_glob_filter`, and `datetime_rebase_mode`. - Implemented `write` method in DataFrameWriter to handle configuration for text and Parquet file formats. - Example usage provided for setting text and Parquet options when writing DataFrames. - Write tests to validate text and Parquet file writing functionality.
…russo8#54) - Added support for reading and writing .csv, .json, .orc, .parquet, and .text file formats. - Created `ConfigOpts` trait for each file type to manage options in a structured way. - Added example method signatures for file reading using a configurable options object passed into methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks good! Left some comments for your consideration. I think the key values when creating the HashMap
need to be the camelCase
option from the spark docs.
…jrusso8#54) - Implemented additional fields in ParquetOptions compression. - Updated test_dataframe_read_parquet_with_options to ensure valid compression codec usage. - Enhanced test_dataframe_read_text_with_options to properly read lines by setting line_sep and disabling whole_text. - Implemented the #[derive(Debug, Clone)] traits for all Option structs. - Updated expected path_glob_filter type to string. - Added the compression field to ParquetOptions, OrcOptions, and JsonOptions. - Updated documentation for all Options structs to include descriptions for new and existing fields.
…usso8#54) - Introduced CommonFileOptions to handle common configuration fields such as: - path_glob_filter - recursive_file_lookup - ignore_corrupt_files - ignore_missing_files - modified_before - modified_after - Updated CsvOptions, JsonOptions, OrcOptions, ParquetOptions, and TextOptions to use CommonFileOptions for the shared fields. - Updated the new() constructors for each file format options struct to initialize CommonFileOptions. - Refactored tests for each file format (e.g., ORC, CSV) to utilize the new CommonFileOptions, ensuring that both format-specific and shared options are properly tested. - Updated and verified tests for DataFrame reading and writing operations with updated options.
Hi @sjrusso8 I just updated the pr. |
@lexara-prime-ai LGTM! just update the README.md and mark these as closed. Then i'll merge in the change |
Just updated the |
Description
This pull request enables DataFrames to be read from and written to various file formats (CSV, JSON, ORC, Parquet) using a set of predefined options implemented via the
ConfigOpts
trait. Key changes include:DataFrameWriter:
csv
,json
,orc
,parquet
andtext
formats.csv
,json
,orc
,parquet
andtext
).ConfigOpts
trait implemented forJsonOptions
,OrcOptions
, andParquetOptions
.TableOptions
tests in DataFusion.DataFrameReader:
csv
,json
,orc
,parquet
andtext
formats.ConfigOpts
trait for flexible configuration.DataFrameReader
configurations and file parsing.Related Issue(s)
Documentation