How Can I read Json, Csv or Parquet with schema? #3785

Miyake-Diogo · 2022-10-10T20:09:09Z

Miyake-Diogo
Oct 10, 2022

Is it possible to read a file passing schema? like spark.

HaoYang670 · 2022-11-09T08:44:59Z

HaoYang670
Nov 9, 2022

Yes. You could try this to read https://github.com/apache/arrow-datafusion/blob/master/datafusion/core/tests/example.csv with schema

use arrow::datatypes::{ Field, Schema, DataType};

use datafusion::error::Result;
use datafusion::prelude::*;
use datafusion::arrow::util::pretty;

#[tokio::main]
async fn main() -> Result<()> {
    let ctx = SessionContext::new();
    let schema = generate_schema();

    // register csv file with the execution context
    ctx.register_csv(
        "example",
        "/home/remziy/learning/datafusion/datafusion/core/tests/example.csv",
        CsvReadOptions::new().has_header(true).schema(&schema),
    )
    .await?;

    // execute the query
    let df = ctx
        .sql("SELECT * from example")
        .await?;

    let results = df.collect().await?;
    // print the results with data type
    println!("{:?}", results);
    // print the table
    pretty::print_batches(&results)?;
    Ok(())
}

fn generate_schema() -> Schema {
    let field_a = Field::new("a", DataType::Int32, false);
    let field_b = Field::new("b", DataType::UInt16, false);
    let field_c = Field::new("c", DataType::Float32, false);
    Schema::new(vec![field_a, field_b, field_c])
}

And this is the output I get

[RecordBatch { schema: Schema { fields: [Field { name: "a", data_type: Int32, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "b", data_type: UInt16, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "c", data_type: Float32, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }], metadata: {} }, columns: [PrimitiveArray<Int32>
[
  1,
], PrimitiveArray<UInt16>
[
  2,
], PrimitiveArray<Float32>
[
  3.0,
]], row_count: 1 }]
+---+---+---+
| a | b | c |
+---+---+---+
| 1 | 2 | 3 |
+---+---+---+

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How Can I read Json, Csv or Parquet with schema? #3785

{{title}}

Replies: 1 comment

{{title}}

Select a reply

How Can I read Json, Csv or Parquet with schema? #3785

Miyake-Diogo Oct 10, 2022

Replies: 1 comment

HaoYang670 Nov 9, 2022

Miyake-Diogo
Oct 10, 2022

HaoYang670
Nov 9, 2022