Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incompatibility between pandas and rust #20366

Open
2 tasks done
moghadas76 opened this issue Dec 19, 2024 · 4 comments
Open
2 tasks done

Incompatibility between pandas and rust #20366

moghadas76 opened this issue Dec 19, 2024 · 4 comments
Labels
bug Something isn't working needs repro Bug does not yet have a reproducible example needs triage Awaiting prioritization by a maintainer python Related to Python Polars

Comments

@moghadas76
Copy link

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

pd.to_csv(...)
pl.read_csv(...)

Log output

File ~/miniconda3/envs/env/lib/python3.11/site-packages/polars/io/csv/functions.py:672, in _read_csv_impl(source, has_header, columns, separator, comment_prefix, quote_char, skip_rows, schema, schema_overrides, null_values, missing_utf8_is_empty_string, ignore_errors, try_parse_dates, n_threads, infer_schema_length, batch_size, n_rows, encoding, low_memory, rechunk, skip_rows_after_header, row_index_name, row_index_offset, sample_size, eol_char, raise_if_empty, truncate_ragged_lines, decimal_comma, glob)
    668         raise ValueError(msg)
    670 projection, columns = parse_columns_arg(columns)
--> 672 pydf = PyDataFrame.read_csv(
    673     source,
    674     infer_schema_length,
    675     batch_size,
    676     has_header,
    677     ignore_errors,
    678     n_rows,
    679     skip_rows,
    680     projection,
    681     separator,
    682     rechunk,
    683     columns,
    684     encoding,
    685     n_threads,
    686     path,
    687     dtype_list,
    688     dtype_slice,
    689     low_memory,
    690     comment_prefix,
    691     quote_char,
    692     processed_null_values,
    693     missing_utf8_is_empty_string,
    694     try_parse_dates,
    695     skip_rows_after_header,
    696     parse_row_index_args(row_index_name, row_index_offset),
    697     eol_char=eol_char,
    698     raise_if_empty=raise_if_empty,
    699     truncate_ragged_lines=truncate_ragged_lines,
    700     decimal_comma=decimal_comma,
    701     schema=schema,
    702 )
    703 return wrap_df(pydf)

ComputeError: could not parse `A` as dtype `i64` at column 'properties_local_ref' (column number 36)

The current offset in the file is 7327452 bytes.

You might want to try:
- increasing `infer_schema_length` (e.g. `infer_schema_length=10000`),
- specifying correct dtype with the `schema_overrides` argument
- setting `ignore_errors` to `True`,
- adding `A` to the `null_values` list.

Original error:  bytes non-empty

Issue description

File ~/miniconda3/envs/env/lib/python3.11/site-packages/polars/io/csv/functions.py:672, in _read_csv_impl(source, has_header, columns, separator, comment_prefix, quote_char, skip_rows, schema, schema_overrides, null_values, missing_utf8_is_empty_string, ignore_errors, try_parse_dates, n_threads, infer_schema_length, batch_size, n_rows, encoding, low_memory, rechunk, skip_rows_after_header, row_index_name, row_index_offset, sample_size, eol_char, raise_if_empty, truncate_ragged_lines, decimal_comma, glob)
668 raise ValueError(msg)
670 projection, columns = parse_columns_arg(columns)
--> 672 pydf = PyDataFrame.read_csv(
673 source,
674 infer_schema_length,
675 batch_size,
676 has_header,
677 ignore_errors,
678 n_rows,
679 skip_rows,
680 projection,
681 separator,
682 rechunk,
683 columns,
684 encoding,
685 n_threads,
686 path,
687 dtype_list,
688 dtype_slice,
689 low_memory,
690 comment_prefix,
691 quote_char,
692 processed_null_values,
693 missing_utf8_is_empty_string,
694 try_parse_dates,
695 skip_rows_after_header,
696 parse_row_index_args(row_index_name, row_index_offset),
697 eol_char=eol_char,
698 raise_if_empty=raise_if_empty,
699 truncate_ragged_lines=truncate_ragged_lines,
700 decimal_comma=decimal_comma,
701 schema=schema,
702 )
703 return wrap_df(pydf)

ComputeError: could not parse A as dtype i64 at column 'properties_local_ref' (column number 36)

The current offset in the file is 7327452 bytes.

You might want to try:

  • increasing infer_schema_length (e.g. infer_schema_length=10000),
  • specifying correct dtype with the schema_overrides argument
  • setting ignore_errors to True,
  • adding A to the null_values list.

Original error: remaining bytes non-empty

Expected behavior

loading consistantly

Installed versions

--------Version info---------
Polars:              1.14.0
Index type:          UInt32
Platform:            Linux-6.8.0-49-generic-x86_64-with-glibc2.35
Python:              3.11.5 (main, Sep 11 2023, 13:54:46) [GCC 11.2.0]
LTS CPU:             False
@moghadas76 moghadas76 added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Dec 19, 2024
@orlp orlp added the needs repro Bug does not yet have a reproducible example label Dec 19, 2024
@orlp
Copy link
Collaborator

orlp commented Dec 19, 2024

Without a dataset to reproduce the issue we can't inspect the issue and fix any potential bugs.

@moghadas76
Copy link
Author

I want to contribute into the project. Could you help me what is the procedure?

@orlp
Copy link
Collaborator

orlp commented Dec 19, 2024

@moghadas76 As I said, the first step is to provide a minimal reproducible example of the problem. That includes any data necessary to reproduce it.

@rodrigogiraoserrao rodrigogiraoserrao changed the title Inompatibility between pandas and rust Incompatibility between pandas and rust Dec 19, 2024
@hutch3232
Copy link

What's the dtype of properties_local_ref when it's in a pd.DataFrame? The error is pretty clear that the parser thought that column was an int but then it encountered an "A" so that's why it errored.

It gave suggestions on how to workaround that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs repro Bug does not yet have a reproducible example needs triage Awaiting prioritization by a maintainer python Related to Python Polars
Projects
None yet
Development

No branches or pull requests

3 participants