-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to read csv with "\t" as the separator #20342
Comments
It loads for me if I change pl.read_csv("Downloads/sms+spam+collection/SMSSpamCollection", separator="\t", has_header=False, quote_char=None)
# shape: (5_574, 2)
# ┌──────────┬─────────────────────────────────┐
# │ column_1 ┆ column_2 │
# │ --- ┆ --- │
# │ str ┆ str │
# ╞══════════╪═════════════════════════════════╡
# │ ham ┆ Go until jurong point, crazy..… │
# │ ham ┆ Ok lar... Joking wif u oni... │
# │ spam ┆ Free entry in 2 a wkly comp to… │
# │ ham ┆ U dun say so early hor... U c … │
# │ ham ┆ Nah I don't think he goes to u… │
# │ … ┆ … │
# │ spam ┆ This is the 2nd time we have t… │
# │ ham ┆ Will ü b going to esplanade fr… │
# │ ham ┆ Pity, * was in mood for that. … │
# │ ham ┆ The guy did some bitching but … │
# │ ham ┆ Rofl. Its true to its name │
# └──────────┴─────────────────────────────────┘ A minimal example. data = b'foo\t"bar"...'
pl.read_csv(data, separator="\t", has_header=False)
# ComputeError: could not parse `"bar".` as dtype `str` at column 'column_2' (column number 2) |
Thanks, that worked for me as well. let parse_options = CsvParseOptions::default()
.with_separator(b'\t')
.with_quote_char(None);
let df = CsvReadOptions::default()
.with_parse_options(parse_options)
.with_has_header(false)
.try_into_reader_with_file_path(Some("SMSSpamCollection.tsv".into()))
.unwrap()
.finish()
.unwrap(); Should I have known to set this option to |
I don't think so. Polars did previously parse this without error - it is a very recent change. (That's why #19432 was filed as a bug.) I think perhaps an additional "Hint: " message could be added to suggest I'm also not sure why you're getting a panic, perhaps that's also a bug. (I just use py-polars.) |
Ah i see -- thanks! I agree a more helpful error message would be nice. |
Checks
Reproducible example
Log output
Issue description
The .tsv file that I'm trying to read is the SMS-Spam dataset that can be downloaded from the UC Irvine data repository. After downloading the .zip file and adding extension .tsv to SMSSpamCollection file, I run the provided code, which is unable to read in the dataset.
Perhaps I'm not doing this correctly?
Expected behavior
That the df loads the .tsv file correctly.
Installed versions
csv, lazy
The text was updated successfully, but these errors were encountered: