Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gsv.to_lists parses wrongfully if there are double quotes on any row (except first) on first colum #6

Open
israelss opened this issue Jun 8, 2024 · 1 comment

Comments

@israelss
Copy link

israelss commented Jun 8, 2024

I noticed that if any row (except the first) on first column have contents inside double quotes, gsv can't parse it correctly.

EDIT: Accordingly to rfc 4180 this shouldn't be an issue (see 5)

Test code:

import gleam/io
import gleam/list
import gleeunit
import gleeunit/should
import gsv

pub fn main() {
  gleeunit.main()
}

pub fn double_quote_on_first_colum_test() {
  let csv = "first_column,second_column\n\"First row\",Second row\n"
  let csv_list = gsv.to_lists(csv)
  let _ = io.debug(csv_list) // output: Ok([["first_column", "second_column", "First row", "Second row"]])

  case csv_list {
    Ok(result) -> list.length(result) |> should.equal(2)
    Error(error) -> should.equal(error, Nil)
  }
}

pub fn double_quote_on_any_column_excpet_first_test() {
  let csv = "first_column,second_column\nFirst row,\"Second row\"\n"
  let csv_list = gsv.to_lists(csv)
  let _ = io.debug(csv_list) // output: Ok([["first_column", "second_column"], ["First row", "Second row"]])

  case csv_list {
    Ok(result) -> list.length(result) |> should.equal(2)
    Error(error) -> should.equal(error, Nil)
  }
}

pub fn double_quote_on_first_colum_header_test() {
  let csv = "\"first_column\",second_column\nFirst row,Second row\n"
  let csv_list = gsv.to_lists(csv)
  let _ = io.debug(csv_list) // output: Ok([["first_column", "second_column"], ["First row", "Second row"]])

  case csv_list {
    Ok(result) -> list.length(result) |> should.equal(2)
    Error(error) -> should.equal(error, Nil)
  }
}

pub fn double_quote_on_any_column_header_excpet_first_test() {
  let csv = "first_column,\"second_column\"\nFirst row,Second row\n"
  let csv_list = gsv.to_lists(csv)
  let _ = io.debug(csv_list) // output: Ok([["first_column", "second_column"], ["First row", "Second row"]])

  case csv_list {
    Ok(result) -> list.length(result) |> should.equal(2)
    Error(error) -> should.equal(error, Nil)
  }
}

pub fn double_quote_on_first_colum_row_and_header_test() {
  let csv = "\"first_column\",second_column\n\"First row\",Second row\n"
  let csv_list = gsv.to_lists(csv)
  let _ = io.debug(csv_list) // output: Ok([["first_column", "second_column", "First row", "Second row"]])

  case csv_list {
    Ok(result) -> list.length(result) |> should.equal(2)
    Error(error) -> should.equal(error, Nil)
  }
}
@bcpeinhardt
Copy link
Owner

Hey there! I just want to clarify and make sure I understand the issue. It looks like if there's white space between a comma ending a field and an opening quote for the next field it causes an issue.
This is technically outside the spec, per point 4 in the spec

Within the header and each record, there may be one or more
fields, separated by commas. Each line should contain the same
number of fields throughout the file. Spaces are considered part
of a field and should not be ignored
. The last field in the
record must not be followed by a comma.

but this is not the first time something that seems reasonable has been outside the spec.

For example, according to the spec, the last field in a record must not be followed by a comma, but commas must be used to indicate empty fields, so it's a bit confusing to produce an error there from a parsing standpoint.
I wrote this library when I was just learning Gleam, and the language has changed a bit since then as well. I think I'll probably take a shot at a version 2 of gsv which is focused on flexibility and performance rather than sticking to the spec, as it seems lots of csv files simply don't conform to it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants