Skip to content

Commit

Permalink
BUG: error in read_excel with some ods files pandas-dev#45598 (pandas…
Browse files Browse the repository at this point in the history
…-dev#46050)

* BUG: error in read_excel with some ods files pandas-dev#45598

* BUG: use hasattr instead of dir

* DOC: add issue number in new test case

* DOC: remove comment

Co-authored-by: Dimitra Karadima <[email protected]>
  • Loading branch information
dimitra-karadima and dimitra-karadima authored Mar 3, 2022
1 parent e4162cd commit 004b4c5
Show file tree
Hide file tree
Showing 4 changed files with 19 additions and 2 deletions.
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.5.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -406,6 +406,7 @@ I/O
- Bug in :func:`read_parquet` when ``engine="pyarrow"`` which caused partial write to disk when column of unsupported datatype was passed (:issue:`44914`)
- Bug in :func:`DataFrame.to_excel` and :class:`ExcelWriter` would raise when writing an empty DataFrame to a ``.ods`` file (:issue:`45793`)
- Bug in Parquet roundtrip for Interval dtype with ``datetime64[ns]`` subtype (:issue:`45881`)
- Bug in :func:`read_excel` when reading a ``.ods`` file with newlines between xml elements(:issue:`45598`)

Period
^^^^^^
Expand Down
8 changes: 6 additions & 2 deletions pandas/io/excel/_odfreader.py
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,11 @@ def get_sheet_data(
table: list[list[Scalar | NaTType]] = []

for sheet_row in sheet_rows:
sheet_cells = [x for x in sheet_row.childNodes if x.qname in cell_names]
sheet_cells = [
x
for x in sheet_row.childNodes
if hasattr(x, "qname") and x.qname in cell_names
]
empty_cells = 0
table_row: list[Scalar | NaTType] = []

Expand Down Expand Up @@ -243,5 +247,5 @@ def _get_cell_string_value(self, cell) -> str:
# https://github.com/pandas-dev/pandas/pull/36175#discussion_r484639704
value.append(self._get_cell_string_value(fragment))
else:
value.append(str(fragment))
value.append(str(fragment).strip("\n"))
return "".join(value)
Binary file added pandas/tests/io/data/excel/test_newlines.ods
Binary file not shown.
12 changes: 12 additions & 0 deletions pandas/tests/io/excel/test_odf.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,3 +36,15 @@ def test_read_writer_table():
result = pd.read_excel("writertable.odt", sheet_name="Table1", index_col=0)

tm.assert_frame_equal(result, expected)


def test_read_newlines_between_xml_elements_table():
# GH#45598
expected = pd.DataFrame(
[[1.0, 4.0, 7], [np.nan, np.nan, 8], [3.0, 6.0, 9]],
columns=["Column 1", "Column 2", "Column 3"],
)

result = pd.read_excel("test_newlines.ods")

tm.assert_frame_equal(result, expected)

0 comments on commit 004b4c5

Please sign in to comment.