Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Low-order non-printable ASCII characters in worksheet name should be escaped or removed #654

Open
khiav223577 opened this issue Aug 24, 2020 · 5 comments

Comments

@khiav223577
Copy link

khiav223577 commented Aug 24, 2020

The worksheet name was escaped as HTML since #112.
But It seems to be not working with some special characters, "\b" character for example.
I think it should be escaped as XML.

See: https://support.microsoft.com/en-us/help/315580/prb-error-message-when-an-xml-document-contains-low-order-ascii-charac

The following are the character ranges for low-order non-printable ASCII characters that are rejected by MSXML versions 3.0 and later:
#x0 - #x8 (ASCII 0 - 8)
#xB - #xC (ASCII 11 - 12)
#xE - #x1F (ASCII 14 - 31)

Steps to reproduce

  1. Export an excel file by aslsx.
require 'axlsx'

Axlsx::Package.new do |p|
  p.workbook.add_worksheet(:name => "My sheet\b") do |sheet|
    sheet.add_row ["Data1", "Data2"]
  end
  p.serialize('simple1.xlsx')
end
  1. Double click simple1.xlsx to open it.

It will say that the excel is corrupted.
2020-08-24_154115

Possible fixes

We can use Builder::XChar.encode to escape the worksheet name.

require 'axlsx'

Axlsx::Package.new do |p|
  p.workbook.add_worksheet(:name => Builder::XChar.encode("My sheet\b")) do |sheet|
    sheet.add_row ["Data1", "Data2"]
  end
  p.serialize('simple2.xlsx')
end

The produced excel file simple2.xlsx works fine.
2020-08-24_154436

@noniq
Copy link
Collaborator

noniq commented Aug 25, 2020

Does it make sense to retain such (escaped) characters in sheet names? In cells for example they are removed, see Cell#clean_value which calls Axlsx.sanitize which removes control characters.

@khiav223577
Copy link
Author

@noniq I'm not sure which is better, removing or escaping non-printable ASCII characters in sheet names. If it is undocumented or is undefined behavior, it's best to be removed. if it is a valid name, it make sense to retain it.

@noniq
Copy link
Collaborator

noniq commented Nov 9, 2020

My gut feeling is that non-printable characters should not be allowed to be part of a sheet name. Are there any references to what constitutes a valid sheet name in the Excel docs?

@khiav223577
Copy link
Author

I didn't find any references about what to do with non-printable characters 🤔.
Either removing or escaping is okay to me.

@khiav223577 khiav223577 changed the title Low-order non-printable ASCII characters in worksheet name should be escaped Low-order non-printable ASCII characters in worksheet name should be escaped or removed Nov 11, 2020
@noniq
Copy link
Collaborator

noniq commented Nov 11, 2020

Would you like to prepare a pull request? axlsx is currently unmaintained though, so please create the pull request in the https://github.com/caxlsx/caxlsx repository (that's the actively maintained community fork).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants