-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UTF damage #36
Comments
I'm not Rhaley any longer, BTW. |
as on irc: what do you mean by "my UTF isn't UTF anymore"? Can you provide a code snippet that demonstrates your issue? |
I've placed the following branch up: In the test, t/36_dump_utf8.t, I take the same string and utf8 encode one element of a hash while leaving the other in wide char mode before calling YAML::Tiny::Dump. |
I also noted that the test was which I moved to allow escaping of newlines and quotes was testing for single quote and then escaping double-quotes. I changed it to test for and escape all three. |
thanks, I'll take a closer look tonight. I tidied and simplified the test to this, to more clearly indicate what is happening:
|
if you have a string with utf8 pragma ( There's no way YAML or any other libraries can handle them, and since YAML requires all text as just text, you have to pass them in perl text strings. The current behavior of YAML::Tiny that renders some byte format might have a bit to be desired (like, properly encoding them using YAML's |
@miyagawa I'm not sure of his intent, but my (perhaps naive) expectation is that even encoded strings should properly round-trip. But to test that, we should not be extracting a snippet of the output from |
the PR is in #40 |
I'll have a look at this tomorrow. |
I already submitted the code which deals with the actual problem. Instead of utf8::valid, I decode it and check the length. If it changes, it's encoded, if not, it's not. |
in the subroutine '_dump_scalar', at line 675, there's a test which my UTF passed, as it has bytes that match the test. Once the code in that block is finished, my UTF isn't UTF anymore.
I fixed this on my own machine by adding ! utf8::valid($string) before the test, and my UTF stayed UTF.
However, the side effect is that some characters which were formerly escaped in that block are creeping through and causing the code to die on 'illegal characters'. I haven't gotten back to looking at which characters, but I suspect quotes.
The text was updated successfully, but these errors were encountered: