Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should Dump and Load encode/decode UTF-8? #11

Open
dagolden opened this issue Oct 25, 2013 · 6 comments
Open

Should Dump and Load encode/decode UTF-8? #11

dagolden opened this issue Oct 25, 2013 · 6 comments
Labels
Milestone

Comments

@dagolden
Copy link

Historically, YAML::Tiny's read_string/write_string were essentially character-based and I'd like to leave them that way.

However, Dump/Load are inconsistent with other YAML modules (which are themselves inconsistent).

Ingy has said that the YAML::XS approach of Dump/Load is probably the path forward:

$perl = Load($utf8_octets);
$utf8_octets = Dump($perl);

Should we make this change for 2.00?

@Leont
Copy link
Member

Leont commented Oct 25, 2013

Sounds sane to me

@dagolden
Copy link
Author

My position, as I've stated on the #yaml channel on irc.perl.org, is as follows:

  • I now feel that providing functions to dump/load unencoded characters is essential for flexibility for those users/libraries who are managing encoding themselves. Providing only octets and forcing such users to decode output only to re-encode it later seems perverse and un-Perlish
  • I would prefer for Dump/Load to provide/consume characters for consistency with the original YAML.pm implementation, as copied by YAML::Tiny and YAML::Syck
  • I would prefer to add DumpUTF8 and LoadUTF8 to provide/consume encoded octets
  • I would tolerate making Dump/Load produce/consume encoded octets if DumpString/LoadString were added to produce/consume unencoded characters

Whatever decision is taken, all existing YAML implementations should become consistent. For those whose Dump/Load API is changing, I would recommend major version bumps.

@dagolden
Copy link
Author

And wrt to timing, as there are other potentially incompatible changes for YAML::Tiny 2.00, I would like this API decision to be made "soon" (by January) so we don't have multiple major version bumps in short sequence.

I would like YAML::Tiny 2.00 to be a cleaned-up, consistent version of what we have now.

Whenever the new OO YAML.pm is considered final and stable, YAML::Tiny may then consider following suit as 3.00 (or it may not do so right away if the OO API is too great a departure from YAML::Tiny's existing OO API)

@rjbs
Copy link
Member

rjbs commented Nov 28, 2013

I agree that both interfaces should be available: characters or octets. If there is a common standard for all YAML libraries everywhere to which other Perl libraries will be adapter, let's use it. Otherwise, let's stick with what we do now. I am only familiar with Perl and Python's YAML, and from Python's PyYAML, we can learn nothing, because there is a clearer distinction between types of strings.

@ingydotnet
Copy link
Contributor

01:15 < ingy> xdg: I have a rough plan formulating in my head
01:15 < ingy> I'll start writing it down
01:16 < ingy> the guiding thoughts are:
01:16 < ingy> 1) there should be mtowtdi
01:17 < ingy> 2) we should leave the defaults alone for now
01:17 < ingy> 3) we should add a way to specify owtdi
01:18 < ingy> 4) OO is best but not necessary to accomplish choice (tmtowtd choice)
01:18 < ingy> 5) the longterm API is OO, but we should get simple choice over legacy default ASAP
01:19 < ingy> ...
01:28 < ether> how do Dump and Load work right now? characters, or octets?
01:29 < ether> er, rather - if I try to Load() a bytestring, will I get decoded chars back, or just bytes?
01:30 < ether> I'm of the school that (en|de)coding should be done at the file layer, and no higher
01:30 < ether> which is I think also what xdg is saying
01:45 < ingy> ether: it's an answerable question
01:46 < ingy> and I think xdg already answered it in the email that started this
01:46 < ingy> but for moving forward it's not currently that important
01:46 < ingy> the important things are:
01:47 < ingy> 1) don't break (too much) shit
01:47 < ingy> 2) provide choice to everyone
01:47 < ingy> ...
01:48 < ingy> then further down the road we can have an OO, multi-backend, choice-ridden API
01:49 < ingy> there are 4 different Perl YAML-s and they are all popular
01:50 < ingy> so I'm pretty certain changing to a default standard API is going to cause a lot of pain
01:51 < ingy> but there's no reason to
01:51 < ingy> we just let them work asis but with super simple ways to flip behaviour
01:52 < ingy> so if I want Tiny to do XS encoding, it's one line
01:52 < ingy> or arg or whatever
01:54 < ingy> s/(too much) //; # :)

@karenetheridge
Copy link
Member

I poked at this last night after having a test fail that was using Test::Deep::YAML with a META.yml with a unicode canary in x_contributors...

  • YAML::XS and YAML::Syck's Load() expect octets, and decode to characters.
  • YAML and YAML::Tiny's Load() expect characters, and do no decoding.
  • YAML::Old won't install on newer perls and should probably be deprecated/replaced with a thin wrapper anyway.

For comparison, JSON::Any's decode_json expects octets (will decode to characters).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants