Clarify relationship to PEP-0383 / UTF-8b #16

Ericson2314 · 2022-06-24T15:23:49Z

I just learned about these older things. The idea seems similar, but I cannot readily tell how similar. It would be great if multiple implementations were converging / did converge on the same thing.

SimonSapin · 2022-06-24T16:06:40Z

I don’t easily find a definition for UTF-8b, is it the same as PEP-0383?

PEP-0383 defines a superset of UTF-32 that can losslessly round-trip an arbitrary byte sequence [u8] by interpreting it as potentially-ill-formed UTF-8 and preserving the meaning of the well-formed parts.

WTF-8 defines a superset of UTF-8 that can losslessly round-trip an arbitrary code unit sequence [u16] (a.k.a. "wide string") by interpreting it as potentially-ill-formed UTF-16 and preserving the meaning of the well-formed parts.

PEP-0383 and WTF-8 take a similar approach in how to solve problems, but they solve fundamentally different problems to begin with. What does "converging" even mean? I’m a bit confused at what you’re expecting here.

In any case, even if we could find a potentially-beneficial change to either of them, PEP-0383 and WTF-8 are names for specific encoding/behaviors that already have implementations in use. Redefining the name to some other encoding would be harmful. If you come up with a different encoding that could be interesting, give it another name. https://github.com/kennytm/omgwtf8 is an example where this happened.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarify relationship to PEP-0383 / UTF-8b #16

Clarify relationship to PEP-0383 / UTF-8b #16

Ericson2314 commented Jun 24, 2022

SimonSapin commented Jun 24, 2022

Clarify relationship to PEP-0383 / UTF-8b #16

Clarify relationship to PEP-0383 / UTF-8b #16

Comments

Ericson2314 commented Jun 24, 2022

SimonSapin commented Jun 24, 2022