-
Notifications
You must be signed in to change notification settings - Fork 7.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix tests with non-GNU iconv #16840
base: master
Are you sure you want to change the base?
Fix tests with non-GNU iconv #16840
Conversation
This test ensures that iconv() will not return part of the string if it encounters an error, but the particular error it wants to cause is not portable. When the input sequence is not representable in the output charset, POSIX says that "iconv() shall perform an implementation-defined conversion on the character" rather than fail. As far as I know, the only two implementations that go against this recommendation (and return the error that this test is expecting) are GNU libiconv and glibc. This commit adds a SKIPIF check for the two GNU implementations to ensure that an error is actually encountered during this test.
This test performs a conversion from UCS-2 to UTF-8 using iconv and stream filters, with the suffix "//IGNORE" being added to the target charset. This magic //IGNORE string was recently standardized in POSIX 2024, but it is not yet portable: musl for example will simply fail when it encounters //IGNORE in what is supposed to be a charset name. Fortunately, we do not need to think too hard about the general problem here: the input is "abc", which has no un-translatable sequences. We can simply drop the "//IGNORE".
Hm, it looks like some versions of glibc iconv actually do the right thing with invalid input sequences and - string(10) "aa%C3%B8aa"
+ Notice: iconv(): Detected an illegal character in input string in /home/runner/work/php-src/php-src/ext/iconv/tests/bug48147.php on line 4
+ string(0) "" That SHOULD warn/fail because the input is invalid, and |
Thanks for the patch. Yes, that GNU libiconv is non-standard and also LGPL based library, which makes it so-so useful. I certainly would add priority to built-in iconv implementation over GNU libiconv. I'm not sure if any packages out there rely on it as there isn't proper integration of POSIX iconv in PHP yet for some characters. I remember I've always had some annoyances with converting characters using it. :D Not to mention that with gnu-libiconv some tests would fail on PHP's CI setup. Perhaps those IGNORE hacks should be removed but I should check in more details. |
Yuck.
This is annoying because, like, three mistakes add up to something that is sort of correct. POSIX and implementations have always agreed that processing would stop at an invalid input sequence, even in the presence of The PHP docs for iconv, however, do say that any failure will get you |
This test meaningfully uses the //IGNORE charset suffix that was was a GNU/Solaris extension but is now standardized in POSIX 2024. The way //IGNORE is used, however, is non-standard: POSIX says that //IGNORE will cause untranslatable sequences to be skipped, but this test is expecting it to skip input sequences that are invalid rather than unexpressible in the target charset. That behavior is specific to the two GNU implementations (and was always non-conforming...), so we add a SKIPIF block to ensure that one of the GNU implementations is used.
This test does a byte comparison of the expected ISO-2022-JP encoding with the result of iconv(). The ISO-2022-JP encoding, however, is stateful; there are many ways to arrive at a correct answer. Musl is known to have an inefficient encoding that causes it to fail this test, so we add a SKIPIF for the "unknown" iconv implementation that musl has. Nothing is wrong here per se, but to support the musl output (and that of other iconvs), an expert would need to verify it.
This test checks the output of iconv_mime_encode() with a target charset of ISO-2022-JP. That charset is stateful, though, and there are many ways to arrive at a correct answer. Musl has an inefficient encoding of it that causes this to fail because the actual output differs from (and is much longer than) the expected output. We add a SKIPIF for the "unknown" iconv implementation that musl has.
699b51c
to
f25e0e5
Compare
I dropped the two commits that change the I'll open a new PR that makes the case for eliminating the |
Get the iconv test suite passing with musl. This fixes some of #13696
Many of these issues relate to the
//IGNORE
suffix that can be appended to the target charset during conversion. This//IGNORE
was recently standardized in POSIX 2024, but there are still some pitfalls://IGNORE
will ignore untranslatable sequences (valid input sequences that cannot be represented in the target charset), but sequences that are a priori invalid should instead cause anEILSEQ
-- even in earlier versions of the standard.EILSEQ
for invalid input sequences. Their//IGNORE
allows these errors to be ignored. In other words, the two GNU implementations have never conformed to POSIX.*
rather than fail. But it does not yet implement//IGNORE
, and will simply crash if it encounters//IGNORE
in the charset name.So, what is done here:
//IGNORE
behavior. These get SKIPIFs.//IGNORE
, but it has no effect on the output. Here the//IGNORE
was simply dropped.@petk @arnaud-lb