-
Notifications
You must be signed in to change notification settings - Fork 446
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check pofile string delimiters #1151
Open
rtobar
wants to merge
6
commits into
python-babel:master
Choose a base branch
from
rtobar:check-string-delimiters
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
While the re module caches some of the latest compilations, it's better form to not rely on it doing so. Signed-off-by: Rodrigo Tobar <[email protected]>
The exact same check is performed a few lines above. Signed-off-by: Rodrigo Tobar <[email protected]>
Since Python 2.3 sorted() has been guaranteed to be stable. The comment was wrong, and thus it makes sense to do the full assertion as clearly intended. Signed-off-by: Rodrigo Tobar <[email protected]>
The _NormalizeString helper class mixes some responsibilities, not only acting as a container for potentially multiple lines of a single string message, but also doing and hiding some of the parsing of such strings. "Doing" because it performs a .strip() on all incoming strings in order to remove any whitespace before/after them, and "hiding" because when invoking the "denormalize" method, each line is slices to remove the first and last element, which are implicitly assumed to be the string delimiters (double quotes, in principle). These multiple roles have already led to confusion within the codebase as to how this class is supposed to be used. Its existing unit test doesn't provide strings with proper delimiters (and thus calling .denormalize() on these objects would return unexpected results -- empty strings in all cases). Similarly, missing msgstr instances also result in a call to _NormalizeString(""), which does work, but is conceptually incorrect, as the empty string is somethiing that _NormalizeString should never see coming in. This commit changes all the places where confusing usage of the _NormalizeString class happens. In particular, the existing unit test's strings are now always delimited by double quotes (so calling .denormalize on them would yield the expected value). A number of new unit tests have also been added exercising the denormalize() method, which includes unescaping escaped characters. Finally, the construction of an empty string message has been simplified to _NormalizeString(). Signed-off-by: Rodrigo Tobar <[email protected]>
Strings should be delimited on both ends by double quotes, but this is currently not being been detected, and content is simply being incorrectly trimmed. This commit adds a check for each string to verify it starts and ends with a double quote character, issuing a warning/error if that's not the case (and fixing it as appropriate). A few new test cases have been added to check that the lack of double quotes to delimit strings issues errors as expected. Signed-off-by: Rodrigo Tobar <[email protected]>
Now that all strings given as inputs to _NormalizeString have been verified (or corrected) to be correctly delimited with double quotes, there's no reason to continue doing an internal strip anymore. Moreover, we can express this internal constraint with an assertion to avoid issues in the future. Signed-off-by: Rodrigo Tobar <[email protected]>
This was referenced Nov 17, 2024
rtobar
added a commit
to python/python-docs-es
that referenced
this pull request
Nov 18, 2024
En `library/re.po` había una entrada que no estaba delineada correctamente con comillas dobles (si ven el diff entero es la última entrada en el diff, o pueden ver simplemente el primer commit de este PR). Esto hacía que `powrap --check` se saltara el archivo y no lo validara. Esto, a su vez, ocurría porque la utilidad `msgcat` de `gettext` identificaba el error de sintaxis, y fallaba al ser ejecutada. `powrap` no consideraba esos errores al momento de calcular el exit code del proceso, y por lo tanto el archivo no sólo seguía siendo inválido, sino que tampoco era verificado. De igual forma, el archivo no podía ser wrapeado correctamente usando `powrap library/re.po`. Ya abrí un PR contra `powrap` para cambiar este comportamiento en https://git.afpy.org/AFPy/powrap/pulls/4 (actualización: el PR ya fue mergeado, y una nuevs versión de powrap fue publicada, pornlo que también actualicé en este PR nuestra dependencia de powrap, además del pre-commit hook de powrap). Por otro lado, el resto de nuestras herramientas *no* consideraban este archivo como inválido, Esto es porque `polib` no hacía la validación correspondiente, e incorrectamente parseaba la entrada. También abrí un PR contra polib para esto en izimobil/polib#161. Actualización: en el intertanto también me di cuenta de que el paquete `babel` sufre del mismo problema, yo incorrectamente había asumido que babel dependía de polib; PR creada contra babel: python-babel/babel#1151. Después de corregir el error de sintaxis, ejecuté powrap de tal manera que ahora `library/re.po` está bien formateado. --------- Signed-off-by: Rodrigo Tobar <[email protected]>
Gentle ping, at least to kick off CI and check if there's any obvious mistakes to be fixed |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds checks to the pofile parser code to validate that message strings are correctly delimited by double quotes. Keeping with the current design, an error is only raised if requested, otherwise a warning is printed, the faulty lines are corrected and parsing goes on.
I found this issue while processing a pofile used in the Spanish translation of the CPython documentation. One of our files was incorrectly written, and from all our tooling only the
msgcat
tool of GNU'sgettext
package complained, whilebabel
,polib
and others didn't. See python/python-docs-es#2873, izimobil/polib#161 and https://git.afpy.org/AFPy/powrap/pulls/4 for further reference.While implementing this change I found that the
_NormalizedString
class not only was used to contain message lines, but also participated in the parsing process (and hid some parsing as well). I thus broke down my changes into three separate commits:_NormalizedString
class across the codebase (see details in commit)._NormalizedString
behavesAlong the way I also implemented three small quality-of-life changes. They are included as the first three commits of this PR, happy to submit these separately if required: