Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default behavior for wrapping in PO files, or confusion between wrapping in content and in PO #291

Open
PhuNH opened this issue Feb 7, 2021 · 5 comments

Comments

@PhuNH
Copy link
Contributor

PhuNH commented Feb 7, 2021

I have only read po4a code for a while, so if I misunderstand anything and/or miss any important parts, please correct me, I would really appreciate.

As I can see in the documentation of po4a and PO module, the default behavior for wrapping in PO files is to wrap (after the 76th column). For gettext, at least msgmerge also behaves similarly: when I use msgmerge to merge a POT file and a translated PO file, if an entry in the POT file doesn't have any flag (and thus doesn't have the no-wrap flag) then the result PO entry is wrapped (both msgid and msgstr). On the other hand, the POT entry needs to have the no-wrap flag for both msgid and msgstr of the result PO entry not to be wrapped.

Now I'm reading the code of the Text and Yaml modules, and see that most of the calls to the translate function pass wrap => 0 (and some pass wrap => $defaultwrap which I think is a confusion between wrapping in PO and wrapping in content, but I myself might be the one that is confused here). Some examples are Text, Text, Text, Yaml. Moreover, the default wrap value is also 0. The result is that in all of these cases, the output PO entries have the no-wrap flag and are never wrapped. This is not configurable and not the default behavior that is expected for PO files as I mentioned above.

Also what the TransTractor documentation writes about the wrap parameter, to me, sounds like it is about wrapping in content, however the PO module shows that it is for wrapping in PO file.

So I have some questions here, and I would be very thankful if someone can provide answers:

  1. Is it true that there are two types of wrapping, those po4a modules (e.g. Text, Yaml) only control the wrapping of content, and the wrapping of PO entries is controlled by the wrap parameter of the translate function?
  2. Why should we set that by default those PO entries will not be wrapped? Shouldn't we let them wrapped, similar to the default behavior of po4a command and (at least) msgmerge? Or perhaps should we add a --wrap-po parameter to other po4a commands similar to that of po4a command, and then somehow pass the argument down to the parsing modules and use it to decide how to wrap the output PO entries?
@jnavila
Copy link
Collaborator

jnavila commented Feb 7, 2021

These are excellent questions. I stumbled upon them while trying to unwrap the content for asciidoc. So, from what I understand:

  1. For me the $wrap parameter has no effect on the wrapping in the po file.

    po4a/lib/Locale/Po4a/Po.pm

    Lines 1096 to 1100 in b82481b

    =item B<wrap>
    boolean indicating whether we can consider that whitespaces in string are
    not important. If yes, the function canonizes the string before looking for
    a translation, and wraps the result.
    the content is "unwrapped" for looking for the key, but it is rewrapped afterwards in the target file. For asciidoc modules for instance, except verbatim blocks, no paragraphs are wrapped in the output, because we could introduce subtle errors in the output file by unattended wrapping (when the line cutting happen to put some things that could be interpreted as keywords at the beginning of lines). So I had to set the $wrap parameter but manually unwrap the result.
  2. For better interaction with version control systems and other tools, I think it's better to not wrap the po file, and stick to this policy in all tools. Otherwise, you can generate bogus changes by wrapping differently between tools. The po file is a structured file that does not really need presentation tweaks.

@PhuNH
Copy link
Contributor Author

PhuNH commented Feb 14, 2021

Thank you for your answer, but I'm afraid we are talking about two different functions. The $wrap parameter you referred to is in the gettext function which as I understand is in the direction of "from PO files out", while my concern is about the translate function which, again as I understand, is in the direction of "into PO files".

The issue I'm facing is with the translation of a KDE website, well it's not really an issue or a problem, just an annoyance: translation commits do not only contain translation updates but also changes in how messages are wrapped. So far there's no problem for the content with this behavior, but the annoyance led me to find out how po4a works and to open this issue. Here is how it is: before I started using po4a for the translation of the website (timeline.kde.org), there was no no-wrap flag anywhere in the KDE l10n system, everything is wrapped by default; then when I used po4a for the website:

  • In the POT file created by po4a-gettextize, let's pick an entry as an example: the entry starting from line 45 is wrapped, but has the no-wrap flag;
  • Let's use the Vietnamese translation as an example: the translation update was made using Lokalize, so the msgstr (line 54) was also wrapped (this is the default behavior of Lokalize), and the msgid (line 48) was still wrapped, not changed yet;
  • In the early morning of the next day, the system ran the procedure to update PO files, msgmerge is used to merge updated POTs to POs, another commit was generated, although there was no change in the POT file, only because the entry has the no-wrap flag, so both msgid and msgstr were changed to be not wrapped.

I have tried running msgmerge with entries that don't have the no-wrap flag, the result was as I described above: both msgid and msgstr were still wrapped.

Where does this no-wrap flag come from? Take that particular entry for example, it comes from this YAML file. The po4a's Yaml module uses the translate function with 'wrap' => 0 to put the message into a POT file, which handles it by calling the push function of the Po module with the same argument passed as wrap, which finally puts no-wrap in the PO entry. Here I just want to describe the problem again in case my previous description was not clear enough.

If I could choose to wrap or not to wrap the PO file, I would myself choose not to wrap as well. However, the system has already worked like that, it would be nice if that can be supported.

@jnavila could you also tell why you had to set the $wrap parameter in your gettext function? I suppose your PO file was wrapped, wasn't it? Because if it wasn't, whitespaces should be important and $wrap should not be set, I think.

@mquinson
Copy link
Owner

Hello, thanks for your interest.

I have the feeling that you are right and that the source code is very intermixed between wrapping the content and wrapping the po-file.

I think that we should sort things out. First the wrap parameter of translate and friends should be renamed wrapctn for clarity, and this should have no impact on the PO file wrapping, only on the content wrapping. This makes sense because format modules should not be concerned with whether the PO files should be wrapped or not.

Then, no-wrap would only be added to all entries if --wrap-po no is provided on the command line.

Patches going to that direction would be more than welcome.

@jnavila
Copy link
Collaborator

jnavila commented Feb 18, 2021

Replying to @PhuNH and elaborating on @mquinson, I would say there are 3 types of wrapping flags:

  • Is preserving the format of the input strings critical? Or is it simply possible to unwrap input? This is particularly useful when the input is supposed to be kept verbatim and that the translations must preserve the lines in a given segment.
  • If a segment does not require to preserve formatting, then the lines should be unwrapped in the segment, so that the translators understand that this is just a long run of text that needs to be translated with no particular formatting. But then, when the translated text is put back in the target files, do we want to apply line-wrapping? This is not the case for all formats for which the syntax can rely on lines parsing, because wrapping could have devasting effects. For Asciidoc, for instance, blind line-wrapping can create some lines which would be understood by the markup tool as "reserved" forms (list bullets, …) and introduce differences in asciidoc formatting between the source and the translated content. See Asciidoc: add an option to prevent line wrapping #242
  • Then there is a question: do we want to format the po-file with wrapping. To me, this is never a good idea to wrap, simply because it breaks the line-by-line diffing algorithm in version control systems. A small change at the beginning of a paragraph could trigger a complete reflow of lines with different wrapping in the po file and defeat simple diff viewing with for instance the option "--word-diff" which is very useful. Beyond this single point, there is more to do to bring a "canonical" format of po files that ensures that po files with equal translations are effectively identical; in this matter, there is still the issue with empty strings used to "prettify" the files, eg:
msgid "a very long string"

versus

msgid ""
"a very long string"

the po file format is quite plastic but this is a drawback for management in version control systems which are not aware of its semantic.

@PhuNH
Copy link
Contributor Author

PhuNH commented Feb 22, 2021

Thank you both very much for providing instructions. I'm not entirely sure that I can handle this, because, as I understand for now, this will involve with every format module of po4a, which is quite a lot of work, but I will try to look into doing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants