msgmerge removes all unknown flags (in particular markdown-text) #470

NightTsarina · 2024-02-13T22:37:20Z

Hi,

For a website using weblate and po4a to manage translations, I am using po4a-updatepo (I have avoided the po4a tool due to it being difficult to integrate with make and the need to update the config file each time a new document is added) to both create and update the PO files after changes to the source files.

What I noticed today is that the resulting PO files are different when first created than when updated later. In particular, the markdown-text is lost when they are updated. This creates extra git conflicts, and I presume this affects how Weblate interprets the strings.

As extra context, I see in the test fixtures the tag is only present in POT files, but I am not using POT files at all.
I might be making some mistake here, but po4a-updatepo does not seem to use POT files except as temporary files, and so this also makes the whole workflow much more complicated, since each change to the source document needs to update all PO files, which usually causes conflicts with translations not yet merged to the main branch.

Is this a bug, or am I using the tools incorrectly? Sorry this looks like a support question, but I have read and re-read the documentation many times, and I have not found an answer.

Thanks.

The text was updated successfully, but these errors were encountered:

mquinson · 2024-02-13T23:14:14Z

Could you please extend on how po4a makes your life more complex, please? Shouldn't this paragraph solve these issues?

Once setup, invoking po4a is enough to update both the translation PO files and translated documents. You may pass the "--no-translations" to po4a to not update the translations (thus only updating the PO files) or "--no-update" to not update the PO files (thus only updating the translations). This roughly corresponds to the individual po4a-updatepo and po4a-translate scripts which are now deprecated (see ``Why are the individual scripts deprecated'' in the FAQ below).

A link to your project would help me debugging the situation, if possible.

NightTsarina · 2024-02-14T14:51:27Z

Hi,

Could you please extend on how po4a makes your life more complex, please? Shouldn't this paragraph solve these issues?

The main 2 issues are:

I need to re-generate the config file each time a new document is added.
I cannot do incremental builds, as it does not allow me to process one file at a time.

For that reason I opted to use the deprecated scripts, which I was able to integrate in make. But I would be willing to change this if it fixes the problems I am having

Once setup, invoking po4a is enough to update both the translation PO files and translated documents. You may pass the "--no-translations" to po4a to not update the translations (thus only updating the PO files) or "--no-update" to not update the PO files (thus only updating the translations). This roughly corresponds to the individual po4a-updatepo and po4a-translate scripts which are now deprecated (see ``Why are the individual scripts deprecated'' in the FAQ below).

A link to your project would help me debugging the situation, if possible.

Sure, luckily the project was made public a couple of months ago: https://gitlab.com/securityinabox/securityinabox.gitlab.io/
You can find all the po4a stuff in the Makefile, but it is not very easy to read as I had to use macros.

mquinson · 2024-02-14T15:23:35Z

The first issue seems related to #272 right? For the second one, I'll have to investiguate your project a bit. No worry, I speak makefile :)

NightTsarina · 2024-02-14T19:36:15Z

For the first issue: yes, globbing would solve that problem
For the second issue: thanks a lot, I really appreciate the help!

Now, this issue in particular (PO files changing format when updated).. is it a bug, or a result of me not using POT files? If it is not a bug, is there a way to do this properly with po4a-update?

mquinson · 2024-02-14T23:24:42Z

I have no idea :)

Fat-Zer · 2024-02-15T12:38:50Z

I had some difficulties reproducing it, so here is some more clear steps:

$ cat >foo.md <<EOF
Hello World
===========

EOF
$ rm -f en.po && po4a-updatepo -f text -o markdown -m foo.md -p en.po
$ tail -n 5 en.po
#. type: Title =
#: foo.md:2
#, markdown-text, no-wrap
msgid "Hello World"
msgstr ""

Notice that Hello World string has markdown-text flag.

Now edit the file:

$ sed -i 's/World/world/' foo.md
$ po4a-updatepo -f text -o markdown -m foo.md -p en.po
$ tail -n 5 en.po
#. type: Title =
#: foo.md:2
#, no-wrap
msgid "Hello world"
msgstr ""

The markdown-text flag is gone

po4a doesn't seems to add the flag at all.

The flag was introduced by #208, and it looks like a bug in both: po4a and po4a-updatepo (when updating a file).

Fat-Zer · 2024-02-15T13:12:31Z

Digging a bit further: it actually getting removed by msgmerge -U... I don't know if there anything could be done on po4a's part about that...

mquinson · 2024-02-16T14:10:24Z

I'm starting to wonder whether we should reimplement msgmerging in Perl directly. Sounds like a nightmare to do, but working around msgmerge issues is also demanding...

NightTsarina · 2024-02-22T00:29:37Z

I wonder if anybody is actually maintaining GNU gettext.. There is a bug open about this for more than 2 years with no replies, as well as reports in the mailing list:

On the flip side, pot2po from Template Toolkit seems to do this right:

$ cat >foo.md <<EOF
Hello World
===========

EOF

$ rm -f foo.pot && po4a-updatepo -f text -o markdown -m foo.md -p foo.pot
$ cp foo.pot foo.es.po

$ sed -i 's/msgstr ""/msgstr "Hola Mundo"/' foo.es.po

$ tail -n 5 foo.es.po
#. type: Title =
#: foo.md:2
#, markdown-text, no-wrap
msgid "Hello World"
msgstr "Hola Mundo"

$ sed -i 's/World/world/' foo.md

$ rm -f foo.pot && po4a-updatepo -f text -o markdown -m foo.md -p foo.pot

$ pot2po -t foo.es.po foo.pot foo.es.po

$ tail -n 5 foo.es.po
#. type: Title =
#: foo.md:2
#, fuzzy, markdown-text, no-wrap
msgid "Hello world"
msgstr "Hola Mundo"

mquinson · 2024-02-22T09:46:14Z

On the flip side, pot2po from Template Toolkit seems to do this right:

That's very interesting. I tried to dig a bit, searching for the piece of code doing the msgmerge, but couldn't find the right file in their source. Just for the reccord, that's the fuzzy matching that is somewhat difficult to write and thus interesting to read. Thanks for your investigation, Mt

mquinson · 2024-02-25T20:53:43Z

I simply cannot find my path in the Template Toolkit source code .Could someone direct me to the right location where the fuzzy matching is done? Thanks in advance,

NightTsarina · 2024-02-26T13:05:21Z

I am sorry, my memory betrayed me (I used to use Template::Toolkit a lot back in my Perl days 😂), I meant translate toolkit

NightTsarina · 2024-02-26T13:19:22Z

The conversion is done in this file: https://github.com/translate/translate/blob/master/translate/convert/pot2po.py but I am having trouble following the many layers of abstraction they use..

mquinson · 2024-03-01T00:33:18Z

This is here: https://github.com/translate/translate/tree/master/translate/search
A Levenshtein distance is used, with some tricks to speed things up. I need to read that further to see if we too could do without msgmerge.

mquinson · 2024-03-01T11:38:07Z

The more I think about it, the less I think we should remove our dependency on gettext. Maybe we should fix gettext for others to enjoy it too.

NightTsarina · 2024-03-01T15:46:58Z

This is here: https://github.com/translate/translate/tree/master/translate/search A Levenshtein distance is used, with some tricks to speed things up. I need to read that further to see if we too could do without msgmerge.

Yesterday I ran msgmerge and pot2po on a bunch of outdated PO files, and the merging was equivalent. But I found 2 other problems in pot2po:

Does not allow changing the wrapping setting (the code is there, but no CLI option)
Loses previous translation strings (#|)

The more I think about it, the less I think we should remove our dependency on gettext. Maybe we should fix gettext for others to enjoy it too.

It looks to me like Translate Toolkit has the potential to become the po2a replacement at some point, but it does not seem to be there yet. Fixing gettext would be the ideal solution, but from those bug reports I am not holding much hope. A workaround for now could be to re-add the flags in po4a after gettext runs?

mquinson · 2024-03-04T13:44:57Z

I looked at the gettext code, and the fuzzying code is much more efficient and advanced than a simple Levenshtein distance. They use something about ngram which I don't quite understand, but which seems to be the state-of-the-art for fuzzy text matching.

Someone should fix gettext, that'd be so much better :(

NightTsarina · 2024-03-06T19:24:08Z

FYI, I wrote a hacky script to copy the missing flags after running msgmerge. Of course, each tool does things slightly different, so polib is wrapping things a bit differently, but it is useable:

#!/usr/bin/env python3

# PO fix-up: copy missing flags from POT and re-wrap.

import argparse

import polib


# Copy flags from POT file, as GNU gettext drops any custom flags.
def copy_flags(pot, po):
  potentries = {
      entry.msgid_with_context: entry for entry in pot
  }
  for poentry in po:
    potentry = potentries.get(poentry.msgid_with_context)
    if not potentry:
      continue

    pofuzzy = poentry.fuzzy
    potflags = set(potentry.flags) - {'fuzzy'}
    poflags = set(poentry.flags) - {'fuzzy'}
    missing = potflags - poflags
    if newflags := poflags - potflags:
      print('Unexpected new flags in file %s: %s' % (po.fpath, newflags))

    poentry.flags.extend((
      flag for flag in potentry.flags if flag not in poflags))
  return po


def main():
  parser = argparse.ArgumentParser(
      description='Copy flags from POT file and re-wrap')
  parser.add_argument(
      'pofile', metavar='DEST',
      help='PO file to copy flags to')
  parser.add_argument(
      'potfile', metavar='TEMPLATE',
      help='template to copy flags from')
  parser.add_argument(
      '--wrap', metavar='N', type=int, default=77,
      help='wrap lines after N columns')

  args = parser.parse_args()

  pot = polib.pofile(args.potfile)
  po = polib.pofile(args.pofile, wrapwidth=args.wrap)
  copy_flags(pot, po)
  po.save()


if __name__ == '__main__':
  main()

mquinson changed the title ~~Different output when creating or updating PO files~~ msgmerge removes all unknown flags (in particular markdown-text) May 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

msgmerge removes all unknown flags (in particular markdown-text) #470

msgmerge removes all unknown flags (in particular markdown-text) #470

NightTsarina commented Feb 13, 2024

mquinson commented Feb 13, 2024

NightTsarina commented Feb 14, 2024

mquinson commented Feb 14, 2024

NightTsarina commented Feb 14, 2024

mquinson commented Feb 14, 2024

Fat-Zer commented Feb 15, 2024 •

edited

Loading

Fat-Zer commented Feb 15, 2024

mquinson commented Feb 16, 2024

NightTsarina commented Feb 22, 2024

mquinson commented Feb 22, 2024 via email

mquinson commented Feb 25, 2024

NightTsarina commented Feb 26, 2024

NightTsarina commented Feb 26, 2024

mquinson commented Mar 1, 2024

mquinson commented Mar 1, 2024

NightTsarina commented Mar 1, 2024

mquinson commented Mar 4, 2024

NightTsarina commented Mar 6, 2024

msgmerge removes all unknown flags (in particular markdown-text) #470

msgmerge removes all unknown flags (in particular markdown-text) #470

Comments

NightTsarina commented Feb 13, 2024

mquinson commented Feb 13, 2024

NightTsarina commented Feb 14, 2024

mquinson commented Feb 14, 2024

NightTsarina commented Feb 14, 2024

mquinson commented Feb 14, 2024

Fat-Zer commented Feb 15, 2024 • edited Loading

Fat-Zer commented Feb 15, 2024

mquinson commented Feb 16, 2024

NightTsarina commented Feb 22, 2024

mquinson commented Feb 22, 2024 via email

mquinson commented Feb 25, 2024

NightTsarina commented Feb 26, 2024

NightTsarina commented Feb 26, 2024

mquinson commented Mar 1, 2024

mquinson commented Mar 1, 2024

NightTsarina commented Mar 1, 2024

mquinson commented Mar 4, 2024

NightTsarina commented Mar 6, 2024

Fat-Zer commented Feb 15, 2024 •

edited

Loading