Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Corrupted Docx Result (Template: Ms Word 2010-365) #558

Open
rrrr98 opened this issue Aug 12, 2024 · 8 comments
Open

Corrupted Docx Result (Template: Ms Word 2010-365) #558

rrrr98 opened this issue Aug 12, 2024 · 8 comments
Labels

Comments

@rrrr98
Copy link

rrrr98 commented Aug 12, 2024

Describe the bug

The bug happen when I try to use docxtpl (0.18 - latest) to render the docx template file (Ms Word 2010-365 Document).
The file is generated but can't be opened using Libreoffice (24.2.5 - released) - says corrupted (see below).

Last known docxtpl working version: 0.16.4

To Reproduce

  1. Open and save any new .docx file using Libreoffice as Word 2010-365 Document (.docx). May add any jinja template as desired - mine is {{ test }}
    image
    image

  2. Install docxtpl - pip install docxtpl==0.18.0

  3. Write these simple script (main.py)

from docxtpl import DocxTemplate

# saved with libreoffice as Ms Word 2010 / 365 format
template = DocxTemplate("./input.docx")

# any context
template.render({
  "test": "Sample Text"
})

template.save("./output.docx")
  1. Run python main.py
  2. These errors should appear (please ignore the mac pathing, it's similar on Windows also):
/opt/homebrew/Cellar/[email protected]/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/zipfile/__init__.py:1598: UserWarning: Duplicate name: 'docProps/core.xml'
return self._open_to_write(zinfo, force_zip64=force_zip64)

Expected behavior

As of 0.16.4 (last version working), there's no error regarding duplicated core.xml. On 0.16.5 version onward, the core.xml is duplicated. You can open the docx by renaming it to zip, see the docProps/core.xml, should be 2 there. I'm guessing this causing the error

Screenshots

image

@rrrr98 rrrr98 added the bug label Aug 12, 2024
@bhavin-qryptal
Copy link

image
As seen here, document generated got two docProps\core.xml

Also, point to add, this issue happens ONLY when input file is generated using LibreOffice. It does not happen when input file is generated using Microsoft Word.

@elapouya
Copy link
Owner

Could you send me the same .docx saved from both MSWord and your libreoffice so I can see differencies

@rrrr98
Copy link
Author

rrrr98 commented Aug 13, 2024

These are the input file saved from Libreoffice - saved as Ms Word 2010-365 (input-libre.docx), and Ms Word - Windows (input-word.docx)
input-libre.docx
input-word.docx

Just in case, here's the output.docx (which contains duplicated core.xml, corrupted - executed with version 0.18.0):
output.docx

Output using input-word.docx:
output-word.docx

@elapouya
Copy link
Owner

I am using libreoffice 7.3.7 on Ubuntu 22.04 : no problem to open both output files
Begin to investiguate...

@elapouya
Copy link
Owner

The main difference between 0.16.4 and 0.16.5 is a code managing core properties :

    def render_properties(self, context: Dict[str, Any], jinja_env: Optional[Environment] = None) -> None:
        # List of string attributes of docx.opc.coreprops.CoreProperties which are strings.
        # It seems that some attributes cannot be written as strings. Those are commented out.
        properties = [
            'author',
            # 'category',
            'comments',
            # 'content_status',
            'identifier',
            # 'keywords',
            'language',
            # 'last_modified_by',
            'subject',
            'title',
            # 'version',
        ]
        if jinja_env is None:
            jinja_env = Environment()

        for prop in properties:
            initial = getattr(self.docx.core_properties, prop)
            template = jinja_env.from_string(initial)
            rendered = template.render(context)
            setattr(self.docx.core_properties, prop, rendered)

The problem may come from that...

@elapouya
Copy link
Owner

Simple fact of getting actual docx template properties :

initial = getattr(self.docx.core_properties, prop)

Makes a duplicate for docProps/core.xml

This looks like a python-docx bug, not a docxtpl bug...

@elapouya
Copy link
Owner

This simple script using python-docx directly does the problem too :

from docx import Document

# Load or create a new document
doc = Document("input-libre.docx")

# Access the core properties
core_properties = doc.core_properties

# Modify the author property
core_properties.author = 'New Author'

# Optionally, modify other properties
core_properties.title = 'New Document Title'
core_properties.subject = 'New Subject'
core_properties.keywords = 'Keyword1, Keyword2'

# Save the document
doc.save('modified_document.docx')

I am getting :

/usr/lib/python3.10/zipfile.py:1519: UserWarning: Duplicate name: 'docProps/core.xml'_propoerties.py                                                                                                                 
  return self._open_to_write(zinfo, force_zip64=force_zip64)

Please, could open a bug to the python-docx project team ?

@bhavin-qryptal
Copy link

@elapouya Thank you very much for your investigation and sharing findings. I have posted your findings on an existing - similar reported open issue @ python-openxml/python-docx#1037

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants