Add xml info to pdf metadata2 #15

MAKOMO · 2024-01-06T15:24:54Z

This one is a conservative adjustment of the stalled PR #14 which among others fixes Issue #14 and tries to follow all of the reviewers valuable comments.

This PR fixes also the typo raised in Issue #13.

Hopefully this leads to a v2.3.1 release on PyPIP fixing the current version mismatch (Issue #12) which would make this lib way more useful without modifications.

While this PR lifts the restriction of the document type code to 380 (INVOICE) the DocumentType in the PDF XMP is still fixed to INVOICE. A proper handling would require the use of complete mapping of valid document type codes to corresponding DocumentTypes.

codecov · 2024-01-06T15:25:34Z

Codecov Report

Attention: Patch coverage is 94.23077% with 3 lines in your changes are missing coverage. Please review.

Project coverage is 90.95%. Comparing base (3171067) to head (6ad2822).

Files	Patch %	Lines
drafthorse/pdf.py	95.65%	2 Missing ⚠️
drafthorse/models/elements.py	75.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master      #15      +/-   ##
==========================================
+ Coverage   90.51%   90.95%   +0.43%     
==========================================
  Files          17       18       +1     
  Lines        1392     1360      -32     
==========================================
- Hits         1260     1237      -23     
+ Misses        132      123       -9

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

raphaelm · 2024-01-19T17:13:48Z

Sorry for the silence on this one, I will need to find some time to validate this against e.g. the Mustang test suite (if you did not already?) and I currently have limited time for this project. Will try to do it soon, though. Thanks for working on it!

MAKOMO · 2024-01-19T17:43:18Z

Sorry for the silence on this one, I will need to find some time to validate this against e.g. the Mustang test suite (if you did not already?) and I currently have limited time for this project. Will try to do it soon, though. Thanks for working on it!

If I remember correctly I got interested into this change initially as some generated PDF did not validate. The problem was reported by verapdf, but it could have been also Mustang. However, the current code is for sure wrong regarding Issue #14. The code changes proposed by the original PR seem to be taken from the facture-x lib.

Give me a bit time to get my validation chain fully functional and I can do more tests on this. I will report back.

MAKOMO · 2024-01-20T20:18:00Z

Not sure what you refer to exactly by Mustang test suite, but I did some initial tests on both, the current trunk and this PR version. The results were identical.

I took the sample file EN16931_Einfach.pdf from the current Factur-X Version 1.0.06 (ZUGFeRD v. 2.2) package, used Mustang to extract the XML and formulated it in drafthorse (attached) to generate the XML and ZUGFeRD PDF.

EN16931_Einfach.py.txt

There were zero (non white space) differences between the extracted XML and the drafthorse generated XML, which also validated using the kosit validator.

I validated the generated PDF using veraPDF

I validated the PDF and XML with Mustang CLI v2.10.0 which first failed as I used "EN16931" in attach_xml which resulted in 9 failed rules w.r.t. the urn:cen.eu:en16931:2017 profile from the XRechnung-CII-validation.xslt which should not be applied here to my understanding as this XML is not tagged to confirm with the XRechnung profile.

    <messages>
      <error type="12">XMP Metadata: ConformanceLevel contains invalid value</error> 
    </messages>

I fixed this to the correct "EN 16931" and the validation worked. However, the validator reported

<?xml version="1.0" encoding="UTF-8"?>

<validation filename="EN16931_Einfach.pdf" datetime="2024-01-20 19:52:30">
  <pdf> 
    <report> 
      <buildInformation> 
        <releaseDetails id="core" version="1.22.2" buildDate="2022-09-14T13:46:00+02:00"/>  
        <releaseDetails id="validation-model" version="1.22.2" buildDate="2022-09-14T13:47:00+02:00"/> 
      </buildInformation>  
      <jobs> 
        <job> 
          <item size="474918"> 
            <name>/Users/luther/Desktop/tmp/xRechnung/Apps/Mustang/EN16931_Einfach.pdf</name> 
          </item>  
          <validationReport profileName="PDF/A-3B validation profile" statement="PDF file is compliant with Validation Profile requirements." isCompliant="true"> 
            <details passedRules="124" failedRules="0" passedChecks="12482" failedChecks="0"/> 
          </validationReport>  
          <duration start="1705776751203" finish="1705776752485">00:00:01.282</duration> 
        </job> 
      </jobs>  
      <batchSummary totalJobs="1" failedToParse="0" encrypted="0" outOfMemory="0" veraExceptions="0"> 
        <validationReports compliant="1" nonCompliant="0" failedJobs="0">1</validationReports>  
        <featureReports failedJobs="0">0</featureReports>  
        <repairReports failedJobs="0">0</repairReports>  
        <duration start="1705776750990" finish="1705776752508">00:00:01.518</duration> 
      </batchSummary> 
    </report>  
    <info>
      <signature>unknown</signature>
      <duration unit="ms">2726</duration>
    </info>
    <summary status="valid"/>
  </pdf>  
  <xml>
    <info>
      <version>2</version>
      <profile>urn:cen.eu:en16931:2017</profile>
      <validator version="2.10.0"/>
      <rules>
        <fired>19</fired>
        <failed>9</failed>
      </rules>
      <duration unit="ms">4719</duration>
    </info>
    <messages>
      <notice type="27" location="/rsm:CrossIndustryInvoice/rsm:ExchangedDocumentContext[1]" criterion="ram:BusinessProcessSpecifiedDocumentContextParameter/ram:ID">Business process MUST be provided. [ID PEPPOL-EN16931-R001] from /xslt/XR_30/XRechnung-CII-validation.xslt)</notice>  
      <notice type="27" location="/rsm:CrossIndustryInvoice/rsm:SupplyChainTradeTransaction[1]/ram:ApplicableHeaderTradeAgreement[1]/ram:SellerTradeParty[1]" criterion="ram:URIUniversalCommunication/ram:URIID">Seller electronic address MUST be provided [ID PEPPOL-EN16931-R020] from /xslt/XR_30/XRechnung-CII-validation.xslt)</notice>  
      <notice type="27" location="/rsm:CrossIndustryInvoice/rsm:SupplyChainTradeTransaction[1]/ram:ApplicableHeaderTradeAgreement[1]/ram:BuyerTradeParty[1]" criterion="ram:URIUniversalCommunication/ram:URIID">Buyer electronic address MUST be provided [ID PEPPOL-EN16931-R010] from /xslt/XR_30/XRechnung-CII-validation.xslt)</notice>  
      <notice type="27" location="/rsm:CrossIndustryInvoice/rsm:SupplyChainTradeTransaction[1]/ram:IncludedSupplyChainTradeLineItem[1]/ram:SpecifiedLineTradeAgreement[1]/ram:GrossPriceProductTradePrice[1]" criterion="not(ram:ChargeAmount) or xs:decimal(../ram:NetPriceProductTradePrice/ram:ChargeAmount) = xs:decimal(ram:ChargeAmount) - xs:decimal(ram:AppliedTradeAllowanceCharge/ram:ActualAmount)">Item net price MUST equal (Gross price - Allowance amount) when gross price is provided. [ID PEPPOL-EN16931-R046] from /xslt/XR_30/XRechnung-CII-validation.xslt)</notice>  
      <notice type="27" location="/rsm:CrossIndustryInvoice/rsm:SupplyChainTradeTransaction[1]/ram:IncludedSupplyChainTradeLineItem[2]/ram:SpecifiedLineTradeAgreement[1]/ram:GrossPriceProductTradePrice[1]" criterion="not(ram:ChargeAmount) or xs:decimal(../ram:NetPriceProductTradePrice/ram:ChargeAmount) = xs:decimal(ram:ChargeAmount) - xs:decimal(ram:AppliedTradeAllowanceCharge/ram:ActualAmount)">Item net price MUST equal (Gross price - Allowance amount) when gross price is provided. [ID PEPPOL-EN16931-R046] from /xslt/XR_30/XRechnung-CII-validation.xslt)</notice>  
      <notice type="27" location="/rsm:CrossIndustryInvoice" criterion="rsm:SupplyChainTradeTransaction/ram:ApplicableHeaderTradeSettlement/ram:SpecifiedTradeSettlementPaymentMeans">[BR-DE-1] Eine Rechnung (INVOICE) muss Angaben zu "PAYMENT INSTRUCTIONS" (BG-16) enthalten. [ID BR-DE-1] from /xslt/XR_30/XRechnung-CII-validation.xslt)</notice>  
      <notice type="27" location="/rsm:CrossIndustryInvoice" criterion="rsm:SupplyChainTradeTransaction/ram:ApplicableHeaderTradeAgreement/ram:BuyerReference[boolean(normalize-space(.))]">[BR-DE-15] Das Element "Buyer reference" (BT-10) muss übermittelt werden. [ID BR-DE-15] from /xslt/XR_30/XRechnung-CII-validation.xslt)</notice>  
      <notice type="27" location="/rsm:CrossIndustryInvoice/rsm:ExchangedDocumentContext[1]" criterion="ram:GuidelineSpecifiedDocumentContextParameter/ram:ID = $XR-CIUS-ID or ram:GuidelineSpecifiedDocumentContextParameter/ram:ID = $XR-EXTENSION-ID">[BR-DE-21] Das Element "Specification identifier" (BT-24) soll syntaktisch der Kennung des Standards XRechnung entsprechen. [ID BR-DE-21] from /xslt/XR_30/XRechnung-CII-validation.xslt)</notice>  
      <notice type="27" location="/rsm:CrossIndustryInvoice/rsm:SupplyChainTradeTransaction[1]/ram:ApplicableHeaderTradeAgreement[1]/ram:SellerTradeParty[1]" criterion="ram:DefinedTradeContact">[BR-DE-2] Die Gruppe "SELLER CONTACT" (BG-6) muss übermittelt werden. [ID BR-DE-2] from /xslt/XR_30/XRechnung-CII-validation.xslt)</notice> 
    </messages>
    <summary status="valid"/>
  </xml>
  <messages></messages>
  <summary status="valid"/>
</validation>

The PDF metadata generated by this PR are somewhat simplified over the original PR (as viewed in Acrobat).

Take your time to review all of this (I know you are busy, I was just enjoying reading your excellent post on My little trade show – enterprise sales is magic (we experienced a very similar thing, thus with a much less professional looking result) and some of your other posts. Nice reads!), but I think the changes are an improvement over trunk and most importantly fix Issue #14.

Thanks for this lib!

drafthorse/__init__.py

drafthorse/pdf.py

…hema.py` generator)

…e and remove hard coded English language

… restriction to documents of type 380) to simplify the code and remove hard coded English language

…se of use bare `except`

…0 for an invoice." test

…ction to cover XRECHNUNG

…ions (logging-fstring-interpolation)

…on failure

…r used by PDF readers for blind people

raphaelm reviewed Mar 10, 2024

View reviewed changes

drafthorse/__init__.py Outdated Show resolved Hide resolved

drafthorse/pdf.py Outdated Show resolved Hide resolved

raphaelm and others added 21 commits March 10, 2024 16:39

Bump to 2.3.0

ca88132

PR pretix#7 Include XML information in PDF metadata (cmcproject)

a512671

update mustang validator

e7f4cb7

fixes fx:DocumentFileName / fx:DocumentType order

0e762bd

remove schemas/ZUGFeRD2p2_extension_schema.xmp (replaced by `xmp_sc…

29a209c

…hema.py` generator)

removes date and seller from pdf metadata subject to simplify the cod…

7a6486e

…e and remove hard coded English language

removes doc type date and seller from pdf metadata subject (incl. the…

9eb05e0

… restriction to documents of type 380) to simplify the code and remove hard coded English language

remove unused (now) unused constant INVOICE_TYPE_CODE and avoid the u…

75b0588

…se of use bare `except`

removes failing "Invalid doc type! XML value for TypeCode shall be 38…

30565e0

…0 for an invoice." test

allows to supply explicit profile level and extends profile auto dete…

5b2cabf

…ction to cover XRECHNUNG

minor code style improvements like lazy % formatting in logging funct…

e95c830

…ions (logging-fstring-interpolation)

fixes style (black)

6e961ca

tests of auto detecting a XRechnung v2 and v3 profiles

f4d2a12

blacking again

cf8af74

tests for en16931 auto profile recognition and auto profile recogniti…

2510362

…on failure

black again

0637c19

typo

84973c2

allow users to set custom pdf metadata and the PDF language identifie…

50d8fd0

…r used by PDF readers for blind people

black

41af560

spelling

aac0ad2

Update drafthorse/pdf.py

2fb3a1c

raphaelm force-pushed the Add_XML_info_to_PDF_metadata2 branch from c6f84d6 to 2fb3a1c Compare March 10, 2024 15:45

raphaelm mentioned this pull request Mar 10, 2024

Include XML information in PDF metadata #7

Closed

Run black

6ad2822

raphaelm merged commit 7e0faa8 into pretix:master Mar 10, 2024
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add xml info to pdf metadata2 #15

Add xml info to pdf metadata2 #15

MAKOMO commented Jan 6, 2024

codecov bot commented Jan 6, 2024 •

edited

Loading

raphaelm commented Jan 19, 2024

MAKOMO commented Jan 19, 2024

MAKOMO commented Jan 20, 2024

Add xml info to pdf metadata2 #15

Add xml info to pdf metadata2 #15

Conversation

MAKOMO commented Jan 6, 2024

codecov bot commented Jan 6, 2024 • edited Loading

Codecov Report

raphaelm commented Jan 19, 2024

MAKOMO commented Jan 19, 2024

MAKOMO commented Jan 20, 2024

codecov bot commented Jan 6, 2024 •

edited

Loading