Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Template validator ? #362

Open
rycks opened this issue Nov 11, 2021 · 3 comments · May be fixed by #481
Open

Template validator ? #362

rycks opened this issue Nov 11, 2021 · 3 comments · May be fixed by #481

Comments

@rycks
Copy link
Contributor

rycks commented Nov 11, 2021

Hello all,
i would like to run i2d with something like --check-template mytemplate.yml just for

  1. yml syntax validation
  2. grammar validation (fields ok / nok, i2d version compatibility)
  3. regexp validity (for example if you put a "/" or something like regex parser can't understand)
  4. missing fields warning (for example "there is no amount regex in this yml"

That sort of tool can really help us making good yml files ... and "debug" yml files

@bosd
Copy link
Collaborator

bosd commented Sep 26, 2022

I would say that an template validator is a welcome addition.
In the current situation one template file with a wrong syntax can break parsing of all invoices.

I would also like to preventing to merging breaking templates.
Therefore I propose to implement a validator on github actions.
(with preferrably a mechanism someone could run locally).

There could be a basic implementation.

  • Checking on tabs

Or an advanced configuration.

  • Validating the keys, regexes
  • Check if the correct syntax is used for calling the line parser.

I've made setup some tests:

The simple version is very straight forward to implement.

The advanced check with Yamale is propably the way to go.
Or is it too complicated? Too error prone?

At first glance, yamale does not provide regex validation on the value field.
example:

fields:
  amount:
    parser: regex
    regex: 'Total[:]\s+(\d+\.\d+)EUR'
    type: float

For complete validation we should catch the key regex and run re.compile('Total[:]\s+(\d+\.\d+)EUR') Which should pass.
bad example which should fail: re.compile('Total:\s+(\d+\.\d+)EUR')

@rmilecki What is your opinion on this?

@rmilecki
Copy link
Collaborator

I'm all for it.

My YAML experience comes from Linux kernel which uses YAML for defining syntax for Device Tree bingings. It's absolutely great for validation. FWIW Linux kernel uses yamllint.

I think we should definitely work on some more advanced validation. Trivial validation like checking for tabs / indentions isn't woth much in my opinion.

That would be great to verify things like:

  1. Correct formats (e.g. verify that issuer is a string)
  2. Required fields (e.g. parser: static requires value field)
  3. RegEx syntax (make sure fields like regex, begin, end, line contain valid RegEx-es)

@bosd
Copy link
Collaborator

bosd commented Mar 18, 2023

Regards to my earlier comment I'm debating if we should do this in a sepearete github actions.

Currently all the checks mentioned in this issue are already implemented in the code of invoice2data.
Templates with rudemental syntax errors are not loaded.

The validation of the template's syntax is tested in the according submodules. (line parser, regex, tables).
When using as a lib, the errors are presented to the user, as example in #490 (comment):

When using it from the cli one can add the --debug flag to get the ouput in the logger.

Detailed feedback

yml syntax validation

on it's way in this code: https://github.com/invoice-x/invoice2data/pull/471/files#diff-0b2a5a5cdf14a3c7ca6389c5b2901bd020bd83ff9ba401c19d18155b436c07d0R71-R72

grammar validation (fields ok / nok, i2d version compatibility)

grammar validation is not implemented. Not usefull for the keys, as we have no standard field names we endorse currently. Mistakes in for example group fields group: happy will result in.

logger.warning("Unsupported grouping method: " + settings["group"])

regexp validity (for example if you put a "/" or something like regex parser can't understand)

Example with the invalid regex validation ( closing bracket missing):

  replace:
    - ['(?P<price_ltr>\d{3}', '\g<price_ltr>L)']

When this happens, the logger reads:
Error has occured missing ), unterminated subpattern at position 0
I agree, the error message is not perfect. But at least it gives an indication what to look for.

missing fields warning (for example "there is no amount regex in this yml"

The amount is by default a required field. This feedback will be in the logger:

Unable to match all required fields. The required fields are: ['date', 'amount', 'invoice_number', 'issuer']. Output contains the following fields: ['partner_zip', 'line_tax_percent', 'payment_method', 'lines', 'telephone', 'amount_tax', 'date', 'vat', 'partner_city', 'btwtype', 'country_code', 'invoice_number', 'partner_website', 'currency', 'issuer', 'partner_name', 'amount_untaxed'].

After PR #491
it will be a bit more user friendly and output:
Missing required field(s): {'amount'}

Required fields (e.g. parser: static requires value field)

Implemented in the static parser:

if "value" not in settings:
logger.warning("Field \"%s\" doesn't have static value specified", field)

an unknown parser is specified

Implemented in:

else:
logger.warning("Field %s has unknown parser %s set", k, v["parser"])
else:
logger.warning("Field %s doesn't have parser specified", k)

RegEx syntax (make sure fields like regex, begin, end, line contain valid RegEx-es)

  lines:
    parser: lines
    rules:
      - start: (BTW type

will result in the logger as:
error has occured missing ), unterminated subpattern at position 0

correct formats (e.g. verify that issuer is a string)

This is not implemented yet. An separate issue is in #485

Conclusion
The basic regex, validation checks are already implemented in the code of the submodules. However not all if provided as feedback to the user, because of a missing try except block in the main.py file.
PR #481 adds those and is required to get some of the feedback mentioned in this post.

https://github.com/invoice-x/invoice2data/pull/481/files#diff-f22425fc7344f0d0c87e3f48dc7edd1ff8b9c9f75634153d849be75c2bccd706R88-R93

Let's close this issue after that pr is merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants