Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON schema validation proposal #5852

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from
Draft

Conversation

mvds00
Copy link
Contributor

@mvds00 mvds00 commented Jan 6, 2024

add test/cli/test-json.py: Schema validation for JSON files.

test-json.py uses jsonschema (https://pypi.org/project/jsonschema/) to validate JSON files against schemas, optionally with custom format checkers.

All JSON files in the project must be accounted for in test-json.schema_mapping(), either as a schema, or as a file to be validated against a schema. The JSON schemas are implicitly validated against the most recent metaschema available in jsonschema.

This commit adds schemas for JSON addon files and for namingng config files.

TODO:

  • runtime schema validation done
  • remove current ad-hoc validation done
  • generate documentation from schemas

Feedback is more than welcome.

test-json.py uses jsonschema (https://pypi.org/project/jsonschema/) to validate
JSON files against schemas, optionally with custom format checkers.

All JSON files in the project must be accounted for in
test-json.schema_mapping(), either as a schema, or as a file to be validated
against a schema. The JSON schemas are implicitly validated against the most
recent metaschema available in jsonschema.

This commit adds schemas for JSON addon files and for namingng config files.

TODO:
- runtime schema validation
- remove current ad-hoc validation
- generate documentation from schemas
@mvds00 mvds00 force-pushed the pullreq14 branch 2 times, most recently from ed34bb0 to 022a7a8 Compare January 7, 2024 00:32
For JSON schema validation in cpp code, valijson
(https://github.com/tristanpenman/valijson) is added. It supports picojson and
is actively maintained. It is used in one-header-include style, because it
seems the simplest solution, for now at least. The header is added, alongside
the LICENSE file, in externals/valijson/.

Various warnings that are emitted compiling the default code generated by
valijson are fixed; these fixes are pushed upstream.

As JSON is now validated against a schema, various ad-hoc checks could be
safely removed.

A simple way to test addon JSON schema validation, is to invoke cppcheck using:

   cppcheck --addon='{"script":1}' test.c

A minor bugfix is that the file specified by "executable" was searched using
getFullPath() with fileName as the second argument; it looks like this should
have been exename.

The fileName argument of parseAddonInfo() is only used for log messages, and is
now replaced by the string "inline JSON" instead of the actual JSON string that
was parsed. The command as given above will therefore output:

   Loading inline JSON failed. JSON schema validation failed: <root> [script]
   Value type not permitted by 'type' constraint., <root> Failed to validate
   against schema associated with property name 'script'.

The addon JSON unit tests are also updated and now operate on JSON specified on
the command line.
Copy link
Owner

@danmar danmar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know. I am not against validating json in theory.. and you seem to have found a reasonable library. the schema is some standard format I assume?

the output messages are more convoluted as far as I see.

on the other hand we do need to improve our json validation..



def test_addon_json_invalid_args_1(tmpdir):
__test_addon_json_invalid(tmpdir, json.dumps({'args':0}), "'args' must be an array.")
__test_addon_json_invalid(tmpdir, json.dumps({'args':0}), "JSON schema validation failed: <root> [args] Value type not permitted by 'type' constraint., <root> Failed to validate against schema associated with property name 'args'., <root> Missing required property 'script'.")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm.. in my humble opinion the old output is more clear

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fully agree, and think we can improve a lot on error reporting here. At this time various (redundant) issues are reported for this case:

  • "script" key is missing
  • "args" has the wrong type
  • "args" failed to verify
  • schema verification failed

I'm pretty sure this can be simplified to something way shorter, I'll look into it.

@mvds00
Copy link
Contributor Author

mvds00 commented Jan 8, 2024

I don't know. I am not against validating json in theory.. and you seem to have found a reasonable library. the schema is some standard format I assume?

Yes, this seems to be a (de facto) standard, see https://json-schema.org/specification

the output messages are more convoluted as far as I see.

They are as it stands now, but that is not a function of having schema validation. Rather it is a function of the validator.

on the other hand we do need to improve our json validation..

I think so too. The project will be more robust as all embedded JSON is validated. And after validating, the code can safely assume all is according to spec, reducing the amount of checking and error handling/reporting code.

@firewave
Copy link
Collaborator

firewave commented Jan 9, 2024

Just some comment via looking at the implementation yet.

The validation is definitely fine in any way of the static configs.

In terms of the addon output we need to consider a potential performance impact compared to manual validation (which does not yet fully exist). It will surely be a drop into the ocean compared to what valueflow is doing but I still would like to have the fastest path possible for the actual core performance.

But maybe in that case it should be (also) implemented within the addon so it will make sure that it will only output valid data. Some addons also output debug messages we need to suppress so maybe the whole output should be piped through a layer and disallowing all print and logging (well - we could leverage that framework at some point) usage.

I won't dig too deep into it yet as the executor and language stuff as well as all other local changes are ahead of it on my list ... still.

@mvds00
Copy link
Contributor Author

mvds00 commented Jan 9, 2024

In terms of the addon output we need to consider a potential performance impact compared to manual validation (which does not yet fully exist). It will surely be a drop into the ocean compared to what valueflow is doing but I still would like to have the fastest path possible for the actual core performance.

For this reason I didn't address that yet. To me a first logical step would seem to make the addon output one big JSON, not a set of JSON objects on separate lines.

But maybe in that case it should be (also) implemented within the addon so it will make sure that it will only output valid data. Some addons also output debug messages we need to suppress so maybe the whole output should be piped through a layer and disallowing all print and logging (well - we could leverage that framework at some point) usage.

That would be less efficient as Python is way slower than C++ in almost every respect (unless we find a Cythonized JSON validator).

Maybe have one JSON object on stdout, and warnings/errors on stderr? This will require a rewrite of the code that executes processes, as the popen() call used in CppCheckExecutor::executeCommand() only captures stdout. This would be an improvement in itself.

I won't dig too deep into it yet as the executor and language stuff as well as all other local changes are ahead of it on my list ... still.

I would like to keep the interface between addon and cppcheck as-is for now, and first finish what's on the table now. A few updates to this PR are underway...

@firewave
Copy link
Collaborator

firewave commented Jan 9, 2024

For this reason I didn't address that yet. To me a first logical step would seem to make the addon output one big JSON, not a set of JSON objects on separate lines.

As source files might be several megabytes that doesn't seems like a good idea as that would balloon the memory usage. Especially since we already have several caches which store these (still working on getting rid of those).

That would be less efficient as Python is way slower than C++ in almost every respect (unless we find a Cythonized JSON validator).

We are not using a very fast JSON library on the C++ side either. Also vanilla Python is making major strides in reclaiming the performance lost since 2.7.

That actually brings up an interesting point. We could pre-compile the addons for better performance.

I would like to keep the interface between addon and cppcheck as-is for now, and first finish what's on the table now. A few updates to this PR are underway...

Same. If we change it we need to provide backwards compatibility as people might be running their own addons. We should probably just introduce a new interface if we plan any changes.

@mvds00
Copy link
Contributor Author

mvds00 commented Jan 11, 2024

I would like to keep the interface between addon and cppcheck as-is for now, and first finish what's on the table now. A few updates to this PR are underway...

Same. If we change it we need to provide backwards compatibility as people might be running their own addons. We should probably just introduce a new interface if we plan any changes.

newline-separated-json-objects is actually an (informal) format, called JSONL. Sounds like something that is easy to formalize/verify/unittest in the interface between addons and cppcheck. Debugging information could still be emitted between objects without violating , e.g. as a quoted string or as an object with a defined structure.

@firewave
Copy link
Collaborator

heads up: #5870 will change the naming of the pytest-based test files

I still haven't looked at this. I currently am in the midst of finished up something else as will take a look afterwards.

@mvds00
Copy link
Contributor Author

mvds00 commented Jan 16, 2024

heads up: #5870 will change the naming of the pytest-based test files

I still haven't looked at this. I currently am in the midst of finished up something else as will take a look afterwards.

Thanks. There are some additional conflicts to solve, plus some additional improvements that I will push in the coming days.

@firewave firewave marked this pull request as draft January 16, 2024 11:47
@firewave
Copy link
Collaborator

Sorry for the late reply.

I hope to take a proper look this week. What about the additional changes you wanted to push?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants