Skip to content

Case Study: from inference to verification

pkoppstein edited this page Aug 21, 2019 · 1 revision

Here is an example JSON document taken from the wild:

curl -so near_earth_asteroids.json  https://data.nasa.gov/resource/2vr3-k9wn.json

As of this writing, the file contains an array of 202 JSON objects.

The schema for the objects as inferred by the schema.jq module is:

   {
   "designation": "string",
   "discovery_date": "string",
   "h_mag": "string",
   "i_deg": "string",
   "moid_au": "string",
   "orbit_class": "string",
   "period_yr": "string",
   "pha": "string",
   "q_au_1": "string",
   "q_au_2": "string"
   }

Let's copy this schema into a file:

PREFIX=near_earth_asteroids
jq 'include "schema"; schema' $PREFIX.json > PREFIX.schema.json

Next we can run the JESS script to determine whether each object in the data array actually includes all the keys in the schema.

Since the data file ($PREFIX.json) contains an array, it would be appropriate to run the JESS script with the --array option, like so:

JESS --array --schema $PREFIX.schema.json $PREFIX.json 

The output begins with a mismatch message:

"Schema mismatch #1 at <stdin>:611: entity #51:"

This message indicates that the 51st item in the array does not match the schema. This is because this particular object is the first (of many) to lack some of the keys in the schema.

If we want to use the inferred schema in a more relaxed fashion, that is, by not requiring that all the keys be present, we will either have to modify it, or use it slightly differently.

The JESS script includes a command-line option --relax for this purpose:

JESS --array --relax --schema $PREFIX.schema.json $PREFIX.json 

Under the hood, this relaxes the given schema ($schema) by changing it into a "::<=" constraint, so that the underlying invocation of jq is as follows:

 jq -n --argfile schema $PREFIX.schema.json '
   include "JESS"; check(inputs[]; ["&", {"::<=": $schema}])' $PREFIX.json

(You can tell the JESS script to reveal how it invokes jq by using the -v command-line option.)

An alternative would be to modify the file containing the schema, e.g. by wrapping the JSON object as shown in the include "JESS" line above.

Clone this wiki locally