Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add queries for validating data against CubiQL's expectations. #145

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

lkitching
Copy link
Contributor

Issue #127 - Add validation queries for use with rdf-validator which
checks a data source against the requirements CubiQL makes on data cubes.

The queries are templates which expect the relevant configuration values
to be provided when executed.

Add instructions to the README on running these validations.

Issue #127 - Add validation queries for use with rdf-validator which
checks a data source against the requirements CubiQL makes on data cubes.

The queries are templates which expect the relevant configuration values
to be provided when executed.

Add instructions to the README on running these validations.
@lkitching
Copy link
Contributor Author

@zeginis - Please could you try running these validation queries against your data and let me know if you have any problems?

@zeginis
Copy link
Contributor

zeginis commented Sep 25, 2018

@lkitching I checked the PR. The queries are ok. I have tested them on some data created by Table2qb and they pass the tests.

However I think we need some more tests to cover other CubiQL requirements:

  • There is a code list that contains ONLY the concepts used at the cube
  • There is a code list for each qb:DimensionProperty including qb:MeasureType
  • Maybe we also need a test for the language tags (e.g. @en, nil)

What do you think?

1. There is a code list that contains ONLY the concepts used at the cube
2.There is a code list for each qb:DimensionProperty
@zeginis
Copy link
Contributor

zeginis commented Sep 26, 2018

@lkitching I added 2 new SPARQL queries to support the CubiQL requirements I mentioned.

When I run them independently at the SPARQL endpoint they return no results -> they succeed
But when I use them at the validator they fail. Any idea why this happens?

The config I use:

{:geo-dimension-uri nil
 :time-dimension-uri nil
 :codelist-source "http://purl.org/linked-data/cube#ComponentSpecification"
 :codelist-predicate "http://publishmydata.com/def/qb/codesUsed"
 :codelist-label-uri "http://www.w3.org/2000/01/rdf-schema#label"
 :dataset-label-uri "http://www.w3.org/2000/01/rdf-schema#label"
 :schema-label-language nil
 :max-observations-page-size 2000}

I run the validator at: http://195.251.218.39:8893/sparql

@lkitching
Copy link
Contributor Author

@zeginis - Dimensions no longer need to specify a codelist - any dimensions which do not specify one and which are not ref area, ref period, string or decimal types are mapped to a String type in the schema and are submitted as typed literals within generated SPARQL queries.

Move the comments for the dimension validation queries to the end of
of the file. Comments before the query cause sesame to infer the
wrong query type (i.e. a graph query instead of a tuple query) which
results in the wrong accept headers being sent to the remote SPARQL
endpoint.
@lkitching
Copy link
Contributor Author

@zeginis - I've pushed a fix to the new queries to allow them to run as expected in rdf-validator. Comments currently need to go after the query so sesame infers the correct query type. This is effectively a bug in rdf-validator we need to fix.

@zeginis
Copy link
Contributor

zeginis commented Oct 1, 2018

@lkitching what do you meant they are mapped to a String ?

Can we use such dimensions without codelist to lock dimensions?

e.g.

{cubiql{
  dataset_earnings {
    title
    description
    observations(dimensions:{gender:ALL 
                             population_group:WORKPLACE_BASED 
                             measure_type:MEDIAN}) {    
     total_matches
  }}}}

@zeginis
Copy link
Contributor

zeginis commented Oct 2, 2018

@lkitching I tried using a dimension that has values URIs but there is no codelist defined.
CubiQL is not working properly:

  • when requesting the dimension values I get an empty list
  • I get an "Internal server error: exception" when requesting the observations.

@zeginis
Copy link
Contributor

zeginis commented Oct 2, 2018

You can try at the endpoint: http://195.251.218.39:8893/sparql

The dimension http://example.gr/hello/def/dimension/station_id has no codelist.

The configuration I use:

{:geo-dimension-uri nil
 :time-dimension-uri nil
 :codelist-source "http://purl.org/linked-data/cube#ComponentSpecification"
 :codelist-predicate "http://publishmydata.com/def/qb/codesUsed"
 :codelist-label-uri "http://www.w3.org/2000/01/rdf-schema#label"
 :dataset-label-uri "http://www.w3.org/2000/01/rdf-schema#label"
 :schema-label-language en
 :max-observations-page-size 2000}

@lkitching
Copy link
Contributor Author

@zeginis - I've pushed a fix for the exception to master, could you check the latest version fixes the issue for you?

@zeginis
Copy link
Contributor

zeginis commented Oct 8, 2018

@lkitching yes this works fine. Thank you

@zeginis
Copy link
Contributor

zeginis commented Oct 9, 2018

@lkitching I understand that it is not mandatory for the dimensions to have a codelist However, if a codelist for the usedCodes is defined, then it should contain all and only the used codes at the cube.

This is a common error we need to catch. The error occur at the transformation of data using Table2qb due to not matching URIs between the cube-pipeline and codelist-pipeline

@zeginis
Copy link
Contributor

zeginis commented Oct 9, 2018

@lkitching I removed the query that checks if each dimension has a codelist. I left the other query that checks the dimensions that have a codelist if the codelists contain all and onlye the used codes at the cube.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants