Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TG2-VALIDATION_ESTABLISHMENTMEANS_STANDARD #268

Open
ArthurChapman opened this issue Feb 7, 2024 · 21 comments
Open

TG2-VALIDATION_ESTABLISHMENTMEANS_STANDARD #268

ArthurChapman opened this issue Feb 7, 2024 · 21 comments
Labels
Conformance CORE TG2 CORE tests OTHER Parameterized Test requires a parameter Test Tests created by TG2, either CORE, Supplementary or DO NOT IMPLEMENT TG2 Validation VOCABULARY

Comments

@ArthurChapman
Copy link
Collaborator

ArthurChapman commented Feb 7, 2024

TestField Value
GUID 4eb48fdf-7299-4d63-9d08-246902e2857f
Label VALIDATION_ESTABLISHMENTMEANS_STANDARD
Description Does the value of dwc:establishmentMeans occur in the bdq:sourceAuthority?
TestType Validation
Darwin Core Class dwc:Occurrence
Information Elements ActedUpon dwc:establishmentMeans
Information Elements Consulted
Expected Response EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if dwc:establishmentMeans is bdq:Empty; COMPLIANT if the value of dwc:establishmentMeans is in the bdq:sourceAuthority; otherwise NOT_COMPLIANT.
Data Quality Dimension Conformance
Term-Actions ESTABLISHMENTMEANS_STANDARD
Parameter(s) bdq:sourceAuthority
Source Authority bdq:sourceAuthority default = "Establishment Means Controlled Vocabulary List of Terms" {[https://dwc.tdwg.org/em/]} {GBIF vocabulary API [https://api.gbif.org/v1/vocabularies/EstablishmentMeans/concepts]}
Specification Last Updated 2024-02-08
Examples [dwc:establishmentMeans="native": Response.status=RUN_HAS_RESULT, Response.result=COMPLIANT, Response.comment="dwc:establishmentMeans found in the bdq:sourceAuthority"]
[dwc:establishmentMeans="cultivated": Response.status=RUN_HAS_RESULT, Response.result=NOT_COMPLIANT, Response.comment="dwc:establishmentMeans not found in the bdq:sourceAuthority"]
Source TG2
References
  • Darwin Core Maintenance Group (2021) Establishment Means Controlled Vocabulary List of Terms. Biodiversity Information Standards (TDWG). http://rs.tdwg.org/dwc/doc/em/
  • Groom et al. (2019) Improving Darwin Core for research and management of alien species. Biodiversity Information Science and Services 3: e38084. https://doi.org/10.3897/biss.3.38084
Example Implementations (Mechanisms)
Link to Specification Source Code
Notes This test must return NOT_COMPLIANT if there is leading or trailing whitespace or there are leading or trailing non-printing characters.
@ArthurChapman ArthurChapman added TG2 Validation OTHER Test Tests created by TG2, either CORE, Supplementary or DO NOT IMPLEMENT VOCABULARY Supplementary Tests supplementary to the core test suite. These are tests that the team regarded as not CORE. Conformance Parameterized Test requires a parameter labels Feb 7, 2024
@ArthurChapman
Copy link
Collaborator Author

I am wondering if this should now be CORE. It had a high rating, but wasn't made CORE at the time as a suitable Vocabulary didn't exist. One now does exist so _ this could be made CORE

@tucotuco
Copy link
Member

tucotuco commented Feb 7, 2024 via email

@chicoreus
Copy link
Collaborator

I would also concur. More notes in #269 (comment)

@Tasilee
Copy link
Collaborator

Tasilee commented Feb 8, 2024

Changed Source Authority from

bdq:sourceAuthority default = "Establishment Means Controlled Vocabulary List of Terms" {[https://api.gbif.org/v1/vocabularies/EstablishmentMeans/concepts]} {dwc:establishmentMeans [https://dwc.tdwg.org/list/#dwc_establishmentMeans]}

to

bdq:sourceAuthority default = "Darwin Core establishmentMeans" {[https://dwc.tdwg.org/list/#dwc_establishmentMeans]} {dwc:establishmentMeans vocabulary [https://dwc.tdwg.org/em/]}

to align with agreed structure.

@Tasilee
Copy link
Collaborator

Tasilee commented Feb 8, 2024

Arthur and I have had a discussion about the finer details of the phrasing and links for Source Authority. I am suggesting that we use the current rendition or a variant as a template we apply to all tests. I based this form on #104 as we are dealing with a Darwin Core term. There are up to three potential links

  1. The Darwin Core term and definition
  2. A list of values
  3. An API of values

In this example, we use (1) and (3).

The phrasing of #104 and here (1) spells out "Darwin Core" then the term "establishmentMeans" in camelCase. Is this appropriate?

The phrasing of (3) uses "dwc:establishmentMeans" which seems ok, but are we happy with something like "vocabulary of terms API" for the following text?

We need to be consistent across all tests.

@Tasilee Tasilee added CORE TG2 CORE tests and removed Supplementary Tests supplementary to the core test suite. These are tests that the team regarded as not CORE. labels Feb 9, 2024
@ArthurChapman
Copy link
Collaborator Author

The four tests (#277, #278, #268, #269) should be CORE (I have discussed this with Lee). Some reasons are

  • Invasive species are subject of the Convention on Biological Diversity Article 8(h) (https://www.cbd.int/idb/2009/about/cbd) and with a strong use case in the Cop 6 Decision last year VI/23 (https://www.cbd.int/decision/cop/default.shtml?id=7197)
  • These terms are recent additions to Darwin Core and are aspirational (we want all adding data follow the Standard)
  • There are good, well thought out vocabularies and APIs available
  • Apart from the CBD Use Case, there are good Use Case arguments given in Groom et al. (reference above)
  • There is no reason under our current criteria for excluding these tests from CORE

@chicoreus
Copy link
Collaborator

chicoreus commented Feb 20, 2024

@ArthurChapman @Tasilee See comment in 152 We need to sort out the conflated concepts within CORE. This set of issues does not fit into CORE the UseCase identified by TG3, but it does fit in another UseCase we consider central and fit into CORE in the meaning of the suite of tests we want to include in the Standard. This test is not Supplementary. But it is not CORE as we use it as the UseCase.

@ArthurChapman
Copy link
Collaborator Author

@chicoreus - see my separate email. CORE, as in the CORE tests, has never been restricted to TG3 and trying to do so, complicates the process. There should be no difference. If you look through the "Source" of the tests - most came from somewhere other than TG3.

@chicoreus
Copy link
Collaborator

@ArthurChapman Yes, CORE has meant the outcome of TG3, that was one of our key guidelines of what to include in CORE or not. We've been able to get away with conflating the what taxon where when for research analysis sense of CORE with a sense of CORE as a broader set of tests we are putting forward as part of the standard until now because the only scope we've been dealing with is that of the outcome of TG3. Source of the tests is not relevant, CORE has been our filter on those sources.

" There should be no difference." means that we are still conflating two very distinct concepts. See the comment on #152.

What is making the difference now is this set of tests that we think are very important, but don't fit into the data quality needs of CORE, they fit other use cases, but not that one. We need to clearly define the relevant UseCases sensu the framework, and clarify what we mean by CORE.

@ArthurChapman
Copy link
Collaborator Author

@chicoreus. TG3 was never one of our key guidelines on determining CORE. Its results were not out until well after we defined what we meant by CORE and had started developing the tests, and TG3 did not cover all aspects. It was looking at a methodology, but was never the guiding principle for TG2. Aspirationally, the TG3 methodology is good methodology for determining Use Cases, but it is not robust as yet and this was a Case Study - not a definitive study. It looked at how you would do it and the User Stories were examples - they were never meant to be comprehensive. In fact, the lack of responses to many of the questionnaires excluded it from being comprehensive. It was looking at a process - User Story, Use Case, linking it then to the Framework, etc. and running a proof of concept. If you needed you could write a use story for establishmentMeans if that satisfied you - not hard to do!

Incidentally, I just went through our tests - 105 tests, including 78 of our Current Core tests, were written prior to TG3 finalising its User Stories. Most were based on existing tests at ALA, iDigBio, CRIA, BISON etc. and were not related with the TG3 User Stories, although there was obviously some overlap.

@chicoreus
Copy link
Collaborator

@ArthurChapman No, TG3 was exactly the thing that shaped CORE. All of our thinking about which tests to include in CORE and what the tests do is shaped by the CORE UseCase of research analysis of darwin core occurrence data of which taxa occur where when. It is implicit in all of our analysis of both which tests to include and what the tests do. Only now that we are starting to describe tests that we think are important but fall outside the scope of CORE are we seeing that we need to clarify what we mean by CORE, either the use case, or the set of tests we are recommending, in which case we need to provide another name for the use case and specify what the other use case is.

@Tasilee
Copy link
Collaborator

Tasilee commented Feb 20, 2024

Either way, we need to be happy with our definition of CORE, and I'd strongly suggest we include links to our not CORE tags to be clear on what is not CORE!

I can't say TG3's use cases were in the front of my mind when considering new tests. They formed a reference but can never be comprehensive in scope given unknown unknows :)

@chicoreus
Copy link
Collaborator

We should be phrasing the source authority as:

bdq:sourceAuthority default = "Establishment Means Controlled Vocabulary List of Terms" {[https://dwc.tdwg.org/em/]} {GBIF vocabulary API [https://api.gbif.org/v1/vocabularies/EstablishmentMeans/]}

As pointed out by @ManonGros in #283 (comment) the GBIF vocabulary API is documented at https://techdocs.gbif.org/en/openapi/v1/vocabulary#/ Developers can choose the best means to to access the API, which for small vocabularies may be caching the json export of the vocabulary. https://api.gbif.org/v1/vocabularies/EstablishmentMeans/export For VALIDATION_term_STANDARD tests, the GBIF API is only likely to provide alternate access to the TDWG controlled vocabulary, but for AMENDMENT_term_STANDARDIZED, it looks like the GBIF data will be including a larger set of translations of labels than the actual standard document, which should be helpful in standardization implementations.

@chicoreus
Copy link
Collaborator

Updated notes from "fail" to more specific "This test must return NOT_COMPLIANT if there is leading or trailing whitespace or there are leading or trailing non-printing characters."

@Tasilee
Copy link
Collaborator

Tasilee commented Feb 24, 2024

Thanks @chicoreus - changing Source Authority from

bdq:sourceAuthority default = "Darwin Core establishmentMeans" {[https://dwc.tdwg.org/list/#dwc_establishmentMeans]} {dwc:establishmentMeans vocabulary API [https://api.gbif.org/v1/vocabularies/EstablishmentMeans/concepts]}

to

bdq:sourceAuthority default = "Establishment Means Controlled Vocabulary List of Terms" {[https://dwc.tdwg.org/em/]} {GBIF vocabulary API [https://api.gbif.org/v1/vocabularies/EstablishmentMeans/]}

@tucotuco
Copy link
Member

I think https://api.gbif.org/v1/vocabularies/EstablishmentMeans/concepts is OK as a source authority because it actually has an API, as long as it is understood that the actually vocabulary is maintained at https://dwc.tdwg.org/em/ and the GBIF API is expected to remain up to date with that.

@Tasilee
Copy link
Collaborator

Tasilee commented Apr 16, 2024

Changed Source Authority from

bdq:sourceAuthority default = "Establishment Means Controlled Vocabulary List of Terms" {[https://dwc.tdwg.org/em/]} {GBIF vocabulary API [https://api.gbif.org/v1/vocabularies/EstablishmentMeans/concepts]}

to

bdq:sourceAuthority default = "GBIF EstablishmentMeans Vocabulary" [https://api.gbif.org/v1/vocabularies/EstablishmentMeans]} {"dwc:establishmentMeans vocabulary API" [https://api.gbif.org/v1/vocabularies/EstablishmentMeans/concepts]}

@tucotuco
Copy link
Member

Source Authority should be

bdq:sourceAuthority default = "Establishment Means Controlled Vocabulary List of Terms" {[https://dwc.tdwg.org/em/]} {GBIF vocabulary API [https://api.gbif.org/v1/vocabularies/EstablishmentMeans/concepts]}

@Tasilee
Copy link
Collaborator

Tasilee commented Apr 16, 2024

Changed Source Authority from

bdq:sourceAuthority default = "GBIF EstablishmentMeans Vocabulary" [https://api.gbif.org/v1/vocabularies/EstablishmentMeans]} {"dwc:establishmentMeans vocabulary API" [https://api.gbif.org/v1/vocabularies/EstablishmentMeans/concepts]}

to

bdq:sourceAuthority default = "Establishment Means Controlled Vocabulary List of Terms" {[https://dwc.tdwg.org/em/]} {GBIF vocabulary API [https://api.gbif.org/v1/vocabularies/EstablishmentMeans/concepts]}

@chicoreus
Copy link
Collaborator

See #275 (comment)

The GBIF API does not help here, it does not provide the actual Controlled Values from the TDWG vocabulary, the values it has differ in case.

@Tasilee
Copy link
Collaborator

Tasilee commented May 14, 2024

GBIF vocabulary has now been aligned with Darwin Core. Thanks @timrobertson100

chicoreus added a commit to FilteredPush/rec_occur_qc that referenced this issue Jul 27, 2024
…hmentMeans amendment and validation along with default methods and unit tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Conformance CORE TG2 CORE tests OTHER Parameterized Test requires a parameter Test Tests created by TG2, either CORE, Supplementary or DO NOT IMPLEMENT TG2 Validation VOCABULARY
Projects
None yet
Development

No branches or pull requests

4 participants