Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New and unified version of schema file (.xsd) for XML standard names table #468

Merged
merged 2 commits into from
Apr 17, 2024

Conversation

larsbarring
Copy link
Contributor

@larsbarring larsbarring commented Mar 20, 2024

This version of the schema file implements issue #459, i.e. recent changes to the XML formal as specified in Appendix B.

This PR must be merged after PR #458.

Only the new file is relevant for this PR. All other changes belong to PR #458. (Force pushed change to remove those files from this PR)

@DocOtak
Copy link
Member

DocOtak commented Mar 27, 2024

Should the xsd name space be versioned as 1.0 or 1.1? Looks like the spec says you don't need the version if you don't want to and that all 1.0 files should be valid 1.1.

@larsbarring
Copy link
Contributor Author

I am not sure about which version to use because I focussed on making as few changes as possible. And from my testing is seems that version 1.0 worked as expected. I have no deeper insight into these matters, and if there are reasons to change I it would be simple to implement.

@ethanrd
Copy link
Member

ethanrd commented Apr 2, 2024

@DocOtak wrote:

Should the xsd name space be versioned as 1.0 or 1.1? Looks like the spec says you don't need the version if you don't want to and that all 1.0 files should be valid 1.1.

Are you asking about the version in the name of the .xsd file? There does not appear to be an XML Namespace defined (the XSD does not specify a targetNamespace and the XML files use a noNamespaceSchemaLocation attribute to reference the .xsd file).

(I haven't been following this closely. Sorry if I missed earlier discussions.)

@DocOtak
Copy link
Member

DocOtak commented Apr 4, 2024

@ethanrd I was asking if the xmlns in line 8 in this PR should be versioned more specifically, but I don't know the trade offs or potential repercussions, so was hoping there is expertise here. https://www.w3.org/TR/xmlschema11-1/#langids

@larsbarring
Copy link
Contributor Author

I am not (at all) an expert on these matters, but after having looked at https://www.w3.org/TR/xmlschema11-1/#ns-bindings, and it seems clear that all the namespace prefixes xs and xsi will have to be changed throughout the file. Of course this can be done, but I think that this is only warranted if we can establish a tangible advantage.

@ethanrd
Copy link
Member

ethanrd commented Apr 5, 2024

Hi @DocOtak and @larsbarring - I think the only reason to look at updating the XML Schema versions would be if newer versions had new features that we wanted to use in the standard names XSD files. I suspect we use pretty standard XML Schema capabilities so probably don't have any need to update.

Also, the referenced W3C document seems to be the latest version. And the "xs" and "xsi" namespace URLs in the CF standard name .xsd and .xml` files match what is listed in Section 1.3.3 Conventional Namespace Bindings:

[It has been a long time since I've done much XSD work but these namespace URLs look very familiar.]

@DocOtak
Copy link
Member

DocOtak commented Apr 5, 2024

@ethanrd Only thing I can think of is if we decide that having some ordering to elements be in the schema, e.g. aliases must be in some lex sorted order, then I'm pretty sure that's a 1.1 feature

@ethanrd
Copy link
Member

ethanrd commented Apr 6, 2024

The referenced document is XML Schema 1.1 which is the latest version. So I would think we would be good with the namespaces specified in section 1.3.3. Though the "2001" in the URL when 1.1 was approved in 2012 seems odd. I expect it is a backward compatibility thing.

@larsbarring
Copy link
Contributor Author

Hi @DocOtak and @ethanrd -- Thanks for these useful comments :-) However, as this conversation has moved on to deal more with the scope of the proposal as such (i.e. enforce/implement sorting or not) I suggest that we continue in the issue. Hence I have copied these comments over to this issue comment, and also added my further thoughts in the next comment (over there).

@larsbarring larsbarring force-pushed the issue-459 branch 2 times, most recently from 24db5ff to ccece29 Compare April 12, 2024 15:21
@larsbarring larsbarring marked this pull request as ready for review April 15, 2024 15:14
Copy link
Member

@sadielbartholomew sadielbartholomew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a basic check that the schema definition is compatible with our standard names XML, I've used a validator tool. I used this one which allows files up to 10MB, whereas many other tools I found online (for speed, I am sure I can install some CLI validator library if that was the only option!) only support up to 2MB: https://conversiontools.io/convert/validate-xml-xsd.

It seems like the schema works for the XML, though it is strict and picks up on issues with the tables. Namely, I see the following pair of errors reported on the XML defining the last released version of the table, 84:

XML is not valid according to the XSD schema.
Validation errors:
Error: Line 4: cvc-complex-type.2.4.a: Invalid content was found starting with element 'last_modified'. One of '{conventions}' is expected.
Line 2: <standard_name_table xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="cf-standard-name-table-1.1.xsd">
Line 3:    <version_number>84</version_number>
Line 4:    <last_modified>2024-01-18T15:55:10Z</last_modified>
Line 5:    <institution>Centre for Environmental Data Analysis</institution>
Line 6:    <contact>[email protected]</contact>

Error: Line 3775: cvc-id.2: There are multiple occurrences of ID value 'drainage_amount_through_base_of_soil_model'.

Now, both of these are fine and (in context) expected:

So, it seems everything is fine for the current table and going forward. And, with the schema in place we can even use the validation tool to compare the latest XML and the XSD to pick up on any issues like duplicate names, missing header tags, etc., before publication, which is useful.

So, I am happy that this will work going forward, though for earlier versions of the XML, for there to not be any raised errors we might need to point to earlier schema files, as per the mapping at #433 (comment). I imagine this is the plan given those files are being kept in the repository under this new Data/schema-files location, @larsbarring?

Regardless, for the issue at hand, I am satisfied this is all good. I have made one minor suggestion about some new wording, but even if you disagree with that I am happy for this to be merged. Lars, please let me know what you think about that one minor suggestion, then I will merge this.

Data/schema-files/cf-standard-name-table-2.0.xsd Outdated Show resolved Hide resolved
@sadielbartholomew sadielbartholomew merged commit cb4abbd into cf-convention:main Apr 17, 2024
1 of 2 checks passed
@larsbarring larsbarring deleted the issue-459 branch April 17, 2024 16:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Harmonise content of the schema definition files
4 participants