-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New and unified version of schema file (.xsd) for XML standard names table #468
Conversation
Should the xsd name space be versioned as 1.0 or 1.1? Looks like the spec says you don't need the version if you don't want to and that all 1.0 files should be valid 1.1. |
I am not sure about which version to use because I focussed on making as few changes as possible. And from my testing is seems that version 1.0 worked as expected. I have no deeper insight into these matters, and if there are reasons to change I it would be simple to implement. |
@DocOtak wrote:
Are you asking about the version in the name of the .xsd file? There does not appear to be an XML Namespace defined (the XSD does not specify a (I haven't been following this closely. Sorry if I missed earlier discussions.) |
@ethanrd I was asking if the xmlns in line 8 in this PR should be versioned more specifically, but I don't know the trade offs or potential repercussions, so was hoping there is expertise here. https://www.w3.org/TR/xmlschema11-1/#langids |
I am not (at all) an expert on these matters, but after having looked at https://www.w3.org/TR/xmlschema11-1/#ns-bindings, and it seems clear that all the namespace prefixes |
Hi @DocOtak and @larsbarring - I think the only reason to look at updating the XML Schema versions would be if newer versions had new features that we wanted to use in the standard names XSD files. I suspect we use pretty standard XML Schema capabilities so probably don't have any need to update. Also, the referenced W3C document seems to be the latest version. And the "xs" and "xsi" namespace URLs in the CF standard name [It has been a long time since I've done much XSD work but these namespace URLs look very familiar.] |
@ethanrd Only thing I can think of is if we decide that having some ordering to elements be in the schema, e.g. aliases must be in some lex sorted order, then I'm pretty sure that's a 1.1 feature |
The referenced document is XML Schema 1.1 which is the latest version. So I would think we would be good with the namespaces specified in section 1.3.3. Though the "2001" in the URL when 1.1 was approved in 2012 seems odd. I expect it is a backward compatibility thing. |
Hi @DocOtak and @ethanrd -- Thanks for these useful comments :-) However, as this conversation has moved on to deal more with the scope of the proposal as such (i.e. enforce/implement sorting or not) I suggest that we continue in the issue. Hence I have copied these comments over to this issue comment, and also added my further thoughts in the next comment (over there). |
24db5ff
to
ccece29
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a basic check that the schema definition is compatible with our standard names XML, I've used a validator tool. I used this one which allows files up to 10MB, whereas many other tools I found online (for speed, I am sure I can install some CLI validator library if that was the only option!) only support up to 2MB: https://conversiontools.io/convert/validate-xml-xsd.
It seems like the schema works for the XML, though it is strict and picks up on issues with the tables. Namely, I see the following pair of errors reported on the XML defining the last released version of the table, 84:
XML is not valid according to the XSD schema.
Validation errors:
Error: Line 4: cvc-complex-type.2.4.a: Invalid content was found starting with element 'last_modified'. One of '{conventions}' is expected.
Line 2: <standard_name_table xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="cf-standard-name-table-1.1.xsd">
Line 3: <version_number>84</version_number>
Line 4: <last_modified>2024-01-18T15:55:10Z</last_modified>
Line 5: <institution>Centre for Environmental Data Analysis</institution>
Line 6: <contact>[email protected]</contact>
Error: Line 3775: cvc-id.2: There are multiple occurrences of ID value 'drainage_amount_through_base_of_soil_model'.
Now, both of these are fine and (in context) expected:
- the first one points out that table 84, as with any earlier version, doesn't have a
conventions
header item (see Standard names: Add "Conventions" string to the standard name xml table header cf-conventions#500) - the second error is OK because this is an Issue already noticed with the table (see cf-convention/discuss#273).
So, it seems everything is fine for the current
table and going forward. And, with the schema in place we can even use the validation tool to compare the latest XML and the XSD to pick up on any issues like duplicate names, missing header tags, etc., before publication, which is useful.
So, I am happy that this will work going forward, though for earlier versions of the XML, for there to not be any raised errors we might need to point to earlier schema files, as per the mapping at #433 (comment). I imagine this is the plan given those files are being kept in the repository under this new Data/schema-files
location, @larsbarring?
Regardless, for the issue at hand, I am satisfied this is all good. I have made one minor suggestion about some new wording, but even if you disagree with that I am happy for this to be merged. Lars, please let me know what you think about that one minor suggestion, then I will merge this.
Co-authored-by: Sadie L. Bartholomew <[email protected]>
This version of the schema file implements issue #459, i.e. recent changes to the XML formal as specified in Appendix B.
This PR must be merged after PR #458.
Only the new file is relevant for this PR.
All other changes belong to PR #458.(Force pushed change to remove those files from this PR)