Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-2828: Remove validation for XML-only datatypes #2846

Merged
merged 1 commit into from
Nov 20, 2024

Conversation

Ostrzyciel
Copy link
Contributor

GitHub issue resolved #2828

Pull request Description:

Remove the code for validating datatypes that according to the RDF 1.1 spec SHOULD NOT be used: QName, ENTITY, ID, IDREF, NOTATION, ENTITIES, NMTOKENS, IDREFS. Also remove code for validating XSD lists and unions. This should not change the behavior of Jena for users, unless they were relying on it to validate their weird XML data, which I don't think is an official functionality of Jena.

This is the first step to resolving #2828 – first I wanted to focus on untangling the unneeded the logic. In future PRs, I will remove all the dead code.

After applying these changes, the size of the jena-core JAR was reduced by 13 489 bytes. Not huge, but still something.

Details:

  • XSDDatatype
    • Removed commented-out datatype definitions for ENTITIES, NMTOKENS, IDREFS.
    • Removed some other commented-out temporary code that was supposed to be cleaned up some time ago (I guess).
    • I left in the registered datatypes for QName, ENTITY, ID, IDREF, NOTATION, because they are not doing any harm here.
  • XSSimpleType
    • Removed method isIdType() along with its implementations (was unused).
    • Removed method getPrimitiveKind() along with its implementations (was unused).
  • XSSimpleTypeDecl
    • Removed the registration of type validators for: QName, ENTITY, ID, IDREF, NOTATION.
    • Removed validation code for these datatypes.
    • Removed all code related to XSD unions and lists (not needed in RDF). Notably, this left atomic types as the only possible types, which greatly reduced the number of branches.
    • Note: there was some weirdness in the code where anySimpleType was treated as NOTATION for some reason. I have untangled this. I think any changes to this won't matter anyway, because neither NOTATION nor xsd:anySimpleType make particular sense in RDF.
  • XSSimpleTypeDefinition
    • Removed constants VARIETY_UNION, VARIETY_LIST.
    • Removed methods related to handling lists and unions.
  • BaseSchemaDVFactory and FullDVFactory
    • Replaced the code for: QName, ENTITY, ID, IDREF, NOTATION, ENTITIES, NMTOKENS, IDREFS with dummy registrations. We still need to register something for these datatypes here, because this is used by XSDDatatype in a public interface.
    • Removed the code for XSD lists and unions.
  • BaseDVFactory and SchemaDVFactory
    • Removed the code for XSD lists and unions.
  • Removed validator classes (now unused): IDDV, IDREFDV, EntityDV, ListDV, UnionDV

  • Tests are included.
  • Documentation change and updates are provided for the Apache Jena website
  • Commits have been squashed to remove intermediate development commit messages.
  • Key commit messages start with the issue number (GH-xxxx)

By submitting this pull request, I acknowledge that I am making a contribution to the Apache Software Foundation under the terms and conditions of the Contributor's Agreement.


See the Apache Jena "Contributing" guide.

@@ -124,9 +121,6 @@ public class XSDDatatype extends BaseDatatype {
public static final XSDDatatype XSDName = new XSDBaseStringType("Name");

/** Datatype representing xsd:QName */
// If you see this, remove commented lines.
// Merely temporary during switch over and testing.
// public static final XSDDatatype XSDQName = new XSDDatatype("QName");
public static final XSDDatatype XSDQName = new XSDPlainType("QName");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets' add @deprecated(forRemoval = true) and /** @deprected Do not use */ to XSDQname and the other non-RDF XSD datatypes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than that, the rest looks great!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I've also added a note to remove these in Jena 6, together with the associated dummy code in BaseSchemaDVFactory.

Issue: apache#2828

Remove the code for validating datatypes that according to the RDF 1.1 spec SHOULD NOT be used: QName, ENTITY, ID, IDREF, NOTATION, ENTITIES, NMTOKENS, IDREFS. Also remove code for validating XSD lists and unions.
@afs afs merged commit dc89c67 into apache:main Nov 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Clean up code in jena.ext.xerces
2 participants