Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Glyphs should allow CONTENT with length above 1 for cases where no precombined character exists #85

Open
urieli opened this issue Feb 7, 2024 · 1 comment

Comments

@urieli
Copy link

urieli commented Feb 7, 2024

The GlyphType documentation states:

Accordingly the value for the glyph element will be defined as follows:
Pre-composed representation = base + combining character(s) (decomposed representation)
See http://www.fileformat.info/info/unicode/char/0101/index.htm
"U+0101" = (U+0061) + (U+0304)
"combining characters" ("base characters" in combination with non-spacing marks or characters which are combined to one) are represented as one "glyph", e.g. áàâ.		

This is accompanied by the restriction length=1 for the CONTENT attribute:

<xsd:attribute name="CONTENT" use="required">
	<xsd:annotation>
		<xsd:documentation>
			CONTENT contains the precomposed representation (combining character) of the character from the parent String element.
			The sequence position of the Glyph element matches the position of the character in the String.
		</xsd:documentation>
	</xsd:annotation>
	<xsd:simpleType>
		<xsd:restriction base="xsd:string">
			<xsd:length fixed="true" value="1"/>
			<xsd:whiteSpace value="preserve"/>
		</xsd:restriction>
	</xsd:simpleType>
</xsd:attribute>

Unfortunately, in some alphabets, a precomposed representation does not exist.

For example, in the Hebrew alphabet, it is possible for many letters to have three diacritics:

Even if we ignore cantillation marks, which are limited to biblical text, only a very small portion of the combined possibilities exist as precombined characters.

Thus, there is no precombined character for "בָּ" or even the more common "בָ".

Therefore, to be able to represent Hebrew glyphs properly, we should change the specification to something like:

<xsd:attribute name="CONTENT" use="required">
	<xsd:annotation>
		<xsd:documentation>
			CONTENT contains the representation of the character from the parent String element.
			Precombined characters are recommended, but it is acceptable to have one base character and zero-to-many combining diacritics.
			The sequence position of the Glyph element matches the position of the character in the String.
		</xsd:documentation>
	</xsd:annotation>
	<xsd:simpleType>
		<xsd:restriction base="xsd:string">
			 <xsd:maxLength value="4" />
			<xsd:whiteSpace value="preserve"/>
		</xsd:restriction>
	</xsd:simpleType>
</xsd:attribute>

We should also remove the text above from the GlyphType documentation.

I'm not sure whether other alphabets would require more than 4 characters - maybe the max length attribute could be removed entirely.

@cipriandinu
Copy link
Member

Thank you for this topic, this change could be a good candidate for 5.0 as well, maybe we will find other use cases (other languages) to provide it as well as sample of usage

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants