Glyphs should allow CONTENT with length above 1 for cases where no precombined character exists #85

urieli · 2024-02-07T13:24:34Z

The GlyphType documentation states:

Accordingly the value for the glyph element will be defined as follows:
Pre-composed representation = base + combining character(s) (decomposed representation)
See http://www.fileformat.info/info/unicode/char/0101/index.htm
"U+0101" = (U+0061) + (U+0304)
"combining characters" ("base characters" in combination with non-spacing marks or characters which are combined to one) are represented as one "glyph", e.g. áàâ.

This is accompanied by the restriction length=1 for the CONTENT attribute:

<xsd:attribute name="CONTENT" use="required">
	<xsd:annotation>
		<xsd:documentation>
			CONTENT contains the precomposed representation (combining character) of the character from the parent String element.
			The sequence position of the Glyph element matches the position of the character in the String.
		</xsd:documentation>
	</xsd:annotation>
	<xsd:simpleType>
		<xsd:restriction base="xsd:string">
			<xsd:length fixed="true" value="1"/>
			<xsd:whiteSpace value="preserve"/>
		</xsd:restriction>
	</xsd:simpleType>
</xsd:attribute>

Unfortunately, in some alphabets, a precomposed representation does not exist.

For example, in the Hebrew alphabet, it is possible for many letters to have three diacritics:

Even if we ignore cantillation marks, which are limited to biblical text, only a very small portion of the combined possibilities exist as precombined characters.

Thus, there is no precombined character for "בָּ" or even the more common "בָ".

Therefore, to be able to represent Hebrew glyphs properly, we should change the specification to something like:

<xsd:attribute name="CONTENT" use="required">
	<xsd:annotation>
		<xsd:documentation>
			CONTENT contains the representation of the character from the parent String element.
			Precombined characters are recommended, but it is acceptable to have one base character and zero-to-many combining diacritics.
			The sequence position of the Glyph element matches the position of the character in the String.
		</xsd:documentation>
	</xsd:annotation>
	<xsd:simpleType>
		<xsd:restriction base="xsd:string">
			 <xsd:maxLength value="4" />
			<xsd:whiteSpace value="preserve"/>
		</xsd:restriction>
	</xsd:simpleType>
</xsd:attribute>

We should also remove the text above from the GlyphType documentation.

I'm not sure whether other alphabets would require more than 4 characters - maybe the max length attribute could be removed entirely.

The text was updated successfully, but these errors were encountered:

cipriandinu · 2024-07-12T13:06:06Z

Thank you for this topic, this change could be a good candidate for 5.0 as well, maybe we will find other use cases (other languages) to provide it as well as sample of usage

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Glyphs should allow CONTENT with length above 1 for cases where no precombined character exists #85

Glyphs should allow CONTENT with length above 1 for cases where no precombined character exists #85

urieli commented Feb 7, 2024

cipriandinu commented Jul 12, 2024

Glyphs should allow CONTENT with length above 1 for cases where no precombined character exists #85

Glyphs should allow CONTENT with length above 1 for cases where no precombined character exists #85

Comments

urieli commented Feb 7, 2024

cipriandinu commented Jul 12, 2024