This project contains:
- a stemmer for the Latin language,
- a filter that converts roman numerals into arabic ones, and
- a value source that correctly sorts strings with numbers.
Usage example in conf/schema.xml
<fieldType name="text_la_stem" class="solr.TextField" positionIncrementGap="100">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="de.uni_koeln.capitularia.lucene_tools.LatinStemFilterFactory"
preserveOriginal="true" minNounSize="3" minVerbSize="3"/>
The stemmer uses an algorithm by Schinke et al.
Schinke R, Greengrass M, Robertson AM and Willett P (1996) A stemming algorithm for Latin text databases. Journal of Documentation, 52: 172-187.
The filter will convert roman XLII
to arabic 42
Usage example in conf/schema.xml
<fieldType name="text_la_stem" class="solr.TextField" positionIncrementGap="100">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="de.uni_koeln.capitularia.lucene_tools.RomanNumeralsFilterFactory"
The value source generates strings that sort correctly when used as keys, like this:
- paris-bn-lat-4638
- paris-bn-lat-10528
instead of alphabetically, like this:
- paris-bn-lat-10528
- paris-bn-lat-4638
Usage example in conf/solrconfig.xml
In the query set the sort
parameter to: strnumsort(my_alphanum_id) asc