I use it to generate random species names (Linnean binomial nomenclature).
Takes in input a text file containing one example name per line (e.g one species name per line).
Output a serie of randomly-generated (letter-wise) sequence of words. There is a transition matrix for each position of the word in the line (e.g a transition matrix for the genus name, and one for the species name).
The first time it is run on a training dataset, it will save the transition
matrix and word length distribution in .npy
files in the current directory.
You can delete those files to rerun the learning step.
# Generate random species names (2 words: Genus+species)
python3 species_markovi.py trainingsets/Opisthokonta_from_timetree.txt -o 3
# To generate random genus names (one word)
./species_markovi.py trainingsets/Opisthokonta-genera_from_timetree.txt -w 1 -o 3
An example of names generated with a chain of order 3, with a training set of 47850 Opisthokonta species from TimeTree:
Pseudosophilus furicoloroensis
Hydroas morachyuratus
Cetheria dicoloralis
Feylios placontademontalis
Melaconius thymetterina
Amphorus zymalendicus
Mokopis pyrans
Tarhombodina trix
Chus flexaspardelanevana
Panoplodon neva
Notamia zospinis
Aiti planthurenae
Napeonocohyllonotaena atrilis
Spiza virinensis
Eoscelis sphaemariatus
Chla cata
Phylamydoides elongchir
Eum tanum
Scylidura tatulater
Pana pardwickleylosus
And another example of order 3 trained on mammalian genera:
Pomys
Hericrotheirotis
Mineociurus
Braccotocomyscius
Asteaseutes
Phoecopomyomus
Equattinus
Martia
Peomyscophintodorhicomys
Stenobudellorex
Licensing: Do what you want with it.