opqua (opkua, upkua) [Chibcha/muysccubun]
I. noun. ailment, disease, illness
II. noun. cause, reason [for which something occurs]
Taken from D. F. GĂłmez Aldana's muysca-spanish dictionary.
Opqua stochastically simulates pathogens with distinct, evolving genotypes that spread through populations of hosts which can have specific immune profiles.
Opqua is a useful tool to test out scenarios, explore hypotheses, make predictions, and teach about the relationship between pathogen evolution and epidemiology.
Among other things, Opqua can model
- host-host, vector-borne, and vertical transmission
- pathogen evolution through mutation, recombination, and/or reassortment
- host recovery, death, and birth
- metapopulations with complex structure and demographic interactions
- interventions and events altering demographic, ecological, or evolutionary parameters
- treatment and immunization of hosts or vectors
- influence of pathogen genome sequences on transmission and evolution, as well as host demographic dynamics
- intra- and inter-host competition and evolution of pathogen strains across user-specified adaptive landscapes
Check out the changelog
file for information on recent updates.
Opqua has been used in-depth to study pathogen evolution across fitness valleys. Check out the peer-reviewed preprint on biorXiv, now peer-reviewed here.
Opqua is developed by Pablo Cárdenas.
The first publication using Opqua was created in collaboration with Vladimir Corredor and Mauricio Santos-Vega. Follow their science antics on BlueSky @pcr-guy or Twitter at @pcr_guy and @msantosvega.
Opqua is available on PyPI and is distributed under an MIT License.
These are some of the plots Opqua is able to produce, but you can output the
raw simulation data yourself to make your own analyses and plots. These are all
taken from the examples in the examples/tutorials
folder—try them out
yourself! See the
[Requirements and Installation](#Requirements and Installation) and
Usage sections for more details.
An optimal pathogen genome arises through de novo mutation and outcompetes all
others through intra-host competition. See
fitness_function_mutation_example.py
in the examples/tutorials/evolution
folder.
An optimal pathogen genome arises through independent reassortment of
chromosomes or genome segments, outcompeting all others
through increases in transmissibility and intra-host competition.
See transmissibility_function_reassortment_example.py
in
the examples/tutorials/evolution
folder. Similar code can be used to achieve
genetic recombination by setting num_crossover_host
to greater than 0.
A population with natural birth and death dynamics shows the effects of a
pathogen. "Dead" denotes deaths caused by pathogen infection. See
vector-borne_birth-death_example.py
in the examples/tutorials/vital_dynamics
folder.
Pathogens spread through a network of interconnected populations of hosts. Lines
denote infected pathogens. See
metapopulations_migration_example.py
in the
examples/tutorials/metapopulations
folder.
A population undergoes different interventions, including changes in
epidemiological parameters and vaccination. "Recovered" denotes immunized,
uninfected hosts.
See intervention_examples.py
in the examples/tutorials/interventions
folder.
Phylogenies can be computed for pathogen genomes that emerge throughout the
simulation. See fitness_function_mutation_example.py
in the
examples/tutorials/evolution
folder.
Mutant genotypes connect to each other across fitness landscapes according to
their rate of emerging and establishing in intrahost evolution.
See landscape_example.py
in the
examples/tutorials/landscapes
folder.
For advanced examples (including multiple parameter sweeps), check out this separate repository (preprint on biorXiv, now peer-reviewed here).
Opqua runs on Python. A good place to get the latest version it if you don't have it is Anaconda.
Opqua is available on PyPI to install
through pip
, as explained below.
If you haven't yet, install pip:
curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
python get-pip.py
Install Opqua by running
pip install opqua
The pip installer should take care of installing the necessary packages.
However, for reference, the versions of the packages used for opqua's
development are saved in requirements.txt
To run any Opqua model (including the tutorials in the examples/tutorials
folder), save the model as a .py
file and execute from the console using
python my_model.py
.
You may also run the models from a notebook environment such as Jupyter or an integrated development environment (IDE) such as Spyder, both available through Anaconda.
The simplest model you can make using Opqua looks like this:
# This simulates a pathogen with genome "AAAAAAAAAA" spreading in a single
# population of 100 hosts, 20 of which are initially infected, under example
# preset conditions for host-host transmission.
from opqua.model import Model
my_model = Model()
my_model.newSetup('my_setup', preset='host-host')
my_model.newPopulation('my_population', 'my_setup', num_hosts=100)
my_model.addPathogensToHosts( 'my_population',{'AAAAAAAAAA':20} )
my_model.run(0,100)
data = my_model.saveToDataFrame('my_model.csv')
graph = my_model.compartmentPlot('my_model.png', data)
For more example usage, have a look at the examples
folder. For an overview
of how Opqua models work, check out the Materials and Methods section on the
manuscript
here. A
summarized description is shown below in the
How Does Opqua Work? section.
For more information on the details of each function, head over to the
Model Documentation section.
Opqua models are composed of populations containing hosts and/or vectors, which themselves may be infected by a number of pathogens with different genomes.
A genome is represented as a string of characters. All genomes must be of the same length (a set number of loci), and each position within the genome can have one of a number of different characters specified by the user (corresponding to different alleles). Different loci in the genome may have different possible alleles available to them. Genomes may be composed of separate chromosomes, separated by the "/" character, which is reserved for this purpose.
Each population may have its own unique parameters dictating the events that happen inside of it, including how pathogens are spread between its hosts and vectors.
There are different kinds of events that may occur to hosts and vectors in a population:
- contact between an infectious host/vector and another host/vector in the same population (intra-population contact) or in a different population ("population contact")
- migration of a host/vector from one population to another
- recovery of an infected host/vector
- birth of a new host/vector from an existing host/vector
- death of a host/vector due to pathogen infection or by "natural" causes
- mutation of a pathogen in an infected host/vector
- recombination of two pathogens in an infected host/vector
The likelihood of each event occurring is determined by the population's parameters (explained in the newSetup function documentation) and the number of infected and healthy hosts and/or vectors in the population(s) involved. Crucially, it is also determined by the genome sequences of the pathogens infecting those hosts and vectors. The user may specify arbitrary functions to evaluate how a genome sequence affects any of the above kinds of rates. This is once again done through arguments of the newSetup function. As an example, a specific genome sequence may result in increased transmission within populations but decreased migration of infected hosts, or increased mutation rates. These custom functions may be different across populations, resulting in different adaptive landscapes within different populations.
Contacts within and between populations may happen by any combination of host-host, host-vector, and/or vector-host routes, depending on the populations' parameters. When a contact occurs, each pathogen genome present in the infecting host/vector may be transferred to the receiving host/vector as long as one "infectious unit" is inoculated. The number of infectious units inoculated is randomly distributed based on a Poisson probability distribution. The mean of this distribution is set by the receiving host/vector's population parameters, and is multiplied by the fraction of total intra-host fitness of each pathogen genome. For instance, consider the mean inoculum size for a host in a given population is 10 units and the infecting host/vector has two pathogens with fitnesses of 0.3 and 0.7, respectively. This would make the means of the Poisson distributions used to generate random infections for each pathogen equal to 3 and 7, respectively.
Inter-population contacts occur via the same mechanism as intra-population contacts, with the distinction that the two populations must be linked in a compatible way. As an example, if a vector-borne model with two separate populations is to allow vectors from Population A to contact hosts in Population B, then the contact rate of vectors in Population A and the contact rate of hosts in Population B must both be greater than zero. Migration of hosts/vectors from one population to another depends on a single rate defining the frequency of vector/host transport events from a given population to another. Therefore, Population A would have a specific migration rate dictating transport to Population B, and Population B would have a separate rate governing transport towards A.
Recovery of an infected host or vector results in all pathogens being removed from the host/vector. Additionally, the host/vector may optionally gain protection from pathogens that contain specific genome sequences present in the genomes of the pathogens it recovered from, representing immune memory. The user may specify a population parameter delimiting the contiguous loci in the genome that are saved on the recovered host/vector as "protection sequences". Pathogens containing any of the host/vector's protection sequences will not be able to infect the host/vector.
Births result in a new host/vector that may optionally inherit its parent's protection sequences. Additionally, a parent may optionally infect its offspring at birth following a Poisson sampling process equivalent to the one described for other contact events above. Deaths of existing hosts/vectors can occur both naturally or due to infection mortality. Only deaths due to infection are tracked and recorded in the model's history.
De novo mutation of a pathogen in a given host/vector results in a single locus within a pathogen's genome being randomly assigned a new allele from the possible alleles at that position. Recombination of two pathogens in a given host/vector creates two new genomes based on the independent segregation of chromosomes (or reassortment of genome segments, depending on the field) from the two parent genomes. In addition, there may be a Poisson-distributed random number of crossover events between homologous parent chromosomes. Recombination by crossover event will result in all the loci in the chromosome on one side of the crossover event location being inherited from one of the parents, while the remainder of the chromosome is inherited from the other parent. The locations of crossover events are distributed throughout the genome following a uniform random distribution.
Furthermore, the user may specify changes in model behavior at specific timepoints during the simulation. These changes are known as "interventions". Interventions can include any kind of manipulation to populations in the model, including:
- adding new populations
- changing a population's event parameters and adaptive landscape functions
- linking and unlinking populations through migration or inter-population contact
- adding and removing hosts and vectors to a population
Interventions can also include actions that involve specific hosts or vectors in a given population, such as:
- adding pathogens with specific genomes to a host/vector
- removing all protection sequences from some hosts/vectors in a population
- applying a "treatment" in a population that cures some of its hosts/vectors of pathogens
- applying a "vaccine" in a population that protects some of its hosts/vectors from pathogens
For these kinds of interventions involving specific pathogens in a population, the user may choose to apply them to a randomly-sampled fraction of hosts/vectors in a population, or to a specific group of individuals. This is useful when simulating consecutive interventions on the same specific group within a population. A single model may contain multiple groups of individuals and the same individual may be a member of multiple different groups. Individuals remain in the same group even if they migrate away from the population they were chosen in.
When a host/vector is given a "treatment", it removes all pathogens within the host/vector that do not contain a collection of "resistance sequences". A treatment may have multiple resistance sequences. A pathogen must contain all of these within its genome in order to avoid being removed. On the other hand, applying a vaccine consists of adding a specific protection sequence to hosts/vectors, which behaves as explained above for recovered hosts/vectors when they acquire immune protection, if the model allows it.
Models are simulated using an implementation of the Gillespie algorithm (optionally with tau-leaping approximation) in which the rates of different kinds of events across different populations are computed with each population's parameters and current state, and are then stored in a matrix. In addition, each population has host and vector matrices containing coefficients that represent the contribution of each host and vector, respectively, to the rates in the master model rate matrix. Each coefficient is dependent on the genomes of the pathogens infecting its corresponding vector or host. Whenever an event occurs, the corresponding entries in the population matrix are updated, and the master rate matrix is recomputed based on this information.
Tau-leaping is achieved by setting a time step threshold under which the step size defaults to a larger fixed step size. The number of events of each type occurring in every population of the model within this time span is then calculated as a random value from a Poisson distribution with mean equal to the corresponding rate. All events for this time span are then executed. The choice of the time step threshold and the minimum step size determines the accuracy of the approximation. By default, the minimum step size is set to be the minimum of all intrahost growth threshold times for hosts and vectors across all model populations, or the minimum mean time before transmission. This ensures that the likelihood of multiple events occurring within the same host in the same tau-leap is low. The threshold time is lower by a factor of the number of event types across all populations to account for the fact that this tau-leaping approximation only becomes less intensive if the number of events in the leap is larger than the number of types of events. This threshold is excessively stringent but provides a good bound.
The model's state at any given time comprises all populations, their hosts and vectors, and the pathogen genomes infecting each of these. A copy of the model's state is saved at every time point, or at intermittent intervals throughout the course of the simulation. A random sample of hosts and/or vectors may be saved instead of the entire model as a means of reducing memory footprint.
The output of a model can be saved in multiple ways. The model state at each saved timepoint may be output in a single, raw pandas DataFrame, and saved as a tabular file. Other data output types include counts of pathogen genomes or protection sequences for the model, as well as time of first emergence for each pathogen genome and genome distance matrices for every timepoint sampled. The user can also create different kinds of plots to visualize the results. These include:
- plots of the number of hosts and/or vectors in different epidemiological compartments (naive, infected, recovered, and dead) across simulation time
- plots of the number of individuals in a compartment for different populations
- plots of the genomic composition of the pathogen population over time
- phylogenies of pathogen genomes
Users can also use the data output formats to make their own custom plots.
All usage is handled through the Opqua Model
class.
The Model
class contains populations, setups, and interventions to be used
in simulation. It also contains groups of hosts/vectors for manipulations and
stores model history as snapshots for specific time points.
To use it, import the class as
from opqua.model import Model
You can find a detailed account of everything Model
does in the
Model attributes and
Model class methods list sections.
- populations -- dictionary with keys=population IDs, values=Population objects
- setups -- dictionary with keys=setup IDs, values=Setup objects
- interventions -- contains model interventions in the order they will occur
- groups -- dictionary with keys=group IDs, values=lists of hosts/vectors
- history -- dictionary with keys=time values, values=Model objects that are snapshots of Model at that timepoint
- global_trackers -- dictionary keeping track of some global indicators over all the course of the simulation
- custom_condition_trackers -- dictionary with keys=ID of custom condition, values=functions that take a Model object as argument and return True or False; every time True is returned by a function in custom_condition_trackers, the simulation time will be stored under the corresponding ID inside global_trackers['custom_condition']
- t_var -- variable that tracks time in simulations
The dictionary global_trackers contains the following keys:
- num_events: dictionary with the number of each kind of event in the simulation
- last_event_time: time point at which the last event in the simulation happened
- genomes_seen: list of all unique genomes that have appeared in the simulation
- custom_conditions: dictionary with keys=ID of custom condition, values=lists of times; every time True is returned by a function in custom_condition_trackers, the simulation time will be stored under the corresponding ID inside global_trackers['custom_condition']
The dictionary num_events inside of global_trackers contains the following keys:
- MIGRATE_HOST
- MIGRATE_VECTOR
- POPULATION_CONTACT_HOST_HOST
- POPULATION_CONTACT_HOST_VECTOR
- POPULATION_CONTACT_VECTOR_HOST
- CONTACT_HOST_HOST
- CONTACT_HOST_VECTOR
- CONTACT_VECTOR_HOST
- RECOVER_HOST
- RECOVER_VECTOR
- MUTATE_HOST
- MUTATE_VECTOR
- RECOMBINE_HOST
- RECOMBINE_VECTOR
- KILL_HOST
- KILL_VECTOR
- DIE_HOST
- DIE_VECTOR
- BIRTH_HOST
- BIRTH_VECTOR
KILL_HOST and KILL_VECTOR denote death due to infection, whereas DIE_HOST and DIE_VECTOR denote death by natural means.
- setRandomSeed -- set random seed for numpy random number generator
- newSetup -- creates a new Setup, save it in setups dict under given name
- saveSetup -- saves Setup parameters to given file location as a CSV file
- loadSetup -- loads Setup parameters from CSV file at given location
- newIntervention -- creates a new intervention executed during simulation
- run -- simulates model for a specified length of time
- runReplicates -- simulate replicates of a model, save only end results
- runParamSweep -- simulate parameter sweep with a model, save only end results
- copyState -- copies a slimmed-down representation of model state
- deepCopy -- copies current model with inner references
- saveToDataFrame -- saves status of model to data frame, writes to file
- getCompositionData -- create dataframe with counts for pathogen genomes or resistance
- getPathogens -- creates data frame with counts for all pathogen genomes
- getProtections -- creates data frame with counts for all protection sequences
- populationsPlot -- plots aggregated totals per population across time
- compartmentPlot -- plots number of naive, infected, recovered, dead hosts/vectors vs time
- compositionPlot -- plots counts for pathogen genomes or resistance vs. time
- clustermap -- plots heatmap and dendrogram of all pathogens in given data
- pathogenDistanceHistory -- calculates pairwise distances for pathogen genomes at different times
- getGenomeTimes -- create DataFrame with times genomes first appeared during simulation
- visualizeMutationNetwork -- creates interactive visualization of mutation network
Make and connect populations:
- newPopulation -- create a new Population object with setup parameters
- linkPopulationsHostMigration -- set host migration rate from one population towards another
- linkPopulationsVectorMigration -- set vector migration rate from one population towards another
- linkPopulationsHostHostContact -- set host-host inter-population contact rate from one population towards another
- linkPopulationsHostVectorContact -- set host-vector inter-population contact rate from one population towards another
- linkPopulationsVectorHostContact -- set vector-host inter-population contact rate from one population towards another
- createInterconnectedPopulations -- create new populations, link all of them to each other by migration and/or inter-population contact
Manipulate hosts and vectors in population:
- newHostGroup -- returns a list of random (healthy or any) hosts
- newVectorGroup -- returns a list of random (healthy or any) vectors
- addHosts -- adds hosts to the population
- addVectors -- adds vectors to the population
- removeHosts -- removes hosts from the population
- removeVectors -- removes vectors from the population
- addPathogensToHosts -- adds pathogens with specified genomes to hosts
- addPathogensToVectors -- adds pathogens with specified genomes to vectors
- treatHosts -- removes infections susceptible to given treatment from hosts
- treatVectors -- removes infections susceptible to treatment from vectors
- protectHosts -- adds protection sequence to hosts
- protectVectors -- adds protection sequence to vectors
- wipeProtectionHosts -- removes all protection sequences from hosts
- wipeProtectionVectors -- removes all protection sequences from vectors
Modify population parameters:
- setSetup -- assigns a given set of parameters to this population
Compute a fitness landscape:
- newLandscape -- creates a new Landscape object with a setup
- mapLandscape -- maps and evaluates fitness of all relevant mutations
- saveLandscape -- saves mutation network and fitness values
- loadLandscape -- load mutation network and fitness values
Utility:
- customModelFunction -- returns output of given function run on model
- peakLandscape -- evaluates genome numeric phenotype by decreasing with distance from optimal sequence
- valleyLandscape -- evaluates genome numeric phenotype by increasing with distance from worst sequence
setRandomSeed(seed)
Set random seed for numpy random number generator.
Arguments:
- seed -- int for the random seed to be passed to numpy (int)
Model()
Class constructor; create a new Model object.
newSetup(name, preset=None, **kwargs)
Create a new Setup, save it in setups dict under given name.
Two preset setups exist: "vector-borne" and "host-host". You may select one of the preset setups with the preset keyword argument and then modify individual parameters with additional keyword arguments, without having to specify all of them. You may also not select a preset setup.
"host-host":
num_loci = 10
possible_alleles = 'ATCG'
allele_groups_host = #LIST:
allele_groups_vector = #LIST:
max_depth_host = 0
max_depth_vector = 0
peak_pathogen_population_host = 100
steady_pathogen_population_host = 100
peak_pathogen_population_vector = 100
steady_pathogen_population_vector = 100
population_threshold_host = 0
population_threshold_vector = 0
selection_threshold_host = 100
selection_threshold_vector = 100
generation_time_host = 1
max_generation_size_host = 10
generation_time_vector = 1
max_generation_size_vector = 10
max_generations_survival_host = 30
max_generations_survival_vector = 30
fitnessHost = (lambda g: 1)
contactHost = (lambda g: 1)
receiveContactHost = (lambda g: 1)
mortalityHost = (lambda g: 1)
natalityHost = (lambda g: 1)
recoveryHost = (lambda g: 1)
migrationHost = (lambda g: 1)
populationContactHost = (lambda g: 1)
receivePopulationContactHost = (lambda g: 1)
mutationHost = (lambda g: 1)
recombinationHost = (lambda g: 1)
fitnessVector = (lambda g: 1)
contactVector = (lambda g: 1)
receiveContactVector = (lambda g: 1)
mortalityVector = (lambda g: 1)
natalityVector = (lambda g: 1)
recoveryVector = (lambda g: 1)
migrationVector = (lambda g: 1)
populationContactVector = (lambda g: 1)
receivePopulationContactVector = (lambda g: 1)
mutationVector = (lambda g: 1)
recombinationVector = (lambda g: 1)
contact_rate_host_vector = 0
transmission_efficiency_host_vector = 0
transmission_efficiency_vector_host = 0
contact_rate_host_host = 2e-1
transmission_efficiency_host_host = 1
mean_inoculum_host = 1e1
mean_inoculum_vector = 0
variance_inoculum_host = 3
variance_inoculum_vector = 0
recovery_rate_host = 1e-1
recovery_rate_vector = 0
mortality_rate_host = 0
mortality_rate_vector = 0
recombine_in_host = 1e-4
recombine_in_vector = 0
num_crossover_host = 1
num_crossover_vector = 0
mutate_in_host = 1e-6
mutate_in_vector = 0
death_rate_host = 0
death_rate_vector = 0
birth_rate_host = 0
birth_rate_vector = 0
vertical_transmission_host = 0
vertical_transmission_vector = 0
inherit_protection_host = 0
inherit_protection_vector = 0
protection_upon_recovery_host = None
protection_upon_recovery_vector = None
"vector-borne":
num_loci = 10
possible_alleles = 'ATCG'
allele_groups_host = #LIST:
allele_groups_vector = #LIST:
max_depth_host = 0
max_depth_vector = 0
peak_pathogen_population_host = 100
steady_pathogen_population_host = 100
peak_pathogen_population_vector = 100
steady_pathogen_population_vector = 100
population_threshold_host = 0
population_threshold_vector = 0
selection_threshold_host = 100
selection_threshold_vector = 100
generation_time_host = 1
max_generation_size_host = 10
generation_time_vector = 1
max_generation_size_vector = 10
max_generations_survival_host = 30
max_generations_survival_vector = 30
fitnessHost = (lambda g: 1)
contactHost = (lambda g: 1)
receiveContactHost = (lambda g: 1)
mortalityHost = (lambda g: 1)
natalityHost = (lambda g: 1)
recoveryHost = (lambda g: 1)
migrationHost = (lambda g: 1)
populationContactHost = (lambda g: 1)
receivePopulationContactHost = (lambda g: 1)
mutationHost = (lambda g: 1)
recombinationHost = (lambda g: 1)
fitnessVector = (lambda g: 1)
contactVector = (lambda g: 1)
receiveContactVector = (lambda g: 1)
mortalityVector = (lambda g: 1)
natalityVector = (lambda g: 1)
recoveryVector = (lambda g: 1)
migrationVector = (lambda g: 1)
populationContactVector = (lambda g: 1)
receivePopulationContactVector = (lambda g: 1)
mutationVector = (lambda g: 1)
recombinationVector = (lambda g: 1)
contact_rate_host_vector = 2e-1
transmission_efficiency_host_vector = 1
transmission_efficiency_vector_host = 1
contact_rate_host_host = 0
transmission_efficiency_host_host = 0
mean_inoculum_host = 3
mean_inoculum_vector = 1
variance_inoculum_host = 3
variance_inoculum_vector = 1
recovery_rate_host = 1e-1
recovery_rate_vector = 1e-1
mortality_rate_host = 0
mortality_rate_vector = 0
recombine_in_host = 0
recombine_in_vector = 1e-4
num_crossover_host = 0
num_crossover_vector = 1
mutate_in_host = 1e-6
mutate_in_vector = 0
death_rate_host = 0
death_rate_vector = 0
birth_rate_host = 0
birth_rate_vector = 0
vertical_transmission_host = 0
vertical_transmission_vector = 0
inherit_protection_host = 0
inherit_protection_vector = 0
protection_upon_recovery_host = None
protection_upon_recovery_vector = None
Arguments:
- name -- name of setup to be used as a key in model setups dictionary
Keyword arguments:
- preset -- preset setup to be used: "vector-borne" or "host-host", if None, must define all other keyword arguments (default None; None or String)
- **kwargs -- setup parameters and values, which may include the following:
- num_loci -- length of each pathogen genome string (int > 0)
- possible_alleles -- set of possible characters in all genome string, or
at each position in genome string; to specify lists in parameter files, use
prefix
#LIST:
(String or list of Strings with num_loci elements) - allele_groups -- relevant alleles affecting fitness, each element
contains a list of strings, each string contains a group of alleles
that all have equivalent fitness behavior; if single list provided, then
those groups are used for all loci; to specify lists in parameter files, use
prefix
#LIST:
(list of lists of Strings) - max_depth_host -- maximum number of mutations considered when evaluating establishment rates in hosts (integer >0)
- max_depth_vector -- maximum number of mutations considered when evaluating establishment rates in vectors (integer >0)
- peak_pathogen_population_host -- peak intrahost pathogen population (integer >0)
- steady_pathogen_population_host -- semi steady-state intrahost pathogen population (integer >0)
- peak_pathogen_population_vector -- peak intrahost pathogen population (integer >0)
- steady_pathogen_population_vector -- semi steady-state intrahost pathogen population (integer >0)
- population_threshold_host -- any intrahost variant that still drifts over this threshold is assumed to always be under drift; =1/selection_threshold (number >0)
- population_threshold_vector -- any intravector variant that still drifts over this threshold is assumed to always be under drift; =1/selection_threshold (number >0)
- selection_threshold_host -- any intrahost variant with a selection coefficient under this threshold is assumed to always be under drift; =1/selection_threshold (number >0)
- selection_threshold_vector -- any intravector variant with a selection coefficient under this threshold is assumed to always be under drift; =1/selection_threshold (number >0)
- generation_time_host -- pathogen replication cycle time in hosts (number >0)
- generation_time_vector -- pathogen replication cycle time in vectors (number >0)
- max_generation_size_host -- maximum growth within hosts, in units of pathogens per replication cycle (number>1)
- max_generation_size_vector -- maximum growth within vectors, in units of pathogens per replication cycle (number>1)
- max_generations_survival_host -- number of generations used to compute rates and probabilities of genotype emergence in hosts (integer)
- max_generations_survival_vector -- number of generations used to compute rates and probabilities of genotype emergence in vectors (integer)
- fitnessHost -- function that evaluates relative fitness in head-to-head competition for different genomes within the same host (function object, takes a String argument and returns a number >= 0)
- contactHost -- function that returns coefficient modifying probability of a given host being chosen to be the infector in a contact event, based on genome sequence of pathogen (function object, takes a String argument and returns a number 0-1)
- receiveContactHost -- function that returns coefficient modifying probability of a given host being chosen to be the infected in a contact event, based on genome sequence of pathogen
- mortalityHost -- function that returns coefficient modifying death rate for a given host, based on genome sequence of pathogen (function object, takes a String argument and returns a number 0-1)
- natalityHost -- function that returns coefficient modifying birth rate for a given host, based on genome sequence of pathogen (function object, takes a String argument and returns a number 0-1)
- recoveryHost -- function that returns coefficient modifying recovery rate for a given host based on genome sequence of pathogen (function object, takes a String argument and returns a number 0-1)
- migrationHost -- function that returns coefficient modifying migration rate for a given host based on genome sequence of pathogen (function object, takes a String argument and returns a number 0-1)
- populationContactHost -- function that returns coefficient modifying population contact rate for a given host based on genome sequence of pathogen (function object, takes a String argument and returns a number 0-1)
- receivePopulationContactHost -- function that returns coefficient modifying probability of a given host being chosen to be the infected in a population contact event, based on genome sequence of pathogen (function object, takes a String argument and returns a number 0-1)
- mutationHost -- function that returns coefficient modifying mutation rate for a given host based on genome sequence of pathogen (function object, takes a String argument and returns a number 0-1)
- recombinationHost -- function that returns coefficient modifying recombination rate for a given host based on genome sequence of pathogen (function object, takes a String argument and returns a number 0-1)
- fitnessVector -- function that evaluates relative fitness in head-to- head competition for different genomes within the same vector (function object, takes a String argument and returns a number >= 0)
- contactVector -- function that returns coefficient modifying probability of a given vector being chosen to be the infector in a contact event, based on genome sequence of pathogen (function object, takes a String argument and returns a number 0-1)
- receiveContactVector -- function that returns coefficient modifying probability of a given vector being chosen to be the infected in a contact event, based on genome sequence of pathogen (function object, takes a String argument and returns a number 0-1)
- mortalityVector -- function that returns coefficient modifying death rate for a given vector, based on genome sequence of pathogen (function object, takes a String argument and returns a number 0-1)
- natalityVector -- function that returns coefficient modifying birth rate for a given vector, based on genome sequence of pathogen (function object, takes a String argument and returns a number 0-1)
- recoveryVector -- function that returns coefficient modifying recovery rate for a given vector based on genome sequence of pathogen (function object, takes a String argument and returns a number 0-1)
- migrationVector -- function that returns coefficient modifying migration rate for a given vector based on genome sequence of pathogen (function object, takes a String argument and returns a number 0-1)
- populationContactVector -- function that returns coefficient modifying population contact rate for a given vector based on genome sequence of pathogen (function object, takes a String argument and returns a number 0-1)
- receivePopulationContactVector -- function that returns coefficient modifying probability of a given vector being chosen to be the infected in a population contact event, based on genome sequence of pathogen (function object, takes a String argument and returns a number 0-1)
- mutationVector -- function that returns coefficient modifying mutation rate for a given vector based on genome sequence of pathogen (function object, takes a String argument and returns a number 0-1)
- recombinationVector -- function that returns coefficient modifying recombination rate for a given vector based on genome sequence of pathogen (function object, takes a String argument and returns a number 0-1)
- contact_rate_host_vector -- ("biting") rate of host-vector contact events, not necessarily transmission, assumes constant population density; events/(vector*time) (number >= 0)
- transmission_efficiency_host_vector -- fraction of host-vector contacts that result in successful transmission
- transmission_efficiency_vector_host -- fraction of vector-host contacts that result in successful transmission
- contact_rate_host_host -- rate of host-host contact events, not necessarily transmission, assumes constant population density; events/time (number >= 0)
- transmission_efficiency_host_host -- fraction of host-host contacts that result in successful transmission
- mean_inoculum_host -- mean number of pathogens that are transmitted from a vector or host into a new host during a contact event (int >= 0)
- mean_inoculum_vector -- mean number of pathogens that are transmitted from a host to a vector during a contact event (int >= 0)
- variance_inoculum_host -- variance in number of pathogens that are transmitted from a host/vector to a host during a contact event (num >=0)
- variance_inoculum_vector -- variance in number of pathogens that are transmitted from a host to a vector during a contact (num >=0)
- recovery_rate_host -- rate at which hosts clear all pathogens; 1/time (number >= 0)
- recovery_rate_vector -- rate at which vectors clear all pathogens; 1/time (number >= 0)
- recovery_rate_vector -- rate at which vectors clear all pathogens; 1/time (number >= 0)
- mortality_rate_host -- rate at which infected hosts die from disease; 1/time (number >= 0)
- mortality_rate_vector -- rate at which infected vectors die from disease; 1/time (number >= 0)
- recombine_in_host -- rate at which recombination occurs in host; events/time (number >= 0)
- recombine_in_vector -- rate at which recombination occurs in vector; events/time (number >= 0)
- num_crossover_host -- mean of a Poisson distribution modeling the number of crossover events of host recombination events (number >= 0)
- num_crossover_vector -- mean of a Poisson distribution modeling the number of crossover events of vector recombination events (number >= 0)
- mutate_in_host -- rate at which mutation occurs in host; events/generation (number >= 0)
- mutate_in_vector -- rate at which mutation occurs in vector; events/generation (number >= 0)
- death_rate_host -- natural host death rate; 1/time (number >= 0)
- death_rate_vector -- natural vector death rate; 1/time (number >= 0)
- birth_rate_host -- infected host birth rate; 1/time (number >= 0)
- birth_rate_vector -- infected vector birth rate; 1/time (number >= 0)
- vertical_transmission_host -- probability that a host is infected by its parent at birth (number 0-1)
- vertical_transmission_vector -- probability that a vector is infected by its parent at birth (number 0-1)
- inherit_protection_host -- probability that a host inherits all protection sequences from its parent (number 0-1)
- inherit_protection_vector -- probability that a vector inherits all protection sequences from its parent (number 0-1)
- protection_upon_recovery_host -- defines indexes in genome string that define substring to be added to host protection sequences after recovery (None or array-like of length 2 with int 0-num_loci)
- protection_upon_recovery_vector -- defines indexes in genome string that define substring to be added to vector protection sequences after recovery (None or array-like of length 2 with int 0-num_loci)
saveSetup(setup_id, save_to_file)
Saves Setup parameters to given file location as a CSV file. Functions (e.g. fitness functions) cannot be saved in this format.
Arguments:
- setup_id -- name of setup used as a key in setups dictionary
- save_to_file -- file path and name to save parameters under (String)
loadSetup(setup_id, file, preset=None)
Loads Setup parameters from CSV file at given location.
Arguments:
- setup_id -- name of setup to be used as a key in setups dictionary
- file -- file path to CSV file with parameters (String)
Keyword arguments:
- preset -- if using preset parameters, 'host-host' or 'vector-borne' (String, default None)
- **kwargs -- setup parameters and values
newIntervention(time, method_name, args)
Create a new intervention to be carried out at a specific time.
Arguments:
- time -- time at which intervention will take place (number)
- method_name -- intervention to be carried out, must correspond to the name of a method of the Model object (String)
- args -- contains arguments for function in positinal order (array-like)
addCustomConditionTracker(condition_id, trackerFunction)
Add a function to track occurrences of custom events in simulation.
Adds function trackerFunction to dictionary custom_condition_trackers under key condition_id. Function trackerFunction will be executed at every event in the simulation. Every time True is returned, the simulation time will be stored under the corresponding condition_id key inside global_trackers['custom_condition']
Arguments:
- condition_id -- ID of this specific condition (String)
- trackerFunction -- function that take a Model object as argument and returns True or False; (Function)
run(
t0,tf,method='approximated',time_sampling=0,host_sampling=0,vector_sampling=0,
skip_uninfected=False)
Simulate model for a specified time between two time points.
Simulates a time series using the Gillespie algorithm.
Saves a dictionary containing model state history, with keys=times and values=Model objects with model snapshot at that time point under this model's history attribute.
Arguments:
- t0 -- initial time point to start simulation at (number)
- tf -- initial time point to end simulation at (number)
Keyword arguments:
- method -- algorithm to be used; default is approximated solver ('approximated' or 'exact'; default 'approximated')
- dt_leap -- time leap size used to simulate bursts; if None, set to minimum growth threshold time across all populations (number, default None)
- dt_thre -- time threshold below which bursts are used; if None, set to dt_leap (number, default None)
- time_sampling -- how many events to skip before saving a snapshot of the system state (saves all by default), if <0, saves only final state (int, default 0)
- host_sampling -- how many hosts to skip before saving one in a snapshot of the system state (saves all by default) (int, default 0)
- vector_sampling -- how many vectors to skip before saving one in a
snapshot of the system state (saves all by default) (int, default 0) - skip_uninfected -- whether to save only infected hosts/vectors and record the number of uninfected host/vectors instead (Boolean, default False)
runReplicates(
t0,tf,replicates,method='approximated',host_sampling=0,vector_sampling=0,
skip_uninfected=False,**kwargs)
Simulate replicates of a model, save only end results.
Simulates replicates of a time series using a variation of the exact Gillespie algorithm (can also use the tau-leaping method).
Saves a dictionary containing model end state state, with keys=times and values=Model objects with model snapshot. The time is the final timepoint.
Arguments:
- t0 -- initial time point to start simulation at (number >= 0)
- tf -- initial time point to end simulation at (number >= 0)
- replicates -- how many replicates to simulate (int >= 1)
Keyword arguments:
- method -- algorithm to be used; default is approximated solver ('approximated' or 'exact'; default 'approximated')
- dt_leap -- time leap size used to simulate bursts; if None, set to minimum growth threshold time across all populations (number, default None)
- dt_thre -- time threshold below which bursts are used; if None, set to dt_leap (number, default None)
- host_sampling -- how many hosts to skip before saving one in a snapshot of the system state (saves all by default) (int >= 0, default 0)
- vector_sampling -- how many vectors to skip before saving one in a snapshot of the system state (saves all by default) (int >= 0, default 0)
- skip_uninfected -- whether to save only infected hosts/vectors and record the number of uninfected host/vectors instead (Boolean, default False)
- **kwargs -- additional arguents for joblib multiprocessing
Returns: List of Model objects with the final snapshots
runParamSweep(
t0,tf,setup_id,
param_sweep_dic={},pop_ids_param_sweep=[],
host_population_size_sweep={}, vector_population_size_sweep={},
host_migration_sweep_dic={}, vector_migration_sweep_dic={},
host_host_population_contact_sweep_dic={},
host_vector_population_contact_sweep_dic={},
vector_host_population_contact_sweep_dic={},
replicates=1,method='approximated',host_sampling=0,vector_sampling=0,
skip_uninfected=False,n_cores=0,
**kwargs)
Simulate a parameter sweep with a model, save only end results.
Simulates replicates of a time series using a variation of the exact Gillespie algorithm (can also use the tau-leaping method).
Saves a dictionary containing model end state state, with keys=times and values=Model objects with model snapshot. The time is the final timepoint.
Arguments:
- t0 -- initial time point to start simulation at (number >= 0)
- tf -- initial time point to end simulation at (number >= 0)
- setup_id -- ID of setup to be assigned (String)
Keyword Arguments:
- param_sweep_dic -- dictionary with keys=parameter names (attributes of Setup), values=list of values for parameter (list, class of elements depends on parameter)
- host_population_size_sweep -- dictionary with keys=population IDs (Strings), values=list of values with host population sizes (must be greater than original size set for each population, list of numbers)
- vector_population_size_sweep -- dictionary with keys=population IDs (Strings), values=list of values with vector population sizes (must be greater than original size set for each population, list of numbers)
- host_migration_sweep_dic -- dictionary with keys=population IDs of origin and destination, separated by a colon ';' (Strings), values=list of values (list of numbers)
- vector_migration_sweep_dic -- dictionary with keys=population IDs of origin and destination, separated by a colon ';' (Strings), values=list of values (list of numbers)
- host_host_population_contact_sweep_dic -- dictionary with keys=population IDs of origin and destination, separated by a colon ';' (Strings), values=list of values (list of numbers)
- host_vector_population_contact_sweep_dic -- dictionary with keys=population IDs of origin and destination, separated by a colon ';' (Strings), values=list of values (list of numbers)
- vector_host_population_contact_sweep_dic -- dictionary with keys=population IDs of origin and destination, separated by a colon ';' (Strings), values=list of values (list of numbers)
- replicates -- how many replicates to simulate (int >= 1)
- method -- algorithm to be used; default is approximated solver (can be either 'approximated' or 'exact')
- dt_leap -- time leap size used to simulate bursts; if None, set to minimum growth threshold time across all populations (number, default None)
- dt_thre -- time threshold below which bursts are used; if None, set to dt_leap (number, default None)
- host_sampling -- how many hosts to skip before saving one in a snapshot of the system state (saves all by default) (int >= 0, default 0)
- vector_sampling -- how many vectors to skip before saving one in a snapshot of the system state (saves all by default) (int >= 0, default 0)
- skip_uninfected -- whether to save only infected hosts/vectors and record the number of uninfected host/vectors instead (Boolean, default False)
- n_cores -- number of cores to parallelize file export across, if 0, all cores available are used (default 0; int >= 0)
- **kwargs -- additional arguents for joblib multiprocessing
Returns:
- DataFrame with parameter combinations, list of Model objects with the final snapshots
copyState(host_sampling=0,vector_sampling=0,skip_uninfected=False)
Returns a slimmed-down representation of the current model state.
Keyword arguments:
- host_sampling -- how many hosts to skip before saving one in a snapshot of the system state (saves all by default) (int >= 0, default 0)
- vector_sampling -- how many vectors to skip before saving one in a snapshot of the system state (saves all by default) (int >= 0, default 0)
- skip_uninfected -- whether to save only infected hosts/vectors and record the number of uninfected host/vectors instead (Boolean, default False)
Returns: Model object with current population host and vector lists.
deepCopy()
Returns a full copy of the current model with inner references.
Returns: copied Model object
saveToDataFrame(save_to_file,n_cores=0,**kwargs)
Save status of model to dataframe, write to file location given.
Creates a pandas Dataframe in long format with the given model history, with one host or vector per simulation time in each row, and columns:
- Time - simulation time of entry
- Population - ID of this host/vector's population
- Organism - host/vector
- ID - ID of host/vector
- Pathogens - all genomes present in this host/vector separated by ;
- Protection - all genomes present in this host/vector separated by ;
- Alive - whether host/vector is alive at this time, True/False
Arguments:
- save_to_file -- file path and name to save model data under (String)
Keyword Arguments:
- n_cores -- number of cores to parallelize file export across, if 0, all cores available are used (default 0; int)
- **kwargs -- additional arguents for joblib multiprocessing
Returns:
- pandas dataframe with model history as described above
getCompositionData(
data=None, populations=[], type_of_composition='Pathogens',
hosts=True, vectors=False, num_top_sequences=-1,
track_specific_sequences=[], genomic_positions=[],
count_individuals_based_on_model=None, save_data_to_file="", n_cores=0,
**kwargs)
Create dataframe with counts for pathogen genomes or resistance.
Creates a pandas Dataframe with dynamics of the pathogen strains or protection sequences across selected populations in the model, with one time point in each row and columns for pathogen genomes or protection sequences.
Of note: sum of totals for all sequences in one time point does not necessarily equal the number of infected hosts and/or vectors, given multiple infections in the same host/vector are counted separately.
Keyword Arguments:
- data -- dataframe with model history as produced by saveToDf function; if None, computes this dataframe and saves it under 'raw_data_'+save_data_to_file (DataFrame, default None)
- populations -- IDs of populations to include in analysis; if empty, uses all populations in model (default empty list; list of Strings)
- type_of_composition -- field of data to count totals of, can be either 'Pathogens' or 'Protection' (default 'Pathogens'; String)
- hosts -- whether to count hosts (default True, Boolean)
- vectors -- whether to count vectors (default False, Boolean)
- num_top_sequences -- how many sequences to count separately and include as columns, remainder will be counted under column "Other"; if <0, includes all genomes in model (default -1; int)
- track_specific_sequences -- contains specific sequences to have as a separate column if not part of the top num_top_sequences sequences (default empty list; list of Strings)
- genomic_positions -- list in which each element is a list with loci positions to extract (e.g. genomic_positions=[ [0,3], [5,6] ] extracts positions 0, 1, 2, and 5 from each genome); if empty, takes full genomes(default empty list; list of lists of int)
- count_individuals_based_on_model -- Model object with populations and fitness functions used to evaluate the most fit pathogen genome in each host/vector in order to count only a single pathogen per host/vector, asopposed to all pathogens within each host/vector; if None, counts all pathogens (default None; None or Model)
- save_data_to_file -- file path and name to save model data under, no saving occurs if empty string (default ''; String)
- n_cores -- number of cores to parallelize processing across, if 0, all cores available are used (default 0; int)
- **kwargs -- additional arguents for joblib multiprocessing
Returns:
- pandas dataframe with model sequence composition dynamics as described above
getPathogens(dat, save_to_file="")
Create Dataframe with counts for all pathogen genomes in data.
Returns sorted pandas Dataframe with counts for occurrences of all pathogen genomes in data passed.
Arguments:
- data -- dataframe with model history as produced by saveToDf function
Keyword Arguments:
- save_to_file -- file path and name to save model data under, no saving occurs if empty string (default ''; String)
Returns:
- pandas dataframe with Series as described above
getProtections(dat, save_to_file="")
Create Dataframe with counts for all protection sequences in data.
Returns sorted pandas Dataframe with counts for occurrences of all protection sequences in data passed.
Arguments:
- data -- dataframe with model history as produced by saveToDf function
Keyword Arguments:
- save_to_file -- file path and name to save model data under, no saving
- occurs if empty string (default ''; String)
Returns:
- pandas dataframe with Series as described above
populationsPlot(
file_name, data, compartment='Infected',
hosts=True, vectors=False, num_top_populations=7,
track_specific_populations=[], save_data_to_file="",
x_label='Time', y_label='Hosts', figsize=(8, 4), dpi=200,
palette=CB_PALETTE, stacked=False)
Create plot with aggregated totals per population across time.
Creates a line or stacked line plot with dynamics of a compartment across populations in the model, with one line for each population.
A host or vector is considered part of the recovered compartment if it has protection sequences of any kind and is not infected.
Arguments:
- file_name -- file path, name, and extension to save plot under (String)
- data -- dataframe with model history as produced by saveToDf function (DataFrame)
Keyword Arguments:
- compartment -- subset of hosts/vectors to count totals of, can be either 'Naive','Infected','Recovered', or 'Dead' (default 'Infected'; String)
- hosts -- whether to count hosts (default True, Boolean)
- vectors -- whether to count vectors (default False, Boolean)
- num_top_populations -- how many populations to count separately and include as columns, remainder will be counted under column "Other"; if <0, includes all populations in model (default 7; int)
- track_specific_populations -- contains IDs of specific populations to have as a separate column if not part of the top num_top_populations populations (list of Strings)
- save_data_to_file -- file path and name to save model plot data under, no saving occurs if empty string (default ''; String)
- x_label -- X axis title (default 'Time', String)
- y_label -- Y axis title (default 'Hosts', String)
- legend_title -- legend title (default 'Population', String)
- legend_values -- labels for each trace, if empty list, uses population IDs (default empty list, list of Strings)
- figsize -- dimensions of figure (default (8,4), array-like of two ints)
- dpi -- figure resolution (default 200, int)
- palette -- color palette to use for traces (default CB_PALETTE, list of color Strings)
- stacked -- whether to draw a regular line plot or a stacked one (default False, Boolean)
Returns:
- axis object for plot with model population dynamics as described above
compartmentPlot(
file_name, data, populations=[], hosts=True, vectors=False,
save_data_to_file="", x_label='Time', y_label='Hosts',
figsize=(8, 4), dpi=200, palette=CB_PALETTE, stacked=False)
Create plot with number of naive, infected, recovered, dead hosts/vectors vs. time.
Creates a line or stacked line plot with dynamics of all compartments (naive, infected, recovered, dead) across selected populations in the model, with one line for each compartment.
A host or vector is considered recovered if it has protection sequences of any kind and is not infected.
Arguments:
- file_name -- file path, name, and extension to save plot under (String)
- data -- dataframe with model history as produced by saveToDf function (DataFrame)
Keyword Arguments:
- populations -- IDs of populations to include in analysis; if empty, uses
- all populations in model (default empty list; list of Strings)
- hosts -- whether to count hosts (default True, Boolean)
- vectors -- whether to count vectors (default False, Boolean)
- save_data_to_file -- file path and name to save model data under, no saving
- occurs if empty string (default ''; String)
- x_label -- X axis title (default 'Time', String)
- y_label -- Y axis title (default 'Hosts', String)
- legend_title -- legend title (default 'Population', String)
- legend_values -- labels for each trace, if empty list, uses population IDs (default empty list, list of Strings)
- figsize -- dimensions of figure (default (8,4), array-like of two ints)
- dpi -- figure resolution (default 200, int)
- palette -- color palette to use for traces (default CB_PALETTE, list of color Strings)
- stacked -- whether to draw a regular line plot or a stacked one (default False, Boolean)
Returns:
- axis object for plot with model compartment dynamics as described above
compositionPlot(
file_name, data, composition_dataframe=None, populations=[],
type_of_composition='Pathogens', hosts=True, vectors=False,
num_top_sequences=7, track_specific_sequences=[],
genomic_positions=[], count_individuals_based_on_model=None,
remove_legend=False, population_fraction=False,
save_data_to_file="", x_label='Time', y_label='Infections',
legend_title='Genotype', legend_values=[],
figsize=(8, 4), dpi=200, palette=CB_PALETTE, stacked=True,
**kwargs)
Create plot with counts for pathogen genomes or resistance vs. time.
Creates a line or stacked line plot with dynamics of the pathogen strains or protection sequences across selected populations in the model, with one line for each pathogen genome or protection sequence being shown.s
Of note: sum of totals for all sequences in one time point does not necessarily equal the number of infected hosts and/or vectors, given multiple infections in the same host/vector are counted separately.
Arguments:
- file_name -- file path, name, and extension to save plot under (String)
- data -- dataframe with model history as produced by saveToDf function
Keyword Arguments:
- composition_dataframe -- output of compositionDf() if already computed (Pandas DataFrame, None by default)
- populations -- IDs of populations to include in analysis; if empty, uses all populations in model (default empty list; list of Strings)
- type_of_composition -- field of data to count totals of, can be either 'Pathogens' or 'Protection' (default 'Pathogens'; String)
- hosts -- whether to count hosts (default True, Boolean)
- vectors -- whether to count vectors (default False, Boolean)
- num_top_sequences -- how many sequences to count separately and include as columns, remainder will be counted under column "Other"; if <0, includes all genomes in model (default 7; int)
- track_specific_sequences -- contains specific sequences to have as a separate column if not part of the top num_top_sequences sequences (list of Strings)
- genomic_positions -- list in which each element is a list with loci positions to extract (e.g. genomic_positions=[ [0,3], [5,6] ] extracts positions 0, 1, 2, and 5 from each genome); if empty, takes full genomes (default empty list; list of lists of int)
- count_individuals_based_on_model -- Model object with populations and fitness functions used to evaluate the most fit pathogen genome in each host/vector in order to count only a single pathogen per host/vector, as opposed to all pathogens within each host/vector; if None, counts all pathogens (default None; None or Model)
- save_data_to_file -- file path and name to save model data under, no saving occurs if empty string (default ''; String)
- x_label -- X axis title (default 'Time', String)
- y_label -- Y axis title (default 'Hosts', String)
- legend_title -- legend title (default 'Population', String)
- legend_values -- labels for each trace, if empty list, uses population IDs (default empty list, list of Strings)
- figsize -- dimensions of figure (default (8,4), array-like of two ints)
- dpi -- figure resolution (default 200, int)
- palette -- color palette to use for traces (default CB_PALETTE, list of color Strings)
- stacked -- whether to draw a regular line plot instead of a stacked one (default False, Boolean).
- remove_legend -- whether to print the sequences on the figure legend instead of printing them on a separate csv file (default True; Boolean)
- population_fraction -- whether to graph fractions of pathogen population instead of pathogen counts (default False, Boolean)
- **kwargs -- additional arguents for joblib multiprocessing
Returns:
- axis object for plot with model sequence composition dynamics as described
clustermap(file_name, data, num_top_sequences=-1,
track_specific_sequences=[], seq_names=[], n_cores=0, method='weighted',
metric='euclidean',save_data_to_file="", legend_title='Distance',
legend_values=[], figsize=(10,10), dpi=200, color_map=DEF_CMAP)
Create a heatmap and dendrogram for pathogen genomes in data passed.
Arguments:
- file_name -- file path, name, and extension to save plot under (String)
- data -- dataframe with model history as produced by saveToDf function
Keyword arguments:
- num_top_sequences -- how many sequences to include in matrix; if <0, includes all genomes in data passed (default -1; int)
- track_specific_sequences -- contains specific sequences to include in matrixif not part of the top num_top_sequences sequences (default empty list; list of Strings)
- seq_names -- list with names to be used for sequence labels in matrix must be of same length as number of sequences to be displayed; if empty, uses sequences themselves (default empty list; list of Strings)
- n_cores -- number of cores to parallelize distance compute across, if 0, all cores available are used (default 0; int)
- method -- clustering algorithm to use with seaborn clustermap (default 'weighted'; String)
- metric -- distance metric to use with seaborn clustermap (default 'euclidean'; String)
- save_data_to_file -- file path and name to save model data under, no saving occurs if empty string (default ''; String)
- legend_title -- legend title (default 'Distance', String)
- figsize -- dimensions of figure (default (8,4), array-like of two ints)
- dpi -- figure resolution (default 200, int)
- color_map -- color map to use for traces (default DEF_CMAP, cmap object)
Returns:
- figure object for plot with heatmap and dendrogram as described
pathogenDistanceHistory(data, samples=-1, num_top_sequences=-1,
track_specific_sequences=[], seq_names=[], n_cores=0, save_to_file='')
Create a long-format dataframe with pairwise distances for pathogen genomes in data passed for different time points.
Arguments: data -- dataframe with model history as produced by saveToDf function
Keyword Arguments:
- samples -- how many timepoints to uniformly sample from the total timecourse; if <0, takes all timepoints (default -1; int)
- num_top_sequences -- how many sequences to include in matrix; if <0, includes all genomes in data passed (default -1; int)
- track_specific_sequences -- contains specific sequences to include in matrixif not part of the top num_top_sequences sequences (default empty list; list of Strings)
- seq_names -- list with names to be used for sequence labels in matrix must be of same length as number of sequences to be displayed; if empty, uses sequences themselves (default empty list; list of Strings)
- n_cores -- number of cores to parallelize distance compute across, if 0, all cores available are used (default 0; int)
- method -- clustering algorithm to use with seaborn clustermap (default 'weighted'; String)
- metric -- distance metric to use with seaborn clustermap (default 'euclidean'; String)
- save_data_to_file -- file path and name to save model data under, no saving occurs if empty string (default ''; String)
Returns:
- long-format Pandas dataframe with pairwise distances for pathogen genomes in data passed for different time points.
getGenomeTimes(
data, samples=-1, num_top_sequences=-1, track_specific_sequences=[],
seq_names=[], n_cores=0, save_to_file='')
Create DataFrame with times genomes first appeared during simulation.
Arguments:
- data -- dataframe with model history as produced by saveToDf function
Keyword arguments:
- samples -- how many timepoints to uniformly sample from the total timecourse; if <0, takes all timepoints (default 1; int)
- save_to_file -- file path and name to save model data under, no saving occurs if empty string (default ''; String)
- n_cores -- number of cores to parallelize across, if 0, all cores available are used (default 0; int)
Returns:
- pandas dataframe with genomes and times as described above
visualizeMutationNetwork(setup_id,landscape_id,file_name,
toggle_physics=True,
node_color='rgba(215,140,10,1)',
peak_border_color='rgba(150,100,10,1)',
edge_color='rgba(215,190,150,1)', show_labels=True)
Create a network visualization for pathogen genomes in landscape
Arguments:
- setup_id -- ID of setup with associated parameters (String)
- landscape_id -- ID of landscape (String)
- file_name -- file path and name to save html graph under (String)
- toggle_physics -- whether graph moves (Boolean)
- node_color -- node color (String)
- peak_border_color -- color of borders on peak nodes (String)
- edge_color -- edge color (String)
- show_labels -- whether to show genomes on nodes (Boolean)
newPopulation(id, setup_id, num_hosts=100, num_vectors=100)
Create a new Population object with setup parameters.
If population ID is already in use, appends _2 to it
Arguments:
- id -- unique identifier for this population in the model (String)
- setup_id -- setup object with parameters for this population (Setup)
Keyword Arguments:
- num_hosts -- number of hosts to initialize population with (default 100; int)
- num_vectors -- number of hosts to initialize population with (default 100; int)
linkPopulationsHostMigration(pop1_id, pop2_id, rate)
Set host migration rate from one population towards another.
Arguments:
- pop1_id -- origin population for which migration rate will be specified (String)
- pop1_id -- destination population for which migration rate will be specified (String)
- rate -- migration rate from one population to the neighbor; events/time (number >= 0)
linkPopulationsVectorMigration(pop1_id, pop2_id, rate)
Set vector migration rate from one population towards another.
Arguments:
- pop1_id -- origin population for which migration rate will be specified (String)
- pop1_id -- destination population for which migration rate will be specified (String)
- rate -- migration rate from one population to the neighbor; events/time (number >= 0)
linkPopulationsHostHostContact(pop1_id, pop2_id, rate)
Set host-host inter-population contact rate from one population towards another.
Arguments:
- pop1_id -- origin population for which migration rate will be specified (String)
- pop1_id -- destination population for which migration rate will be specified (String)
- rate -- migration rate from one population to the neighbor; events/time (number >= 0)
linkPopulationsHostVectorContact(pop1_id, pop2_id, rate)
Set host-vector inter-population contact rate from one population to another.
Arguments:
- pop1_id -- origin population for which migration rate will be specified (String)
- pop1_id -- destination population for which migration rate will be specified (String)
- rate -- migration rate from one population to the neighbor; events/time (number >= 0)
linkPopulationsVectorHostContact(pop1_id, pop2_id, rate)
Set vector-host inter-population contact rate from one population to another.
Arguments:
- pop1_id -- origin population for which migration rate will be specified (String)
- pop1_id -- destination population for which migration rate will be specified (String)
- rate -- migration rate from one population to the neighbor; events/time (number >= 0)
createInterconnectedPopulations(
num_populations, id_prefix, setup_id,
host_migration_rate=0, vector_migration_rate=0,
host_host_contact_rate=0,
host_vector_contact_rate=0, vector_host_contact_rate=0,
num_hosts=100, num_vectors=100)
Create new populations, link all of them to each other.
All populations in this cluster are linked with the same migration rate, starting number of hosts and vectors, and setup parameters. Their IDs are numbered onto prefix given as 'id_prefix_0', 'id_prefix_1', 'id_prefix_2', etc.
Arguments:
- num_populations -- number of populations to be created (int)
- id_prefix -- prefix for IDs to be used for this population in the model, (String)
- setup_id -- setup object with parameters for all populations (Setup)
Keyword arguments:
- host_migration_rate -- host migration rate between populations; events/time (default 0; number >= 0)
- vector_migration_rate -- vector migration rate between populations; events/time (default 0; number >= 0)
- host_host_contact_rate -- host-host inter-population contact rate between populations; events/time (default 0; number >= 0)
- host_vector_contact_rate -- host-vector inter-population contact rate between populations; events/time (default 0; number >= 0)
- vector_host_contact_rate -- vector-host inter-population contact rate between populations; events/time (default 0; number >= 0)
- num_hosts -- number of hosts to initialize population with (default 100; int)
- num_vectors -- number of hosts to initialize population with (default 100; int)
newHostGroup(pop_id, group_id, num_hosts, healthy=False)
Return a list of random (healthy or any) hosts in population.
Arguments:
- pop_id -- ID of population to be modified (String)
- group_id -- ID to call this group by (String)
- num_vectors -- number of vectors to be sampled randomly (int)
Keyword Arguments:
- healthy -- whether to sample healthy hosts only (default True; Boolean)
Returns:
- list containing sampled hosts
newVectorGroup(pop_id, group_id, num_vectors, healthy=False)
Return a list of random (healthy or any) vectors in population.
Arguments:
- pop_id -- ID of population to be modified (String)
- group_id -- ID to call this group by (String)
- num_vectors -- number of vectors to be sampled randomly (int)
Keyword Arguments:
- healthy -- whether to sample healthy vectors only (default True; Boolean)
Returns:
- list containing sampled vectors
addHosts(pop_id, num_hosts)
Add a number of healthy hosts to population, return list with them.
Arguments:
- pop_id -- ID of population to be modified (String)
- num_hosts -- number of hosts to be added (int)
Returns:
- list containing new hosts
addVectors(pop_id, num_vectors)
Add a number of healthy vectors to population, return list with them.
Arguments:
- pop_id -- ID of population to be modified (String)
- num_vectors -- number of vectors to be added (int)
Returns:
- list containing new vectors
removeHosts(pop_id, num_hosts_or_list)
Remove a number of specified or random hosts from population.
Arguments:
- pop_id -- ID of population to be modified (String)
- num_hosts_or_list -- number of hosts to be sampled randomly for removal or list of hosts to be removed, must be hosts in this population (int or list of Hosts)
removeVectors(pop_id, num_vectors_or_list)
Remove a number of specified or random vectors from population.
Arguments:
- pop_id -- ID of population to be modified (String)
- num_vectors_or_list -- number of vectors to be sampled randomly for removal or list of vectors to be removed, must be vectors in this population (int or list of Vectors)
addPathogensToHosts(pop_id, genomes_numbers, group_id="")
Add specified pathogens to random hosts, optionally from a list.
Arguments:
- pop_id -- ID of population to be modified (String)
- genomes_numbers -- dictionary containing pathogen genomes to add as keys and number of hosts each one will be added to as values (dict with keys=Strings, values=int)
Keyword Arguments:
- group_id -- ID of group to sample hosts to sample from, if empty, samples from whole population (default empty String; String)
addPathogensToVectors(pop_id, genomes_numbers, group_id="")
Add specified pathogens to random vectors, optionally from a list.
Arguments:
- pop_id -- ID of population to be modified (String)
- genomes_numbers -- dictionary containing pathogen genomes to add as keys and number of vectors each one will be added to as values (dict with keys=Strings, values=int)
Keyword Arguments:
- group_id -- ID of group to sample vectors to sample from, if empty, samples from whole population (default empty String; String)
treatHosts(pop_id, frac_hosts, resistance_seqs, group_id="")
Treat random fraction of infected hosts against some infection.
Removes all infections with genotypes susceptible to given treatment. Pathogens are removed if they are missing at least one of the sequences in resistance_seqs from their genome. Removes this organism from population infected list and adds to healthy list if appropriate.
Arguments:
- pop_id -- ID of population to be modified (String)
- frac_hosts -- fraction of hosts considered to be randomly selected (number between 0 and 1)
- resistance_seqs -- contains sequences required for treatment resistance (list of Strings)
Keyword Arguments:
- group_id -- ID of group to sample hosts to sample from, if empty, samples from whole population (default empty String; String)
treatVectors(pop_id, frac_vectors, resistance_seqs, group_id="")
Treat random fraction of infected vectors against some infection.
Removes all infections with genotypes susceptible to given treatment. Pathogens are removed if they are missing at least one of the sequences in resistance_seqs from their genome. Removes this organism from population infected list and adds to healthy list if appropriate.
Arguments:
- pop_id -- ID of population to be modified (String)
- frac_vectors -- fraction of vectors considered to be randomly selected (number between 0 and 1)
- resistance_seqs -- contains sequences required for treatment resistance (list of Strings)
Keyword Arguments:
- group_id -- ID of group to sample vectors to sample from, if empty, samples from whole population (default empty String; String)
protectHosts(pop_id, frac_hosts, protection_sequence, group_id="")
Protect a random fraction of infected hosts against some infection.
Adds protection sequence specified to a random fraction of the hosts specified. Does not cure them if they are already infected.
Arguments:
- pop_id -- ID of population to be modified (String)
- frac_hosts -- fraction of hosts considered to be randomly selected (number between 0 and 1)
- protection_sequence -- sequence against which to protect (String)
Keyword Arguments:
- group_id -- ID of group to sample hosts to sample from, if empty, samples from whole population (default empty String; String)
protectVectors(pop_id, frac_vectors, protection_sequence, group_id="")
Protect a random fraction of infected vectors against some infection.
Adds protection sequence specified to a random fraction of the vectors specified. Does not cure them if they are already infected.
Arguments:
- pop_id -- ID of population to be modified (String)
- frac_vectors -- fraction of vectors considered to be randomly selected (number between 0 and 1)
- protection_sequence -- sequence against which to protect (String)
Keyword Arguments:
- group_id -- ID of group to sample vectors to sample from, if empty, samples from whole population (default empty String; String)
wipeProtectionHosts(pop_id, group_id="")
Removes all protection sequences from hosts.
Arguments:
- pop_id -- ID of population to be modified (String)
Keyword Arguments:
- group_id -- ID of group to sample hosts to sample from, if empty, takes whole population (default empty String; String)
wipeProtectionVectors(pop_id, group_id="")
Removes all protection sequences from vectors.
Arguments:
- pop_id -- ID of population to be modified (String)
Keyword Arguments:
- group_id -- ID of group to sample vectors to sample from, if empty, takes whole population (default empty String; String)
setSetup(pop_id, setup_id)
Assign parameters stored in Setup object to this population.
Arguments:
- pop_id -- ID of population to be modified (String)
- setup_id -- ID of setup to be assigned (String)
newLandscape(setup_id, landscape_id, fitnessFunc=None,
mutate=None, generation_time=None,
population_threshold=None, selection_threshold=None,
max_depth=None, allele_groups=None)
Create a new Landscape
Arguments:
- setup_id -- ID of setup with associated parameters (String)
- landscape_id -- ID of landscape (String)
Keyword arguments:
- fitnessFunc -- fitness function used to evaluate genomes (function taking a genome for argument and returning a fitness value >0, default None)
- mutate -- mutation rate per generation (number>0, default None)
- generation_time -- time between pathogen generations (number>0, default None)
- population_threshold -- pathogen threshold under which drift is assumed to dominate (number >1, default None)
- selection_threshold -- selection coefficient threshold under which drift is assumed to dominate; related to population_threshold (number >1, default None)
- max_depth -- max number of mutations considered when evaluating establishment rates (integer >0, default None)
- allele_groups -- relevant alleles affecting fitness, each element contains a list of strings, each string contains a group of alleles that all have equivalent fitness behavior (list of lists of Strings)
mapLandscape(setup_id,landscape_id,seed_genomes)
Maps and evaluates relevant mutations given object parameters
Saves result in landscape's mutation_network property.
Arguments:
- setup_id -- ID of setup with associated parameters (String)
- landscape_id -- ID of landscape (String)
- seed_genomes -- genome or list of genomes used as background for mutations (String or list of Strings)
saveLandscape(setup_id,landscape_id,save_to_file)
Saves mutation network and fitness values stored in landscape
CSV format has the following columns:
- Genome: reduced genome
- Neighbors: list of neighboring reduced genomes, separated by semicolons
- Rates: list of corresponding establishment rates for neighbors, separated by semicolons
- Sum_rates: number with sum of all rates in previous list
Arguments:
- setup_id -- ID of setup with associated parameters (String)
- landscape_id -- ID of landscape (String)
- save_to_file -- file path and name to save model data under (String)
loadLandscape(setup_id,landscape_id,file)
Loads mutation network and fitness from file path
CSV format has the following columns:
- Genome: reduced genome
- Neighbors: list of neighboring reduced genomes, separated by semicolons
- Rates: list of corresponding establishment rates for neighbors, separated by semicolons
- Sum_rates: number with sum of all rates in previous list
Arguments:
- file -- file path and name to save model data under (String)
customModelFunction(function)
Returns output of given function, passing this model as a parameter.
Arguments:
- function -- function to be evaluated; must take a Model object as the only parameter (function)
Returns:
- Output of function passed as parameter
peakLandscape(genome, peak_genome, min_value)
Evaluate a genome's numerical phenotype by decreasing with distance from optimal seq.
Originally meant as a purifying selection fitness function based on exponential decay of fitness as genomes move away from the optimal sequence. Distance is measured as percent Hamming distance from an optimal genome sequence.
Can be used to evaluate mortality as well as transmissibility.
Arguments:
- genome -- the genome to be evaluated (String)
- peak_genome -- the genome sequence to measure distance against, has value of 1 (String)
- min_value -- minimum value at maximum distance from optimal genome (number > 0)
Returns:
- value of genome (number)
valleyLandscape(genome, worst_genome, min_fitness)
Evaluate a genome's numerical phenotype by increasing with distance from worst seq.
Originally meant as a disruptive selection fitness function based on exponential decay of fitness as genomes move closer to the worst possible sequence. Distance is measured as percent Hamming distance from the worst possible genome sequence.
Can be used to evaluate mortality as well as transmissibility.
Arguments:
- genome -- the genome to be evaluated (String)
- valley_genome -- the genome sequence to measure distance against, has value of min_value (String)
- min_value -- fitness value of worst possible genome (number > 0)
Returns:
- value of genome (number)