Skip to content

Site repeats usage

BenoitMorel edited this page Sep 29, 2017 · 3 revisions

Site repeats basic usage

To enable site repeats, the attribute PLL_ATTRIB_SITE_REPEATS should be set when creating the partitions. See https://github.com/xflouris/libpll/wiki/Partition-Attributes for more information about setting attributes.

This attribute is incompatible with the attribute PLL_ATTRIB_PATTERN_TIP, since site repeats are a generalization of the pattern tip optimization.

In most of the use cases, setting this attribute is enough. You should read the advanced usage section if you need to directly access the CLVs or the scalers.

Site repeats advanced usage

To speedup computations and reduce the memory footprint, site repeats compress identical sites at each node. Please make sure that you understood how CLVs are stored when repeats are disabled (https://github.com/xflouris/libpll/wiki/Computing-the-likelihood-of-a-tree).

Accessing CVLs now requires an additional lookup table. Here are several examples on how your code should change when enabling repeats.

Accessing site i at node n

Without site repeats:

unsigned int span = partition->states_padded * partition->rate_cats;
double *clv_n_i = &partition->clv[n][span * i]; 

With site repeats:

unsigned int span = partition->states_padded * partition->rate_cats;
unsigned int *site_id = partition->repeats->pernode_site_id[n];
unsigned int id = PLL_GET_ID(site_id, i);
double *clv_n_i = &partition->clv[n][span * id];

Iterating over the raw CLV at node n (to print or copy for instance)

Without site repeats:

unsigned int span = partition->states_padded * partition->rate_cats;
unsigned int vector_size = span * partition->sites;
for (unsigned int i = 0; i < vector_size; ++i) {
    printf("%f ", partition->clv[n][i]);
} 

With site repeats:

unsigned int span = partition->states_padded * partition->rate_cats;
unsigned int vector_size = span * pll_get_sites_number(partition, n);
for (unsigned int i = 0; i < vector_size; ++i) {
    printf("%f ", partition->clv[n][i]);
} 

Iterate over the uncompressed CLV at node n

Without site repeats:

unsigned int span = partition->states_padded * partition->rate_cats;
unsigned int vector_size = span * partition->sites;
for (unsigned int s = 0; s < partition->size; ++s) { // iterate over sites
    double *clv = &partition->clv[n][s * span]; // get clv for site s
    for (unsigned int i = 0; i < span; ++i) // iterate over the clv rates and states 
        printf("%f ", clv[i]);
} 

With site repeats:

unsigned int span = partition->states_padded * partition->rate_cats;
unsigned int vector_size = span * partition->sites;
unsigned int *site_id = partition->repeats->pernode_site_id[n];
for (unsigned int s = 0; s < partition->size; ++s) { // iterate over sites
    unsigned int id = PLL_GET_ID(site_id, s); // get position of the site s in the compressed vector
    double *clv = &partition->clv[n][id * span]; // get clv for site s
    for (unsigned int i = 0; i < span; ++i) // iterate over the clv rates and states 
        printf("%f ", clv[i]);
}