Skip to content

This generator can be used to generate a specified number of phylogenetic trees (or clusters of trees in the cluster version) in Newick format with a variable number of leaves and with some level of overlap between trees in clusters.

License

Notifications You must be signed in to change notification settings

tahiri-lab/GPTree

Repository files navigation

GPTree

Generator of Phylogenetic trees (for supertrees)


A vesion of the Generator of clusters of phylogenetic trees with overlapping and HGT is here.

This generator (Generator of Phylogenetic trees) can be used to generate a specified number of phylogenetic trees in Newick format with a variable number of leaves and with some level of overlap between trees. With this tool, the user can generate a dataset with trees (particularly, gene trees with horizontal gene transfer implemented), which is saved in txt, with the possibility of its further use in their scientific experiments (e.g., testing classification algorithms or inference supertrees).

The generator is based on the use of the AsymmeTree library. The user has to specify several initial parameters:

  • The minimum possible number of leaves for each tree
  • The maximum possible number of leaves for each tree
  • The average level of overlap (common leaves) between the trees in the set. We will define the level of overlap between two trees as the number of common leaves (between these trees) divided by the summed length of these trees minus the number of common leaves. Note: for the generator of clusters we tested another approach to calculate the average level of overlap between 2 trees.

Initial values set by the user:

  • Lmin = the minimum possible number of leaves for each tree, integer (5<=Lmin<500)
  • Lmax = the maximum possible number of leaves for each tree, integer (Lmin<Lmax<=500)
  • Ngen = the number of trees to be generated, integer (Ngen<=500)
  • Plevel = the average level of overlap (common leaves) between the trees in the set, in decimal notation, from 0.2 to 0.7 with steps of 0.5 (which corresponds to the range from 20% to 70%)

Currently, the generator works very slow for the levels of overlap <0.2 and >0.7.

The basic workflow:

The basic workflow

Output: the generated dataset of the specified number of trees (gene trees and/or species trees) in Newick format is saved in the folder (e.g. genetrees_50.txt file, where the number indicates the level of overlap), from which the code was launched, or in the "Files" section, if launched in colab. Examples of generated datasets see here.

The Jupiter notebook also contains steps for checking the generated dataset (tree visualization, number of trees and leaves, level of overlap).

About

This generator can be used to generate a specified number of phylogenetic trees (or clusters of trees in the cluster version) in Newick format with a variable number of leaves and with some level of overlap between trees in clusters.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published