-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLV Management / Saving mode #19
base: master
Are you sure you want to change the base?
Conversation
…ict with autogen build
…eplacement data field, chagned terminology around re slottable and pinned
…s the PLL_PATTERN_TIP dependant addressable_begin (added *_end also). Fixed wrong addressable_clvs number, now correctly set to tips+buffers
…d; made changes to alloc / dealloc / init accordingly. Various renamings
… replace and pop_stack
…strategy, updated callsite accordingly
…iven clv index. Updated the update function accordingly. Fixed incorrect clv pointer calculation
…ng more systemic though
…, adding a subtree_size field to the unode_t, and adding functions for setting them, as well as a dedicated largest-subtree_first traversal function
…ithout tip-pattern
…tree size equality
… order and applies a callback function
…to compute all possible subtree sizes
…versal identifier const
Here is a quick feedback before leaving for vacations ;-)
I can't look at it in details now, but it looks great to me at a first glance ;-) |
Looks very nice! After reading the description and without looking at the code yet, I have a couple of comments regarding potential raxml-ng integration:
|
Thanks for taking the time, and for the kind words! :)
good point, will add a basic example. Whats already there is the tests, but they can be hard to read
I don't think it will be honestly. A lot of the operations have to be per-clv_buffer anyway, and the overhead regarding the subtree sizes, for example, can just be done once per tree.
I agree, I would really like to have it in. In epa-ng it currently induces quite some overhead, especially when having to recompute all possible branches, so anything that can speed that up would be great. |
agreed, its high on the list.
Thats a good quick way around it, but probably we would want to experiment if introducing a mutex for every slot would be worth it. Either way it should be optional to minimize unneeded overhead for differing par. schemes/no par.
This is a good point; I had it a bit too separate in my mind: its needed for the MRC strategy, but there it's just one possible form of "cost". But more importantly since you absolutely need it in the lsf-traversal, I guess theres no reason not to have it. This way the |
I have a bunch of bikeshedding comments that I will leave off. I think this is very well done, but I do have some technical questions:
|
True, and also safer.
I don't understand what you mean by recompacting functions, elaborate please? As for statistics: I haven't needed anything like that yet, any specific suggestions? Like, number of pinned slots, total recomputation cost?
Interesting point, perhaps some more advanced replacement strategy could prefer to put child and parent clvs in slots that are close to each other in terms of actual buffer address, since we are abstracting here already anyway. With normal allocation in the partition, probably the memory is somewhat fragmented, as we allocate one CLV at a time, right? As opposed to one large malloc.
In a sense, yes, though the goal isn't to deallocate the buffers of the slots, but rather have the program never exceed some amount of memory. By the way, in the epa-ng codebase I have a bunch of functions to gauge the memory footprint of a partition/etc. maybe these could be useful in the library at some point, implemented in a more "plugged in" way (hand in hand with a refactor of Again, thanks for the comments! |
I mean some function that just swaps the allocations such that we obtain a "better"[1] layout. To be honest, I was kinda just thinking in writing there, and that thought was the "proto"thought to the locality question. Basically, the motivation is that if you find that locality matters, then we might want to, occasionally, reallocate in a way that improves the locality.
This is half true? because the CLV is actually as long as the alignment (except in site repeats), so the length usually saves us from any locality problems. But I think EPA-NG often has really short alignments, so maybe this is a bigger factor? I did the experiment with 1000 sites, and didn't try different numbers. I think that if you have time this should be looked into. [1]: better is left as an exercise to the reader |
8992ddf
to
18261f8
Compare
This PR introduces a new optional feature that allows the number of concurrently allocated CLV buffers to be set as low as
log_2( #leaves ) + 2
, compared to the usual one per inner node of the tree (or three per inner node in the case of epa-ng).The main idea behind how this is implemented is that of actively managing this set of limited CLV slots, translating the usual
clv_index
into an internalslot_index
, and deciding which slot should be overwritten if there aren't enough free slots available.This behaviour can be customized through the use of callback functions, collectively called the replacement strategy.
If/when you have time, please give some thought to what would still need to change for you to be able to use this in your code. This PR is primarily about feedback, though if we feel we can merge it safely that would be great fo course.
New structures
Like the repeats, the clv_manager is intended to be a feature with a self-contained struct that holds everything it needs:
Notes:
addressable_begin
and..._end
is my way of dealing with the tipchar mode and its implication for what is a valid trueclv_index
addressable_end
equalstips + clv_buffers
as specified duringpll_partition_create
(more on that later)unused_slots
is kind a kind of uneccesary optimization, as usually you wouldn't explicitly mark slots as unused, so for those cases it only saves time in the very beginning, making getting an unused slotO(1)
instead ofO(#slots)
. It also complicates the codebase as I implemented a stack for it. However I think it should be left in as it may come in handy for non-standard use-cases and replacement strategies.The callbacks can be set to custom functions to implement different behaviour for choosing which unpinned slot should be overwritten next, if there are no unused slots available. By default these are set to the minimum recomputation cost strategy, that aims to overwrite those CLV first that are easier to recompute if they are needed again.
Lifetime management
The manager struct is deallocated with the partition (dduring pll_partition_destroy). The CLV manager dealloc function tries to call the replacement-strategy deallocationc allback to deallocate any custom data used there.
Changes to existing code
I tried to keep these as limited, and as "zero overhead" as possible.
clv access
The biggest change is that, with the memsaver enabled these kind of accesses are wrong:
Instead, access to the clv buffer should now be done through these functions:
clv_index
is not slotted. There is also pll_clv_is_slotted which can be used to check on whether a clv is slotted, such that the developer can possibly take action if not.pinning
In the existing code the access functions come into play, for example, in the call to the partials functions. There, we also encounter the second major (but small) change, as we have to explicitly pin the parent clv of the current to-be-updated partial, then do the normal update, followed by unpinning the two now no longer needed child-clvs:
first pin
then unpin after
changes to clv allocation
Like with the repeats mode, the allocation of the clv buffer is dependent on information we only get during the later initialization, namely how many slots we want. Consequently, the clv buffer is allocated after the partition creation. As this is shared between repeats and memory saver, I refactored this part into its own function to reduce redundancy.
other minor changes and additions
pll_bool_t
to use an actual bool. From what I saw it was only used once.pll_utree_every
to take a callback that can actually change the nodes it visits (I think there was no difference topll_utree_every_const
?). There should at least be some function allowing changes to the nodes to make it a useful apply-like function, and since there is already a const vs nonconst split here seems like an obvious change.make -j
) and made it explicitly use this libpll vs. possibly using some system wide version.Usage
Notes:
utree
instead of aunode
. This is one of the things I'm totally open to change; it was more of a question of old vs. new style (and having the information about the tree being binary or not).Completeness TODOs
Some things I haven't been able to dive into fleshing out/implementing yet, that I would consider necessary for the manager to be "complete":
Other open questions
subtree_sizes
array which is needed anyway to traverse the tree largest-subtree-first. Still, a smipler default may be desirable.