Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed optimizations #1

Open
6 tasks
lrq3000 opened this issue Aug 6, 2017 · 0 comments
Open
6 tasks

Speed optimizations #1

lrq3000 opened this issue Aug 6, 2017 · 0 comments

Comments

@lrq3000
Copy link
Owner

lrq3000 commented Aug 6, 2017

Speed optimizations todo list:

  • An old version with only the bare minimum is available on SO and a less minimal version on gist. After perf test on setitem and getitem indirect access, they are as fast as original dict! We should use these versions as references and reduce complexity of newer fdict. /EDIT: no in fact it is because the test uses di[str(j)] = {}, which created a dict in older releases instead of not doing anything. If you add that check, then fdict is used for subdicts, and we get about the same performances as currently. The 10x slowdown is thus probably due to the fdict instanciation + string manipulation.
  • In practice fdict setitem and getitem are 10x slower than dict when using indirect access (direct access is as fast as dict). After profiling, it seems most time is taken by string comparisons: join and [:-1] == delimiter. Maybe we could internally replace strings by a bitarray representation? But wouldn't it make things even slower?
  • Fastview mode is very slow to setitem, because of metadata building which is O(m*l) where m is the number of parent per leaf and l the number of leaves added, thus quadratic time, but it should be possible to do it in linear time by walking each parent node only once: we should build the list of parent nodes in a top down approach, instead of bottom-up as is currently done.
  • getitem currently returns another fdict() instance with a reference to the same internal dict and exactly the same parameters EXCEPT one: rootpath, a simple string. Maybe we could find another pythonic way to reuse the parent fdict but just define another rootpath for the child? In other words, an exact copy except for one field.
  • Move fastview mode fdict to its own class. Fastview will maybe be a bit slower but standard fdict will be faster!
  • Not a speed optimization but the benchmarks.py tests for getitem also include the time for setitem. We should fix that to only see getitem.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant