You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a new issue to discuss performance optimization for sampling of alternatives in the MergedChoiceTable utility.
Earlier discussion is in issues #4, #5, and #11, now closed. Core functionality is implemented in PR #37.
Performance of the underlying sampling algorithms
Sampling items from a list is fastest with uniform weights and "replacement" of items (meaning that an item can appear multiple times in the sample).
In my tests, NumPy sampling implementations are generally faster than core Python (comparing NumPy 1.14 and Python 3.6 on Mac), but the differences are small.
Typical performance penalties
sampling with non-uniform weights: ~10x performance penalty
sampling without replacement: ~5x performance penalty
combining the two: ~50x performance penalty
Sampling time scales linearly with the sample size and also with the population size -- except for uniform sampling without replacement, which is essentially constant with respect to the population size.
The penalties above are for samples and populations greater than a few hundred thousand; with smaller datasets the performance is more idiosyncratic.
Performance in the context of MergedChoiceTable
In real-world use, there's also overhead from building and merging large DataFrames.
At the extremes, this can be quite significant. For example, generating a merged table from 1,000 choosers and 1,000 alternatives, without sampling (i.e. 1 million rows) takes about 40x longer than uniform sampling of 10 alternatives per chooser with replacement.
Here are some MergedChoiceTable performance benchmarks for sampling 10 alternatives from a set of 100k, for 100k choosers:
Sampling method
Time
Uniform sampling with replacement
0.3 seconds
Alternative-specific weights, with replacement
0.4 seconds
Uniform sampling without replacement
4.5 minutes
Alternative-specific weights, without replacement
6.5 minutes
At this scale there is only minimal penalty for non-uniform weights. But sampling without replacement is very inefficient, because it requires a separate random draw for each chooser.
Interaction weights, which depend on both the chooser and the alternative, have the same problem. At smaller scales, the performance penalty is about 180x, a little worse than the penalty for sampling without replacement. (For this table I wasn't even able to run the benchmark, because it would require 10 billion interaction weights.)
These benchmarks are from a fast i7 iMac; an old i5 MacBook is about 50% slower.
Comparison with the older UrbanSim MNL codebase
Our older codebase, urbansim.urbanchoice, only supported the basic case of uniform sampling with replacement. It performs a little bit better than the new codebase for this use case, probably through better optimization of the data merging operations.
Why are the repeated random draws (required for sampling without replacement or for interaction weights) so slow? Can we optimize it better?
If the underlying sampling algorithms are part of the problem, are there alternatives to NumPy that would perform better?
For interaction weights, can we get better performance by using generator functions or a different lookup format? What about optimizing for an additional case with shared weights ("buckets") that don't require as many separate draws?
There's also the question of how to best calculate the weights themselves, which is a separate problem.
The text was updated successfully, but these errors were encountered:
This is a new issue to discuss performance optimization for sampling of alternatives in the
MergedChoiceTable
utility.Earlier discussion is in issues #4, #5, and #11, now closed. Core functionality is implemented in PR #37.
Performance of the underlying sampling algorithms
Sampling items from a list is fastest with uniform weights and "replacement" of items (meaning that an item can appear multiple times in the sample).
In my tests, NumPy sampling implementations are generally faster than core Python (comparing NumPy 1.14 and Python 3.6 on Mac), but the differences are small.
Typical performance penalties
Sampling time scales linearly with the sample size and also with the population size -- except for uniform sampling without replacement, which is essentially constant with respect to the population size.
The penalties above are for samples and populations greater than a few hundred thousand; with smaller datasets the performance is more idiosyncratic.
Performance in the context of MergedChoiceTable
In real-world use, there's also overhead from building and merging large DataFrames.
At the extremes, this can be quite significant. For example, generating a merged table from 1,000 choosers and 1,000 alternatives, without sampling (i.e. 1 million rows) takes about 40x longer than uniform sampling of 10 alternatives per chooser with replacement.
Here are some
MergedChoiceTable
performance benchmarks for sampling 10 alternatives from a set of 100k, for 100k choosers:At this scale there is only minimal penalty for non-uniform weights. But sampling without replacement is very inefficient, because it requires a separate random draw for each chooser.
Interaction weights, which depend on both the chooser and the alternative, have the same problem. At smaller scales, the performance penalty is about 180x, a little worse than the penalty for sampling without replacement. (For this table I wasn't even able to run the benchmark, because it would require 10 billion interaction weights.)
These benchmarks are from a fast i7 iMac; an old i5 MacBook is about 50% slower.
Comparison with the older UrbanSim MNL codebase
Our older codebase,
urbansim.urbanchoice
, only supported the basic case of uniform sampling with replacement. It performs a little bit better than the new codebase for this use case, probably through better optimization of the data merging operations.Resources
Sampling-benchmarks.ipynb
Next steps
Here are some potential lines of investigation:
Why are the repeated random draws (required for sampling without replacement or for interaction weights) so slow? Can we optimize it better?
If the underlying sampling algorithms are part of the problem, are there alternatives to NumPy that would perform better?
For interaction weights, can we get better performance by using generator functions or a different lookup format? What about optimizing for an additional case with shared weights ("buckets") that don't require as many separate draws?
There's also the question of how to best calculate the weights themselves, which is a separate problem.
The text was updated successfully, but these errors were encountered: