Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAINT: suggested speedups #1

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

MAINT: suggested speedups #1

wants to merge 1 commit into from

Conversation

andyfaff
Copy link

I was browsing PyPI and came across the package. It's great to know that there's a community building up around refnx. I knew that @arm61 was working on stuff around this area, I guess this is to do with that.

The project infrastructure looks great. Automated testing, pypi, etc.

I've made this PR to suggest some possible speedups via removal of loops, vectorisation, etc. It's not currently passing tests, possibly because the PDFs are being calculated differently. That should only require minor tweaks though.

for d in self.data:
self.distros.append(norm(loc=d[0], scale=d[1]))
# truncnorm takes care of each gaussian contribution integrating
# to unity, etc.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

truncnorm a much better way of handling a clipped normal distribution, and it should be fairly fast.


return _pdf

arrs = [d.pdf(x) for d in self.distros]
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing a double nested loop should be a hell of a lot faster. Each of the d.pdf calls is vectorised.

arrs = [d.cdf(x) for d in self.distros]
return np.sum(arrs, axis=0) / len(self.data)

def _ppf_single(self, q):
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach to calculating ppf is taken from scipy.stats. If it works well there, it should work well here. Of course I may have ignored improved bracketing that you've already worked out for these distributions.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

q is normally used as the argument for ppf (at least in scipy.stats). Changing the argument name to q would benefit me (at least), and possibly others who are interested in contributing.

args=[v]).root
vfun = np.vectorize(self._ppf_single, otypes='d')
_ppf = vfun(np.atleast_1d(q))

if len(_ppf) == 1:
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not a good idea to have this, you probably need if _ppf.size == 1

>>> x = np.random.random(size=(1, 10))
>>> len(x)
1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant