-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[question] can we make mass_spring.py repeatable (deterministic) between runs? #1565
Comments
This may have been a better question to post in the DiffTaiChi repo. I will post it there instead (taichi-dev/difftaichi#31 (comment)) and summarize any response I get here. |
Hey @ehannigan ! I just played around with this and found that there's randomness deriving from both Python's stdlib
|
Hey! Thank you @samuela! I thought I had tried setting a seed (using numpy's RandomState object), but I must have messed up somewhere. I'll go back and try running it with your fix. |
I tried adding in these two lines, and I am still not getting repeatable results. Maybe you are using a different setup? What measure are you using to see if your results are the same each time? I'm looking at loss values. Here is my current setup: Here are the loss outputs after running the same command twice in a row: First run: python mass_spring.py 2 train
Second run: python mass_spring.py 2 train
Is there anything else I could be missing to get the same results you are getting? I'm tearing my hair out on this one lol. |
Sorry for my absence - recent days have been rather hectic for me. Do you get any improvements if you use |
I also tried setting f64 and i64 but I got this error: So I started a jupyter notebook to keep track of my debugging so I could post it here. |
I've created a jupyter notebook to outline my debugging process. Since there were some updates to difftaichi due to updates in taichi, I went ahead and updated my version just to make sure we weren't debugging old code. Here is the notebook: https://github.com/ehannigan/difftaichi/blob/testing_determinism/examples/debug_determinism-current.ipynb I tried running mass_spring.py without any modifications. I tried switching to f64. I tried also changing i32->i64 (which caused an error), and I tried using np.random.RandomState() instead of np.random.seed(). At least in my system, the results are still not deterministic. Could someone try running my jupyter notebook on their machine to see if you get the same results? |
Is there cuda in the backend? Is it possible that a function similar to this one needs to be added? |
There is although I think you need to be selecting it for it to be enabled. Default for mass_spring is CPU-only IIRC. |
Hmmm, then idk why I am still getting stochastic results. @samuela , you said you were able to get repeatable results? Were they just similar results, or did you get losses that matched exactly? If so, what is your system setup? |
I was debugging some modifications I made to mass_spring.py when I realized that the result of each run is non-deterministic. I went back to the original mass_spring.py and made sure the controller network weights were initialized to the same value each time. But even when I can guarantee that there are no random variables being assigned anywhere, the resulting loss differs in each run.
Here are two different runs of the exact same code. You can see that the controller weights are exactly the same, but the loss values begin to diverge.
Run 1: mass_spring.py 2 train
n_objects= 20 n_springs= 46 weights1[0,0] -0.23413006961345673 weights2[0,0] 0.46663400530815125 Iter= 0 Loss= -0.2193218171596527 0.19502715683487248 Iter= 1 Loss= -0.21754804253578186 0.07976935930575488 Iter= 2 Loss= -0.3397877812385559 0.055776006347379746 Iter= 3 Loss= -0.3514309227466583 0.03870257399629174
Run 2: mass_spring.py 2 train
n_objects= 20 n_springs= 46 weights1[0,0] -0.23413006961345673 weights2[0,0] 0.46663400530815125 Iter= 0 Loss= -0.21932175755500793 0.1950520028177551 Iter= 1 Loss= -0.21754644811153412 0.07983238023710348 Iter= 2 Loss= -0.3397367000579834 0.055822440269175766 Iter= 3 Loss= -0.3514898419380188
In my own modifications, this was resulting in inconsistent failures of the simulation (v_inc will explode and all values will go to nan). I assume this is due to instabilities in Euler integration, but it would be nice to be able to get consistent results each time to make debugging easier.
Where could the non-deterministic behavior be coming from? Is it something we can fix, or are there stochastic processes that are a result of the compiler?
The text was updated successfully, but these errors were encountered: