-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NaNs in hyperopt #2015
Comments
It indeed assumes all layers have the same activation. I'm not sure I understand your last question? Setting the epoch range to min 100 and max 100 will indeed fail all checks, but the number of epochs of this trial is 39703 so setting that as the upper and lower bound should be fine. I guess that where it says To reproduce it exactly you could indeed fix the ranges of parameters in the hyperopt runcard such that they are limited to a specific value, setting them equal to the values of this trial. Alternatively, I suppose you could do a regular fit but with the datasets you would fit in the hyperopt case. This will of course skip the hyperopt-specific computations, but if you don't get the null's in that case you'd at least know it's caused by something specific to running hyperopt and not the hyperparameters+dataset. |
Oh I hadn't even noticed that, no idea what that's about.
My point was that I think setting min and max to be equal of any parameter (epochs was just an example) won't work. I've tried this before with epochs, and it fails one of the n3fit checks. Can I just remove parameters from the hyperopt_config completely? I mean will it take them from parameters instead then? |
Another requirement to reproduce a trial is of course the hyperopt seed. I remember we discussed this at some point, but I guess we haven't gotten to implementing it so that it is fully reproducible, I see the hyperopt seed is just set to 42. Or am I forgetting something @Cmurilochem? |
That's a good point. At some point I do remember running the same configuration many times in a hyperopt setup because I wanted to get a feel for the statistical fluctuations in the hyperopt loss of "good" configurations. I thought that was done by fixing those things in the runcard, but you're right, that is not possible.
I don't remember if this is an option. If not, you could still overwrite the sampling done to get the layer width by hardcoding instead the settings you want. Since the purpose is to understand where the |
Hi @APJansen. You are right. We have fixed this to 42. |
The approach I mentioned of just reducing the search space to the epochs and having it pick up the rest from the non-hyperopt settings in the runcard works. The scripts What happens in this example of trial 28 is that in folds 1 and 2, the warning These warnings correspond exactly (at least, in their counts in the log of the 5 day hyperopt run) to the instances of So I would say that the information that the training blew up is not properly passed on. Ideally we'd stop the trial after one fold hits this and not continue with the next folds. Can someone look into this? (Our time is really limited now unfortunately) |
Hi @APJansen, thanks a lot for the investigation! I can have a look at this but unfortunately this won't be before the end of this week. Are you relying on this to produce more samples? |
Yes @Radonirinaunimi, we aren't running anything for the past 2 weeks. I spent some hours on this and PR #2014 today. The issue here I think is around this line. I've added some monitoring and am rerunning this trial now from PR #2014. |
Not sure whether we addressed this issue specifically, but we are now running hyperopt normally and not seeing these problems, so I'm going to close this issue for the time being. |
I created a small script that gets the parameters from a trial and creates a runcard from that here (where I means GPT4).
Side issue, bug?
Regardless of using the script or doing it manually, there is a small issue: in the parameters reported in
tries.json
I always see only one value foractivation_per_layer
(regardless of the number of layers). I'm not sure if that's a bug, or if it's intended to mean that all layers have this same activation (apart from the last which is always linear I think?). Even if it's the latter, it doesn't pass the checks if I leave it like this.Example: trial 28
In this example there were only 2 layers, so
tanh
could only be[tanh, linear]
.This is an example with a NaN loss, it seems in this case something goes wrong in some of the folds.
I'm running this now but without hyperopt, so without folds, so I realise now I won't reproduce the issue.
The entry in the
tries.json
is:trial_28.json
Question
Is there an easy way to reproduce this trial exactly? i.e. using kfolding, either without hyperopt or with but setting the parameters for the first trial? I know that just setting for instance the epoch range to be min 100 and max 100 will fail the checks.
The text was updated successfully, but these errors were encountered: