Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NaNs in hyperopt #2015

Closed
APJansen opened this issue Mar 20, 2024 · 9 comments
Closed

NaNs in hyperopt #2015

APJansen opened this issue Mar 20, 2024 · 9 comments

Comments

@APJansen
Copy link
Collaborator

I created a small script that gets the parameters from a trial and creates a runcard from that here (where I means GPT4).

Side issue, bug?

Regardless of using the script or doing it manually, there is a small issue: in the parameters reported in tries.json I always see only one value for activation_per_layer (regardless of the number of layers). I'm not sure if that's a bug, or if it's intended to mean that all layers have this same activation (apart from the last which is always linear I think?). Even if it's the latter, it doesn't pass the checks if I leave it like this.

Example: trial 28

In this example there were only 2 layers, so tanh could only be [tanh, linear].

This is an example with a NaN loss, it seems in this case something goes wrong in some of the folds.

I'm running this now but without hyperopt, so without folds, so I realise now I won't reproduce the issue.

The entry in the tries.json is:

trial_28.json
{
  "_id": "65ec6c12cdc17306b5a546b5",
  "state": 2,
  "tid": 28,
  "spec": null,
  "result": {
    "status": "ok",
    "loss": null,
    "validation_loss": null,
    "experimental_loss": null,
    "kfold_meta": {
      "validation_losses": [
        [
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null
        ],
        [
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null
        ],
        [
          12.14753532409668,
          622.563720703125,
          15.752861022949219,
          22.73064613342285,
          13.940808296203613,
          12.795502662658691,
          485.9554138183594,
          14.621479034423828,
          25.957843780517578,
          11.59742546081543,
          21.798128128051758,
          13.857935905456543,
          16.326919555664062,
          20.212799072265625,
          8.943603515625,
          12.832194328308105,
          86.92933654785156,
          12.584427833557129,
          28.086872100830078,
          11.063165664672852,
          11.780959129333496,
          23.072505950927734,
          13.89393424987793,
          18.552104949951172,
          13.357744216918945,
          12.194093704223633,
          13.462635040283203,
          9.592662811279297,
          18.079463958740234,
          9.498067855834961,
          170.617431640625,
          1038.8602294921875,
          18.29841423034668,
          15.70957088470459,
          12.098176002502441,
          19.777820587158203,
          14.42442798614502,
          10.0394868850708,
          47.44027328491211,
          8.413172721862793,
          11.434549331665039,
          1968.7259521484375,
          15.88143253326416,
          2881.301025390625,
          13.021681785583496,
          8.430719375610352,
          200.9595947265625,
          12.42056941986084,
          41.56287384033203,
          19.01072883605957,
          12.09191608428955,
          17.90202522277832,
          14.396261215209961,
          11.48803424835205,
          24.40220832824707,
          11.868151664733887,
          10.247714042663574,
          17.367982864379883,
          13.672079086303711,
          10.491067886352539,
          13.204983711242676,
          270.6560974121094,
          1698.0059814453125,
          11.53370475769043,
          12.530192375183105,
          9.645076751708984,
          22.128299713134766,
          17.130889892578125,
          8.982980728149414,
          10.918508529663086,
          11.656984329223633,
          13.176027297973633,
          72.55720520019531,
          14.917561531066895,
          17.282772064208984,
          14.455068588256836,
          10.749722480773926,
          13.771191596984863,
          12.240583419799805,
          11.156545639038086,
          23.284114837646484,
          12.345758438110352,
          11.330618858337402,
          13.840761184692383,
          10.85977554321289,
          10.69460391998291,
          16.99068260192871,
          10.387388229370117,
          19.446277618408203,
          10.275043487548828,
          11.654125213623047,
          18.474218368530273,
          17.532350540161133,
          15.129594802856445,
          15.477754592895508,
          19.633609771728516,
          22.094167709350586,
          16.409204483032227,
          9.5444917678833,
          13.123198509216309
        ]
      ],
      "trvl_losses_phi": [
        null,
        null,
        38.75936293123688
      ],
      "experimental_losses": [
        [
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null
        ],
        [
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null
        ],
        [
          9.631409645080566,
          14.65926742553711,
          14.765120506286621,
          22.81585121154785,
          18.704925537109375,
          11.019681930541992,
          24401.271484375,
          16.35789680480957,
          30.54192352294922,
          9.4488525390625,
          16.245166778564453,
          12.085138320922852,
          15.265380859375,
          14.113433837890625,
          9.736549377441406,
          9.431405067443848,
          99.01387023925781,
          11.187447547912598,
          23.340747833251953,
          7.108949184417725,
          10.156389236450195,
          30.652923583984375,
          16.785707473754883,
          20.611572265625,
          13.525415420532227,
          9.401630401611328,
          10.916632652282715,
          6.493870258331299,
          14.7423095703125,
          9.452704429626465,
          8916.083984375,
          3220.8310546875,
          12.99640941619873,
          12.243772506713867,
          8.79442310333252,
          23.920921325683594,
          13.531204223632812,
          7.656867027282715,
          48.91118621826172,
          9.483509063720703,
          10.30264663696289,
          15796.875,
          11.896005630493164,
          38.02886962890625,
          10.020211219787598,
          6.9951629638671875,
          21.850326538085938,
          8.467232704162598,
          51.03904342651367,
          11.28868293762207,
          10.1692533493042,
          18.222007751464844,
          13.1695556640625,
          8.790900230407715,
          41.1088752746582,
          11.083891868591309,
          11.740309715270996,
          12.803691864013672,
          9.846954345703125,
          6.113747596740723,
          8.874459266662598,
          706.6709594726562,
          28218322.0,
          8.474566459655762,
          10.354934692382812,
          9.65860652923584,
          23.21355628967285,
          14.157861709594727,
          7.734203338623047,
          9.123676300048828,
          7.600849628448486,
          11.874972343444824,
          718.5275268554688,
          13.111132621765137,
          12.585481643676758,
          11.613744735717773,
          6.636402606964111,
          8.746490478515625,
          7.64871072769165,
          9.85733413696289,
          17.32428741455078,
          10.904479026794434,
          8.159088134765625,
          14.700692176818848,
          9.440145492553711,
          7.42826509475708,
          17.706758499145508,
          9.940926551818848,
          24.914688110351562,
          6.253153324127197,
          12.495806694030762,
          18.51062774658203,
          16.331514358520508,
          11.867968559265137,
          10.207791328430176,
          19.361270904541016,
          18.218006134033203,
          14.038684844970703,
          8.058456420898438,
          8.859779357910156
        ]
      ],
      "hyper_losses": [
        [
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null
        ],
        [
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null,
          null
        ],
        [
          9.631409645080566,
          14.65926742553711,
          14.765120506286621,
          22.81585121154785,
          18.704925537109375,
          11.019681930541992,
          24401.271484375,
          16.35789680480957,
          30.54192352294922,
          9.4488525390625,
          16.245166778564453,
          12.085138320922852,
          15.265380859375,
          14.113433837890625,
          9.736549377441406,
          9.431405067443848,
          99.01387023925781,
          11.187447547912598,
          23.340747833251953,
          7.108949184417725,
          10.156389236450195,
          30.652923583984375,
          16.785707473754883,
          20.611572265625,
          13.525415420532227,
          9.401630401611328,
          10.916632652282715,
          6.493870258331299,
          14.7423095703125,
          9.452704429626465,
          8916.083984375,
          3220.8310546875,
          12.99640941619873,
          12.243772506713867,
          8.79442310333252,
          23.920921325683594,
          13.531204223632812,
          7.656867027282715,
          48.91118621826172,
          9.483509063720703,
          10.30264663696289,
          15796.875,
          11.896005630493164,
          38.02886962890625,
          10.020211219787598,
          6.9951629638671875,
          21.850326538085938,
          8.467232704162598,
          51.03904342651367,
          11.28868293762207,
          10.1692533493042,
          18.222007751464844,
          13.1695556640625,
          8.790900230407715,
          41.1088752746582,
          11.083891868591309,
          11.740309715270996,
          12.803691864013672,
          9.846954345703125,
          6.113747596740723,
          8.874459266662598,
          706.6709594726562,
          28218322.0,
          8.474566459655762,
          10.354934692382812,
          9.65860652923584,
          23.21355628967285,
          14.157861709594727,
          7.734203338623047,
          9.123676300048828,
          7.600849628448486,
          11.874972343444824,
          718.5275268554688,
          13.111132621765137,
          12.585481643676758,
          11.613744735717773,
          6.636402606964111,
          8.746490478515625,
          7.64871072769165,
          9.85733413696289,
          17.32428741455078,
          10.904479026794434,
          8.159088134765625,
          14.700692176818848,
          9.440145492553711,
          7.42826509475708,
          17.706758499145508,
          9.940926551818848,
          24.914688110351562,
          6.253153324127197,
          12.495806694030762,
          18.51062774658203,
          16.331514358520508,
          11.867968559265137,
          10.207791328430176,
          19.361270904541016,
          18.218006134033203,
          14.038684844970703,
          8.058456420898438,
          8.859779357910156
        ]
      ],
      "hyper_losses_phi": [
        null,
        null,
        396.61186935111425
      ],
      "penalties": {
        "saturation": [
          [
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null
          ],
          [
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null
          ],
          [
            2.656452720743435,
            2.249868713454709,
            3.460148973652974,
            2.5260735502132126,
            8.470907919082244,
            1.6593138110891172,
            3.7240513480015442,
            8.038696890402889,
            4.5444754748274505,
            3.563270434807783,
            3.7467510736960374,
            3.6940533026299294,
            7.929168726902323,
            3.892730539625497,
            1.7669792133609266,
            7.179103624182386,
            4.131493868852873,
            2.441749608692859,
            1.076389435456914,
            1.42457110471527,
            2.3585938773041617,
            2.0813801407587738,
            3.2614185887920843,
            4.60568097490542,
            4.343469451914197,
            4.20434783097259,
            3.546487762593795,
            2.973756548448712,
            8.242689710085843,
            4.247226755602139,
            20.216802089208148,
            3.2183613123048094,
            8.518471686376078,
            1.702957497812683,
            3.3400091159831646,
            6.146248498185259,
            5.866374523786929,
            1.6764654631239282,
            5.112705463604855,
            1.999221971310929,
            3.7193412462183697,
            6.185707114643248,
            8.803578482470403,
            12.474585604276028,
            5.1580703384345155,
            3.495984115718236,
            5.074407949986764,
            4.313428859615183,
            1.2995852554394078,
            3.592075168097778,
            11.149528321343743,
            4.07579561114684,
            3.4075703338036583,
            3.56782547983067,
            2.909606493835124,
            2.0209598148033967,
            1.624241943059869,
            3.9396620448150257,
            3.26786771366886,
            3.864827440838045,
            7.300091545825602,
            4.785786568126305,
            2.5495437344121736,
            3.240279007403567,
            5.164385378324037,
            1.6742697373552722,
            2.0716361099129386,
            3.0229719030964954,
            3.278816181585102,
            3.3674583655389574,
            3.737867267618177,
            12.283305807394688,
            3.01337533331514,
            4.448669309358695,
            3.7443274451475235,
            2.7275068789200505,
            6.856379346358085,
            2.2148164849944143,
            2.8342377484971792,
            3.2935560123086725,
            1.429871937174859,
            3.00909963057026,
            7.050622830772754,
            2.2437751555556886,
            10.290600688618103,
            2.9732550756312732,
            2.5412259712469645,
            6.3067344764985656,
            1.376964008697831,
            2.2924383988336774,
            4.480360846259685,
            2.560406944097767,
            4.292233686515102,
            3.554847270642838,
            3.2680043348751213,
            6.125173712584803,
            1.7976846787349947,
            1.6517489570475106,
            2.4071141949347563,
            1.3218436138544791
          ]
        ],
        "patience": [
          [
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null
          ],
          [
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null
          ],
          [
            19.55881921053845,
            1002.3935667112833,
            25.363775661176202,
            36.59874916193362,
            22.446178738278096,
            20.602115294158654,
            782.4397155200273,
            23.54213076110071,
            41.79487937700625,
            18.673084032595916,
            35.097296349407515,
            22.31274540674855,
            26.288070734874335,
            32.544748563864005,
            14.400149461253237,
            20.661192764264612,
            139.96544420486507,
            20.26226245058606,
            45.222840597131786,
            17.81286914246232,
            18.968592688946227,
            37.14917969681439,
            22.37070655620246,
            29.87085503437409,
            21.507383780237017,
            19.633783136581307,
            21.676269121690314,
            15.445203711512503,
            29.10985295025199,
            15.263866892684733,
            274.7121461755074,
            1672.6750630105405,
            29.462386090859855,
            25.29407395091681,
            19.47934547190264,
            31.844387105864076,
            23.224857690827182,
            16.164637822633658,
            76.38386748606817,
            13.546099670731726,
            18.410836203816565,
            3169.8574192892793,
            25.570789418188088,
            4639.199997629558,
            20.966287651468786,
            13.574351642777252,
            323.5662442638088,
            19.998433039589912,
            66.9206315211043,
            30.60928809381414,
            19.469266340183154,
            28.824157781229413,
            23.17950620463964,
            18.496952587762934,
            39.29014143991154,
            19.108981911196256,
            16.499905613294246,
            27.964292988906912,
            22.013496231679675,
            16.891731091130925,
            21.26142317725727,
            435.7849996955645,
            2733.969576826188,
            18.570486947743813,
            20.17493761498472,
            11.940638759679912,
            35.6289075914251,
            27.58254817830644,
            27.683366815915562,
            17.57995582501281,
            18.768980122502278,
            21.214800283229426,
            116.82478964415694,
            24.018854958128493,
            27.827094570391356,
            23.274192307374314,
            17.308192399148503,
            22.17307787735473,
            19.708636505420664,
            17.96322079716147,
            37.48989242974758,
            19.87797943085513,
            18.243497127695615,
            22.28509228622201,
            17.48538949258193,
            17.21944567506876,
            27.3567995818507,
            16.72479585590611,
            31.310567790809838,
            16.543908915805723,
            18.764376643389692,
            29.74544938423234,
            28.228942365829052,
            24.360250996,
            24.920825154302307,
            31.612192410237345,
            35.57395144818311,
            26.420558188107233,
            19.79173071603307,
            21.12974033478321
          ]
        ],
        "integrability": [
          [
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null
          ],
          [
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null,
            null
          ],
          [
            0.0,
            0.0,
            0.0,
            0.0,
            0.0,
            0.0,
            0.0,
            0.0,
            0.0,
            0.0,
            0.0,
            0.0,
            0.0,
            0.0,
            0.0,
            0.0,
            0.0,
            0.0,
            0.0,
            0.0,
            0.0,
            0.0,
            0.0,
            0.0,
            0.0,
            0.0,
            0.0,
            0.0,
            0.0,
            0.0,
            0.9194395160552102,
            0.0,
            0.0,
            0.0,
            0.0,
            0.0,
            0.0,
            0.0,
            0.0,
            0.0,
            0.0,
            89.36382512376943,
            0.0,
            0.0,
            0.0,
            0.0,
            0.0,
            0.0,
            0.0,
            0.0,
            0.0,
            0.0,
            0.0,
            0.0,
            0.0,
            0.0,
            0.0,
            0.0,
            0.0,
            0.0,
            0.0,
            0.9475459004730196,
            0.0,
            0.0,
            0.0,
            0.0,
            0.0,
            0.0,
            0.0,
            0.0,
            0.0,
            0.0,
            0.0,
            0.0,
            0.0,
            0.0,
            0.0,
            0.0,
            0.0,
            0.0,
            0.0,
            0.0,
            0.0,
            0.0,
            0.0,
            0.0,
            0.0,
            0.0,
            0.0,
            0.0,
            0.0,
            0.0,
            0.0,
            0.0,
            0.0,
            0.0,
            0.0,
            0.0,
            0.0,
            0.0
          ]
        ]
      }
    }
  },
  "misc": {
    "tid": 28,
    "cmd": [
      "domain_attachment",
      "FMinIter_Domain"
    ],
    "workdir": null,
    "idxs": {
      "Adadelta_clipnorm": [],
      "Adadelta_learning_rate": [],
      "Adam_clipnorm": [],
      "Adam_learning_rate": [],
      "Amsgrad_clipnorm": [
        28
      ],
      "Amsgrad_learning_rate": [
        28
      ],
      "Nadam_clipnorm": [],
      "Nadam_learning_rate": [],
      "activation_per_layer": [
        28
      ],
      "dropout": [
        28
      ],
      "epochs": [
        28
      ],
      "initial": [
        28
      ],
      "initializer": [
        28
      ],
      "nl1:-0/1": [
        28
      ],
      "nl2:-0/2": [],
      "nl2:-1/2": [],
      "nl3:-0/3": [],
      "nl3:-1/3": [],
      "nl3:-2/3": [],
      "nl4:-0/4": [],
      "nl4:-1/4": [],
      "nl4:-2/4": [],
      "nl4:-3/4": [],
      "nodes_per_layer": [
        28
      ],
      "optimizer": [
        28
      ],
      "stopping_patience": [
        28
      ]
    },
    "vals": {
      "Adadelta_clipnorm": [],
      "Adadelta_learning_rate": [],
      "Adam_clipnorm": [],
      "Adam_learning_rate": [],
      "Amsgrad_clipnorm": [
        0.000009949736889655482
      ],
      "Amsgrad_learning_rate": [
        0.001125607109055478
      ],
      "Nadam_clipnorm": [],
      "Nadam_learning_rate": [],
      "activation_per_layer": [
        0
      ],
      "dropout": [
        0.4
      ],
      "epochs": [
        39703.0
      ],
      "initial": [
        52.261799207196255
      ],
      "initializer": [
        1
      ],
      "nl1:-0/1": [
        32.0
      ],
      "nl2:-0/2": [],
      "nl2:-1/2": [],
      "nl3:-0/3": [],
      "nl3:-1/3": [],
      "nl3:-2/3": [],
      "nl4:-0/4": [],
      "nl4:-1/4": [],
      "nl4:-2/4": [],
      "nl4:-3/4": [],
      "nodes_per_layer": [
        0
      ],
      "optimizer": [
        1
      ],
      "stopping_patience": [
        0.11999999999999998
      ]
    },
    "space_vals": {
      "activation_per_layer": "tanh",
      "dropout": 0.4,
      "epochs": 39703,
      "initializer": "glorot_uniform",
      "integrability": {
        "initial": 10,
        "multiplier": null
      },
      "layer_type": "dense",
      "nodes_per_layer": [
        32,
        8
      ],
      "optimizer": {
        "clipnorm": 0.000009949736889655482,
        "learning_rate": 0.001125607109055478,
        "optimizer_name": "Amsgrad"
      },
      "positivity": {
        "initial": 52.261799207196255
      },
      "stopping_patience": 0.11999999999999998
    }
  },
  "exp_key": null,
  "owner": [
    "gcn15.local.snellius.surf.nl:3834410"
  ],
  "version": 3,
  "book_time": "2024-03-09 14:56:57.782000",
  "refresh_time": "2024-03-09 17:02:20.961000"
}

Question

Is there an easy way to reproduce this trial exactly? i.e. using kfolding, either without hyperopt or with but setting the parameters for the first trial? I know that just setting for instance the epoch range to be min 100 and max 100 will fail the checks.

@RoyStegeman
Copy link
Member

It indeed assumes all layers have the same activation.

I'm not sure I understand your last question? Setting the epoch range to min 100 and max 100 will indeed fail all checks, but the number of epochs of this trial is 39703 so setting that as the upper and lower bound should be fine. I guess that where it says "epochs": [28] this is just the tid being printed for whatever reason, and it's not actually being run with that setting.

To reproduce it exactly you could indeed fix the ranges of parameters in the hyperopt runcard such that they are limited to a specific value, setting them equal to the values of this trial.

Alternatively, I suppose you could do a regular fit but with the datasets you would fit in the hyperopt case. This will of course skip the hyperopt-specific computations, but if you don't get the null's in that case you'd at least know it's caused by something specific to running hyperopt and not the hyperparameters+dataset.

@APJansen
Copy link
Collaborator Author

I guess that where it says "epochs": [28] this is just the tid being printed for whatever reason, and it's not actually being run with that setting.

Oh I hadn't even noticed that, no idea what that's about.

To reproduce it exactly you could indeed fix the ranges of parameters in the hyperopt runcard such that they are limited to a specific value, setting them equal to the values of this trial.

My point was that I think setting min and max to be equal of any parameter (epochs was just an example) won't work. I've tried this before with epochs, and it fails one of the n3fit checks.
Also for the layers for example, I want it to be 32, 8 but can only set a number of layers and an overall min/max.

Can I just remove parameters from the hyperopt_config completely? I mean will it take them from parameters instead then?
In that case I can just leave only say the epochs and set the min to whatever was used in the trial and the max to 1 above that.

@APJansen
Copy link
Collaborator Author

Another requirement to reproduce a trial is of course the hyperopt seed. I remember we discussed this at some point, but I guess we haven't gotten to implementing it so that it is fully reproducible, I see the hyperopt seed is just set to 42. Or am I forgetting something @Cmurilochem?

@RoyStegeman
Copy link
Member

I want it to be 32, 8 but can only set a number of layers and an overall min/max.

That's a good point. At some point I do remember running the same configuration many times in a hyperopt setup because I wanted to get a feel for the statistical fluctuations in the hyperopt loss of "good" configurations. I thought that was done by fixing those things in the runcard, but you're right, that is not possible.

Can I just remove parameters from the hyperopt_config completely? I mean will it take them from parameters instead then?

I don't remember if this is an option. If not, you could still overwrite the sampling done to get the layer width by hardcoding instead the settings you want. Since the purpose is to understand where the NaNs are coming from, I don't think we should care too much about being able to reproduce this later on.

@Cmurilochem
Copy link
Collaborator

Another requirement to reproduce a trial is of course the hyperopt seed. I remember we discussed this at some point, but I guess we haven't gotten to implementing it so that it is fully reproducible, I see the hyperopt seed is just set to 42. Or am I forgetting something @Cmurilochem?

Hi @APJansen. You are right. We have fixed this to 42.

@APJansen
Copy link
Collaborator Author

The approach I mentioned of just reducing the search space to the epochs and having it pick up the rest from the non-hyperopt settings in the runcard works. The scripts runtrial.slurm along with create_trial_runcard.py here automate rerunning a single trial in hyperopt mode (with the only exception that the seeds are different, I think, so not fully reproducing it).

What happens in this example of trial 28 is that in folds 1 and 2, the warning Nan found, stopping activated is triggered, after 12k and 16k epochs. The loss reported just before is about 1e10, so it is plausible that it has diverged. Although it's not like it was smaller further back.

These warnings correspond exactly (at least, in their counts in the log of the 5 day hyperopt run) to the instances of "Fold <..> finished, loss=nan (...) pass=True".

So I would say that the information that the training blew up is not properly passed on. Ideally we'd stop the trial after one fold hits this and not continue with the next folds.

Can someone look into this? (Our time is really limited now unfortunately)

@Radonirinaunimi
Copy link
Member

Can someone look into this? (Our time is really limited now unfortunately)

Hi @APJansen, thanks a lot for the investigation! I can have a look at this but unfortunately this won't be before the end of this week. Are you relying on this to produce more samples?

@APJansen
Copy link
Collaborator Author

APJansen commented Apr 2, 2024

Yes @Radonirinaunimi, we aren't running anything for the past 2 weeks. I spent some hours on this and PR #2014 today. The issue here I think is around this line. I've added some monitoring and am rerunning this trial now from PR #2014.
If we can decide what criterium to use here and get #2012 and #2016 merged tomorrow we can hopefully start running again.

@scarlehoff
Copy link
Member

Not sure whether we addressed this issue specifically, but we are now running hyperopt normally and not seeing these problems, so I'm going to close this issue for the time being.

@scarlehoff scarlehoff closed this as not planned Won't fix, can't repro, duplicate, stale Sep 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants