You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
My server has 96 cores and I always notice that TPOT uses all cores only for first 5-7 minutes of each "generation" (right?). And after that TPOT uses only 1 core for next few hours.
I tried to limit that process life time using different combinations of arguments:
generations = None
max_time_mins = 30
max_eval_time_mins = 30
For the first time I use to think that "max_time_mins" works like "signal" in python and rises an exception to stop a bench of processes that you call "generation". But it doesn't. And I don't understand why.
Could you tell me what combination of arguments should I set to stop one "generation" in 30 minutes?
P.S.
I've created a custom metric to explore the problem and found strange situations. For example. TPOT has found a good pipeline at the beginning of process (within first 5 minutes after start when 96 cores worked hard). Metric of that pipeline was printed and it was "perfect" for me. But after that all 95 process was finished and 1 process worked for next hour. After an hour TPOT finished all jobs and returned to fitted_pipeline_ NOT a "perfect" pipeline. But a "random" (?) or last(?) pipeline with very bad score. I wanted to get "perfect" pipeline (with printed score). But I received not a good pipeline. Why? If it is an error it is probably connected with that 1 super long process.
P.P.S.
It's very painful to look at one working core for few hours:
The text was updated successfully, but these errors were encountered:
each generation, tpot evaluates "population_size" number of pipelines. What can often happen is that all the pipelines are complete except for one (often SVC). TPOT will not evaluate the next generation/batch until the current batch is completed. So you will see only one core utilized until that individual is completed.
Another bug is that sometimes tpot is not able to actually terminate the long-running pipeline in some cases, which could take a very long time. I think that is likely what is happening here.
I am not sure about your second issue where it is not selecting the best score.. was this an sklearn scorer?
we resolved these issues in the next version of the package, called TPOT2. This version should correctly time out all pipelines. TPOT2 also has support for custom objective functions. TPOT2 also returns a pandas dataframe with all evaluated pipelines and their scores to make them easier to access. You can find that here. Note that currently the search_space_api branch is the latest version.
There is also a second version of the evolutionary algorithm included in TPOT2 called TPOTEstimatorSteadyState which can be found here. This version does not wait for batches to be completed. As soon as a pipeline finishes evaluation, another one is submitted. this ensures that all cores are always utilized.
Hi!
Thank you for creating TPOT!
I'm using TPOT==0.12.2.
My server has 96 cores and I always notice that TPOT uses all cores only for first 5-7 minutes of each "generation" (right?). And after that TPOT uses only 1 core for next few hours.
I tried to limit that process life time using different combinations of arguments:
For the first time I use to think that "max_time_mins" works like "signal" in python and rises an exception to stop a bench of processes that you call "generation". But it doesn't. And I don't understand why.
Could you tell me what combination of arguments should I set to stop one "generation" in 30 minutes?
P.S.
I've created a custom metric to explore the problem and found strange situations. For example. TPOT has found a good pipeline at the beginning of process (within first 5 minutes after start when 96 cores worked hard). Metric of that pipeline was printed and it was "perfect" for me. But after that all 95 process was finished and 1 process worked for next hour. After an hour TPOT finished all jobs and returned to fitted_pipeline_ NOT a "perfect" pipeline. But a "random" (?) or last(?) pipeline with very bad score. I wanted to get "perfect" pipeline (with printed score). But I received not a good pipeline. Why? If it is an error it is probably connected with that 1 super long process.
P.P.S.
It's very painful to look at one working core for few hours:
The text was updated successfully, but these errors were encountered: