Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

one task does not run only on one core #2

Open
QianqianHan96 opened this issue Jun 16, 2023 · 2 comments
Open

one task does not run only on one core #2

QianqianHan96 opened this issue Jun 16, 2023 · 2 comments

Comments

@QianqianHan96
Copy link
Collaborator

QianqianHan96 commented Jun 16, 2023

When I train the RF model with one core, it takes 1 hour. If I set n_jobs=-1, it use all cores, and it takes 9 mins. There is very different during training.
However, when I predict, there is no difference, the predicting time is same for both trained model, and they both not only use one core (more than 10 cores are running).

image

@QianqianHan96
Copy link
Collaborator Author

QianqianHan96 commented Jun 17, 2023

I realized that why during predicting it uses not only 1 core. The problem is not the trained model (predict() function), but before predict() function during the array data prepare (see screenshot1) for predict() function. Because I printed the running time for array data prepare and predict() function, with two models, the running time of predict() function is different in two models, which means the model is different, but the time of array data prepare is same and takes up most of the time (e.g., when I predict 4 timesteps, predicting takes 0.1 s for every timestep, array data prepare takes 4.8 s, the total time is 19 s, see screenshot 2).
I do not know why the script in screenshot1 make use of more than 20 cores, and I do not know how to control it. But in my opinion, this part is same for every spatial unit and timestep, we can just ignore it or we still want to make everything run on one core?
image
image

@geek-yang
Copy link
Member

About multi-cores utilization in your data preparation. numpy is written in c++ and it enables parallel in the backend (that is to say, numpy is not constrained by GIL of python, that's why its performance is awesome!). In this case, you convert your array to numpy and performs numpy array operations like reshape and concatenate, so it is not surprising that numpy uses multiple cores to accelerate your computation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants