Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core][Parallelization] Shall we change our parallel utils to use dynamic scheduling instead of static? #12924

Open
pooyan-dadvand opened this issue Dec 11, 2024 · 7 comments · May be fixed by #12923

Comments

@pooyan-dadvand
Copy link
Member

The default scheduling in OpenMP is static due to its smaller overhead in comparison to others.

However, there are many modern CPUs that have two types of cores. The strong ones and efficient ones. This makes the current static loops inefficient because Strong CPUs finish their job way earlier than the others and wait for long time. The effect in our test is about 3x slower for a 24 cores laptop.

My proposal is:

  1. Change the by default schedule to dynamic
  2. Add an additional template argument for scheduling in all for_each functions to have a mechanism in selecting the schedule in special situations

@KratosMultiphysics/technical-committee

@loumalouomega
Copy link
Member

Related #12923

@loumalouomega
Copy link
Member

@loumalouomega
Copy link
Member

I would also suggest to detect the OpenMP version so we can add more modern stuff (OpenMP is updated in Linux/Mac, but not in Windows), can be done with the variable __OPENMP, for example:

#include <iostream>

// Define macros for different OpenMP versions
#if defined(__OPENMP)
    #if __OPENMP >= 201811
        #define OPENMP_VERSION "OpenMPv5.0+"
    #elif __OPENMP >= 201511
        #define OPENMP_VERSION "OpenMPv4.5+"
    #elif __OPENMP >= 200805
        #define OPENMP_VERSION "OpenMPv3.0+"
    #else
        #define OPENMP_VERSION "OpenMPv2.0"
    #endif
#else
    #define OPENMP_VERSION "OpenMP is not supported"
#endif

int main() {
    std::cout << OPENMP_VERSION << std::endl;
    return 0;
}

@matekelemen
Copy link
Contributor

I'm not against dynamic scheduling, but

  1. I don't like the reasoning. Users must at least know the basics of their hardware; we just cannot rack up small performance hits in an effort to pamper every type of hardware at the same time. The performance/efficiency core tradeoff is very similar to hyperthreading, which completely murders our performance.
  2. It'd be best to stop relying on OpenMP on the long run and replace it with either raw C++11 threads (or jthreads), or use some 3rd party lib.

@loumalouomega
Copy link
Member

I'm not against dynamic scheduling, but

1. I don't like the reasoning. Users must at least know the basics of their hardware; we just cannot rack up small performance hits in an effort to pamper every type of hardware at the same time. The performance/efficiency core tradeoff is very similar to hyperthreading, which completely murders our performance.

We can refactor the utilities to accept more arguments iin order to chose the mechanism, but if we want to minimize changes this is the simplest way.

2. It'd be best to stop relying on OpenMP on the long run and replace it with either raw C++11 threads (or jthreads), or use some 3rd party lib.

Better the C++17 parallel , no?. Alternatives mostly we have tested TBB from Intel, but that can be problamatic.

@RiccardoRossi
Copy link
Member

I would also suggest to detect the OpenMP version so we can add more modern stuff (OpenMP is updated in Linux/Mac, but not in Windows), can be done with the variable __OPENMP, for example:

#include

// Define macros for different OpenMP versions
#if defined(__OPENMP)
#if __OPENMP >= 201811
#define OPENMP_VERSION "OpenMPv5.0+"
#elif __OPENMP >= 201511
#define OPENMP_VERSION "OpenMPv4.5+"
#elif __OPENMP >= 200805
#define OPENMP_VERSION "OpenMPv3.0+"
#else
#define OPENMP_VERSION "OpenMPv2.0"
#endif
#else
#define OPENMP_VERSION "OpenMP is not supported"
#endif

int main() {
std::cout << OPENMP_VERSION << std::endl;
return 0;
}

about this, the point is that we are essentially restricted to the lowest because of MSVC. I don't love the idea of having compile time dependencies on the version of openmp and different behaviours depending on it. (but this is just my CURRENT pesonal opinion, and i am open to contributions about this).

Also i agree with @matekelemen that we should transition away from openmp and move towards native parallelism.

the point i am raising here is however slightly different:
as of now we are doing the scheduling by hand, which is based on a partitionining in few chunks (as many of the cores). If we want to change to dynamic parallelism we should change the chunking first.

On the positive side, i think that @loumalouomega argument about transitioning to dynamic is because of the tendence to having heterogeneous cores (E-cores and P-cores on intel). In the context in which not all the cores are the same it does make sense to use dynamic over static...

@loumalouomega
Copy link
Member

I would also suggest to detect the OpenMP version so we can add more modern stuff (OpenMP is updated in Linux/Mac, but not in Windows), can be done with the variable __OPENMP, for example:
#include
// Define macros for different OpenMP versions
#if defined(__OPENMP)
#if __OPENMP >= 201811
#define OPENMP_VERSION "OpenMPv5.0+"
#elif __OPENMP >= 201511
#define OPENMP_VERSION "OpenMPv4.5+"
#elif __OPENMP >= 200805
#define OPENMP_VERSION "OpenMPv3.0+"
#else
#define OPENMP_VERSION "OpenMPv2.0"
#endif
#else
#define OPENMP_VERSION "OpenMP is not supported"
#endif
int main() {
std::cout << OPENMP_VERSION << std::endl;
return 0;
}

about this, the point is that we are essentially restricted to the lowest because of MSVC. I don't love the idea of having compile time dependencies on the version of openmp and different behaviours depending on it. (but this is just my CURRENT pesonal opinion, and i am open to contributions about this).

Also i agree with @matekelemen that we should transition away from openmp and move towards native parallelism.

the point i am raising here is however slightly different: as of now we are doing the scheduling by hand, which is based on a partitionining in few chunks (as many of the cores). If we want to change to dynamic parallelism we should change the chunking first.

On the positive side, i think that @loumalouomega argument about transitioning to dynamic is because of the tendence to having heterogeneous cores (E-cores and P-cores on intel). In the context in which not all the cores are the same it does make sense to use dynamic over static...

We can also define it in execution time and detect the CPU type before assigning the schedule type

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants