Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix RAII issue by introducing wrapper classes for backend plans #208

Open
wants to merge 41 commits into
base: main
Choose a base branch
from

Conversation

yasahi-hpc
Copy link
Collaborator

@yasahi-hpc yasahi-hpc commented Dec 3, 2024

Resolves #197

In this PR, we try to resolve RAII issues by introducing a thin wrapper for backend fft plans.
Following modifications are made,

  • Introducing a KokkosFFT_FFTW_Types.hpp file including the FFTW plan wrapper class and common code bases related fftw
  • Introducing a wrapper class for automatic deallocation. Wrappers are added for FFTW, cufft, hipfft, and rocfft (Scoped***Plan)
    For CUDA, we have
struct ScopedCufftPlanType {
   cufftHandle m_plan;
   // Should we raise an error here, if it fails?
   ScopedCufftPlanType() { cufftCreate(&m_plan); }
   
   // Should we abort here, if it fails?
   ~ScopedCufftPlanType() { cufftDestroy(m_plan); }

   cufftHandle &plan() { return m_plan; }
};
  • Info and Buffer that are required only for rocfft are moved inside rocfft helper class
  • Add missing cleanup_threads() in the destructor of FFTW helper class
  • Remove destroy_plan_and_info since it is automatically done in destructors of helper classes

Questions

  1. We are creating plans inside a constructor for cufft and hipfft. Should we raise assertions in constructor if it fails?
  2. What should we do if destroy functions in destructor fails. Should we abort?

@yasahi-hpc yasahi-hpc self-assigned this Dec 3, 2024
@yasahi-hpc yasahi-hpc added bug Something isn't working enhancement New feature or request cleanup labels Dec 3, 2024
@yasahi-hpc yasahi-hpc changed the title Fix raii issue Fix RAII issue by introducing wrapper classes for backend plans Dec 3, 2024
@yasahi-hpc yasahi-hpc marked this pull request as draft December 3, 2024 15:04
@yasahi-hpc yasahi-hpc marked this pull request as ready for review December 4, 2024 19:30
@tpadioleau
Copy link
Member

I would go a step further and make these RAII classes unique-owners. That implies to disallow shallow copy either by simply deleting copy constructor and assignment or make them allocate and copy a new plan if backends offer such a copy function. We can keep the move semantics but we have to keep track if the current object must call cufftDestroy in the destructor. If we delete both the copy and move semantics that gives:

class ScopedCufftPlanType {
 private:
  cufftHandle m_plan;

 public:
  ScopedCufftPlanType() {
    cufftResult cufft_rt = cufftCreate(&m_plan);
    KOKKOSFFT_THROW_IF(cufft_rt != CUFFT_SUCCESS, "cufftCreate failed");
  }

  ScopedCufftPlanType(ScopedCufftPlanType const& rhs) = delete;

  ScopedCufftPlanType(ScopedCufftPlanType&& rhs) = delete;

  ~ScopedCufftPlanType() noexcept {
    cufftResult cufft_rt = cufftDestroy(m_plan);
    if (cufft_rt != CUFFT_SUCCESS) Kokkos::abort("cufftDestroy failed");
  }

  ScopedCufftPlanType& operator=(ScopedCufftPlanType const& rhs) = delete;

  ScopedCufftPlanType& operator=(ScopedCufftPlanType&& rhs) = delete;

  cufftHandle plan() const noexcept { return m_plan; }
};

The ScopedCufftPlanType::plan can return the backend handle by copy for CUDA, I don't know for the other backends.

I notice that we could also improve the error messages, I see that cufftCreate can return different error types https://docs.nvidia.com/cuda/cufft/index.html?highlight=cufftMakePlan1d#c.cufftCreate.

@yasahi-hpc
Copy link
Collaborator Author

I would go a step further and make these RAII classes unique-owners. That implies to disallow shallow copy either by simply deleting copy constructor and assignment or make them allocate and copy a new plan if backends offer such a copy function. We can keep the move semantics but we have to keep track if the current object must call cufftDestroy in the destructor.

I would prefer to disallow both copy and move, because it may be unsafe for some backends.

If we delete both the copy and move semantics that gives:

class ScopedCufftPlanType {
 private:
  cufftHandle m_plan;

 public:
  ScopedCufftPlanType() {
    cufftResult cufft_rt = cufftCreate(&m_plan);
    KOKKOSFFT_THROW_IF(cufft_rt != CUFFT_SUCCESS, "cufftCreate failed");
  }

  ScopedCufftPlanType(ScopedCufftPlanType const& rhs) = delete;

  ScopedCufftPlanType(ScopedCufftPlanType&& rhs) = delete;

  ~ScopedCufftPlanType() noexcept {
    cufftResult cufft_rt = cufftDestroy(m_plan);
    if (cufft_rt != CUFFT_SUCCESS) Kokkos::abort("cufftDestroy failed");
  }

  ScopedCufftPlanType& operator=(ScopedCufftPlanType const& rhs) = delete;

  ScopedCufftPlanType& operator=(ScopedCufftPlanType&& rhs) = delete;

  cufftHandle plan() const noexcept { return m_plan; }
};

The ScopedCufftPlanType::plan can return the backend handle by copy for CUDA, I don't know for the other backends.

Does it return a shallow copy of the cufftHandle? I could not find the details of copy constructor of cufftHandle. That should work for oneMKL backend, but I am not sure about the behavior of other backends

I notice that we could also improve the error messages, I see that cufftCreate can return different error types https://docs.nvidia.com/cuda/cufft/index.html?highlight=cufftMakePlan1d#c.cufftCreate.

That is true. We can improve the error messages

@tpadioleau
Copy link
Member

I would prefer to disallow both copy and move, because it may be unsafe for some backends.

Ok for me. I don't think there should be any problem with implementing the move semantics later. It requires us to only pass these objects by reference.

Does it return a shallow copy of the cufftHandle? I could not find the details of copy constructor of cufftHandle. That should work for oneMKL backend, but I am not sure about the behavior of other backends

I mean if we do

ScopedCufftPlanType plan1;
ScopedCufftPlanType plan2;
plan2 = plan1;

This compiles fine and plan2 will get the same cufftHandle than plan1. The plan2 will destroy correctly the handle but then plan1 will likely fail to do so.

@yasahi-hpc
Copy link
Collaborator Author

I would prefer to disallow both copy and move, because it may be unsafe for some backends.

Ok for me. I don't think there should be any problem with implementing the move semantics later. It requires us to only pass these objects by reference.

Does it return a shallow copy of the cufftHandle? I could not find the details of copy constructor of cufftHandle. That should work for oneMKL backend, but I am not sure about the behavior of other backends

I mean if we do

ScopedCufftPlanType plan1;
ScopedCufftPlanType plan2;
plan2 = plan1;

This compiles fine and plan2 will get the same cufftHandle than plan1. The plan2 will destroy correctly the handle but then plan1 will likely fail to do so.

That is right. This does not work in the current implementation. Yes, for safety, it is better to disallow copy

fft/src/KokkosFFT_Cuda_types.hpp Outdated Show resolved Hide resolved
fft/src/KokkosFFT_Cuda_transform.hpp Outdated Show resolved Hide resolved
fft/src/KokkosFFT_Cuda_transform.hpp Outdated Show resolved Hide resolved
fft/src/KokkosFFT_Cuda_types.hpp Outdated Show resolved Hide resolved
fft/src/KokkosFFT_FFTW_Types.hpp Outdated Show resolved Hide resolved
fft/src/KokkosFFT_ROCM_types.hpp Outdated Show resolved Hide resolved
void allocate_work_buffer(std::size_t workbuffersize) {
m_buffer = BufferViewType("workbuffer", workbuffersize);
}
rocfft_plan &plan() { return m_plan; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
rocfft_plan &plan() { return m_plan; }
rocfft_plan plan() const noexcept { return m_plan; }

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

Comment on lines 133 to 135
[](rocfft_plan_description* desc) {
rocfft_plan_description_destroy(*desc);
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[](rocfft_plan_description* desc) {
rocfft_plan_description_destroy(*desc);
});
[](rocfft_plan_description* desc) {
rocfft_plan_description_destroy(*desc);
delete desc;
});

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I introduced a wrapper for description as well for RAII. This part has been removed

[](rocfft_plan_description* desc) {
rocfft_plan_description_destroy(*desc);
});
rocfft_status status = rocfft_plan_description_create(&(*description));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can improve the situation by also wrapping rocfft_plan_description.

In case rocfft_plan_description_create fails, the next KOKKOSFFT_THROW_IF will throw, calling the destructor of description thus calling rocfft_plan_description_destroy even though the rocfft_plan_description object has not been initialised.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have introduced a wrapper accordingly

Comment on lines 164 to 169
plan = std::make_unique<PlanType>();
status = rocfft_plan_create(&(*plan), place, fft_direction, precision,
status = rocfft_plan_create(&(plan->plan()), place, fft_direction, precision,
reversed_fft_extents.size(), // Dimension
reversed_fft_extents.data(), // Lengths
howmany, // Number of transforms
description // Description
howmany, // Number of transforms
*description // Description
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not making this part of the constructor of ScopedRocfftPlan ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the logic into the constructor

Copy link
Collaborator Author

@yasahi-hpc yasahi-hpc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tpadioleau
Thank you for reviews. I have fixed accordingly

Comment on lines +110 to +114
if constexpr (std::is_same_v<plan_type, fftwf_plan>) {
if (m_is_created) fftwf_destroy_plan(m_plan);
} else {
if (m_is_created) fftw_destroy_plan(m_plan);
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may expect that fftw_plan_many_dft may fail if we request too much memory

fft/src/KokkosFFT_Cuda_types.hpp Outdated Show resolved Hide resolved
fft/src/KokkosFFT_Cuda_transform.hpp Outdated Show resolved Hide resolved
fft/src/KokkosFFT_Cuda_transform.hpp Outdated Show resolved Hide resolved
fft/src/KokkosFFT_Cuda_types.hpp Outdated Show resolved Hide resolved
fft/src/KokkosFFT_ROCM_types.hpp Outdated Show resolved Hide resolved
void allocate_work_buffer(std::size_t workbuffersize) {
m_buffer = BufferViewType("workbuffer", workbuffersize);
}
rocfft_plan &plan() { return m_plan; }
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

Comment on lines 133 to 135
[](rocfft_plan_description* desc) {
rocfft_plan_description_destroy(*desc);
});
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I introduced a wrapper for description as well for RAII. This part has been removed

[](rocfft_plan_description* desc) {
rocfft_plan_description_destroy(*desc);
});
rocfft_status status = rocfft_plan_description_create(&(*description));
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have introduced a wrapper accordingly

Comment on lines 164 to 169
plan = std::make_unique<PlanType>();
status = rocfft_plan_create(&(*plan), place, fft_direction, precision,
status = rocfft_plan_create(&(plan->plan()), place, fft_direction, precision,
reversed_fft_extents.size(), // Dimension
reversed_fft_extents.data(), // Lengths
howmany, // Number of transforms
description // Description
howmany, // Number of transforms
*description // Description
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the logic into the constructor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cleanup enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RAII issue
2 participants