-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove solver_.parameters_ #668
base: main
Are you sure you want to change the base?
Conversation
I do see some calls where solver.solver_.parameters_ is not being modifiedbut rather the value is getting used. Example: EXPECT_EQ(solver.solver_.parameters_.absolute_tolerance_[state.variable_map_["Ar"]], 1.0e-12); I am unsure if we need to change these as well? |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #668 +/- ##
==========================================
+ Coverage 93.19% 93.56% +0.37%
==========================================
Files 53 53
Lines 3586 3607 +21
==========================================
+ Hits 3342 3375 +33
+ Misses 244 232 -12 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right direction, but we still have some more work to do. Any mechanism-specific thing (like tolerances) need to be moved somewhere else. We should spend some time figuring out where they need to go. Also, the parameters need to be any set of paramters (rosenbrock, backward euler, etc.)
d059401
to
32a16e2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @montythind for working on this PR. I observed the following failures on Derecho:
15 - cuda_rosenbrock (SEGFAULT)
31 - rosenbrock (SEGFAULT)
45 - regression_test_solve (NUMERICAL)
46 - chapman_integration (SEGFAULT)
47 - analytical_rosenbrock_integration (NUMERICAL)
48 - analytical_backward_euler (NUMERICAL)
49 - terminator (NUMERICAL)
50 - integrated_reaction_rates (NUMERICAL)
52 - README_example (NUMERICAL)
53 - solve_results (NUMERICAL)
54 - multiple_grid_cells (NUMERICAL)
55 - rate_constants_no_user_defined_example_by_hand (NUMERICAL)
56 - rate_constants_user_defined_example_by_hand (NUMERICAL)
57 - solver_configuration (NUMERICAL)
59 - rate_constants_no_user_defined_example_with_config (NUMERICAL)
60 - rate_constants_user_defined_example_with_config (NUMERICAL)
61 - carbon_bond_5_example (NUMERICAL)
62 - chapman_example (NUMERICAL)
63 - robertson_example (NUMERICAL)
64 - ts1_example (NUMERICAL)
I am not sure what happened but after my latest pull and merge- lot of tests failed. I am looking what caused the test failures |
@sjsprecious we've addressed some of your comments. Also, here is the timing of the cuda tests:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @K20shores for working on this giant PR. It looks good to me and I just have a few minor questions.
absolute_tolerance_param_.number_of_elements_ = this->absolute_tolerance_.size(); | ||
absolute_tolerance_param_.number_of_grid_cells_ = 1; | ||
|
||
CHECK_CUDA_ERROR(micm::cuda::MallocVector<double>(absolute_tolerance_param_, absolute_tolerance_param_.number_of_elements_), "cudaMalloc"); | ||
CHECK_CUDA_ERROR(micm::cuda::CopyToDevice<double>(absolute_tolerance_param_, this->absolute_tolerance_), "cudaMemcpyHostToDevice"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After a second thought, shall we copy the absolute tolerance in the constructor or somewhere instead? The variables_
and rate_constants_
are changed every time step but the absolute_tolerance_param_
is not. Thus we should only need to copy it once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyt ideas where else we can copy these in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can do it in the constructor of CudaState
here: https://github.com/NCAR/micm/blob/main/include/micm/cuda/solver/cuda_state.hpp#L24-L25, assuming that the absolute_tolerance_param_
is already initialized properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @montythind , I have pushed the changes needed to address this comment. Let me know if you have any questions about it.
@K20shores The CUDA LU decomposition and process set tests seem to take lots of time than what I observed on Derecho. Are they running on Derecho or your local laptop? |
This was on derecho |
Interesting. I just ran the CUDA tests on my branch for the LU decomposition rename PR and it ran much faster.
Can you run it on a scratch directory in case it is an issue due to the filesystem for the home directory? |
Oh man. That's drastically longer. I will try scratch and see what happens |
@sjsprecious I guess it was a scratch issue?
|
@K20shores The new result is consistent with mine now. I noticed that you were running the CUDA tests on your home directory on Derecho and I think that was the problem. |
This PR includes moving Parameters
absolute tolerance and relative tolearnce to state from solver
Note:
I have left
std::vector absolute_tolerance_;
double relative_tolerance_{ 1e-6 };
in rosenbrock_solver_prarameter.hpp as the CUDA tests are failing because tolerance doesnt get copied over to the device. Issue occurs in CudaRosenbrockSolver constructor when doing this->parameters_.absolute_tolerance_.size(); it comes out as zero
We can potential get rid of this line in solver_builder.inl. This is also a temporary. solution.
options.absolute_tolerance_ = state_parameters.absolute_tolerance_;
Along with that i have commented out unit tests till we don't have a solution how to deal with tolerance.
closes #621