Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not try to compute initial solution for inactive multi-segment wells split across processors #5751

Merged
merged 3 commits into from
Nov 21, 2024

Conversation

vkip
Copy link
Member

@vkip vkip commented Nov 19, 2024

No description provided.

@vkip
Copy link
Member Author

vkip commented Nov 19, 2024

jenkins build this please

@bska
Copy link
Member

bska commented Nov 19, 2024

For what it's worth, this PR allows me to run a field case as mpirun -np 14 which, without this PR, crashes very early in the initialisation process.

I'll nevertheless defer to those more familiar with this part of the code to review the PR as there may be aspects of the structure that I don't fully grasp.

@GitPaean
Copy link
Member

Is there some more information regarding the symptom? Where does it crash exactly?

@bska
Copy link
Member

bska commented Nov 20, 2024

Where does it crash exactly?

In current master we crash in WellState<>::initWellStateMSWell() when indexing into an empty perf_rates object when n_activeperf > 0 (n_activeperf = 22 for w=0 in one of my test runs).

That said, if we want to use the proposed guard, then we should at least amend it to perf_rates.size() != n_activeperf * np, since there is supposed to be np entries for each active connection/perforation.

@vkip
Copy link
Member Author

vkip commented Nov 20, 2024

That said, if we want to use the proposed guard, then we should at least amend it to perf_rates.size() != n_activeperf * np, since there is supposed to be np entries for each active connection/perforation.

Or just perf_data.size() != n_active_perf, as it is currently in the PR?

@GitPaean
Copy link
Member

GitPaean commented Nov 20, 2024

My main concern is that with an if condition of inequality of these two variable is too broad for the targeted situation, might cover up other scenarios/bugs for future (we are not running distributed parallel ms wells yet, it should be addressed by that development for the situation of parallel ms well running).

If we know it was because that the well is SHUT, why do not we use that types of if condition to make it more clear that it was due to SHUT of the well. (at least something like ws.perf_data.size() == 0)

And also, let us output some DEBUG information or throw if ws.perf_data.size()) > 0, and ws.perf_data.size()) and n_activeperf are not equal. If it crashes in the future because they are not equal, then we check that specific scenario to have a more proper investigation and fixing.

@vkip
Copy link
Member Author

vkip commented Nov 20, 2024

My main concern is that with an if condition of inequality of these two variable is too broad for the targeted situation, might cover up other scenarios/bugs for future (we are not running distributed parallel ms wells yet, it should be addressed by that development for the situation of parallel ms well running).

I agree that the case with distributed active wells need to be handled by that development, hence the \todo message.

If we know it was because that the well is SHUT, why do not we use that types of if condition to make it more clear that it was due to SHUT of the well. (at least something like ws.perf_data.size() == 0)

When allowing to split inactive wells (that are never open at any time during the simulation) across processes, ws.perf_data.size() is not equal to zero here. There are perforations, since these wells may need to output RFT data, but each process may not have all of them.

Checking for SHUT sounds dangerous, since I guess wells may open during a time step..?

And also, let us output some DEBUG information or throw if ws.perf_data.size()) > 0, and ws.perf_data.size()) and n_activeperf are not equal. If it crashes in the future because they are not equal, then we check that specific scenario to have a more proper investigation and fixing.

Since this is not an error situation I think we should avoid DEBUG messages and definitely throws.

@vkip
Copy link
Member Author

vkip commented Nov 20, 2024

I can add a more explicit check for inactive wells, then (for now) throw for distributed wells. Does that sound ok?

@GitPaean
Copy link
Member

GitPaean commented Nov 20, 2024

I can add a more explicit check for inactive wells, then (for now) throw for distributed wells. Does that sound ok?

Yes, that is sensible.

And we discussed a little bit. Since we decide some inactive wells can be distributed across processes, there should be a way/criteria to detect/decide which wells can be split. For those wells, since we can not do much (like opening them), let us do minimal things with them. For example, if possible, not initialize unneeded wellstate information (you are the one knows the best regarding this issue).

For the function initWellStateMSWell(), you can safely continue at the beginning of the for loop for those wells, and for init() and base_init(), we can also do less possibly but I am not familiar related to the RFT usage.

Please let us know what you think of it.

// \todo{ Update the procedure below to work for actually distributed wells. }
if (static_cast<int>(ws.perf_data.size()) != n_activeperf)
if (this->is_inactive_well(well_ecl.name()))
continue;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use the following code,

                if (this->is_inactive_well(well_ecl.name()))
                    continue;

right after the beginning of the for loop
image

for other scenarios, if (static_cast<int>(ws.perf_data.size()) != n_activeperf) we can throw while the new parallel ms wells development will need to handle it.

@@ -273,6 +273,7 @@ void WellState<Scalar>::init(const std::vector<Scalar>& cellPressures,
report_step,
wells_ecl);
well_rates.clear();
this->inactive_well_names_ = schedule.getInactiveWellNamesAtEnd();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is using schedule.getInactiveWellNamesAtEnd(); to determine whether a well can be split across the processes, I will suggest a more specific name for inactive_well_names_, to show the wells will be shut all the time and can not be open across the simulation. Like permanently_inactive_well_names_ and the corresponding function name can be is_permanently_inactive_well.

@vkip vkip force-pushed the dont_initsolve_inactive_msw branch from b60eecb to 673d541 Compare November 20, 2024 15:25
@vkip
Copy link
Member Author

vkip commented Nov 20, 2024

jenkins build this please

@GitPaean
Copy link
Member

@bska , can you test whether the current version fix the running of your case? I am happy with the current approach that has a more specific design to tackle the problem. You can review/merge as you will.

Copy link
Member

@bska bska left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you test whether the current version fix the running of your case?

I've just completed a test of field case I mentioned before. I can confirm that the case continues to run in parallel (mpirun -np 14) with this edition of the PR. In the current master sources the case does not run in parallel, but it does run in sequential mode.

I am happy with the current approach that has a more specific design to tackle the problem.

It looks good to me too. At some point we may consider moving the Schedule::getInactiveWellNamesAtEnd() call to the WellState constructor, however. We call WellState<>::init() at least once for each report step and I don't really expect getInactiveWellNamesAtEnd() to change although I may be missing something.

In any case, this fixes a real problem on a real case so I'll merge into master.

@bska bska merged commit 20a13ee into OPM:master Nov 21, 2024
1 check passed
@vkip vkip deleted the dont_initsolve_inactive_msw branch November 21, 2024 09:20
@lisajulia
Copy link
Contributor

lisajulia commented Nov 22, 2024

can you test whether the current version fix the running of your case?

I've just completed a test of field case I mentioned before. I can confirm that the case continues to run in parallel (mpirun -np 14) with this edition of the PR. In the current master sources the case does not run in parallel, but it does run in sequential mode.

I am happy with the current approach that has a more specific design to tackle the problem.

It looks good to me too. At some point we may consider moving the Schedule::getInactiveWellNamesAtEnd() call to the WellState constructor, however. We call WellState<>::init() at least once for each report step and I don't really expect getInactiveWellNamesAtEnd() to change although I may be missing something.

In any case, this fixes a real problem on a real case so I'll merge into master.

@bska can you rerun the test field case you were running with mpirun -np 14 once more with the current master and/or send me the file so I can also check this on my side?
I've been working on running MSWells in parallel, I've split my work into two PRs (assembly #5680 and solving #5746) and I would like to test with that file as well.

Thanks!

@bska
Copy link
Member

bska commented Nov 22, 2024

can you rerun the test field case you were running with mpirun -np 14 once more with the current master

Sure. Is there anything in particular you'd like me to look out for?

@lisajulia
Copy link
Contributor

can you rerun the test field case you were running with mpirun -np 14 once more with the current master

Sure. Is there anything in particular you'd like me to look out for?

Nothing in particular, just check if the case runs through as expected. Thanks!

@bska
Copy link
Member

bska commented Nov 22, 2024

can you rerun the test field case you were running with mpirun -np 14 once more with the current master

Sure. Is there anything in particular you'd like me to look out for?

Nothing in particular, just check if the case runs through as expected

Cool. I'll just rebuild everything first to make sure I have a consistent set of binaries given the CMake changes that were just merged.

@bska
Copy link
Member

bska commented Nov 22, 2024

Nothing in particular, just check if the case runs through as expected

@lisajulia : The model does indeed still run as mpirun -np 14.

@GitPaean
Copy link
Member

GitPaean commented Nov 22, 2024

I think the concern only applies when we actually distribute the MS wells across processes.

@lisajulia
Copy link
Contributor

lisajulia commented Nov 22, 2024

Yes :) @bska : can you also try with this PR? #5746

@bska
Copy link
Member

bska commented Nov 22, 2024

can you also try with PR #5746?

I got slightly different timestepping behaviour between master and that PR, but not different enough that it's possible to say that one run is "better" than the other. Final TCPU is currently slightly higher with #5746 than in master as of #5756.

On a sidenote, if AllowDistributedWells is supposed to work as of #5746, then there's still something missing as I get the diagnostic below when setting the value to true.

Error: Option --allow-distributed-wells=true is only allowed if model
only has only standard wells. You need to provide option 
 with --enable-multisegement-wells=false to treat existing 
multisegment wells as standard wells.

Error: [${ROOT}/opm-simulators/opm/simulators/flow/FlowGenericVanguard.cpp:332] All wells need to be standard wells!

@lisajulia
Copy link
Contributor

Ok thanks, I will take this setting into account for my PR #5746 !

@akva2
Copy link
Member

akva2 commented Nov 22, 2024

do address the typo in the message as well (--enable-multisegment-wells=false)

@lisajulia
Copy link
Contributor

do address the typo in the message as well (--enable-multisegment-wells=false)

6bdb801#diff-cdbb36d3d28bb6896b6aa7d316bc42496e4feb0bca83f210919e4826dc7f275dR327

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants