Memory use in the `combine` function exceeds (by factor of 2-4 or more) the memory limit passed in #638

mwcraig · 2018-08-10T22:25:22Z

The combine function allows one to set a limit on the amount of memory used. Unfortunately, the function actually uses more memory than that limit when performing the image combination.

Still working on tracking down the root cause, but will post in a moment some example memory profiles.

The text was updated successfully, but these errors were encountered:

mwcraig · 2018-08-11T18:00:38Z

Running the memory profiles

To run the memory profiles shown below:

install memory_profiler. conda install memory_profiler (or use pip if you prefer)
Check out the branch mwcraig/fix-open-files-in-combiner. These tests from that branch were run on with commit c20a7bf
Copy the file run_with_file_number_limit.py to run_with_file_number_limit_copy.py (or something like that). You are making a copy so that when you check out master later you have a copy of the script.
Change line 185 from resource.setrlimit(resource.RLIMIT_NOFILE, (n, n)) to resource.setrlimit(resource.RLIMIT_NOFILE, (n + 20, n + 20)). Or just grab the file from this gist
Run the profile with: mprof run --interval 0.05 run_with_file_number_limit_copy.py --overhead 6 --kind fits --open-by combine-chunk --size 2000 --frequent-gc 20
Open the graph mprof produces with mprof plot.

Note: If you want the marks where a particular function begins/ends you need to mark the function with a decorator @profile. You do not need to import profile. I do not think the markers are necessary, but included them below to confirm when memory is being used/freed.

Parameters of profiling runs

Each run generates several FITS files of dimension 2000×2000, all zeros. [Fun fact: np.zeros does not allocate memory when called. It allocates when the array is first used.] This is not a good set of images for profiling clipping because nothing will be clipped.
The memory limit for the combine function is the size of one file, 32000000 bytes. That is probably a poor choice as it makes interpretation of the results a little confusing.
The number of files generated and other parameters, like the memory limit imposed, are printed in the terminal.

Interpreting the output

The graphs below show snippets from the memory profiles of two commits, one the tip of the file refactor, the other master (not the tip of master as of today, it is a few commits back). Prior to the interval shown memory use in both cases has reached roughly 150MB. That is prior to the beginning of the actual image combination.

The cyan "brackets" show when Combiner.__init__ is entered (opening bracket [) and exited (closing bracket ]). The gold brackets show when Combiner.average_combine is entered exited.

After refactor to limit number of open files (commit `c20a7bf`)

Discussion

I believe the jump from 150MB to what looks like a new "baseline" of 180MB is due to the creation of the CCDData object to hold the results. My expectation from the documentation is that during the combination, memory use for performing the combination will not exceed the limit set (32000000 MB), i.e. that memory use will not exceed 180 + 32 = 212MB. In fact, peak memory usage regularly reaches roughly 270MB, roughly 2x what I expected.

The sharp drops after average_combine exits are due to the explicit deletion of the tile_combiner and a couple others; most of the drop is from the tile_combiner.

Recent master (commit `c289d4e`)

Discussion

A few things jump out here. One is that the peak memory usage is much higher above the 150MB baseline. Peak is 290MB, and the floor on memory use never drops below 240MB, i.e. about 60MB above starting, and 30MB above the limit. Not sure why that is, exactly, but the new version is definitely an improvement.

Summary

The easiest fix here is to increase the number of chunks the file is broken into by adding a "fudge factor" of 2 or 4. I don't think that will slow the code appreciably and I don't think we will be able to accurately predict what memory usage will be in all cases. As an example, I plan to open a PR to switch us to using astropy sigma clipping instead of our own. Currently astropy sigma clipping makes copies of its input (because sorting is done in-place, I think) so memory usage will depend on whether sigma clipping is done. More broadly, we can't predict what memory use might be in other functions the user passes in.

To some extent, I think that is not our problem to solve. On the other hand, it seems safe to assume that memory use for combination will be at last twice the memory limit if the number of chunks is calculated based only on the size of the images to be combined and it isn't hard to imagine it being 4 times.

I don't think we have to be super strict about the amount of memory....but in the PR I'll try to come up with a test we can add to make sure there are not any bad regressions on this front.

MSeifert04 · 2018-08-11T20:38:23Z

Thanks Matt for the careful analysis.

I wonder if it wouldn't be enough (especially because of your "we will be able to accurately predict what memory usage will be in all cases. " statement) to fix the documentation to clearly state that "*roughly that amount of input data will be used and that the effective memory usage during combination can significantly higher (in general 2-4 times is to be expected)".

We probably should investigate if we can reduce the memory-usage but given that we allow arbitrary combine and uncertainty functions it will be completely impossible to know what fudge factor makes sense. Even in the default case we would rely on numpy implementation details...

MSeifert04 · 2018-08-11T20:40:52Z

As an example, I plan to open a PR to switch us to using astropy sigma clipping instead of our own.

Just out of interest: Did you happen to do memory profilings on #632? It probably isn't something we should use as-is (numba is cool but without anaconda a real pain to install & use) but an equivalent Cython/C approach wouldn't be too complicated.

mwcraig · 2018-08-12T17:19:03Z

Just out of interest: Did you happen to do memory profilings on #632?

Can do today; would be interesting to see what happens.

mwcraig · 2018-08-12T17:21:15Z

We probably should investigate if we can reduce the memory-usage

I'll try to write a more extensive set of profile cases today and write a more proper memory test runner instead of hacking the files-open runner. To some extent the examples above are really "toy" cases since the memory limit is being set to the size of one image. In a more real-world case it may be that the memory overhead is much less....

mwcraig · 2018-08-14T19:31:50Z

Updated discussion and more (and more realistic) profiles in #642

mwcraig added bug combiner package-expert effort-medium labels Aug 10, 2018

mwcraig added this to the 2.0 milestone Aug 10, 2018

mwcraig mentioned this issue Aug 10, 2018

Refactor combiner.combine to minimize the number of files open at one time #630

Merged

3 tasks

mwcraig mentioned this issue Aug 14, 2018

Change formula for chunking of files in combine #642

Merged

3 tasks

crawfordsm closed this as completed in #642 Jul 27, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory use in the `combine` function exceeds (by factor of 2-4 or more) the memory limit passed in #638

Memory use in the `combine` function exceeds (by factor of 2-4 or more) the memory limit passed in #638

mwcraig commented Aug 10, 2018

mwcraig commented Aug 11, 2018 •

edited

Loading

MSeifert04 commented Aug 11, 2018 •

edited

Loading

MSeifert04 commented Aug 11, 2018

mwcraig commented Aug 12, 2018

mwcraig commented Aug 12, 2018

mwcraig commented Aug 14, 2018

Memory use in the combine function exceeds (by factor of 2-4 or more) the memory limit passed in #638

Memory use in the combine function exceeds (by factor of 2-4 or more) the memory limit passed in #638

Comments

mwcraig commented Aug 10, 2018

mwcraig commented Aug 11, 2018 • edited Loading

Running the memory profiles

Parameters of profiling runs

Interpreting the output

After refactor to limit number of open files (commit c20a7bf)

Discussion

Recent master (commit c289d4e)

Discussion

Summary

MSeifert04 commented Aug 11, 2018 • edited Loading

MSeifert04 commented Aug 11, 2018

mwcraig commented Aug 12, 2018

mwcraig commented Aug 12, 2018

mwcraig commented Aug 14, 2018

Memory use in the `combine` function exceeds (by factor of 2-4 or more) the memory limit passed in #638

Memory use in the `combine` function exceeds (by factor of 2-4 or more) the memory limit passed in #638

mwcraig commented Aug 11, 2018 •

edited

Loading

After refactor to limit number of open files (commit `c20a7bf`)

Recent master (commit `c289d4e`)

MSeifert04 commented Aug 11, 2018 •

edited

Loading