Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test: Use a numba-powered function for median sigma clipping #632

Closed
wants to merge 1 commit into from
Closed

Test: Use a numba-powered function for median sigma clipping #632

wants to merge 1 commit into from

Conversation

MSeifert04
Copy link
Contributor

Just a test (see also #624 (comment)), it's not meant to be merged 😄

Related to #624

@MSeifert04
Copy link
Contributor Author

MSeifert04 commented Jul 7, 2018

For the example in #624 (comment) the new version takes 1min 24s while the old version takes 4min 25s on my computer for the combine(...) line. Not sure if the difference justifies the special-casing and the numba dependency 🤷‍♂️

@MSeifert04
Copy link
Contributor Author

MSeifert04 commented Jul 7, 2018

I also did a line-profiling of combine to compare the difference. Based on astropy '3.1.dev22145'

The really important line here is the one that does the clipping:

# Old
   763         1 2536309643.0 2536309643.0     75.4                  getattr(tile_combiner, to_call)(**to_call_in_combiner[to_call])

# New
   812         1  303152937.0 303152937.0     20.7                  getattr(tile_combiner, to_call)(**to_call_in_combiner[to_call])

# Time:
2536309643.0  # old
 303152937.0  # new

# Percents
75.4  # old
20.7  # new

However I also copied the complete line-profiling, just in case

Old

Timer unit: 1e-07 s

Total time: 336.567 s
File: d:\git\python\ccdproc\ccdproc\combiner.py
Function: combine at line 513

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   513                                           def combine(img_list, output_file=None,
   514                                                       method='average', weights=None, scale=None, mem_limit=16e9,
   515                                                       clip_extrema=False, nlow=1, nhigh=1,
   516                                                       minmax_clip=False, minmax_clip_min=None, minmax_clip_max=None,
   517                                                       sigma_clip=False,
   518                                                       sigma_clip_low_thresh=3, sigma_clip_high_thresh=3,
   519                                                       sigma_clip_func=ma.mean, sigma_clip_dev_func=ma.std,
   520                                                       dtype=None, combine_uncertainty_function=None, **ccdkwargs):
   624         1        117.0    117.0      0.0      if not isinstance(img_list, list):
   627         1         53.0     53.0      0.0          if isinstance(img_list, np.ndarray):
   628         1        716.0    716.0      0.0              img_list = img_list.tolist()
   629                                                   elif isinstance(img_list, str) and (',' in img_list):
   630                                                       img_list = img_list.split(',')
   631                                                   else:
   632                                                       raise ValueError(
   633                                                           "unrecognised input for list of images to combine.")
   634                                           
   636         1         53.0     53.0      0.0      if method == 'average':
   637         1         31.0     31.0      0.0          combine_function = 'average_combine'
   638                                               elif method == 'median':
   639                                                   combine_function = 'median_combine'
   640                                               elif method == 'sum':
   641                                                   combine_function = 'sum_combine'
   642                                               else:
   643                                                   raise ValueError("unrecognised combine method : {0}.".format(method))
   644                                           
   646         1        435.0    435.0      0.0      if isinstance(img_list[0], CCDData):
   647                                                   ccd = img_list[0].copy()
   648                                               else:
   650         1    1144163.0 1144163.0      0.0          ccd = CCDData.read(img_list[0], **ccdkwargs)
   651                                           
   655         1         85.0     85.0      0.0      if ccd.uncertainty is None and combine_uncertainty_function is not None:
   656                                                   ccd.uncertainty = StdDevUncertainty(np.zeros(ccd.data.shape))
   657                                           
   658         1         29.0     29.0      0.0      if dtype is None:
   659         1         55.0     55.0      0.0          dtype = np.float64
   660                                           
   664         1         90.0     90.0      0.0      if ccd.data.dtype != dtype:
   665         1   18088302.0 18088302.0      0.5          ccd.data = ccd.data.astype(dtype)
   666                                           
   667         1        138.0    138.0      0.0      size_of_an_img = ccd.data.nbytes
   668         1         25.0     25.0      0.0      try:
   669         1        147.0    147.0      0.0          size_of_an_img += ccd.uncertainty.array.nbytes
   672         1         51.0     51.0      0.0      except AttributeError:
   673         1         31.0     31.0      0.0          pass
   675         1        108.0    108.0      0.0      if ccd.mask is not None:
   676                                                   size_of_an_img += ccd.mask.nbytes
   681         1         23.0     23.0      0.0      try:
   682         1         58.0     58.0      0.0          size_of_an_img += ccd.flags.nbytes
   683         1         28.0     28.0      0.0      except AttributeError:
   684         1         24.0     24.0      0.0          pass
   685                                           
   686         1         42.0     42.0      0.0      no_of_img = len(img_list)
   687                                           
   689         1        130.0    130.0      0.0      no_chunks = int((size_of_an_img * no_of_img) / mem_limit) + 1
   690         1         29.0     29.0      0.0      if no_chunks > 1:
   691                                                   log.info('splitting each image into {0} chunks to limit memory usage '
   692                                                            'to {1} bytes.'.format(no_chunks, mem_limit))
   693         1         52.0     52.0      0.0      xs, ys = ccd.data.shape
   695         1         67.0     67.0      0.0      xstep = max(1, int(xs/no_chunks))
   698         1         45.0     45.0      0.0      ystep = max(1, int(ys / (1 + no_chunks - int(xs / xstep))))
   699                                           
   702         1         29.0     29.0      0.0      to_set_in_combiner = {}
   703         1         26.0     26.0      0.0      to_call_in_combiner = {}
   704                                           
   707         1         25.0     25.0      0.0      if weights is not None:
   708                                                   to_set_in_combiner['weights'] = weights
   709                                           
   710         1         22.0     22.0      0.0      if scale is not None:
   713                                                   if callable(scale):
   714                                                       scalevalues = []
   715                                                       for image in img_list:
   716                                                           if isinstance(image, CCDData):
   717                                                               imgccd = image
   718                                                           else:
   719                                                               imgccd = CCDData.read(image, **ccdkwargs)
   720                                           
   721                                                           scalevalues.append(scale(imgccd.data))
   722                                           
   723                                                       to_set_in_combiner['scaling'] = np.array(scalevalues)
   724                                                   else:
   725                                                       to_set_in_combiner['scaling'] = scale
   726                                           
   727         1         24.0     24.0      0.0      if clip_extrema:
   728                                                   to_call_in_combiner['clip_extrema'] = {'nlow': nlow,
   729                                                                                          'nhigh': nhigh}
   730                                           
   731         1         23.0     23.0      0.0      if minmax_clip:
   732                                                   to_call_in_combiner['minmax_clipping'] = {'min_clip': minmax_clip_min,
   733                                                                                             'max_clip': minmax_clip_max}
   734                                           
   735         1         22.0     22.0      0.0      if sigma_clip:
   736                                                   to_call_in_combiner['sigma_clipping'] = {
   737         1         23.0     23.0      0.0              'low_thresh': sigma_clip_low_thresh,
   738         1         23.0     23.0      0.0              'high_thresh': sigma_clip_high_thresh,
   739         1         28.0     28.0      0.0              'func': sigma_clip_func,
   740         1         47.0     47.0      0.0              'dev_func': sigma_clip_dev_func}
   741                                           
   744         2         85.0     42.5      0.0      for x in range(0, xs, xstep):
   745         2         57.0     28.5      0.0          for y in range(0, ys, ystep):
   746         1         49.0     49.0      0.0              xend, yend = min(xs, x + xstep), min(ys, y + ystep)
   747         1         33.0     33.0      0.0              ccd_list = []
   748        29        431.0     14.9      0.0              for image in img_list:
   749        28       1868.0     66.7      0.0                  if isinstance(image, CCDData):
   750                                                               imgccd = image
   751                                                           else:
   752        28    1546396.0  55228.4      0.0                      imgccd = CCDData.read(image, **ccdkwargs)
   753                                           
   755        28      17415.0    622.0      0.0                  ccd_list.append(imgccd[x:xend, y:yend])
   756                                           
   758         1  432276392.0 432276392.0     12.8              tile_combiner = Combiner(ccd_list, dtype=dtype)
   760         1         36.0     36.0      0.0              for to_set in to_set_in_combiner:
   761                                                           setattr(tile_combiner, to_set, to_set_in_combiner[to_set])
   762         2        211.0    105.5      0.0              for to_call in to_call_in_combiner:
   763         1 2536309643.0 2536309643.0     75.4                  getattr(tile_combiner, to_call)(**to_call_in_combiner[to_call])
   766         1         19.0     19.0      0.0              combine_kwds = {}
   767         1        466.0    466.0      0.0              if combine_uncertainty_function is not None:
   768                                                           combine_kwds['uncertainty_func'] = combine_uncertainty_function
   769                                           
   770         1  368408377.0 368408377.0     10.9              comb_tile = getattr(tile_combiner, combine_function)(**combine_kwds)
   773         1    7877530.0 7877530.0      0.2              ccd.data[x:xend, y:yend] = comb_tile.data
   774         1        433.0    433.0      0.0              if ccd.mask is not None:
   775                                                           ccd.mask[x:xend, y:yend] = comb_tile.mask
   776         1         38.0     38.0      0.0              if ccd.uncertainty is not None:
   777                                                           ccd.uncertainty.array[x:xend, y:yend] = comb_tile.uncertainty.array
   780         1         12.0     12.0      0.0      if output_file is not None:
   781                                                   ccd.write(output_file)
   782                                           
   783         1         10.0     10.0      0.0      return ccd

New

   562                                           def combine(img_list, output_file=None,
   563                                                       method='average', weights=None, scale=None, mem_limit=16e9,
   564                                                       clip_extrema=False, nlow=1, nhigh=1,
   565                                                       minmax_clip=False, minmax_clip_min=None, minmax_clip_max=None,
   566                                                       sigma_clip=False,
   567                                                       sigma_clip_low_thresh=3, sigma_clip_high_thresh=3,
   568                                                       sigma_clip_func=ma.mean, sigma_clip_dev_func=ma.std,
   569                                                       dtype=None, combine_uncertainty_function=None, **ccdkwargs):
   673         1         75.0     75.0      0.0      if not isinstance(img_list, list):
   676         1         37.0     37.0      0.0          if isinstance(img_list, np.ndarray):
   677         1        251.0    251.0      0.0              img_list = img_list.tolist()
   678                                                   elif isinstance(img_list, str) and (',' in img_list):
   679                                                       img_list = img_list.split(',')
   680                                                   else:
   681                                                       raise ValueError(
   682                                                           "unrecognised input for list of images to combine.")
   685         1         25.0     25.0      0.0      if method == 'average':
   686         1         22.0     22.0      0.0          combine_function = 'average_combine'
   687                                               elif method == 'median':
   688                                                   combine_function = 'median_combine'
   689                                               elif method == 'sum':
   690                                                   combine_function = 'sum_combine'
   691                                               else:
   692                                                   raise ValueError("unrecognised combine method : {0}.".format(method))
   695         1        962.0    962.0      0.0      if isinstance(img_list[0], CCDData):
   696                                                   ccd = img_list[0].copy()
   697                                               else:
   699         1     471571.0 471571.0      0.0          ccd = CCDData.read(img_list[0], **ccdkwargs)
   704         1         71.0     71.0      0.0      if ccd.uncertainty is None and combine_uncertainty_function is not None:
   705                                                   ccd.uncertainty = StdDevUncertainty(np.zeros(ccd.data.shape))
   706                                           
   707         1         22.0     22.0      0.0      if dtype is None:
   708         1         41.0     41.0      0.0          dtype = np.float64
   713         1         75.0     75.0      0.0      if ccd.data.dtype != dtype:
   714         1   17600493.0 17600493.0      1.2          ccd.data = ccd.data.astype(dtype)
   715                                           
   716         1        101.0    101.0      0.0      size_of_an_img = ccd.data.nbytes
   717         1         15.0     15.0      0.0      try:
   718         1         96.0     96.0      0.0          size_of_an_img += ccd.uncertainty.array.nbytes
   721         1         31.0     31.0      0.0      except AttributeError:
   722         1         17.0     17.0      0.0          pass
   724         1         73.0     73.0      0.0      if ccd.mask is not None:
   725                                                   size_of_an_img += ccd.mask.nbytes
   730         1         13.0     13.0      0.0      try:
   731         1         46.0     46.0      0.0          size_of_an_img += ccd.flags.nbytes
   732         1         15.0     15.0      0.0      except AttributeError:
   733         1         14.0     14.0      0.0          pass
   734                                           
   735         1         25.0     25.0      0.0      no_of_img = len(img_list)
   738         1         84.0     84.0      0.0      no_chunks = int((size_of_an_img * no_of_img) / mem_limit) + 1
   739         1         20.0     20.0      0.0      if no_chunks > 1:
   740                                                   log.info('splitting each image into {0} chunks to limit memory usage '
   741                                                            'to {1} bytes.'.format(no_chunks, mem_limit))
   742         1         32.0     32.0      0.0      xs, ys = ccd.data.shape
   744         1         42.0     42.0      0.0      xstep = max(1, int(xs/no_chunks))
   747         1         26.0     26.0      0.0      ystep = max(1, int(ys / (1 + no_chunks - int(xs / xstep))))
   751         1         16.0     16.0      0.0      to_set_in_combiner = {}
   752         1         14.0     14.0      0.0      to_call_in_combiner = {}
   756         1         14.0     14.0      0.0      if weights is not None:
   757                                                   to_set_in_combiner['weights'] = weights
   758                                           
   759         1         13.0     13.0      0.0      if scale is not None:
   762                                                   if callable(scale):
   763                                                       scalevalues = []
   764                                                       for image in img_list:
   765                                                           if isinstance(image, CCDData):
   766                                                               imgccd = image
   767                                                           else:
   768                                                               imgccd = CCDData.read(image, **ccdkwargs)
   769                                           
   770                                                           scalevalues.append(scale(imgccd.data))
   771                                           
   772                                                       to_set_in_combiner['scaling'] = np.array(scalevalues)
   773                                                   else:
   774                                                       to_set_in_combiner['scaling'] = scale
   775                                           
   776         1         13.0     13.0      0.0      if clip_extrema:
   777                                                   to_call_in_combiner['clip_extrema'] = {'nlow': nlow,
   778                                                                                          'nhigh': nhigh}
   780         1         13.0     13.0      0.0      if minmax_clip:
   781                                                   to_call_in_combiner['minmax_clipping'] = {'min_clip': minmax_clip_min,
   782                                                                                             'max_clip': minmax_clip_max}
   783                                           
   784         1         12.0     12.0      0.0      if sigma_clip:
   785                                                   to_call_in_combiner['sigma_clipping'] = {
   786         1         12.0     12.0      0.0              'low_thresh': sigma_clip_low_thresh,
   787         1         13.0     13.0      0.0              'high_thresh': sigma_clip_high_thresh,
   788         1         15.0     15.0      0.0              'func': sigma_clip_func,
   789         1         29.0     29.0      0.0              'dev_func': sigma_clip_dev_func}
   793         2         71.0     35.5      0.0      for x in range(0, xs, xstep):
   794         2         37.0     18.5      0.0          for y in range(0, ys, ystep):
   795         1         26.0     26.0      0.0              xend, yend = min(xs, x + xstep), min(ys, y + ystep)
   796         1         19.0     19.0      0.0              ccd_list = []
   797        29        502.0     17.3      0.0              for image in img_list:
   798        28       3018.0    107.8      0.0                  if isinstance(image, CCDData):
   799                                                               imgccd = image
   800                                                           else:
   801        28    1720625.0  61450.9      0.1                      imgccd = CCDData.read(image, **ccdkwargs)
   804        28      21503.0    768.0      0.0                  ccd_list.append(imgccd[x:xend, y:yend])
   807         1  426691955.0 426691955.0     29.2              tile_combiner = Combiner(ccd_list, dtype=dtype)
   809         1         38.0     38.0      0.0              for to_set in to_set_in_combiner:
   810                                                           setattr(tile_combiner, to_set, to_set_in_combiner[to_set])
   811         2         54.0     27.0      0.0              for to_call in to_call_in_combiner:
   812         1  303152937.0 303152937.0     20.7                  getattr(tile_combiner, to_call)(**to_call_in_combiner[to_call])
   815         1         17.0     17.0      0.0              combine_kwds = {}
   816         1         10.0     10.0      0.0              if combine_uncertainty_function is not None:
   817                                                           combine_kwds['uncertainty_func'] = combine_uncertainty_function
   818                                           
   819         1  708999703.0 708999703.0     48.5              comb_tile = getattr(tile_combiner, combine_function)(**combine_kwds)
   822         1    4252261.0 4252261.0      0.3              ccd.data[x:xend, y:yend] = comb_tile.data
   823         1        268.0    268.0      0.0              if ccd.mask is not None:
   824                                                           ccd.mask[x:xend, y:yend] = comb_tile.mask
   825         1         56.0     56.0      0.0              if ccd.uncertainty is not None:
   826                                                           ccd.uncertainty.array[x:xend, y:yend] = comb_tile.uncertainty.array
   829         1         12.0     12.0      0.0      if output_file is not None:
   830                                                   ccd.write(output_file)
   831                                           
   832         1          9.0      9.0      0.0      return ccd

Especially interesting here: One could also use that function (slightly modified) to also speed up the combine step. Because it already calculates the median/mad/valid pixel... But I haven't tried that.

@MSeifert04
Copy link
Contributor Author

@mwcraig @crawfordsm Independent of this PR, what's your general opinion regarding numba? Should we avoid it and rather use Cython/C extensions or would numba be okay (as optional dependency)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants