Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A workaround for unfixable header keys in ImageFileCollection #743

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

gully
Copy link

@gully gully commented Aug 11, 2020

Unfixable header keys are currently populating the summary_dict with their genuine values and also missing value place holders, resulting in lists of values that are too long, breaking the ImageFileCollection for data with these malformed headers.

I tried to track down the root cause, and could pinpoint it to the _dict_from_fits_header() method called within _fits_summary. My hunch is there's something special about these unfixable keys that gets them double counted as missing_in_this_file. A key clue is that the resulting list for offending keys is always 2*N-1 long, where N is the number of files.

This workaround is really just that-- a stopgap measure to get this to work on the types of files I have. I suspect the right solution is to dig a bit deeper on _dict_from_fits_header, so I didn't go through with a full check of tests, assuming this workaround won't actually be in the final product.

Here are urls to two example files that reproduce the problem:

https://koa.ipac.caltech.edu/cgi-bin/getKOA/nph-getKOA?instrument=NS&filehand=/koadata11/NIRSPEC/20010117/lev0/spec/NS.20010117.25279.fits

https://koa.ipac.caltech.edu/cgi-bin/getKOA/nph-getKOA?instrument=NS&filehand=/koadata11/NIRSPEC/20010117/lev0/spec/NS.20010117.25430.fits

The malformed keys are GAIN.SPE and FREQ.SPE, though they do not appear in the error message, truncated below:

    627                 pass#summary_dict[key]= [None]*len(summary_dict['file'])
    628 
--> 629         summary_table = Table(summary_dict, masked=True)
    630 
    631         for column in summary_table.colnames:


ValueError: Inconsistent data column lengths: {4, 7}

Please have a look at the following list and replace the "[ ]" with a "[x]" if
the answer to this question is yes.

  • For new contributors: Did you add yourself to the "Authors.rst" file?

For documentation changes:

  • For documentation changes: Does your commit message include a "[skip ci]"?
    Note that it should not if you changed any examples!

For bugfixes:

  • Did you add an entry to the "Changes.rst" file?
  • Did you add a regression test?
  • Does the commit message include a "Fixes #issue_number" (replace "issue_number").
  • Does this PR add, rename, move or remove any existing functions or parameters?

For new functionality:

  • Did you add an entry to the "Changes.rst" file?
  • Did you include a meaningful docstring with Parameters, Returns and Examples?
  • Does the commit message include a "Fixes #issue_number" (replace "issue_number").
  • Did you include tests for the new functionality?
  • Does this PR add, rename, move or remove any existing functions or parameters?

Please note that the last point is not a requirement. It is meant as a check if
the pull request potentially breaks backwards-compatibility.


…g with missing content and their genuine values, breaking the ImageFileCollection for data with these malformed headers.
@gully
Copy link
Author

gully commented Nov 17, 2020

I am seeing that this PR breaks 1 test:
py.test -vsk test_filenames_are_set_properly
Currently investigating...

@mwcraig
Copy link
Member

mwcraig commented Nov 17, 2020

Thanks -- can you please ping when you have fixed that? Tests are currently not running on travis because they've dropped support for open source projects.

Also, don't be shy about pinging me multiple times 😬

@gully
Copy link
Author

gully commented Nov 17, 2020

I fixed the failed test which in hindsight was an edge case of a None header. Also added a warning logger.
I'm a bit muddled on this PR. On one hand it's a quick fix that makes it possible to move forward on any data that has malformed FITS keywords, which may be relatively common and hindering other folks from getting started. On the other hand it just treats the symptom not the sickness, and it loses available information (currently just fills the keyword with Nones rather than attempt to sort out the genuine values). In other words the image collection class would come with the proviso: "Currently does not support malformed header keywords".

Probably best practices would be to add a test that has such a malformed keyword and then make sure the correct key:value pairs are returned. Or if remote data can be tolerated in the test suite, then the two files I linked above would reveal the problem.

@mwcraig
Copy link
Member

mwcraig commented Nov 29, 2020

I'm going to close/reopen to see if that picks up the new GitHub Actions CI. More substantive review tomorrow, I hope...

@mwcraig mwcraig closed this Nov 29, 2020
@mwcraig mwcraig reopened this Nov 29, 2020
@mwcraig mwcraig added this to the 2.1.1 milestone Nov 29, 2020
ccdproc/log_meta.py Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Nov 29, 2020

Codecov Report

Merging #743 (53d4877) into main (5af6ee5) will decrease coverage by 2.26%.
The diff coverage is 71.42%.

❗ Current head 53d4877 differs from pull request most recent head d9d7128. Consider uploading reports for the commit d9d7128 to get more accurate results

@@            Coverage Diff             @@
##             main     #743      +/-   ##
==========================================
- Coverage   97.55%   95.30%   -2.26%     
==========================================
  Files           9       30      +21     
  Lines        1392     3895    +2503     
==========================================
+ Hits         1358     3712    +2354     
- Misses         34      183     +149     
Files Changed Coverage Δ
ccdproc/image_collection.py 98.40% <60.00%> (-0.80%) ⬇️
ccdproc/log_meta.py 96.25% <100.00%> (ø)

... and 26 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@mwcraig mwcraig modified the milestones: 2.1.1, 2.2.0 Mar 15, 2021
Base automatically changed from master to main March 16, 2021 13:40
@mwcraig mwcraig modified the milestones: 2.2.0, 2.2.1 May 25, 2021
@mwcraig mwcraig modified the milestones: 2.2.1, 2.3 Nov 21, 2021
@mwcraig mwcraig modified the milestones: 2.3, 3.0 Jan 19, 2022
@gully gully requested a review from mwcraig August 9, 2023 17:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants