Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combining raw & proc: specify which sources should be proc? #292

Open
takluyver opened this issue Mar 16, 2022 · 0 comments
Open

Combining raw & proc: specify which sources should be proc? #292

takluyver opened this issue Mar 16, 2022 · 0 comments

Comments

@takluyver
Copy link
Member

I got a request from FXE: the JUNGFRAU calibration pipeline had failed (because of issues with GPFS, outside our control), and so when they went to access data with open_run(..., data='all'), the JUNGFRAU data it offered them was from raw instead of from proc. They specifically wanted the corrected data, so it would have been clearer to say that no JF data was found.

The immediate issue is that EXtra-data doesn't know what sources are meant to be in proc, so anything that isn't in proc will be exposed from raw. One obvious way round this is to let users specify that sources are expected to be in proc - e.g. proc_only='*/DET/JNGFR*' . But this is a clumsy workaround - it mostly works without this, so people won't bother setting it before they hit the problem, and it's easy to exclude stuff you might want (e.g. a .../DET/JNGFRCTRL source which is not meant to go in proc).

In the longer term, I want correction to write data with a new source name (e.g. .../CORR/JNGFR01), so you can clearly refer to raw/corrected data as separate things. But this is going to be a big change in offline correction. We might want to offer something in EXtra-data before that.

  • If we decide what the source names for corrected data will look like, we might try to 'rename' them in EXtra-data before the change in the files. This may get fiddly and confusing, though - so far, EXtra-data has always reflected what's in the files.
  • We could add run.raw and run.proc (or .corr?) attributes, which would point to separate DataCollection objects for the raw and proc data, so you could use run.proc['.../DET/JNGFR01'] to ensure you were getting the proc data. It would be pretty simple to do this in a crude way, but if you wanted e.g. run.select() to affect the separate raw & proc data collections, it would get much more involved.

We might also add a new high-level function like open_run that defaults to opening raw & proc together - it's hard to change open_run without breaking existing code.

@takluyver takluyver mentioned this issue May 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant