Combining raw & proc: specify which sources should be proc? #292

takluyver · 2022-03-16T15:08:28Z

I got a request from FXE: the JUNGFRAU calibration pipeline had failed (because of issues with GPFS, outside our control), and so when they went to access data with open_run(..., data='all'), the JUNGFRAU data it offered them was from raw instead of from proc. They specifically wanted the corrected data, so it would have been clearer to say that no JF data was found.

The immediate issue is that EXtra-data doesn't know what sources are meant to be in proc, so anything that isn't in proc will be exposed from raw. One obvious way round this is to let users specify that sources are expected to be in proc - e.g. proc_only='*/DET/JNGFR*' . But this is a clumsy workaround - it mostly works without this, so people won't bother setting it before they hit the problem, and it's easy to exclude stuff you might want (e.g. a .../DET/JNGFRCTRL source which is not meant to go in proc).

In the longer term, I want correction to write data with a new source name (e.g. .../CORR/JNGFR01), so you can clearly refer to raw/corrected data as separate things. But this is going to be a big change in offline correction. We might want to offer something in EXtra-data before that.

If we decide what the source names for corrected data will look like, we might try to 'rename' them in EXtra-data before the change in the files. This may get fiddly and confusing, though - so far, EXtra-data has always reflected what's in the files.
We could add run.raw and run.proc (or .corr?) attributes, which would point to separate DataCollection objects for the raw and proc data, so you could use run.proc['.../DET/JNGFR01'] to ensure you were getting the proc data. It would be pretty simple to do this in a crude way, but if you wanted e.g. run.select() to affect the separate raw & proc data collections, it would get much more involved.

We might also add a new high-level function like open_run that defaults to opening raw & proc together - it's hard to change open_run without breaking existing code.

The text was updated successfully, but these errors were encountered:

takluyver mentioned this issue May 10, 2022

WIP: Layers #310

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Combining raw & proc: specify which sources should be proc? #292

Combining raw & proc: specify which sources should be proc? #292

takluyver commented Mar 16, 2022

Combining raw & proc: specify which sources should be proc? #292

Combining raw & proc: specify which sources should be proc? #292

Comments

takluyver commented Mar 16, 2022