Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADCP refactorings #724

Open
7 tasks
ocehugo opened this issue Feb 16, 2021 · 5 comments
Open
7 tasks

ADCP refactorings #724

ocehugo opened this issue Feb 16, 2021 · 5 comments
Labels
Type:code_repo_consistency code-repo wise quirks Type:enhancement General enhancements Unit:Instrument Reader Instrument parsers and related Unit:Processing preprocessing and transformations

Comments

@ocehugo
Copy link
Contributor

ocehugo commented Feb 16, 2021

This issue is to track down future refactorings required at most ADCP parsers. It is likely the list below will change as I inspect other parsers in more detail.

  • Create a Voltage PreProcessing to fill the missing ping NaNs. The parser should not be the place to fill data in - only to read raw data and convert them properly.
  • BadOrientation QC - All parsers contain pretty much the same boilerplate code to assign NaN to bad orientations related to the adcp configuration. If there is a standard place to (meta struct) that defines a bad orientation, then we may export this task as PP.
  • As above, provide a more standardized adcp configuration meta structure.
  • As above, provide a generic NaN filling PP so we can arbitrary fill arrays with NaN based on multiple values with multiple combinations.
  • Refactor wave reading data in workhorseParse.
  • Refactor most dimensions/variable definitions with the new +IMOS package.
  • Export all fixedLeader attributes as global attributes, with proper name and convention.

...

@ocehugo ocehugo added Type:enhancement General enhancements Unit:Instrument Reader Instrument parsers and related Type:code_repo_consistency code-repo wise quirks Unit:Processing preprocessing and transformations labels Feb 16, 2021
@petejan
Copy link
Contributor

petejan commented Mar 9, 2021

@ocehugo This is the output from my python parser to extract all the data fields from a RAW RDI ADCP file,
`

netcdf \16413000.000 {
dimensions:
	TIME = UNLIMITED ; // (726163 currently)
	CELL = UNLIMITED ; // (28 currently)
variables:
	double TIME(TIME) ;
		TIME:long_name = "time" ;
		TIME:units = "days since 1950-01-01 00:00:00 UTC" ;
		TIME:calendar = "gregorian" ;
		TIME:axis = "T" ;
	float INST_TEMP(TIME) ;
		INST_TEMP:units = "degrees_Celsius" ;
		INST_TEMP:instrument_uncertainty = 0.5 ;
	float PRES(TIME) ;
		PRES:units = "dbar" ;
		PRES:applied_offset = -10.1353 ;
	float PRES_VAR(TIME) ;
		PRES_VAR:units = "dbar" ;
	float HEADING_MAG(TIME) ;
		HEADING_MAG:units = "degrees" ;
	float PITCH(TIME) ;
		PITCH:units = "degrees" ;
	float ROLL(TIME) ;
		ROLL:units = "degrees" ;
	float TX_VOLT(TIME) ;
		TX_VOLT:units = "V" ;
	float TX_CURRENT(TIME) ;
		TX_CURRENT:units = "A" ;
	float SOUND_SPEED(TIME) ;
		SOUND_SPEED:units = "m/s" ;
	float BEAM1_VEL(TIME, CELL) ;
		BEAM1_VEL:units = "m/s" ;
		BEAM1_VEL:valid_max = 20.f ;
		BEAM1_VEL:valid_min = -20.f ;
	float BEAM2_VEL(TIME, CELL) ;
		BEAM2_VEL:units = "m/s" ;
		BEAM2_VEL:valid_max = 20.f ;
		BEAM2_VEL:valid_min = -20.f ;
	float BEAM3_VEL(TIME, CELL) ;
		BEAM3_VEL:units = "m/s" ;
		BEAM3_VEL:valid_max = 20.f ;
		BEAM3_VEL:valid_min = -20.f ;
	float BEAM4_VEL(TIME, CELL) ;
		BEAM4_VEL:units = "m/s" ;
		BEAM4_VEL:valid_max = 20.f ;
		BEAM4_VEL:valid_min = -20.f ;
	ubyte CORR_MAG1(TIME, CELL) ;
	ubyte CORR_MAG2(TIME, CELL) ;
	ubyte CORR_MAG3(TIME, CELL) ;
	ubyte CORR_MAG4(TIME, CELL) ;
	float ECHO_INT1(TIME, CELL) ;
	float ECHO_INT2(TIME, CELL) ;
	float ECHO_INT3(TIME, CELL) ;
	float ECHO_INT4(TIME, CELL) ;
	ubyte PCT_GOOD1(TIME, CELL) ;
	ubyte PCT_GOOD2(TIME, CELL) ;
	ubyte PCT_GOOD3(TIME, CELL) ;
	ubyte PCT_GOOD4(TIME, CELL) ;
	ubyte STATUS1(TIME, CELL) ;
	ubyte STATUS2(TIME, CELL) ;
	ubyte STATUS3(TIME, CELL) ;
	ubyte STATUS4(TIME, CELL) ;

// global attributes:
		:instrument = "RDI ; WorkHorse 150kHz" ;
		:instrument_model = "WorkHorse 150kHz" ;
		:instrument_serial_number = "16413" ;
		:frequency = 153.6 ;
		:data_coordinates = "Beam" ;
		:number_beams = 4 ;
		:number_cells = 28 ;
		:cell_size_cm = 800 ;
		:blank_after_tx_cm = 352 ;
		:dist_centre_bin1_cm = 1224 ;
		:xmit_pulse_length_cm = 849 ;
		:time_between_pings_sec = 72. ;
		:instrument_setup_fixed_cpuVER = 50 ;
		:instrument_setup_fixed_cpuREV = 41 ;
		:instrument_setup_fixed_sysConfig = 16841 ;
		:instrument_setup_fixed_read = 0 ;
		:instrument_setup_fixed_lag_len = 13 ;
		:instrument_setup_fixed_num_beam = 4 ;
		:instrument_setup_fixed_num_cells = 28 ;
		:instrument_setup_fixed_pings_per_ensemble = 1 ;
		:instrument_setup_fixed_cell_length = 800 ;
		:instrument_setup_fixed_blank_after_tx = 352 ;
		:instrument_setup_fixed_profile_mode = 1 ;
		:instrument_setup_fixed_low_corr = 0 ;
		:instrument_setup_fixed_code_reps = 9 ;
		:instrument_setup_fixed_gd_min = 0 ;
		:instrument_setup_fixed_error_vel_max = 0 ;
		:instrument_setup_fixed_tpp_min = 1 ;
		:instrument_setup_fixed_tpp_sec = 12 ;
		:instrument_setup_fixed_tpp_hun_sec = 0 ;
		:instrument_setup_fixed_coord_trans = 7 ;
		:instrument_setup_fixed_head_alignment = 0 ;
		:instrument_setup_fixed_head_bias = 0 ;
		:instrument_setup_fixed_sensor_source = 125 ;
		:instrument_setup_fixed_sensor_available = 61 ;
		:instrument_setup_fixed_bin1_dist = 1224 ;
		:instrument_setup_fixed_xmit_pulse_len = 849 ;
		:instrument_setup_fixed_ref_layer = 1281 ;
		:instrument_setup_fixed_false_target_thresh = 255 ;
		:instrument_setup_fixed_spare = 0 ;
		:instrument_setup_fixed_tx_lag_dist = 96 ;
		:instrument_setup_fixed_system_bandwidth = 1 ;
		:instrument_setup_fixed_system_power = 255 ;
		:instrument_setup_fixed_spare2 = 0 ;
		:instrument_setup_fixed_inst_serial = 16413 ;
		:instrument_setup_fixed_beam_angle = 20 ;
		:time_coverage_start = "2018-10-15T00:00:00Z" ;
		:time_coverage_end = "2020-06-11T03:14:24Z" ;
		:date_created = "2020-06-14T09:08:10Z" ;
		:history = "2020-06-14 created from file 16413000.000" ;
}

How many of these fields does one need, maybe some, maybe not, but why not extract them all if the FV00 file is supposed to represent the RAW input file.

One thing that was missing from the current parse was the ADCP frequency, model, firmware version, and parameters around the cell binning into ensemble information.

The ugly python parsing code is here rdi2netcdf.py

@ocehugo
Copy link
Contributor Author

ocehugo commented Mar 10, 2021

@petejan,

FYI, I just did a lot of refactoring, improved the performance/memory usage and extended a bit the information we read from RDIs - mostly the information required for coord transformation: see #725 and subsequent PRs that are happening at the moment.

may you please send me a comprehensive list of attributes we are not importing? I think this is good timing to solve this issue since I'm quite fresh on the details of our binary parser.

I didn't code anything to export new information to files yet, so I would appreciate a list of the missing raw attributes so I can quickly iterate through them. I'm quite sure that we are almost there in terms of parsing all the RDI information. My idea is to name them as close as possible to the instrument manuals but that indeed requires auxiliary documentation explaining each - which I imagine you may be the right person to do it!

@petejan
Copy link
Contributor

petejan commented Mar 10, 2021

@ocehugo

What attributes does the parser extract and put into the netCDF file? the ones around ensemble information used to be missing, not sure if this has been fixed yet.

What is the imos-toolbox approach to recording firmware versions of instruments?

This is the manual explaining what each field is, page 127 has the fixed header explanation,

WorkHorse Commands and Output Data Format_Jun18.pdf

@ocehugo
Copy link
Contributor Author

ocehugo commented Mar 10, 2021

@petejan,

What attributes does the parser extract and put into the netCDF file? the ones around ensemble information used to be missing, not sure if this has been fixed yet.

If you could drill down and come with a list of what we don't import from the binary parser that your python code does, it would be great. Again, we load a lot of parameters, but we don't export all of them. What goes in the netcdf is easy to say - we export everything described in the imos/cf conventions. Anything beyond that is basically not exported.

hence, as of today, most of the global attributes equivalent that you got above are not exported, but a lot of them are imported, some of them in more detail. The reason why this is the case is that we don't have a convention for exporting them. Hence, one avenue is that you could propose a convention for say, FV00 RDI adcp metadata exports (attribute names, attribute values, their type, their description if any, etc).

For example, your instrument_setup_fixed_sensor_source, is actually a hashtable/dictionary of several configurations. Every bit in that number means different things. Hence, we could create a specific attribute name for each bit. If we don't go on to describe each bit, we at least need to provide a context for interpretation of that number (which could be a wiki page for example).

What is the imos-toolbox approach to recording firmware versions of instruments?

ATM, there is no storage of firmware versions in the netcdf. For RDIs we load it but it's only stored in the meta struct and used for some dated time logic.

May you please create an actionable/specific issue with your thoughts!?

@petejan
Copy link
Contributor

petejan commented Mar 11, 2021

@ocehugo so the action/specific issue is that all the data from the RAW file should in replicated in the FV00 file, so every value from the fixed header should be represented in a netCDF attribute, the above list is a good place to start.

One could extent this to if the raw file was in beam coordinates, then the FV00 file should be in beam coordinates, then converted to distance/depth when put into the FV01 file.

The imos/cf conventions don't really say anything about the specific attributes apart from use what is needed to describe the data.

As for binary fields, I have converted these into a string value and used a character attribute in the past, like the data_coordinates attribute above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type:code_repo_consistency code-repo wise quirks Type:enhancement General enhancements Unit:Instrument Reader Instrument parsers and related Unit:Processing preprocessing and transformations
Projects
None yet
Development

No branches or pull requests

2 participants