Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrating mpl reader from ACT to xradar #159

Open
zssherman opened this issue Mar 8, 2024 · 3 comments
Open

Migrating mpl reader from ACT to xradar #159

zssherman opened this issue Mar 8, 2024 · 3 comments
Assignees
Labels
enhancement New feature or request

Comments

@zssherman
Copy link

zssherman commented Mar 8, 2024

After the ACT dev call, we discussed on how moving the MPL reader to xradar would be more fitting:
ARM-DOE/ACT#806

I can give this a shot. I will just need to learn the backends of Xarray first.

@zssherman zssherman added the enhancement New feature or request label Mar 8, 2024
@zssherman zssherman self-assigned this Mar 8, 2024
@kmuehlbauer
Copy link
Collaborator

@zssherman Great initiative! It looks like MPL is also some binary format with neat header structures.

The sigmet/iris reader heavily uses these kind of structured decoding. This is also what #158 is trying to achieve for nexrad level2.

Maybe we can discuss on next open radar meeting which steps are necessary to get a prototype reader ready.

@zssherman
Copy link
Author

@kmuehlbauer That sounds good to me!

@kmuehlbauer
Copy link
Collaborator

@zssherman Since there wasn't much time yesterday I'll follow up with some ideas/pointers here.

I'm not really sure how to handle the sidecar files, but we might just search/recognize them and directly read/decode as binary blobs (when without header).

For the main file the idea would be to use np.memmap for easy reading large data. See

def __init__(self, filename, mode="r", loaddata=False):
"""initalize the object."""
self._fp = None
self._filename = filename
# read in the volume header and compression_record
if hasattr(filename, "read"):
self._fh = filename
else:
self._fp = open(filename, "rb")
self._fh = np.memmap(self._fp, mode=mode)
self._filepos = 0
self._rawdata = False
self._loaddata = loaddata
self._bz2_indices = None
self.volume_header = self.get_header(VOLUME_HEADER)

Then the header could be directly extracted using the machinery from the iris/sigmet reader:

def get_header(self, header):
len = struct.calcsize(_get_fmt_string(header))
head = _unpack_dictionary(self.read_from_file(len), header, self._rawdata)
return head

For this the header structure needs some special layout, where decoding information can be attached into the OrderedDict:

VOLUME_HEADER = OrderedDict(
[
("tape", {"fmt": "9s"}),
("extension", {"fmt": "3s"}),
("date", UINT4),
("time", UINT4),
("icao", {"fmt": "4s"}),
]
)

The actual data might be read with dedicated functions (eg names like get_data or similar), which uses header information about file offset, size and dtype. See the following for a (not so nice example):

def get_data(self, sweep_number, moment=None):
"""Load sweep data from file."""
sweep = self.data[sweep_number]
start = sweep["record_number"]
stop = sweep["record_end"]
intermediate_records = [
rec["record_number"] for rec in sweep["intermediate_records"]
]
filepos = sweep["filepos"]
moments = sweep["sweep_data"]
if moment is None:
moment = moments
elif isinstance(moment, str):
moment = [moment]
for name in moment:
if self.is_compressed:
self.init_record(start)
else:
self.init_record_by_filepos(start, filepos)
ngates = moments[name]["ngates"]
word_size = moments[name]["word_size"]
data_offset = moments[name]["data_offset"]
ws = {8: 1, 16: 2}
width = ws[word_size]
data = []
self.rh.pos += data_offset
data.append(self._rh.read(ngates, width=width).view(f"uint{word_size}"))
while self.init_next_record() and self.record_number <= stop:
if self.record_number in intermediate_records:
continue
self.rh.pos += data_offset
data.append(self._rh.read(ngates, width=width).view(f"uint{word_size}"))
moments[name].update(data=data)

This get_data function is used in the ArrayWrapper to retrieve the data in a lazy manner, whereas header data is used to provide the information to create the DataArrays/Dataset.

class NexradLevel2ArrayWrapper(BackendArray):

This is then used in the XarrayStore to provide Variables/Coordinates

def open_store_variable(self, name, var):

def open_store_coordinates(self):

I hope this does at least make some sense and you could give it a try.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Development

No branches or pull requests

2 participants