OM format decoding problem #20

L1nzar · 2024-09-25T01:46:18Z

I'm trying to make my own decoder for OM files, but I get a discrepancy between the estimated number of chunks and what's in the file.
For example, I process the MSM model, temperature (chunk_4209.om file)
header (56 bytes like it's said in documentation):

OM(2): OM
Version(1): 2
Compression(1): 0
ScaleFactor(4): 20
Dim0(8): 242905
Dim1(8): 114
Chunk0(8): 26
Chunk1(8): 114
???(8): 790
???(8): 1469
From this I get that there are 9343 chunks in the file ((114/114) * 242905 / 26 = 9,342.5).

I start reading the array of offsets:
Pos:56 ChunkN:0 Offset:2177
Pos:64 ChunkN:1 Offset:2858
Pos:72 ChunkN:2 Offset:3484
Pos:80 ChunkN:3 Offset:4139

It's alredy no clear why the first offset is so large. As if there are three in this chunk at once.

At the last offsets I get incorrect values, as if the data already begins where the offset values should be:
Pos:74776 ChunkN:9340 Offset:6206997
Pos:74784 ChunkN:9341 Offset:1407379585452674
Pos:74792 ChunkN:9342 Offset:-4611685990509789180

I have a guess that the offsets are actually written from the 40th byte.

Is there some problem with documentation?

patrick-zippenfenig · 2024-09-25T07:11:14Z

Hi, for which programming language are you trying to implement a reader? We are actively working on a direct implementation for various programming languages. The file format will also be revised to support more than 2 dimensions, streaming write support, cloud native reads and further improve compression ratio. Here is the branch to implement new writer/reader. It is not yet functional.

I have a guess that the offsets are actually written from the 40th byte.

You might be correct. The size is calculated just by using sizeof(OpenMeteoHeader) and maybe my note in the code was outdated.

kikocorreoso · 2024-09-25T12:42:37Z

I've tried it around a month ago or so using python with no success. When I tried it I had like an hour of 'free' time and after that I went on holidays for a month so I can't remember the details or what was the issue but the data after the header made no sense to me.

It will be very useful if someone can provide any hints.

L1nzar · 2024-09-26T00:02:51Z

@patrick-zippenfenig For C#. I thought it would be simple enough, but TurboPFor without a wrapper for C# and my lack of experience with integer compression has become a big problem so far.

patrick-zippenfenig · 2024-09-26T15:21:39Z

We aim to provide low-level C functions to interact with OM files. This will abstract chunking and compression. Integrations into other programming languages using asynchronous IO should then be "relatively" easy. Here are some additional notes: fsspec/kerchunk#464

L1nzar closed this as completed Sep 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OM format decoding problem #20

OM format decoding problem #20

L1nzar commented Sep 25, 2024 •

edited

Loading

patrick-zippenfenig commented Sep 25, 2024

kikocorreoso commented Sep 25, 2024

L1nzar commented Sep 26, 2024

patrick-zippenfenig commented Sep 26, 2024

OM format decoding problem #20

OM format decoding problem #20

Comments

L1nzar commented Sep 25, 2024 • edited Loading

patrick-zippenfenig commented Sep 25, 2024

kikocorreoso commented Sep 25, 2024

L1nzar commented Sep 26, 2024

patrick-zippenfenig commented Sep 26, 2024

L1nzar commented Sep 25, 2024 •

edited

Loading