Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OM format decoding problem #20

Closed
L1nzar opened this issue Sep 25, 2024 · 4 comments
Closed

OM format decoding problem #20

L1nzar opened this issue Sep 25, 2024 · 4 comments

Comments

@L1nzar
Copy link

L1nzar commented Sep 25, 2024

I'm trying to make my own decoder for OM files, but I get a discrepancy between the estimated number of chunks and what's in the file.
For example, I process the MSM model, temperature (chunk_4209.om file)
header (56 bytes like it's said in documentation):
image

OM(2): OM
Version(1): 2
Compression(1): 0
ScaleFactor(4): 20
Dim0(8): 242905
Dim1(8): 114
Chunk0(8): 26
Chunk1(8): 114
???(8): 790
???(8): 1469

From this I get that there are 9343 chunks in the file ((114/114) * 242905 / 26 = 9,342.5).

I start reading the array of offsets:
Pos:56 ChunkN:0 Offset:2177
Pos:64 ChunkN:1 Offset:2858
Pos:72 ChunkN:2 Offset:3484
Pos:80 ChunkN:3 Offset:4139

It's alredy no clear why the first offset is so large. As if there are three in this chunk at once.

At the last offsets I get incorrect values, as if the data already begins where the offset values ​​should be:
Pos:74776 ChunkN:9340 Offset:6206997
Pos:74784 ChunkN:9341 Offset:1407379585452674
Pos:74792 ChunkN:9342 Offset:-4611685990509789180

I have a guess that the offsets are actually written from the 40th byte.

Is there some problem with documentation?

@patrick-zippenfenig
Copy link
Member

Hi, for which programming language are you trying to implement a reader? We are actively working on a direct implementation for various programming languages. The file format will also be revised to support more than 2 dimensions, streaming write support, cloud native reads and further improve compression ratio. Here is the branch to implement new writer/reader. It is not yet functional.

I have a guess that the offsets are actually written from the 40th byte.

You might be correct. The size is calculated just by using sizeof(OpenMeteoHeader) and maybe my note in the code was outdated.

@kikocorreoso
Copy link

I've tried it around a month ago or so using python with no success. When I tried it I had like an hour of 'free' time and after that I went on holidays for a month so I can't remember the details or what was the issue but the data after the header made no sense to me.

It will be very useful if someone can provide any hints.

@L1nzar
Copy link
Author

L1nzar commented Sep 26, 2024

@patrick-zippenfenig For C#. I thought it would be simple enough, but TurboPFor without a wrapper for C# and my lack of experience with integer compression has become a big problem so far.

@L1nzar L1nzar closed this as completed Sep 26, 2024
@patrick-zippenfenig
Copy link
Member

We aim to provide low-level C functions to interact with OM files. This will abstract chunking and compression. Integrations into other programming languages using asynchronous IO should then be "relatively" easy. Here are some additional notes: fsspec/kerchunk#464

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants