Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Radiosonde data - apparent (and misterious) disparity in data shape #35

Open
geacomputing opened this issue May 16, 2024 · 8 comments
Open

Comments

@geacomputing
Copy link

Hello.

I have some Radiosonde profiles, in BUFR format. I have tried reading them using both pybufrkit and eccodes. Specifically for eccodes, I adapted this script from confluence.ecmwf.int

My aim is to read in the whole sequence of BUFR and concatenate/convert them to an xarray dataset.

Specifically, when running following command,

pybufrkit decode -a bufr309052_all_20240125_1108_3.bfr > "../testout.json"

I have:

        # --- 1 of 3592 replications ---
        303054 (Temperature, dewpoint and wind data at a pressure level with radiosonde position)
            004086 LONG TIME PERIOD OR DISPLACEMENT None
            008042 EXTENDED VERTICAL SOUNDING SIGNIFICANCE 65536
            007004 PRESSURE 100000.0
            010009 GEOPOTENTIAL HEIGHT 126
            005015 LATITUDE DISPLACEMENT (HIGH ACCURACY) 0.0
            006015 LONGITUDE DISPLACEMENT (HIGH ACCURACY) 0.0
            012101 TEMPERATURE/AIR TEMPERATURE None
            012103 DEWPOINT TEMPERATURE None
            011001 WIND DIRECTION None
            011002 WIND SPEED None

However, in both cases (pybufrkit and eccodes), depending on the file I ingest in my python function inside my loop, i observe a mismatch in size (shape) of my data:
len(airT) = len(dewT) = len(time) -1

More specifically:

[+] -------------------------------------------
[+] Filename:  03/bufr309052_all_20240331_1104_0.bfr
[+] -------------------------------------------
      variable: dtime   --> lenght: 2720
      variable: dlat    --> lenght: 2720
      variable: dlon    --> lenght: 2720
      variable: airt    --> lenght: 2720
      variable: geopot  --> lenght: 2720

[+] -------------------------------------------
[+] Filename:  01/bufr309052_all_20240131_1105_0.bfr
[+] -------------------------------------------
      variable: dtime   --> lenght: 2854
      variable: dlat    --> lenght: 2854
      variable: dlon    --> lenght: 2854
      variable: airt    --> lenght: 2853    |<--------
      variable: geopot  --> lenght: 2853    |<--------

Now my questions:
Why is that? Shouldn't lat, lon, time and parameters always be the same size?
How can I proceed? Should I filter? Skip? Ignore?

Thanks for any suggestion or constructive comment you might be willing to share with me.

@ywangd
Copy link
Owner

ywangd commented May 16, 2024

Are you sure the fields are missing in the output of pybufrkit? I suspect there is some issue in your python script especially since both pybufrkit and eccodes give you the same result.

If you can share one of the BUFR file that is giving you this issue, I can take a closer look at it.

@domcyi24
Copy link

Hi Yang.

thank you so much for your kind and prompt reply.
At this stage, I am not sure of anything, I sailing through unknown waters, and your kind offer of help is greatly appreciated: thanks!

I have enclosed both BUFR files, as well as my python script (with eccodes). Is there any way I could get a clean, reproducible, understandable python code to simply get lat, lon, time, geopot height, airtT, dewT?

As mentioned, my aim is to combine all my (hundreds of) files into one whole xarray dataset. But for that, I need to have a clearer understanding, and be able to align the data.

Thank you so much for your kind help!
cheers

GitHub_BUFR_Radiosonde.zip

@david-i-berry
Copy link

david-i-berry commented May 16, 2024

The BUFR sequence contains a block of data for the radiosonde profile (at 0 to n levels) followed by a block of data containing the wind shear (at a different 0 to n levels). The additional latitude, longitude and date/time are coming from the wind shear block.

If you run bufr_dump from eccodes on the second file you should see the wind sheer data at the end of the output:

#2854#timePeriod=2726
#2854#extendedVerticalSoundingSignificance=18432
#2854#pressure=11240
#2854#latitudeDisplacement=0.19886
#2854#longitudeDisplacement=0.37582
absoluteWindShearIn1KmLayerBelow=6.4
absoluteWindShearIn1KmLayerAbove=MISSING

Hope this helps.

@geacomputing
Copy link
Author

Hi David, Thanks for your kind answer. Yes, it helped. Things start appearing less fuzzy now.
So, I understand there are two blocks, one for the data (I am interested in: temp, dew, geopot), and one for wind shear (uninteresting for me, at the moment).

When reading in the data (adapted from this link should I mask the data to only get block #1 ?

And also (from the same link): what is significance level? Is it some kind of quality check?
vsSignif = codes_get_array(bufr, "extendedVerticalSoundingSignificance")

Thanks!

@david-i-berry
Copy link

You want only the children of the 303054 block (https://library.wmo.int/idviewer/35625/448). Using pybufrkit the query would be something like

pybufrkit query 303054/005015 /local/app/bufr309052_all_20240131_1105_0.bfr

to get the latitude displacements etc.

@geacomputing
Copy link
Author

Dear David!

THANK YOU for your kind hint, for the PDF link and for pointing me in the right direction.
I decided to abandon ecCodes and use pybufrkit, using this code:

#Define my input file
INPUT = '/home/olddog/Radiosonde/fromDB/03/bufr309052_all_20240331_1104_0.bfr'

#Decode the BUFR message
decoder = Decoder()
with open(INPUT, 'rb') as ins:
    bufr_message = decoder.process(ins.read())


#Extract the stuff I need, as parent/child
lon     = DataQuerent(NodePathParser()).query(bufr_message, '303054/005015').results[0][0]
lat     = DataQuerent(NodePathParser()).query(bufr_message, '303054/006015').results[0][0]
geopot  = DataQuerent(NodePathParser()).query(bufr_message, '303054/010009').results[0][0]
time    = DataQuerent(NodePathParser()).query(bufr_message, '303054/004086').results[0][0]
airT    = DataQuerent(NodePathParser()).query(bufr_message, '303054/012101').results[0][0]
dewT    = DataQuerent(NodePathParser()).query(bufr_message, '303054/012103').results[0][0]

#Show Data:
plt.close('all')
plt.figure(figsize=(5, 10))
plt.plot(airT, geopot, dewT, geopot)
plt.xlabel('Temperature [K]')
plt.ylabel('Geopot h [m ASL]')
plt.title('Profile \n'+ os.path.basename(INPUT))
plt.tight_layout()
plt.show()

And I compare this new reader to the ever first message I posted on this thread.
NOW IT IS FINALLY CONSISTENT:

[+] -------------------------------------------
[+] Filename:  03/bufr309052_all_20240331_1104_0.bfr
[+] -------------------------------------------
	-variable: time    --> lenght: 2720
	-variable: lat     --> lenght: 2720
	-variable: lon     --> lenght: 2720
	-variable: airt    --> lenght: 2720
	-variable: geopot  --> lenght: 2720
	-variable: dewT    --> lenght: 2720





[+] -------------------------------------------
[+] Filename:  02/bufr309052_all_20240224_0511_0.bfr
[+] -------------------------------------------
	-variable: time    --> lenght: 2679
	-variable: lat     --> lenght: 2679
	-variable: lon     --> lenght: 2679
	-variable: airt    --> lenght: 2679
	-variable: geopot  --> lenght: 2679
	-variable: dewT    --> lenght: 2679 

Vertical_Profile

@ywangd
Copy link
Owner

ywangd commented May 17, 2024

Glad to know your problem is solved. Thanks for helping out @david-i-berry 👍

@geacomputing
Copy link
Author

geacomputing commented May 17, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants