Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a boundary layer module to estimate boundary height #3572

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

ThomasRieutord
Copy link

Add a boundary layer module to estimate boundary height with various classical ways from vertical profiles

Description Of Changes

Boundary layer height estimation is a parameter commonly derived from vertical profiles (from radiosondings or model profiles) with key importance in application domains such as air quality. The issue #1515 was expressing the need for a way to estimate boundary layer height with MetPy. This PR suggests a module in calc to do so.

The newly added module boundarylayer.py includes four different ways to estimate the boundary layer height, each of them is linked to references:

  • from bulk Richardson number
  • from the parcel method
  • from humidity gradient
  • from temperature inversion

Help needed

The imports in the module do not follow the MetPy standards. They are just as if MetPy was an external module. Can someone help me correcting them?

The decorators that are present on the functions of other modules (such as exporter) are missing. As I'm not familiar with them, help is welcome here too.

The testing program added together with the module only runs the functions in the new module on sample data, but the unitary tests are not defined yet.

Thanks for your help.

Checklist

@ThomasRieutord ThomasRieutord requested a review from a team as a code owner July 19, 2024 11:44
@ThomasRieutord ThomasRieutord requested review from dopplershift and removed request for a team July 19, 2024 11:44
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Copy link
Contributor

@DWesl DWesl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall a good start, though the documentation could use a description of the assumptions or limitations of each method (i.e., this method gives odd answers at night, that one gives odd answers during the day, the concept of a boundary layer distinct from the free troposphere breaks down inside a thunderstorm).

This is a solid collection of functions to implement existing methods for soundings isolated in time.

from metpy.units import units


def smooth(val, span):
Copy link
Contributor

@DWesl DWesl Jul 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

XArray calls this a rolling mean. So does pandas.
Bottleneck calls this a moving-window mean.
SciPy appears to call the same thing a uniform filter.

These would likely work better in the See Also section than as a change in the name.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the references! I knew some equivalent functions were already existing but they are not quite exactly the same (Xarray works on xarray.Dataset, Scipy has a slightly different strategy at the edges) and, given that the function is simple enough, it was less work to write it than to look for the existing one.

Bottleneck's function seems to do exactly what I want but it is not listed in the Metpy's dependencies. Do you think it's worth adding it so I can use their moving-mean function?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'd have to ask one of the maintainers, but, in the meantime, would this be faster?

cumulative_sums = np.nancumsum(val)
rolling_sums = cumulative_sums[span:] - cumulative_sums[:-span]
valid_index = np.isfinite(val)
cumulative_count = np.cumsum(valid_index)
rolling_count = cumulative_count[span:] - cumulative_count[:-span]
rolling_means = rolling_sums / rolling_count

You'd need to pre-allocate rolling_means and handle the edges still but it should work. (Alternately, use np.lib.stride_tricks.sliding_window_view with np.nanmean, but the note about it being slow is warranted)

Alternately, use SciPy for the bulk of the calculation, then re-do the edges the way you want.

It would probably be a good idea to check whether this takes enough time that it's worth optimizing before going too far, though (as you may have noticed, I am not good at that).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the testing data I have it's almost instantaneous and, as I moved other topics, I don't really have something bigger to quickly try it on. I suggest we leave it that way for now and other users might open another issue if when need to speed it up. Would that be alright?

Copy link
Contributor

@DWesl DWesl Aug 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You picked the same name as elsewhere in MetPy, though the edge handling is again different from what you do here (they do not smooth close to the edge) and from SciPy, and they do not use nanmean.

The bigger test data would likely be someone trying to find boundary layer height from model data somewhere, and it looks like your code is designed for a profile at a time, not arrays of profiles (either the N x Z that description implies or the Z x Y x X conventional in model output). It might be a simple matter to add an axis keyword argument to most functions and borrow the logic from the derivative functions to create the indexers (and possibly the vertical axis auto-detection for DataArrays), but that should probably be a follow-up PR in case it isn't.

Comment on lines 97 to 98
u[0] = 0 * units.meter_per_second
v[0] = 0 * units.meter_per_second
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a particular reason you force the winds at the first layer to zero without documenting the fact, rather than asserting an additional layer at the ground (below the profiles passed) where the winds are zero to satisfy the boundary conditions?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not at all, I just didn't have the idea of adding an extra point. Thanks for the suggestion!
However, it raises the question of what value put to the potential temperature at this extra point. I think duplicating the lowest point it the most convenient solution.

The two lines would be replaced by:

    # Add a ground point with null wind to satisfy theoretical boundary conditions
    u = np.insert(u, 0, 0 * units.meter_per_second)
    v = np.insert(v, 0, 0 * units.meter_per_second)
    height = np.insert(height, 0, 0 * units.metre)
    potential_temperature = np.insert(potential_temperature, 0, potential_temperature[0])

Additionally, as the insertion of the ground changes the length of the profile, the returned profile should exclude it to keep the same size as the input ones:

    return bRi[1:]

Does it look OK with these changes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you need to copy the point into the array? Could you just add a couple of variables for these? That would avoid the complication with the changing profile size, which would also impact idxfoot.
Something like:

if idxfoot == 0:
    u0 = 0
    v0 = 0
else:
    u0 = u[idxfoot]
    v0 = v[idxfoot]
theta0 = theta[idxfoot]
height0 = height[idxfoot]

or even:

if idxfoot == 0:
    Du = u
    Dv = v
else:
    Du = u - u[idxfoot]
    Dv = v - v[idxfoot]

I'm not sure if height should be treated the same as velocity. I think I remember a displacement height that might complicate things?

I think the main difference between this method and yours would be if the boundary layer was below the lowest passed level.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your second suggestion:

if idxfoot == 0:
    Du = u
    Dv = v
else:
    Du = u - u[idxfoot]
    Dv = v - v[idxfoot]

looks the best to me. I will go for that one. Thanks!

Regarding the problem that could arise with BL below the first level, I think the bulk Richardson method is out of scope anyway in this kind of situation, so I don't think we need to address it (rather refer to a more appropriate method).

theta0 = potential_temperature[0]

if any(potential_temperature > theta0):
iblh = np.where(potential_temperature > theta0)[0][0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this looks like a potential temperature threshold method. I would prefer "exceeds" over "reaches" in the documentation, given the usual description of the convective boundary layer.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine by me. I will also add that it's only suited for unstable boundary layer, as suggested in the main comment. The name of the method might vary with the authors, "parcel method" is the one I have seen the most, but I can include other names (e.g. "potential temperature threshold method") in the doc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There might be a difference in expectations if the boundary layer is saturated (i.e. fog or fair-weather cumulus), but describing alternate names should avoid that.

"""Calculate atmospheric boundary layer height from the relative
humidity gradient

It is the height where the relative humidity or specific humidity gradient reaches a minimum.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a side note, it should be possible to (ab)use this function with TKE or aerosol backscatter. I take it you considered wavelet methods out-of-scope?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely possible to use it with TKE or aerosol backscatter. I don't mention it because the references I give are related to radiosounding data, while aerosol would come from lidar or ceilometer and TKE usually from model output, but they are very good way of deriving the BLH (especially the TKE, when turbulence is right in the model).

The wavelet method is out of scope for this first PR, yes. It would be great to have it in the future but I will not have the time to commit for that. In the meantime, this gradient method + smoothing should be equivalent to the Haar wavelet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TKE usually from model output, but they are very good way of deriving the BLH (especially the TKE, when turbulence is right in the model).

I think I've seen TKE from Doppler lidar. It's probably possible to derive something similar from Doppler radar, but the radar-based boundary-layer methods I've seen have been based on scattering/reflectivity.

The wavelet method is out of scope for this first PR, yes. It would be great to have it in the future but I will not have the time to commit for that.

Yeah, there's a lot of interesting things to do with boundary layers, and not all of them should be in a first attempt.

In the meantime, this gradient method + smoothing should be equivalent to the Haar wavelet.

Fairly close, yes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, Doppler lidars can provide TKE but, as far as I remember, only the most powerful (and expensive) Doppler lidars will be able to have any signal from above the boundary layer, which make it difficult to run such an estimation because the top of the BL might be too close to the end of the profile.

Anyway, the function is there, no harm trying it on TKE profiles! Do you think I should change the name (and the doc) of the function to make it more general? Something like blh_from_concentration_gradient?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, Doppler lidars can provide TKE but, as far as I remember, only the most powerful (and expensive) Doppler lidars will be able to have any signal from above the boundary layer, which make it difficult to run such an estimation because the top of the BL might be too close to the end of the profile.

I think I've seen estimates based on SNR/signal quality in that situation, but likely less frequent of a use-case.

Anyway, the function is there, no harm trying it on TKE profiles! Do you think I should change the name (and the doc) of the function to make it more general? Something like blh_from_concentration_gradient?

I like blh_from_concentration_gradient. Definitely mention humidity in the docs, since that's probably what people will be looking for.

Contains a collection of boundary layer height estimations.


References:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should probably ask the maintainers about one overall References section here with citations everywhere else vs. repeated References sections in each function that needs them.

"""Calculate atmospheric boundary layer height from the inversion of
absolute temperature gradient

It is the height where the temperature gradient (absolute or potential) changes of sign.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, how well does this work for the convective boundary layer with potential temperature? Or, for that matter, with the nocturnal stable boundary layer with either?

I was expecting to see a threshold method on $\frac{d\theta}{dz}$.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I remember (my experience with this is now a bit old), this method is more suited for nocturnal stable boundary layers as it will track the end of the stable layer at the surface.

For convective boundary layer, the parcel method should be preferred to this method, as this method would gives underestimated height with a large variability.

The threshold of $\frac{d\theta}{dz}$ would be interesting to try too. I think I have seen it used in some works, just not in the ones I mention in this code.

Comment on lines 129 to 145
blh = boundarylayer.blh_from_richardson_bulk(SAMPLE_HEIGHT, potential_temperature, SAMPLE_U, SAMPLE_V)
print(f"Estimated boundary layer height with blh_from_richardson_bulk:\t {blh}")

blh = boundarylayer.blh_from_parcel(SAMPLE_HEIGHT, potential_temperature)
print(f"Estimated boundary layer height with blh_from_parcel:\t {blh}")

blh = boundarylayer.blh_from_humidity_gradient(SAMPLE_HEIGHT, SAMPLE_RELATIVE_HUMIDITY)
print(f"Estimated boundary layer height with blh_from_humidity_gradient (relative humidity):\t {blh}")

blh = boundarylayer.blh_from_humidity_gradient(SAMPLE_HEIGHT, specific_humidity)
print(f"Estimated boundary layer height with blh_from_humidity_gradient (specific humidity):\t {blh}")

blh = boundarylayer.blh_from_temperature_inversion(SAMPLE_HEIGHT, SAMPLE_TEMPERATURE)
print(f"Estimated boundary layer height with blh_from_temperature_inversion (absolute temperature):\t {blh}")

blh = boundarylayer.blh_from_temperature_inversion(SAMPLE_HEIGHT, potential_temperature)
print(f"Estimated boundary layer height with blh_from_temperature_inversion (potential temperature):\t {blh}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the usual method is to put the calculations in a function, with some asserts to indicate what you expect to be true. With this setup, checking that the different methods give answers within a few hundred meters of each other, or within a few hundred meters of the value you picked out from a Skew-T or tephigram, would likely get your intent across.

I'm pretty sure MetPy has an example sounding around somewhere, for the documentation if not for the tests, so you could create similar tests on that one as well.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can write the corresponding pytest unitary tests based on these computations. However, I better use a standard Metpy profile rather than my own testing data. Do you know where I could find this sounding data?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for the review, @DWesl !

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/Unidata/MetPy/tree/main/staticdata has a few files matching *_sounding.txt that look promising. Those are used here:

df = pd.read_fwf(get_test_data('may4_sounding.txt', as_file_obj=False),
skiprows=5, usecols=[0, 1, 2, 3, 6, 7], names=col_names)

The SkewT tests look like they use p = np.geomspace(1000, 100, 10); T = np.linspace(20, -20, 10) and similar profiles. You could build something similar for a first test (constant potential temperature and water vapor mixing ratio up to some height, then a sharp increase in the former and a sharp decrease in the latter)

import numpy as np
import pandas as pd

import metpy.calc as mpcalc

Check notice

Code scanning / CodeQL

Module is imported with 'import' and 'import from' Note test

Module 'metpy.calc' is imported with both 'import' and 'import from'.
# -*-coding:utf-8 -*-
"""Testing program for the MetPy boundary layer module"""

import numpy as np

Check notice

Code scanning / CodeQL

Unused import Note test

Import of 'np' is not used.
col_names = ["pressure", "height", "temperature", "dewpoint", "direction", "speed"]

df = pd.read_fwf(
get_test_data("may4_sounding.txt", as_file_obj=False),

Check warning

Code scanning / CodeQL

File is not always closed Warning test

File is opened but is not closed.
N = len(val)
smoothed_val = deepcopy(val)
for i in range(N):
smoothed_val[i] = np.nanmean(val[i - min(span, i) : i + min(span, N - i)])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
smoothed_val[i] = np.nanmean(val[i - min(span, i) : i + min(span, N - i)])
smoothed_val[i] = np.nanmean(val[max(i - span, 0) : min(i + span, N)])

Would this be more clear as to the intent?

return smoothed_val


def bulk_richardson_number(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might close #628

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

calculate PBL height from soundings
3 participants