Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schaap dataset is uncompressed #150

Open
BSchilperoort opened this issue Mar 31, 2023 · 7 comments
Open

Schaap dataset is uncompressed #150

BSchilperoort opened this issue Mar 31, 2023 · 7 comments

Comments

@BSchilperoort
Copy link
Contributor

I noticed that the Schaap soil data is not compressed. The netCDF file format supports compression, which could save a very large amount of disk space, while having little impact on performance (it is more likely that a moderate compression speeds up performance).

the nccopy tool (included in the netCDF software just like ncdump) allows for easily copying and compressing the data.

For example:

nccopy -d 5 PTF_SoilGrids_Schaap_sl1_alpha.nc PTF_SoilGrids_Schaap_sl1_alpha_COMPRESSED.nc

Copies the file, while compressing it with deflate level 5. (ranges 0 -- 9).

Compressing the Schaap data can save 100 GB of disk space.

@BSchilperoort
Copy link
Contributor Author

BSchilperoort commented Mar 31, 2023

The following script can be used if nccopy is installed on the system:

# Compress all soil_property data:
from pathlib import Path
import subprocess

infiles = [str(f) for f in Path("C:/STEMMUS_SCOPE_data/soil_property").rglob("*.nc")]
outfiles = [f.replace("soil_property", "soil_property_compressed") for f in infiles]

for infile, outfile in zip(infiles, outfiles):
    subprocess.run(f"nccopy -d 4 {infile} {outfile}")  # perhaps split this string on unix

@SarahAlidoost
Copy link
Member

Nice finding, thanks. A the beginning of the project, the data is copied from CRIB to Snellius. I am wondering about the data format in its original source. @Yunfei-Wang1993 explains data sources in his paper. Your solution can be suggested to the data provider.

@BSchilperoort
Copy link
Contributor Author

@Yunfei-Wang1993, @yijianzeng in the PLUMBER2 paper it is stated that the "SoilGrids" dataset is used. However, I am unable to find exactly where the netCDF files come from.

Also, there is the Shangguan 2014 dataset ("GSDE"), which links to https://globalchange.bnu.edu.cn/, however that website seems to be down.

Could you tell me where the files in the soil_property folder came from (including subfolders)?

Folder screenshot.

image

@yijianzeng
Copy link
Contributor

@Yunfei-Wang1993, @yijianzeng in the PLUMBER2 paper it is stated that the "SoilGrids" dataset is used. However, I am unable to find exactly where the netCDF files come from.

Also, there is the Shangguan 2014 dataset ("GSDE"), which links to https://globalchange.bnu.edu.cn/, however that website seems to be down.

Could you tell me where the files in the soil_property folder came from (including subfolders)?

Folder screenshot.

Hi Bart, this is coming from the following paper:

Montzka, C., Herbst, M., Weihermüller, L., Verhoef, A., and Vereecken, H.: A global data set of soil hydraulic properties and sub-grid variability of soil water retention and hydraulic conductivity curves, Earth Syst. Sci. Data, 9, 529–543, https://doi.org/10.5194/essd-9-529-2017, 2017.

Although it was stated 0.25 deg resolution, the original product was generated at 1km resolution, which can be obtained by contacting the author of this ESSD paper (and can be found here: https://fz-juelich.sciebo.de/s/xILqOr9hxlEzM7c ).

i hope the above is ok.

Cheers, Yijian

@BSchilperoort
Copy link
Contributor Author

Thanks for you reply, @yijianzeng , however, this is only part of the data. There are also the files such as SAND1.nc or CLAY1.nc, as well as files like PTF_SoilGrids_Schaap_sl1_alpha.nc (etc).

@Yunfei-Wang1993
Copy link
Contributor

Thanks for you reply, @yijianzeng , however, this is only part of the data. There are also the files such as SAND1.nc or CLAY1.nc, as well as files like PTF_SoilGrids_Schaap_sl1_alpha.nc (etc).

Hi, Bart, the soil hydraulic parameters (the Schaap files) come from Montzka's datasets. And the other soil properties (including CLAY, OC, POR, SAND, SILT, lambda et al.) come from Shangguan's dataset. I have check the link and it can't be open now. I will try to find the new link where can download these data.

Shangguan, W., Dai, Y., Duan, Q., Liu, B., and Yuan, H.: A global soil data set for earth system modeling, Journal of Advances in Modeling Earth Systems, 6, 249-263, 10.1002/2013ms000293, 2014.

@Yunfei-Wang1993
Copy link
Contributor

@Yunfei-Wang1993, @yijianzeng in the PLUMBER2 paper it is stated that the "SoilGrids" dataset is used. However, I am unable to find exactly where the netCDF files come from.

Also, there is the Shangguan 2014 dataset ("GSDE"), which links to https://globalchange.bnu.edu.cn/, however that website seems to be down.

Could you tell me where the files in the soil_property folder came from (including subfolders)?

Folder screenshot.

@BSchilperoort Hi, Bart, I check the link again and it can be opened now. Please use the new link: http://globalchange.bnu.edu.cn/research/soilw

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants