-
Notifications
You must be signed in to change notification settings - Fork 20
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #32 from developmentseed/feat/copc-netcdf4-hdf5
Feat/copc netcdf4 hdf5
- Loading branch information
Showing
3 changed files
with
36 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
--- | ||
title: Cloud-Optimized NetCDF4/HDF5 | ||
--- | ||
|
||
Cloud-optimized access to NetCDF4/HDF5 files is possible. However, there are no standards for the metadata, chunking and compression for cloud-optimized access for these file types. | ||
|
||
::: {.callout-note} | ||
Note: NetCDF4 are valid HDF5 files, see [Reading and Editing NetCDF-4 Files with HDF5](https://docs.unidata.ucar.edu/netcdf-c/current/interoperability_hdf5.html). | ||
::: | ||
|
||
NetCDF4/HDF5 were designed for disk access and thus moving them to the cloud is bearing little fruit. Matt Rocklin describes the issue in [HDF in the Cloud: Challenges and Solutions for Scientific Data](https://matthewrocklin.com/blog/work/2018/02/06/hdf-in-the-cloud): | ||
|
||
>The HDF format is complex and metadata is strewn throughout the file, so that a complex sequence of reads is required to reach a specific chunk of data. The only pragmatic way to read a chunk of data from an HDF file today is to use the existing HDF C library, which expects to receive a C FILE object, pointing to a normal file system (not a cloud object store) (this is not entirely true, as we’ll see below). | ||
> | ||
>So organizations like NASA are dumping large amounts of HDF onto Amazon’s S3 that no one can actually read, except by downloading the entire file to their local hard drive, and then pulling out the particular bits that they need with the HDF library. This is inefficient. It misses out on the potential that cloud-hosted public data can offer to our society. | ||
To provide cloud-optimized access to these files without an intermediate service, like [Hyrax](https://hdfeos.org/software/hdf5_handler.php) or the [Highly Scalable Data Service (HSDS)](https://www.hdfgroup.org/solutions/highly-scalable-data-service-hsds/), it is recommended to determine if the NetCDF4/HDF5 data you wish to provide can be used with kerchunk. Rich Signell provided some promising results and instructions on how to create a kerchunk reference file (aka `fsspec.ReferenceFileSystem`) for NetCDF4/HDF5 and the things to be aware of [Cloud-Performant NetCDF4/HDF5 with Zarr, Fsspec, and Intake](https://medium.com/pangeo/cloud-performant-netcdf4-hdf5-with-zarr-fsspec-and-intake-3d3a3e7cb935). Note, the post is from 2020, so it's possible details have changed, however the approach of using kerchunk for NetCDF4/HDF5 is still recommended. | ||
|
||
Stay tuned for more information on cloud-optimized NetCDF4/HDF5 in future releases of this guide. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
--- | ||
title: Cloud-Optimized Point Clouds (COPC) | ||
subtitle: An Introduction to Cloud-Optimized Point Clouds (COPC) | ||
--- | ||
|
||
The LASER (LAS) file format is designed to store 3-dimensional (x,y,z) point cloud data typically collected from [LiDAR](https://en.wikipedia.org/wiki/Lidar). An LAZ file is a compressed LAS file and a Cloud-Optimized Point Cloud (COPC) file is a valid LAZ file. | ||
|
||
COPC files are similar to COGs for GeoTIFFs: Both are valid versions of the original file format but with additional requirements to support cloud-optimized data access. In the case of COGs, there are additional requirements for tiling and overviews. For COPC, data must be organized into a clustered octree with a variable-length record (VLR) describing the octree structure. | ||
|
||
Read more at [https://copc.io/](https://copc.io/). | ||
|
||
Stay tuned for more information on COPC in future releases of this guide. |