Input / Output

Reading and Writing of meteorological data

enstools.io.drop_unused(ds)

COSMO is one example of a model which stores all possible coordinates within the output files even if no data variables are present which would use these variables. This function checks all coordinates and removes those that are not used by any data variable. Unused dimensions are also removed from the dataset.

Some special coordinates are kept: - rotated_pole

Parameters

dsxarray.Dataset: Dataset to check.

Returns

xarray.Dataset:: a copy of the input dataset with removed coordinates.

enstools.io.get_file_type(filename: str, only_extension=False)

use the first bytes to guess the type of input files

Parameters

filenamestring: name of the file to check
only_extensionbool: if True, only the file name is checked. This is useful for not yet created files.

Returns

string: NC, HDF, GRIB

enstools.io.read(filenames: Union[List[str], Tuple[str], str, Path, List[Path]], constant=None, merge_same_size_dim=False, members_by_folder=False, member_by_filename=None, decode_times=True, **kwargs)

Read multiple input files

Parameters

filenameslist of str or tuple of str or str

names of individual files or filename pattern

merge_same_size_dimbool

dimensions of the same size but with different names are merged together. This is sometimes useful to merge datasets from different file formats (e.g., grib and netcdf). Default: False.

members_by_folderbool

interpret files in one subdirectory as data from the same ensemble member. This is useful if the ensemble member number if not stored within the input file. Defaut: False.

member_by_filenamestr

string containing a regular expression to read to ensemble member from the file names. The first group within this regex has to match an integer. Example: member_by_filename=”_P(\d+).nc” would be able the read the 12 from the file name AROME-EPS_2012110500_P12.nc.

constantstr

name of a file containing constant variables.

decode_times: bool

decode the times

**kwargs

all arguments accepted by xarray.open_dataset() of xarray.open_mfdataset() plus some additional:

drop_unused: bool: remove unused coordinates. This improves the performance of the merge process. The merge process compares the coordinates from all files with each other.
in_memory: bool: store the complete arrays in memory. Data is still handled as dask arrays, but not backed by the input files anymore. This works of course only for datasets which fit into memory.
leadtime_from_filename: bool: COSMO-GRIB1-Files do not contain exact times. If this argument is set, then the timestamp is calculated from the init time and the lead time from the file name.

Returns

xarray.Dataset: in-memory representation of the content of the input file(s)

enstools.io.write(ds: Union[Dataset, DataArray], file_path: Union[str, Path], file_format: Optional[str] = None, compression: Optional[Union[str, dict]] = None, compute: bool = True, engine: str = 'h5netcdf', format: str = 'NETCDF4')

write a xarray dataset to a file

Parameters

dsxarray.Dataset

the dataset to store

file_pathstring or Path

the file to create

file_format{‘NC’}

string indicating the format to use. if not specified, the file extension if used.

compressionstring

Used to specify the compression mode and optionally additional arguments. The parameter follows the rules defined in enstools-encoding. (https://gitlab.physik.uni-muenchen.de/w2w/enstools-encoding.git)

To apply lossless compression we can just use:: “lossless”
Or we can select the backend and the compression level using the following syntax:: “lossless,backend,compression_level”
The backend can be one of:: ‘blosclz’ ‘lz4’ (default) ‘lz4hc’ ‘snappy’ ‘zlib’ ‘zstd’

and the compression level must be an integer from 1 to 9 (default is 9). Few examples:

“lossless,zstd,4” “lossless,lz4,9” “lossless,snappy,1”

Using “lossless” without additional arguments would be equivalent to “lossless,lz4,9”

For lossy compression, we might be able to pass more arguments:: “lossy”
The lossy compressors available will be:: ‘zfp’ ‘sz’
For ZFP we have different compression methods available:: ‘rate’ ‘accuracy’ ‘precision’
For SZ we have also different compression methods available:: ‘abs’ ‘rel’ ‘pw_rel’

Each one of this methods require an additional parameter: the rate, the precision or the accuracy. The examples would look like:

‘lossy,zfp,accuracy,0.2’ ‘lossy,zfp,rate,4’

There are also few features that target datasets with multiple variables. One can write a different specification for different variables by using a list of space separated specifications:

‘var1:lossy,zfp,rate,4.0 var2:lossy,sz,abs,0.1’

For more details see the corresponding documentation.

Another option would be to pass the path to a YAML configuration file as argument.

computebool

Dask delayed feature. Set to true to delay the file writing.