Input / Output
Reading and Writing of meteorological data
- enstools.io.drop_unused(ds)
COSMO is one example of a model which stores all possible coordinates within the output files even if no data variables are present which would use these variables. This function checks all coordinates and removes those that are not used by any data variable. Unused dimensions are also removed from the dataset.
Some special coordinates are kept: - rotated_pole
- Parameters
- dsxarray.Dataset
Dataset to check.
- Returns
- xarray.Dataset:
a copy of the input dataset with removed coordinates.
- enstools.io.get_file_type(filename: str, only_extension=False)
use the first bytes to guess the type of input files
- Parameters
- filenamestring
name of the file to check
- only_extensionbool
if True, only the file name is checked. This is useful for not yet created files.
- Returns
- string
NC, HDF, GRIB
- enstools.io.read(filenames: Union[List[str], Tuple[str], str, Path, List[Path]], constant=None, merge_same_size_dim=False, members_by_folder=False, member_by_filename=None, decode_times=True, **kwargs)
Read multiple input files
- Parameters
- filenameslist of str or tuple of str or str
names of individual files or filename pattern
- merge_same_size_dimbool
dimensions of the same size but with different names are merged together. This is sometimes useful to merge datasets from different file formats (e.g., grib and netcdf). Default: False.
- members_by_folderbool
interpret files in one subdirectory as data from the same ensemble member. This is useful if the ensemble member number if not stored within the input file. Defaut: False.
- member_by_filenamestr
string containing a regular expression to read to ensemble member from the file names. The first group within this regex has to match an integer. Example: member_by_filename=”_P(\d+).nc” would be able the read the 12 from the file name AROME-EPS_2012110500_P12.nc.
- constantstr
name of a file containing constant variables.
- decode_times: bool
decode the times
- **kwargs
all arguments accepted by xarray.open_dataset() of xarray.open_mfdataset() plus some additional:
- drop_unused: bool
remove unused coordinates. This improves the performance of the merge process. The merge process compares the coordinates from all files with each other.
- in_memory: bool
store the complete arrays in memory. Data is still handled as dask arrays, but not backed by the input files anymore. This works of course only for datasets which fit into memory.
- leadtime_from_filename: bool
COSMO-GRIB1-Files do not contain exact times. If this argument is set, then the timestamp is calculated from the init time and the lead time from the file name.
- Returns
- xarray.Dataset
in-memory representation of the content of the input file(s)
- enstools.io.write(ds: Union[Dataset, DataArray], file_path: Union[str, Path], file_format: Optional[str] = None, compression: Optional[Union[str, dict]] = None, compute: bool = True, engine: str = 'h5netcdf', format: str = 'NETCDF4')
write a xarray dataset to a file
- Parameters
- dsxarray.Dataset
the dataset to store
- file_pathstring or Path
the file to create
- file_format{‘NC’}
string indicating the format to use. if not specified, the file extension if used.
- compressionstring
Used to specify the compression mode and optionally additional arguments. The parameter follows the rules defined in enstools-encoding. (https://gitlab.physik.uni-muenchen.de/w2w/enstools-encoding.git)
- To apply lossless compression we can just use:
“lossless”
- Or we can select the backend and the compression level using the following syntax:
“lossless,backend,compression_level”
- The backend can be one of:
‘blosclz’ ‘lz4’ (default) ‘lz4hc’ ‘snappy’ ‘zlib’ ‘zstd’
and the compression level must be an integer from 1 to 9 (default is 9). Few examples:
“lossless,zstd,4” “lossless,lz4,9” “lossless,snappy,1”
Using “lossless” without additional arguments would be equivalent to “lossless,lz4,9”
- For lossy compression, we might be able to pass more arguments:
“lossy”
- The lossy compressors available will be:
‘zfp’ ‘sz’
- For ZFP we have different compression methods available:
‘rate’ ‘accuracy’ ‘precision’
- For SZ we have also different compression methods available:
‘abs’ ‘rel’ ‘pw_rel’
Each one of this methods require an additional parameter: the rate, the precision or the accuracy. The examples would look like:
‘lossy,zfp,accuracy,0.2’ ‘lossy,zfp,rate,4’
There are also few features that target datasets with multiple variables. One can write a different specification for different variables by using a list of space separated specifications:
‘var1:lossy,zfp,rate,4.0 var2:lossy,sz,abs,0.1’
For more details see the corresponding documentation.
Another option would be to pass the path to a YAML configuration file as argument.
- computebool
Dask delayed feature. Set to true to delay the file writing.