Welcome to the Community Intercomparison Suite’s documentation!¶
Contents:
Installing CIS¶
A pre-packaged version of CIS is available for installation using conda for 64-bit Linux, Mac OSX and Windows.
Once conda is installed, you can easily install CIS with the following command:
$ conda install -c cistools -c scitools cis
If you don’t already have conda, you must first download and install it. Anaconda is a free conda package that includes Python and many common scientific and data analysis libraries, and is available here. Further documentation on using Anaconda and the features it provides can be found at http://docs.continuum.io/anaconda/index.html.
To check that CIS is installed correctly, simply type cis version
to display the version number, for example:
$ cis version
Using CIS version: V1R2M0 (Stable)
In order to upgrade CIS to the latest version use:
$ conda update cis
Dependencies¶
If you choose to install the dependencies yourself, use the following command to check the required dependencies are present:
$ python setup.py checkdep
What’s new in CIS¶
What’s new in CIS 1.3¶
This page documents the new features added, and bugs fixed in CIS since version 1.2. See all changes here: https://github.com/cedadev/cis/compare/1.2.1...1.3.0
CIS 1.3 features¶
- Some significant optimisations have been made in reading Caliop, CCI and Aeronet datasets, there have also been speed improvements for ungridded data subsetting
- New Pandas interface allows the easy creation of DataFrames through the ‘as_data_frame’ method on Gridded or Ungridded data. Pandas is an extensive python library providing many powerful data analysis algorithms and routines.
- Compatibility updates for newer versions of Numpy and SciPy. The minimum require version of SciPy is now 0.16.0
- Swapped out Basemap plotting routines for Cartopy. This removed a dependancy (as Cartopy was already required by Iris), and has given us more flexibility for plotting different projections in the future
- Plots now automatically try to use the most appropriate resolution background images for plots over coastlines NASA blue marble images.
- ‘scatter_overlay’ plots have been completely removed (they have been deprecated for the last two versions), the same functionality can be achieved through the more generic ‘overlay’ plots.
- Update to the UngriddedData.coord() and .coords() API to match the changes in IRIS >=1.8. This allows users to also search for coordinates by supplying a
Coord
instance to compare against. Currently this only compares standard names, but this may be extended in the future.
Bugs fixed¶
- JASCIS-279 - This release removes the basemap dependency and means we can use a much newer version of GEOS which doesn’t clash with the SciTools version
- JASCIS-267 - Fixed ASCII file reading to be compatible with Numpy 1.9
- JASCIS-259 - Fixed Stats unit tests to reflect updates in SciPy (>0.15.0) linear regression routines for masked arrays
- JASCIS-211 - Subsetting now accepts variable names (rather than axes shorthands) more consistently, the docs have been updated to make the dangers of relying on axes shorthands clear and an error is now thrown if a specific subset coordinate is not found.
- JASCIS-275 - The ungridded subsetting is now done array-wise rather than element wise giving large performance improvements
CIS 1.3.1 fixes¶
- JASCIS-231 & JASCIS-209 - CIS now better determines the yaxis when the user specifies the xaxis as ‘time’ so that overlaying multiple time series is easy
- JASCIS-283 - An issue with setting xmin or xmax using datetimes
- A minor fix to the AerosolCCI product
What’s new in CIS 1.2¶
This page documents the new features added, and bugs fixed in CIS since version 1.1. See all changes here: https://github.com/cedadev/cis/compare/1.1.0...1.2.0
CIS 1.2 features¶
- All new
cis info
command provides much more detailed information about ungridded data variables and enables multiple variables to be output at a time.- Updated a number of routines to take advantage of Iris 1.8 features. In particular gridded-gridded collocation using the nearest neighbour kernel should be significantly faster. Iris 1.8 is now the minimum version required for CIS.
- Gridded-ungridded collocation now supports collocation from cubes with hybrid height or hybrid pressure coordinates for both nearest neighbour and linear interpolation kernels.
- Built-in support for reading multiple HadGEM .pp files directly.
- All new API and plugin development documentation, including a number of tutorials
Bugs fixed¶
- JASCIS-253 - Any ungridded points which contain a NaN in any of its coordinate values will now be ignored by CIS
- JASCIS-250 - Multiple HadGEM files can now be read correctly through the new data plugins.
- JASCIS-197 - Gridded-gridded collocation now respects scalar coordinates
- JASCIS-199 - Aggregation now correctly uses the bounds supplied by the user, even when collapsing to length one coordinates.
- Speed improvement to the ungridded-gridded collocation using linear interpolation
- Several bug fixes for reading multiple GASSP ship files
- Renamed and restructured the collocation modules for consistency
- Many documentation spelling and formatting updates
- Many code formatting updates for PEP8 compliance
CIS 1.2.1 features¶
- Updated CCI plugin to support Aerosol CCI v3 files.
What’s new in CIS 1.1¶
This page documents the new features added, and bugs fixed in CIS since version 1.0. For more detail see all changes here: https://github.com/cedadev/cis/compare/1.0.0...1.1.0
CIS 1.1 features¶
- JASMIN-CIS is now called CIS, and the packages, modules and documentation have been renamed accordingly.
- Conda packages are now available to allow much easier installation of CIS, and across more platforms: Linux, OSX and Windows.
- PyHDF is now an optional dependency. This makes the installation of CIS on e.g. Windows much easier when HDF reading is not required.
Bugs fixed¶
- JASCIS-243 - Error when reading multiple GASSP aircraft files
- JASCIS-139 - Updated ungridded aggregation to rename any variables which clash with coordinate variables, as this breaks during the output otherwise.
- Compatibility fixes for Numpy versions >1.8 and Python-NetCDF versions >1.1.
- Fix Caliop pressure units which were stored as hPA, but need to be hPa to conform to CF.
- The integration test data has been moved completely out of the repository - making the download quicker and less bloated. It’s location can be specified by setting the CIS_DATA_HOME environment variable.
- A test runner has been created to allow easy running of the unit and integration test.
What’s new in CIS 1.1.1¶
This section documents changes in CIS since version 1.1, these were primarily bug fixes and documentation updates. See all changes here: https://github.com/cedadev/cis/compare/1.1.0...1.1.1
Bugs fixed¶
- JASCIS-181 - Updated eval documentation
- JASCIS-239 - Documented the requirement of PyHamCrest for running tests
- JASCIS-249 - CIS will now accept variables and filenames (such as Windows paths) which include a colon as long as they are escaped with a backslash. E.g.
cis plot my_var:C\:\my_file.nc
.- Occasionally HDF will exit when reading an invalid HDF file without throwing any exceptions. To protect against this the HDF reader will now insist on an .hdf extension for any files it reads.
What kind of data can CIS deal with?¶
Writing¶
When creating files from a CIS command, CIS uses the NetCDF 4 classic format. Ungridded output files are always
prefixed with cis-
, and both ungridded and gridded output are always suffixed with .nc
.
Reading¶
CIS has built-in support for NetCDF and HDF4 file formats. That said, most data requires some sort of pre-processing before being ready to be plotted or analysed (this could be scale factors or offsets needing to applied, or even just knowing what the dependencies between variables are). For that reason, the way CIS deals with reading in data files is via the concept of “data products”. Each product has its own very specific way of reading and interpreting the data in order for it to be ready to be plotted, analysed, etc.
So far, CIS can read the following ungridded data files:
Dataset Product name Type File Signature AERONET Aeronet Ground-stations *.lev20 Aerosol CCI Aerosol_CCI Satellite *ESACCI*AEROSOL* CALIOP L1 Caliop_L1 Satellite CAL_LID_L1-ValStage1-V3*.hdf CALIOP L2 Caliop_L2 Satellite CAL_LID_L2_05kmAPro-Prov-V3*.hdf CloudSat CloudSat Satellite *_CS_*GRANULE*.hdf Flight campaigns NCAR_NetCDF_RAF Aircraft RF*.nc MODIS L2 MODIS_L2 Satellite *MYD06_L2*.hdf, *MOD06_L2*.hdf, *MYD04_L2*.hdf, *MOD04_L2*.hdf, *MYDATML2.*.hdf, *MODATML2*.hdf Cloud CCI Cloud_CCI Satellite *ESACCI*CLOUD* CSV datapoints ASCII_Hyperpoints N/A *.txt CIS ungridded cis CIS output cis-*.nc NCAR-RAF NCAR_NetCDF_RAF Aircraft *.nc containing the attribute Conventions with the value NCAR-RAF/nimbus GASSP NCAR_NetCDF_RAF Aircraft *.nc containing the attribute GASSP_Version GASSP NCAR_NetCDF_RAF Ship *.nc containing the attribute GASSP_Version, with no altitude GASSP NCAR_NetCDF_RAF Ground-station *.nc containing the attribute GASSP_Version, with attributes Station_Lat, Station_Lon and Station_Altitude
It can also read the following gridded data types:
Dataset Product name Type File Signature MODIS L3 daily MODIS_L3 Satellite *MYD08_D3*.hdf, *MOD08_D3*.hdf, *MOD08_E3*.hdf HadGEM pp data HadGEM_PP Gridded Model Data *.pp Net_CDF Gridded Data NetCDF_Gridded Gridded Model Data *.nc (this is the default for NetCDF Files that do not match any other signature)
The file signature is used to automatically recognise which product definition to use. Note the product can overridden easily by being specified at the command line.
This is of course far from being an exhaustive list of what’s out there. To cope with this, a “plugin” architecture has been designed so that the user can readily use their own data product reading routines, without even having to change the code - see the plugin development page for more information. There are also mechanisms to allow you to overwrite default behaviour if the built-in products listed above do not achieve the desired results.
Datagroups¶
Most CIS commands operate on a ‘datagroup’, which is a unit of data containing one or more similar variables and one or more files from which those variables should be taken. A datagroup represents closely related data from a specific instrument or model and as such is associated with only one data product.
A datagroup is specified with the syntax:
<variable>...:<filename>[:product=<productname>]
where:
<variable>
is a mandatory argument specifying the variable or variable names to use. This should be the name of the variable as described in the file, e.g. the NetCDF variable name or HDF SDS/VDATA variable name. Multiple variables may be specified by commas, and variables may be wildcarded using any wildcards compatible with the python module glob, so that *, ? and [] can all be usedAttention
When specifying multiple variables, it is essential that they be on the same grid (i.e. use the same coordinates).
<filenames>
is a mandatory argument used to specify the files to read the variable from. These can be specified as a comma seperated list of the following possibilities:1. a single filename - this should be the full path to the file
2. a single directory - all files in this directory will be read
3. a wildcarded filename - A filename with any wildcards compatible with the python module glob, so that *, ? and [] can all be used. E.g.,
/path/to/my/test*file_[0-9]
.Attention
When multiple files are specified (whether through use of commas, pointing at a directory, or wildcarding), then all those files must contain all of the specified variables, and the files should be ‘compatible’ - it should be possible to aggregate them together using a shared dimension - typically time (in a NetCDF file this is usually the unlimited dimension). So selecting multiple monthly files for a model run would be OK, but selecting files from two different datatypes would not be OK.
<productname>
is an optional argument used to specify the type of files being read. If omitted, the program will attempt to figure out which product to use based on the filename. See Reading to see a list of available products and their file signatures.
For example:
illum:20080620072500-ESACCI-L2_CLOUD-CLD_PRODUCTS-MODIS-AQUA-fv1.0.nc
Cloud_Fraction_*:MOD*,MODIS_dir/:product=MODIS_L2
Some file paths or variable names might contain colons (:), these need to be escaped so that CIS can tell the difference between it and the colons used to separate Datagroup elements. Simply use a backslash () to escape these characters. For example:
"TOTAL RAINFALL RATE\: LS+CONV KG/M2/S:C\:\My files\MODIS_dir:product=MODIS_L2"
Notice that we have used outer quotes to allow for the spaces in the variable and file names, and used the backslashes to escape the colons.
Reading hybrid height data with separate orography data¶
CIS supports the reading of gridded data containing hybrid height and pressure fields, with an orography field supplied in a separate file. The file containing the orography field (which should be properly referenced from a formula term in the data file) can just be appended to the list of files to be read in and CIS will attempt to create an appropriate altitude dimension.
Reading NetCDF4 Hierarchical Groups¶
CIS supports the reading of NetCDF4 hierarchical groups.
These can be specified on the command line in the format <group>.<variable_name>
,
e.g. AVHRR.Ch4CentralWavenumber
. Groups can be nested to any required depth like <group1>.<group2...>.<variable_name>
.
CIS currently does not support writing out of NetCDF4 groups, so any groups read in will be output ‘flat’.
Reading groups in user-developed product plugins¶
Most of the methods in the cis.data_io.netcdf module support netCDF4 groups using the syntax described above - users should use this module when designing their own plugins to ensure support for groups.
Using the command line¶
Run the following command to print help and check that it runs: cis --help
The following should be displayed:
usage: cis [-h] {plot,info,col,aggregate,subset,version} ...
positional arguments:
{plot,info,col,aggregate,subset,version}
plot Create plots
info Get information about a file
col Perform collocation
aggregate Perform aggregation
subset Perform subsetting
eval Evaluate a numeric expression
stats Perform statistical comparison of two datasets
version Display the CIS version number
optional arguments:
-h, --help show this help message and exit
There are 8 commands the program can execute:
plot
which is used to plot the datainfo
which prints information about a given input filecol
which is used to perform collocation on dataaggregate
which is used to perform aggregation along coordinates in the datasubset
which is used to perform subsetting of the dataeval
which is used to evaluate a numeric expression on datastats
which is used to perform a statistical comparison of two datasetsversion
which is used to display the version number of CIS
If an error occurs while running any of these commands, you may wish to check the log file ‘cis.log’; the default location for this is the current user’s home directory.
LSF Batch Job Submission¶
CIS jobs may be submitted to an LSF type batch submission system (e.g. the JASMIN environment) by using the
command cis.lsf
instead of cis. In this case the job will be sent to the batch system and any output will be written
to the log file.
Getting file information¶
Running $ cis info <filenames>
will print a list of the variables available in those files such as:
Trop
latitude
longitude_1
surface
unspecified_1
level6
ht
msl
latitude_1
To get more specific information about a given variable, simply run:
$ cis info <filenames> -v $var1 $var2 $var3
where $var1
, $var2
and $var3
are the names of the variables to get the information for.
- Other options available include:
--product
which allows the user to override the default product for the files, and--type
which allows the user to list onlySD
orVD
variables from an HDF file, the default isAll
Here is an example:
Ungridded data: SO4 / (ug m-3)
Shape = (6478,)
Total number of points = 6478
Number of non-masked points = 6478
Long name = Sulphate
Standard name = SO4
Units = ug m-3
Missing value = -9999
Range = (-0.57346399999999997, 7.0020300000000004)
History =
Coordinates:
time
Long name = Starting time
Standard name = time
Units = days since 1600-01-01 00:00:00
Calendar = gregorian
Missing value = -9999
Range = ('2008-07-10 02:04:35', '2008-07-20 09:50:33')
History =
latitude
Long name = Latitude
Standard name = latitude
Units = N degree
Missing value = -9999
Range = (4.0211802, 7.14886)
History =
longitude
Long name = Longitude
Standard name = longitude
Units = E degree
Missing value = -9999
Range = (114.439, 119.733)
History =
altitude
Long name = Altitude
Standard name = altitude
Units = m
Missing value = -9999
Range = (51.164299, 6532.6401)
History =
Subsetting¶
Subsetting allows the reduction of data by extracting variables and restricting them to ranges of one or more coordinates.
To perform subsetting, run a command of the format:
$ cis subset <datagroup> <limits> [-o <outputfile>]
where:
<datagroup>
is a CIS datagroup specifying the variables and files to read and is of the format
<variable>...:<filename>[:product=<productname>]
where:variable
is a mandatory variable or list of variables to use.filenames
is a mandatory file or list of files to read from.product
is an optional CIS data product to use (see Data Products):
See Datagroups for a more detailed explanation of datagroups.
<limits>
is a comma separated sequence of one or more coordinate range assignments of the form
variable=[start,end]
orvariable=[value]
in whichvariable
is the name of the variable to be subsetted, this can be the variable name (as it is in the data file) or it’s CF standard name. It is also possible to use axes name shorthands such asx
,y
,t
,z
andp
- which usually refer to longitude, latitude, time, altitude and pressure respectively. However this approach can lead to confusion as these shorthands can be overridden by the files themselves, or the data readers, and may not always behave as expected. For example when specifying ‘z’ for a gridded hybrid pressure file, this may well refer to sigma levels rather than altitude, and ‘p’ may not be found at all (it isn’t possible to subset over hybrid coordinates). For this reason it is often safer to use variable names explicitly.start
is the value at the start of the coordinate range to be includedend
is the value at the end of the coordinate range to be includedvalue
is taken as the start and end value.
Note
Longitude coordinates are considered to be circular, so that -10 is equivalent to 350. The start and end must describe a monotonically increasing coordinate range, so
x=[90,-90]
is invalid, but could be specified usingx=[90,270]
. The range between the start and end must not be greater than 360 degrees. The output coordinates will be on the requested grid, not the grid of the source data.Note
Date/times are specified in the format:
YYYY-MM-DDThh:mm:ss
in whichYYYY-MM-DD
is a date andhh:mm:ss
is a time. A colon or space can be used instead of the ‘T’ separator (but if a space is used, the argument must be quoted). Any trailing components of the date/time may be omitted. When a date/time is used as a range start, the earliest date/time compatible with the supplied components is used (e.g.,2010-04
is treated as2010-04-01T00:00:00
) and when used as a range end, the latest compatible date/time is used. Including optional and alternative components, the syntax isYYYY[-MM[-DD[{T|:| }hh[:mm[:ss]]]]]
. When thet=[value]
form is used, value is interpreted as both the start and end value, as described above, giving a range spanning the specified date/time, e.g.,t=[2010]
gives a range spanning the whole of the year 2010.outputfile
- is an optional argument to specify the name to use for the file output. This is automatically given a
.nc
extension and prepended withcis-`, if it contains ungridded data, to make it distinguishable as a collocated file. The default filename is ``cis-out.nc
for ungridded data, andout.nc
for gridded data.
A full example would be:
$ cis subset solar_3:xglnwa.pm.k8dec-k9nov.col.tm.nc longitude=[0,180],latitude=[0,90] -o Xglnwa-solar_3
Gridded netCDF data is output as gridded data, while ungridded and non-netCDF gridded data is output as ungridded data.
Examples¶
Below are examples of subsetting using each of the supported products (together with a command to plot the output):
$ cis subset AO2CO2:RF04.20090114.192600_035100.PNI.nc time=[2009-01-14:19:26:00,2009-01-14:19:36:00] -o RF04-AO2CO2-out
$ cis plot AO2CO2:cis-RF04-AO2CO2-out.nc
$ cis subset IO_RVOD_ice_water_content:2007180125457_06221_CS_2B-CWC-RVOD_GRANULE_P_R04_E02.hdf t=[2007-06-29:13:00,2007-06-29:13:30] -o CloudSAT-out
$ cis plot IO_RVOD_ice_water_content:cis-CloudSAT-out.nc --xaxis=time --yaxis=altitude
$ cis subset Cloud_Top_Temperature:MYD06_L2.A2011100.1720.051.2011102130126.hdf x=[-50,-40],y=[0,10] -o MODIS_L2-out
$ cis plot Cloud_Top_Temperature:cis-MODIS_L2-out.nc
$ cis subset cwp:20080620072500-ESACCI-L2_CLOUD-CLD_PRODUCTS-MODIS-AQUA-fv1.0.nc x=[85,90],y=[-3,3] -o Cloud_CCI-out
$ cis plot atmosphere_mass_content_of_cloud_liquid_water:cis-Cloud_CCI-out.nc
$ cis subset AOD870:20080612093821-ESACCI-L2P_AEROSOL-ALL-AATSR_ENVISAT-ORAC_32855-fv02.02.nc x=[-5,20],y=[15,25] -o Aerosol_CCI-out
$ cis plot atmosphere_optical_thickness_due_to_aerosol:cis-Aerosol_CCI-out.nc
$ cis subset 440675Angstrom:920801_121229_Abracos_Hill.lev20 t=[2002] -o Aeronet-out
$ cis plot 440675Angstrom:cis-Aeronet-out.nc --xaxis=time --yaxis=440675Angstrom
$ cis subset solar_3:xglnwa.pm.k8dec-k9nov.vprof.tm.nc y=[0,90] -o Xglnwa_vprof-out
$ cis plot solar_3:Xglnwa_vprof-out.nc
$ cis subset solar_3:xglnwa.pm.k8dec-k9nov.col.tm.nc x=[0,180],y=[0,90] -o Xglnwa-out
$ cis plot solar_3:Xglnwa-out.nc
$ cis subset Cloud_Top_Temperature_Mean_Mean:MOD08_E3.A2010009.005.2010026072315.hdf x=[0,179.9],y=[0,90] -o MODIS_L3-out
$ cis plot Cloud_Top_Temperature_Mean_Mean:cis-MODIS_L3-out.nc
The files used above can be found at:
/group_workspaces/jasmin/cis/jasmin_cis_repo_test_files/
2007180125457_06221_CS_2B-CWC-RVOD_GRANULE_P_R04_E02.hdf
20080612093821-ESACCI-L2P_AEROSOL-ALL-AATSR_ENVISAT-ORAC_32855-fv02.02.nc
20080620072500-ESACCI-L2_CLOUD-CLD_PRODUCTS-MODIS-AQUA-fv1.0.nc
MOD08_E3.A2010009.005.2010026072315.hdf
MYD06_L2.A2011100.1720.051.2011102130126.hdf
RF04.20090114.192600_035100.PNI.nc
xglnwa.pm.k8dec-k9nov.col.tm.nc
xglnwa.pm.k8dec-k9nov.vprof.tm.nc
/group_workspaces/jasmin/cis/data/aeoronet/AOT/LEV20/ALL_POINTS/
920801_121229_Abracos_Hill.lev20
Aggregation¶
The Community Intercomparison Suite (CIS) has the ability to aggregate both gridded and ungridded data along one or more coordinates. For example, you might aggregate a dataset over the longitude coordinate to produce an averaged measurement of variation over latitude.
CIS supports ‘complete collapse’ of a coordinate - where all values in that dimension are aggregated so that the coordinate no longer exists - and ‘partial collapse’ - where a coordinate is aggregated into bins of fixed size, so that the coordinate still exists but is on a coarser grid. Partial collapse is currently only supported for ungridded data. The output of an aggregation is always a CF compliant gridded NetCDF file.
The aggregation command has the following syntax:
$ cis aggregate <datagroup>[:options] <grid> [-o <outputfile>]
where:
<datagroup>
is a CIS datagroup specifying the variables and files to read and is of the format
<variable>...:<filename>[:product=<productname>]
where:<variable>
is a mandatory variable or list of variables to use.<filenames>
is a mandatory file or list of files to read from.<productname>
is an optional CIS data product to use (see Data Products):
See Datagroups for a more detailed explanation of datagroups.
<options>
Optional arguments given as
keyword=value
in a comma separated list. Options are:kernel=<kernel>
- the method by which the value in each aggregation cell is determined.<kernel>
should be one of:mean
- use the mean value of all the data points in that aggregation cell. For gridded data, this mean is weighted to take into account differing cell areas due to the projection of lat/lon lines on the Earth.min
- use the lowest valid value of all the data points in that aggregate cell.max
- use the highest valid value of all the data points in that aggregate cell.moments
- In addition to returning the mean value of each cell (weighted where applicable), this kernel also outputs the number of points used to calculate that mean and the standard deviation of those values, each as a separate variable in the output file.
If not specified the default is
moments
.product=<productname>
is an optional argument used to specify the type of files being read. If omitted, CIS will attempt to figure out which product to use based on the filename. See Reading to see a list of available product names and their file signatures.
<grid>
This mandatory argument specifies the coordinates to aggregate over and whether they should be completely collapsed or aggregated into bins. Multiple coordinates can be aggregated over, in which case they should be separated by commas. Coordinates may be identified using their variable names (e.g.
latitude
), standard names, or using the axes shorthands:x
,y
,t
,z
andp
which refer to longitude, latitude, time, altitude and pressure respectively.Note
The axes shorthands are used throughout the examples here, but should be used with care, as the expected coordinate may not always be chosen. For example when specifying ‘z’ for a gridded hybrid height file, this may well refer to model level number rather than altitude. For this reason it is often safer to use variable names explicitly.
Complete collapse - To perform a complete collapse of a coordinate, simply provide the name of the coordinate(s) as a comma separated list - e.g.
x,y
will aggregate data completely over both latitude and longitude. For ungridded data this will result in length one coordinates with bounds reflecting the maximum and minimum values of the collapsed coordinate.Partial collapse - To aggregate a coordinate into bins, specify the start, end and step size of those bins in the form
coordinate=[start,end,step]
. The step may be missed out, in which case the bin will span the whole range given. Partial collapse is currently only supported for ungridded data.Longitude coordinates are considered to be circular, so that -10 is equivalent to 350. The start and end must describe a monotonically increasing coordinate range, so
x=[90,-90,10]
is invalid, but could be specified usingx=[90,270,10]
. The range between the start and end must not be greater than 360 degrees.
Complete and partial collapses may be mixed where applicable - for example, to completely collapse time and to aggregate latitude on a grid from -45 degrees to 45 degrees, using a step size of 10 degrees:
t,y=[-45,45,10]
Note
For ungridded data, if a coordinate is left unspecified it is collapsed completely. This is in contrast to gridded data where a coordinate left unspecified is not used in the aggregation at all.
Note
The range specified is the very start and end of the grid, the actual midpoints of the aggregation cells will start at
start + delta/2
.Date/times:
Date/times are specified in the format:
YYYY-MM-DDThh:mm:ss
in whichYYYY-MM-DD
is a date andhh:mm:ss
is a time. A colon or space can be used instead of the ‘T’ separator (but if a space is used, the argument must be quoted). Any trailing components of the date/time may be omitted. When a date/time is used as a range start, the earliest date/time compatible with the supplied components is used (e.g.,2010-04
is treated as2010-04-01T00:00:00
) and when used as a range end, the latest compatible date/time is used. Including optional and alternative components, the syntax isYYYY[-MM[-DD[{T|:| }hh[:mm[:ss]]]]]
.Date/time steps are specified in the ISO 8061 format
PnYnMnDTnHnMnS
, where any particular time period is optional, for exampleP1MT30M
would specify a time interval of 1 month and 30 minutes. Years and months are treated as calendar years and months, meaning they are not necessarily fixed in length. For example a date interval of 1 year and 1 month would mean going from 12:00 15th April 2013 to 12:00 15th May 2013. The are two exceptions to this, in rare cases such as starting at 30th January and going forward 1 month, the month is instead treated as a period of 28 days. Also, for the purposes of finding midpoints for the start in a month the month is always treated as 30 days. For example, to start on the 3rd November 2011 at 12:00 and aggregate over each month up to 3rd January 2013 at 12:00:t=[2011-11-03T12:00,2013-01,P1M]
<outputfile>
- is an optional argument to specify the name to use for the file output. This is automatically given a
.nc
extension if not present. This must not be the same file path as any of the input files. If not supplied, the default filename isout.nc
.
A full example would be:
$ cis aggregate rsutcs:rsutcs_Amon_HadGEM2-A_sstClim_r1i1p1_*.nc:product=NetCDF_Gridded,kernel=mean t,y=[-90,90,20],x -o rsutcs-mean
Conditional Aggregation¶
Sometimes you may want to perform an aggregation over all the points that meet a certain criteria - for example, aggregating satellite data only where the cloud cover fraction is below a certain threshold. This is possible by performing a CIS evaluation on your data first - see Using Evaluation for Conditional Aggregation
Aggregation Examples¶
Ungridded aggregation¶
Aircraft Track¶
Original data:
$ cis plot TT_A:RF04.20090114.192600_035100.PNI.nc --xmin -180 --xmax -120 --ymin 0 --ymax 90
Aggregating onto a coarse grid:
$ cis aggregate TT_A:RF04.20090114.192600_035100.PNI.nc x=[-180,-120,3],y=[0,90,3] -o NCAR_RAF-1
$ cis plot TT_A:NCAR_RAF-1.nc
Aggregating onto a fine grid:
$ cis aggregate TT_A:RF04.20090114.192600_035100.PNI.nc x=[180,240,0.3],y=[0,90,0.3] -o NCAR_RAF-2
$ cis plot TT_A:NCAR_RAF-2.nc
Aggregating with altitude and time:
$ cis aggregate TT_A:RF04.20090114.192600_035100.PNI.nc t=[2009-01-14T19:30,2009-01-15T03:45,30M],z=[0,15000,1000] -o NCAR_RAF-3
$ cis plot TT_A:NCAR_RAF-3.nc --xaxis time --yaxis altitude
Aggregating with altitude and pressure:
$ cis aggregate TT_A:RF04.20090114.192600_035100.PNI.nc p=[100,1100,20],z=[0,15000,500] -o NCAR_RAF-4
$ cis plot TT_A:NCAR_RAF-4.nc --xaxis altitude --yaxis air_pressure --logy
MODIS L3 Data¶
Original data:
$ cis plot Cloud_Top_Temperature_Mean_Mean:MOD08_E3.A2010009.005.2010026072315.hdf
Aggregating with a mean kernel:
$ cis aggregate Cloud_Top_Temperature_Mean_Mean:MOD08_E3.A2010009.005.2010026072315.hdf x=[-180,180,10],y=[-90,90,10] -o cloud-mean
$ cis plot Cloud_Top_Temperature_Mean_Mean:cloud-mean.nc
Aggregating with the standard deviation kernel:
$ cis aggregate Cloud_Top_Temperature_Mean_Mean:MOD08_E3.A2010009.005.2010026072315.hdf:kernel=stddev x=[-180,180,10],y=[-90,90,10] -o cloud-stddev
$ cis plot Cloud_Top_Temperature_Mean_Mean:cloud-stddev.nc &
Aggregating with the maximum kernel:
$ cis aggregate Cloud_Top_Temperature_Mean_Mean:MOD08_E3.A2010009.005.2010026072315.hdf:kernel=max x=[-180,180,10],y=[-90,90,10] -o cloud-max
$ cis plot Cloud_Top_Temperature_Mean_Mean:cloud-max.nc
Aggregating with the minimum kernel:
$ cis aggregate Cloud_Top_Temperature_Mean_Mean:MOD08_E3.A2010009.005.2010026072315.hdf:kernel=min x=[-180,180,10],y=[-90,90,10] -o cloud-min
$ cis plot Cloud_Top_Temperature_Mean_Mean:cloud-min.nc
Gridded aggregation¶
Aggregating 3D model data over time and longitude to produce an averaged measure of variation with latitude:
$ cis aggregate rsutcs:rsutcs_Amon_HadGEM2-A_sstClim_r1i1p1_185912-188911.nc:kernel=mean t,x -o agg-out.nc
$ cis plot rsutcs:agg-out.nc --xaxis latitude --yaxis rsutcs -o gridded_collapse.png
This file can be found in:
/group_workspaces/jasmin/cis/data/CMIP5
Collocation¶
One of the key features of the Community Intercomparison Suite (CIS) is the ability to collocate one or more arbitrary data sets onto a common set of coordinates. This page briefly describes how to perform collocation in a number of scenarios.
To perform collocation, run a command of the format:
$ cis col <datagroup> <samplegroup> -o <outputfile>
where:
<datagroup>
is a CIS datagroup specifying the variables and files to read and is of the format
<variable>...:<filename>[:product=<productname>]
where:<variable>
is a mandatory variable or list of variables to use.<filenames>
is a mandatory file or list of files to read from.<productname>
is an optional CIS data product to use (see Data Products):
See Datagroups for a more detailed explanation of datagroups.
<samplegroup>
is of the format
<filename>[:<options>]
The available options are described in more detail below. They are entered in a comma separated list, such asvariable=Temperature,collocator=bin,kernel=mean
. Not all combinations of collocator and data are available; see Available Collocators.<filename>
is a single filename with the points to collocate onto.variable
is an optional argument used to specify which variable’s coordinates to use for collocation. If a variable is specified, a missing value will be set in the output file at every point for which the sample variable has a missing value. If a variable is not specified, non-missing values will be set at all sample points unless collocation at a point does not result in a valid value.collocator
is an optional argument that specifies the collocation method. Parameters for the collocator, if any, are placed in square brackets after the collocator name, for example,collocator=box[fill_value=-999,h_sep=1km]
. If not specified, a Default Collocator is identified for your data / sample combination. The collocators available are:bin
For use only with ungridded data and gridded sample points. Data points are placed in bins corresponding to the cell bounds surrounding each grid point. The bounds are taken from the gridded data if they are defined, otherwise the mid-points between grid points are used. The binned points should then be processed by one of the kernels to give a numeric value for each bin.box
For use with gridded and ungridded sample points and data. A search region is defined by the parameters and points within the defined separation of each sample point are associated with the point. The points should then be processed by one of the kernels to give a numeric value for each bin. The parameters defining the search box are:h_sep
- the horizontal separation. The units can be specified as km or m (for exampleh_sep=1.5km
); if none are specified then the default is km.a_sep
- the altitude separation. The units can be specified as km or m, as for h_sep; if none are specified then the default is m.p_sep
- the pressure separation. This is not an absolute separation as for h_sep and a_sep, but a relative one, so is specified as a ratio. For example a constraint of p_sep = 2, for a point at 10 hPa, would cover the range 5 hPa < points < 20 hPa. Note that p_sep >= 1.t_sep
- the time separation. This can be specified in years, months, days, hours, minutes or seconds usingPnYnMnDTnHnMnS
(the T separator can be replaced with a colon or a space, but if using a space quotes are required). For example to specify a time separation of one and a half months and thirty minutes you could uset_sep=P1M15DT30M
. It is worth noting that the units for time comparison are fractional days, so that years are converted to the number of days in a Gregorian year, and months are 1/12th of a Gregorian year.
If
h_sep
is specified, a k-d tree index based on longitudes and latitudes of data points is used to speed up the search for points. It h_sep is not specified, an exhaustive search is performed for points satisfying the other separation constraints.lin
For use with gridded source data only. A value is calculated by linear interpolation for each sample point. The extrapolation mode can be controlled with theextrapolate
keyword. The default mode is not to extrapolate values for sample points outside of the gridded data source (masking them in the output instead). Settingextrapolate=True
will override this and instruct the kernel to extrapolate these values outside of the data source instead.Sometimes it can be useful to use a different kernel in the vertical direction, for example when collocating ship data you may want to linearly interpolate the data points horizontaly and in time, but just take the nearest vertical value. Set the
nn_vertical
keyword toTrue
to set the vertical interpolation to nearest neighbour rather than linear intepolation. Note, this will only work when the vertical coordinates of the source data are hybrid height or hybrid pressure.nn
For use with gridded source data only. The data point closest to each sample point is found, and the data value is set at the sample point.dummy
For use with ungridded data only. Returns the source data as the collocated data irrespective of the sample points. This might be useful if variables from the original sample file are wanted in the output file but are already on the correct sample points.
Collocators have the following general optional parameters, which can be used in addition to any specific ones listed above:
fill_value
- The numerical value to apply to the collocated point if there are no points which satisfy the constraint.var_name
- Specifies the name of the variable in the resulting NetCDF file.var_long_name
- Specifies the variable’s long name.var_units
- Specifies the variable’s units.
kernel
is used to specify the kernel to use for collocation methods that create an intermediate set of points for further processing, that is box and bin. The default kernel for box and bin is moments. The built-in kernel methods currently available are:moments
- Default. This is an averaging kernel that returns the mean, standard deviation and the number of points remaining after the specified constraint has been applied. This can be used for gridded or ungridded sample points where the collocator is one of ‘bin’ or ‘box’. The names of the variables in the output file are the name of the input variable with a suffix to identify which quantity they represent:- Mean - no suffix - the mean value of all data points which were mapped to that sample grid point (data points with missing values are excluded)
- Standard Deviation - suffix:
_std_dev
- The corrected sample standard deviation (i.e. 1 degree of freedom) of all the data points mapped to that sample grid point (data points with missing values are excluded) - Number of points - suffix:
_num_points
- The number of data points mapped to that sample grid point (data points with missing values are excluded)
mean
- an averaging kernel that returns the mean values of any points found by the collocation methodnn_t
(ornn_time
) - nearest neighbour in time algorithmnn_h
(ornn_horizontal
) - nearest neighbour in horizontal distancenn_a
(ornn_altitude
) - nearest neighbour in altitudenn_p
(ornn_pressure
) - nearest neighbour in pressure (as in a vertical coordinate). Note that similarly to thep_sep
constraint that this works on the ratio of pressure, so the nearest neighbour to a point with a value of 10 hPa, out of a choice of 5 hPa and 19 hPa, would be 19 hPa, as 19/10 < 10/5.
product
is an optional argument used to specify the type of files being read. If omitted, the program will attempt to determine which product to use based on the filename, as listed at Reading.
<outputfile>
- is an optional argument specifying the file to output to. This will be automatically given a
.nc
extension if not present and if the output is ungridded, will be prepended withcis-
to identify it as a CIS output file. This must not be the same file path as any of the input files. If not provided, the default output filename is out.nc
A full example would be:
$ cis col rain:"my_data_??.*" my_sample_file:collocator=box[h_sep=50km,t_sep=6000S],kernel=nn_t -o my_col
Warning
When collocating two data sets with different spatio-temporal domains, the sampling points should be within the spatio-temporal domain of the source data. Otherwise, depending on the collocation options selected, strange artifacts can occur, particularly with linear interpolation. Spatio-temporal domains can be reduced in CIS with Aggregation or Subsetting.
Available Collocators and Kernels¶
Collocation type | |||
---|---|---|---|
( data -> sample) | Available Collocators | Default Collocator | Default Kernel |
Gridded -> gridded | lin , nn , box |
lin |
None |
Ungridded -> gridded | bin , box |
bin |
moments |
Gridded -> ungridded | nn , lin |
nn |
None |
Ungridded -> ungridded | box |
box |
moments |
Collocation output files¶
All ungridded collocation output files are prefixed with cis-
and both ungridded and gridded data files are suffixed with .nc
(so there is no need to specify the extension in the output parameter). This is to ensure the cis data product is always used to read collocated ungridded data.
It is worth noting that in the process of collocation all of the data and sample points are represented as 1-d lists, so any structural information about the input files is lost. This is done to ensure consistency in the collocation output. This means, however, that input files which may have been plotable as, for example, a heatmap may not be after collocation. In this situation plotting the data as a scatter plot will yield the required results.
Each collocated output variable has a history attributed created (or appended to) which contains all of the parameters and file names which went into creating it. An example might be:
double mass_fraction_of_cloud_liquid_water_in_air(pixel_number) ;
...
mass_fraction_of_cloud_liquid_water_in_air:history = "Collocated onto sampling from: [\'/test/test_files/RF04.20090114.192600_035100.PNI.nc\'] using CIS version V0R4M4\n",
"variable: mass_fraction_of_cloud_liquid_water_in_air\n",
"with files: [\'/test/test_files/xenida.pah9440.nc\']\n",
"using collocator: DifferenceCollocator\n",
"collocator parameters: {}\n",
"constraint method: None\n",
"constraint parameters: None\n",
"kernel: None\n",
"kernel parameters: None" ;
mass_fraction_of_cloud_liquid_water_in_air:shape = 30301 ;
double difference(pixel_number) ;
...
Writing your own plugins¶
The collocation framework was designed to make it easy to write your own plugins. Plugins can be written to create new kernels, new constraint methods and even whole collocation methods. See the analysis plugin development section for more details.
Collocation Examples¶
Ungridded to Ungridded Collocation Examples¶
Ungridded data with vertical component¶
First subset two Caliop data files:
$ cis subset Temperature:CAL_LID_L2_05kmAPro-Prov-V3-01.2009-12-31T23-36-08ZN.hdf x=[170,180],y=[60,80],z=[28000,29000],p=[13,15] -o 2009
$ cis subset Temperature:CAL_LID_L2_05kmAPro-Prov-V3-01.2010-01-01T00-22-28ZD.hdf x=[170,180],y=[60,80],z=[28000,29000],p=[12,13.62] -o 2010
Results of subset can be plotted with:
$ cis plot Temperature:cis-2009.nc --itemwidth 25 --xaxis time --yaxis air_pressure
$ cis plot Temperature:cis-2010.nc --itemwidth 25 --xaxis time --yaxis air_pressure
Then collocate data, and plot output:
$ cis col Temperature:cis-2010.nc cis-2009.nc:collocator=box[p_sep=1.1],kernel=nn_p
$ cis plot Temperature:cis-out.nc --itemwidth 25 --xaxis time --yaxis air_pressure
The output for the two subset data files, and the collocated data should look like:



File Locations¶
The files used above can be found at:
/group_workspaces/jasmin/cis/data/caliop/CAL-LID-L2-05km-APro
Ungridded data collocation using k-D tree indexing¶
These examples show the syntax for using the k-D tree optimisation of the separation constraint. The indexing is only by horizontal position.
Nearest-Neighbour Kernel¶
The first example is of Aerosol CCI data on to the points of a MODIS L3 file (which is an ungridded data file but with points lying on a grid).
Subset to a relevant region:
$ cis subset AOD550:20080612093821-ESACCI-L2P_AEROSOL-ALL-AATSR_ENVISAT-ORAC_32855-fv02.02.nc x=[-6,0],y=[20,30] -o AOD550n_3
$ cis subset Cloud_Top_Temperature_Mean_Mean:MOD08_E3.A2010009.005.2010026072315.hdf x=[-6,0],y=[20,30] -o MOD08n_3
The results of subsetting can be plotted with:
$ cis plot AOD550:cis-AOD550n_3.nc --itemwidth 10
$ cis plot Cloud_Top_Temperature_Mean_Mean:cis-MOD08n_3.nc --itemwidth 20
These should look like:


To collocate with the nearest-neighbour kernel use:
$ cis col Cloud_Top_Temperature_Mean_Mean:cis-MOD08n_3.nc cis-AOD550n_3.nc:collocator=box[h_sep=150],kernel=nn_h -o MOD08_on_AOD550_nn_kdt
This can be plotted with:
$ cis plot Cloud_Top_Temperature_Mean_Mean:cis-MOD08_on_AOD550_nn_kdt.nc --itemwidth 10
The sample points are more closely spaced than the data points, hence a patchwork effect is produced.

Collocating the full Aerosol CCI file on to the MODIS L3 with:
$ cis col AOD550:20080612093821-ESACCI-L2P_AEROSOL-ALL-AATSR_ENVISAT-ORAC_32855-fv02.02.nc MOD08_E3.A2010009.005.2010026072315.hdf:variable=Cloud_Top_Temperature_Mean_Mean,collocator=box[h_sep=150],kernel=nn_h -o AOD550_on_MOD08_kdt_nn_full
gives the following result

Mean Kernel¶
This example is similar to the first nearest-neighbour collocation above:
$ cis col Cloud_Top_Temperature_Mean_Mean:cis-MOD08n_3.nc cis-AOD550n_3.nc:collocator=box[h_sep=75],kernel=mean -o MOD08_on_AOD550_hsep_75km
Plotting this again gives a granular result:
$ cis plot Cloud_Top_Temperature_Mean_Mean:cis-MOD08_on_AOD550_hsep_75km.nc --itemwidth 10

This example collocates the Aerosol CCI data on to the MODIS L3 grid:
$ cis col AOD550:20080612093821-ESACCI-L2P_AEROSOL-ALL-AATSR_ENVISAT-ORAC_32855-fv02.02.nc MOD08_E3.A2010009.005.2010026072315.hdf:variable=Cloud_Top_Temperature_Mean_Mean,collocator=box[h_sep=50,fill_value=-999],kernel=mean -o AOD550_on_MOD08_kdt_hsep_50km_full
This can be plotted as follows, with the full image and zoomed into a representative section show below:
$ cis plot AOD550:cis-AOD550_on_MOD08_kdt_hsep_50km_full.nc --itemwidth 50


The reverse collocation can be performed with this command (taking about 7 minutes):
$ cis col Cloud_Top_Temperature_Mean_Mean:MOD08_E3.A2010009.005.2010026072315.hdf 20080612093821-ESACCI-L2P_AEROSOL-ALL-AATSR_ENVISAT-ORAC_32855-fv02.02.nc:variable=AOD550,collocator=box[h_sep=100,fill_value=-999],kernel=mean -o MOD08_on_AOD550_kdt_hsep_100km_var_full
Plotting it with this command gives the result below:
$ cis plot Cloud_Top_Temperature_Mean_Mean:cis-MOD08_on_AOD550_kdt_hsep_100km_var_full.nc

Omitting the variable option in the sample group gives collocated values over a full satellite track (taking about 30 minutes):
$ cis col Cloud_Top_Temperature_Mean_Mean:MOD08_E3.A2010009.005.2010026072315.hdf 20080612093821-ESACCI-L2P_AEROSOL-ALL-AATSR_ENVISAT-ORAC_32855-fv02.02.nc:collocator=box[h_sep=100,fill_value=-999],kernel=mean -o MOD08_on_AOD550_kdt_hsep_100km_full
Plotting it with this command gives the result below:
$ cis plot Cloud_Top_Temperature_Mean_Mean:cis-MOD08_on_AOD550_kdt_hsep_100km_full.nc

File Locations¶
The files used above can be found at:
/group_workspaces/jasmin/cis/jasmin_cis_repo_test_files/
20080612093821-ESACCI-L2P_AEROSOL-ALL-AATSR_ENVISAT-ORAC_32855-fv02.02.nc
MOD08_E3.A2010009.005.2010026072315.hdf
Examples of collocation of ungridded data on to gridded¶
Simple Example of Aerosol CCI Data on to a 4x4 Grid¶
This is a trivial example that collocates on to a 4x4 spatial grid at a single time:
$ cis subset tas:tas_day_HadGEM2-ES_rcp45_r1i1p1_20051201-20151130.nc x=[0,2],y=[24,26],t=[2008-06-12T1,2008-06-12] -o tas_day_HadGEM2-ES_rcp45_r1i1p1_20051201-20151130.nc -o tas_1
$ cis subset AOD550:20080612093821-ESACCI-L2P_AEROSOL-ALL-AATSR_ENVISAT-ORAC_32855-fv02.02.nc x=[0,2],y=[24,26] -o AOD550n_1
$ cis col AOD550:cis-AOD550n_1.nc tas_1.nc:collocator=bin[fill_value=-9999.0],kernel=mean -o AOD550_on_tas_1
$ cis plot AOD550:AOD550_on_tas_1.nc
Note that for ungridded gridded collocation, and the collocator must be one bin or box and a kernel such as “mean” must be used.
The plotted image looks like:

Aerosol CCI with Three Time Steps¶
This example involves collocation on to a grid with three time steps. The ungridded data all has times within the middle step, so the output has missing values for all grid points with the time equal to the first or third value. This can be seen using ncdump:
$ cis subset tas:tas_day_HadGEM2-ES_rcp45_r1i1p1_20051201-20151130.nc x=[-6,-.0001],y=[20,30],t=[2008-06-11T1,2008-06-13] -o tas_3day
$ cis subset AOD550:20080612093821-ESACCI-L2P_AEROSOL-ALL-AATSR_ENVISAT-ORAC_32855-fv02.02.nc x=[-6,0],y=[20,30] -o AOD550n_3
$ cis col AOD550:cis-AOD550n_3.nc tas_3day.nc:collocator=bin[fill_value=-9999.0],kernel=mean -o AOD550_on_tas_3day
$ ncdump AOD550_on_tas_3day.nc |less
Aerosol CCI with One Time Step¶
This is as above but subsetting the grid to one time step so that the output can be plotted directly:
$ cis subset tas:tas_day_HadGEM2-ES_rcp45_r1i1p1_20051201-20151130.nc t=[2008-06-12T1,2008-06-12] -o tas_2008-06-12
$ cis col AOD550:20080612093821-ESACCI-L2P_AEROSOL-ALL-AATSR_ENVISAT-ORAC_32855-fv02.02.nc tas_2008-06-12.nc:collocator=bin[fill_value=-9999.0],kernel=mean -o AOD550_on_tas_1day
$ cis plot AOD550:AOD550_on_tas_1day.nc
$ cis plot AOD550:20080612093821-ESACCI-L2P_AEROSOL-ALL-AATSR_ENVISAT-ORAC_32855-fv02.02.nc
$ cis plot tas:tas_2008-06-12.nc
These are the plots before and after collocation:


Example with NCAR RAF Data¶
This example uses the data in RF04.20090114.192600_035100.PNI.nc. However, this file does not have standard_name or units accepted as valid by Iris. These were modified using ncdump and ncgen, giving RF04_fixed_AO2CO2.nc:
$ cis subset tas:tas_day_HadGEM2-ES_rcp45_r1i1p1_20051201-20151130.nc t=[2009-01-14T1,2009-01-14] -o tas_2009-01-14
$ cis col AO2CO2:RF04_fixed_AO2CO2.nc tas_2009-01-14.nc:collocator=bin[fill_value=-9999.0],kernel=mean -o RF04_on_tas
$ cis plot AO2CO2:RF04_on_tas.nc:product=NetCDF_Gridded
These are the plots before and after collocation:


Cloud CCI with One Time Step¶
This is analogous to the Aerosol CCI example:
$ cis subset tas:tas_day_HadGEM2-ES_rcp45_r1i1p1_20051201-20151130.nc t=[2008-06-20T1,2008-06-20] -o tas_2008-06-20
$ cis col cwp:20080620072500-ESACCI-L2_CLOUD-CLD_PRODUCTS-MODIS-AQUA-fv1.0.nc tas_2008-06-20.nc:collocator=bin[fill_value=-9999.0],kernel=mean -o Cloud_CCI_on_tas
$ cis plot cwp:Cloud_CCI_on_tas.nc
$ cis plot cwp:20080620072500-ESACCI-L2_CLOUD-CLD_PRODUCTS-MODIS-AQUA-fv1.0.nc
These are the plots before and after collocation:


File Locations¶
The files used above can be found at:
/group_workspaces/jasmin/cis/jasmin_cis_repo_test_files/
20080612093821-ESACCI-L2P_AEROSOL-ALL-AATSR_ENVISAT-ORAC_32855-fv02.02.nc
20080620072500-ESACCI-L2_CLOUD-CLD_PRODUCTS-MODIS-AQUA-fv1.0.nc
RF04.20090114.192600_035100.PNI.nc
/group_workspaces/jasmin/cis/example_data/
RF04_fixed_AO2CO2.nc
/group_workspaces/jasmin/cis/gridded-test-data/cmip5.output1.MOHC.HadGEM2-ES.rcp45.day.atmos.day.r1i1p1.v20111128/
tas_day_HadGEM2-ES_rcp45_r1i1p1_20051201-20151130.nc
Examples of Gridded to Gridded Collocation¶
Example of Gridded Data onto a Finer Grid¶
First to show original data subset to a single time slice:
$ cis subset rsutcs:rsutcs_Amon_HadGEM2-A_sstClim_r1i1p1_185912-188911.nc t=[1859-12-12] -o sub1
Plot for subset data:
$ cis plot rsutcs:sub1.nc
Collocate onto a finer grid, which was created using nearest neighbour:
$ cis col rsutcs:rsutcs_Amon_HadGEM2-A_sstClim_r1i1p1_185912-188911.nc dummy_high_res_cube_-180_180.nc:collocator=nn -o 2
$ cis subset rsutcs:2.nc t=[1859-12-12] -o sub2
$ cis plot rsutcs:sub2.nc
Collocate onto a finer grid, which was created using linear interpolation:
$ cis col rsutcs:rsutcs_Amon_HadGEM2-A_sstClim_r1i1p1_185912-188911.nc dummy_high_res_cube_-180_180.nc:collocator=lin -o 3
$ cis subset rsutcs:3.nc t=[1859-12-12] -o sub3
$ cis plot rsutcs:sub3.nc
Before, after nearest neighbour and after linear interpolation:



4D Gridded Data with latitude, longitude, air_pressure and time to a New Grid¶
$ cis col temp:aerocom.INCA.A2.RAD-CTRL.monthly.temp.2006-fixed.nc dummy_low_res_cube_4D.nc:collocator=lin -o 4D-col
Note the file aerocom.INCA.A2.RAD-CTRL.monthly.temp.2006-fixed.nc
has the standard name of presnivs
changed to air_pressure
, in order to be read correctly.
Slices at Different Pressures¶
$ cis subset temp:4D-col.nc t=[2006-01],z=[100000] -o sub9
$ cis plot temp:sub9.nc
$ cis subset temp:4D-col.nc t=[2006-01],z=[0] -o sub10
$ cis plot temp:sub10.nc


Pressure against time¶
$ cis subset temp:4D-col.nc x=[0],t=[2006-01] -o sub11
$ cis plot temp:sub11.nc --xaxis latitude --yaxis air_pressure
$ cis subset temp:aerocom.INCA.A2.RAD-CTRL.monthly.temp.2006-fixed.nc x=[0],t=[2006-01] -o sub12
$ cis plot temp:sub12.nc --xaxis latitude --yaxis air_pressure


File Locations¶
The files used above can be found at:
/group_workspaces/jasmin/cis/sprint_reviews/SR4-IB/gridded_col2
Plotting¶
Plotting is straightforward:
$ cis plot variable:filenames
This will attempt to locate the variable variable
in all of the specified filenames
, work out its metadata, such as units, labels, etc. and the appropriate chart type to plot, so that a line graph is used for two dimensional data, a scatter plot is used for three dimensional ungridded data and a heatmap for three dimensional gridded data. Other types of chart can be specified using the --type
option. Available types are:
line
- a simple line plot, for three dimensional data the third variable is represented by the line colour
scatter
- a scatter plot, for three dimensional data the third variable is represented by the maker
heatmap
- a heatmap especially suitable for gridded data
contour
- a standard contour plot, see contour options
contourf
- a filled contour plot, see contour options
histogram3d
histogram2d
comparativescatter
- allows two variables to be plotted against each other, specified as
cis plot variable1:filename1 variable2:filename2 --type comparativescatter
overlay
- a collection of plots overlaid on one another, see overlay plots
Note that filenames
is a non-optional argument used to specify the files to read the variable from. These can be specified as a comma separated list of the following possibilities:
- A single filename - this should be the full path to the file
- A single directory - all files in this directory will be read
- A wildcarded filename - A filename with any wildcards compatible with the python module glob, so that *, ? and [] can all be used. For example
/path/to/my/test*file_[0-9]
.
Note that when using option 2, the filenames in the directory will be automatically sorted into alphabetical order. When using option 3, the filenames matching the wildcard will also be sorted into alphabetical order. The order of the comma separated list will however remain as the user specified, e.g.:
$ cis plot $var:filename1,filename2,wildc*rd,/my/dir/,filename3
would read filename1
, then filename2
, then all the files that match wildc*rd
(in alphabetical order), then all the files in the directory /my/dir/
(in alphabetical order) and then finally filename3
.
Plot Options¶
There are a number of optional arguments, which should be entered as a comma separated list after the mandatory arguments, for example variable:filename:product=Cis,edgecolor=black
. The options are:
color
- colour of markers, e.g. for scatter plot points or contour lines, see Available Colours and Markers
cmap
- colour map to use, e.g. for contour lines or heatmap, see Available Colours and Markers
cmin
- the minimum value for the colourmap
cmax
- the maximum value for the colourmap
edgecolor
- colour of scatter marker edges (can be used to differentiate scatter markers with a colourmap from the background plot)
itemstyle
- shape of scatter marker, see Available Colours and Markers
label
- name of datagroup for the legend
product
- the data product to use for the plot
Additional datagroup options for contour plots only:
contnlevels
- the number of levels for the contour plot
contlevels
- a list of levels for the contour plot, e.g.
contlevels=[0,1,3,10]
contlabel
- options are
true
orfalse
, if true then contour labels are shown contwidth
- width of the contour lines
contfontsize
- size for labels on contour plot
Note that label
refers to the label the plot will have on the legend, for example if a multi-series line graph or scatter plot is plotted. To set the labels of the axes, use --xlabel
and --ylabel
. --cbarlabel
can be used to set the label on the colour bar.
The axes can be specified with --xaxis
and --yaxis
. Gridded data supports any coordinate axes available in the file, while ungridded data supports the following coordinate options (if available in the data):
latitude
longitude
time
altitude
air_pressure
variable
- the variable being plotted
If the product is not specified, the program will attempt to figure out which product should be used based on the filename. See What kind of data can CIS deal with? to see a list of available products and their file signatures, or run cis plot -h
.
Saving to a File¶
By default a plot will be displayed on screen. To save it to an image file instead, use the --output
option. Available output types are png, pdf, ps, eps and svg, which can be selected using the appropriate filename extension, for example --output plot.svg
.
Plot Formatting¶
There are a number of plot formatting options available:
--xlabel
- The label for the x axis
--ylabel
- The label for the y axis
--cbarlabel
- The label for the colorbar
--xtickangle
- The angle for the ticks on the x axis
--ytickangle
- The angle for the ticks on the y axis
--title
- The title of the plot
--itemwidth
- The width of an item. Unit are points in the case of a line, and points squared in the case of a scatter point
--fontsize
- The size of the font in points
--cmap
- The colour map to be used when plotting a 3D plot, see Available Colours and Markers
--height
- The height of the plot, in inches
--width
- The width of the plot, in inches
--xbinwidth
- The width of the histogram bins on the x axis
--ybinwidth
- The width of the histogram bins on the y axis
--cbarorient
- The orientation of the colour bar, either horizontal or vertical
--nocolourbar
- Hides the colour bar on a 3D plot
--grid
- Shows grid lines
--plotwidth
- width of the plot in inches
--plotheight
- height of the plot in inches
--cbarscale
- this can be used to change the size of the colourbar when plotting and defaults to 0.55 for vertical colorbars, 1.0 for horizontal.
--coastlinescolour
- The colour of the coastlines on a map, see Available Colours and Markers
--nasabluemarble
- Use the NASA Blue Marble for the background, instead of coastlines, when doing lat-lon plots
Setting Plot Ranges¶
The arguments --xmin
, --xmax
, --xstep
, --ymin
, --ymax
, --ystep
, --vmin
, --vmax
, --vstep
can be used to specify the range of values to plot, where x and y correspond to the axes and v corresponds to the colours.
When the arguments refer to dates or times, they should be in the format YYYY-MM-DDThh:mm:ss
, where the time is optional. A colon or a space is also a valid date and time separator (if using a space quotes are necessary).
The step
arguments are used to specify the tick spacing on the axes and vstep
is used to specify the tick spacing on the colorbar.
When the step
arguments refer to an amount of time, they should be in the ISO 8061 format PnYnMnDTnHnMnS
, where any particular time group is optional, case does not matter, and T
can be substituted for either a colon or a space (if using a space quotes are necessary).
For example, to specify a tick spacing of one month and six seconds on the x axis, the following argument should be given:
--xstep 1m6S
Note: If a value is negative, then an equals sign must be used, e.g.
--xmin=-5
.
To plot using a log scale:
--logx
- The x axis will be plotted using a log scale of base 10
--logy
- The y axis will be plotted using a log scale of base 10
--logv
- The values (colours) will be plotted using a log scale of base 10
Overlaying Multiple Plots¶
Overlaying multiple line graphs or scatter plots is straightforward, simply use the plot command as before but specify multiple files and variables, e.g.:
$ cis plot $var1:$filename1:edgecolor=black $var2:$filename2:edgecolor=red
To plot two variables from the same file, simply use the above command with $filename1 in place of $filename2.
However, using --type overlay
allows multiple files to be specified on the command line to be plotted each with its own type, which is specified as e.g. type=heatmap
, along with the other datagroup options. Currently supported plot types are heatmap
, contour
, contourf
and scatter
. An additional datagroup option available is transparency
, which allows the transparency for a layer to be set. transparency
take a value between 0 and 1, where 0 is completely opaque and 1 fully transparent.
For example, to plot a heatmap and a contour plot the following options can be used:
cis plot var1:file1:type=heatmap var2:file2:type=contour,color=white --type overlay --plotwidth 20 --plotheight 15 --cbarscale 0.5 -o overlay.png
Note that the default plot dimensions are deduced from the first datagroup specified.
Many more examples are available in the overlay examples page.
Available Colours and Markers¶
CIS recognises any valid html colour, specified using its name e.g. red for options such as item colour (line/scatter colour) and the colour of the coast lines.
A list of available colour maps for 3D plots, such as heatmaps, scatter and contour plots, can be found here: colour maps.
For a list of available scatter point styles, see here: scatter point styles.
Evaluation¶
The Community Intercomparison Suite allows you to perform general arithmetic operations between different variables using the ‘eval’ command. For example, you might want to calculate the (relative) difference between two variables.
Note
All variables used in a evaluation must be of the same shape in order to be compatible, i.e. the same number of points in each dimension, and of the same type (Ungridded or Gridded). This means that, for example, operations between different data products are unlikely to work correctly - performing a collocation or aggregation onto a common grid would be a good pre-processing step.
Warning
This CIS command performs a Python eval() on user input. This has the potential to be a security risk and before deploying CIS to any environment where your user input is untrusted (e.g. if you want to run CIS as a web service) you must satisfy yourself that any security risks have been mitigated. CIS implements the following security restrictions on the expression which is evaluated:
- The eval() operates in a restricted namespace that only has access to a select handful of builtins
(see expr below) - so
__import__
, for example, is unavailable. - The only module available in the namespace is numpy.
- Any expression containing two consecutive underscores (
__
) is assumed to be harmful and will not be evaluated.
The evaluate syntax looks like this:
$ cis eval <datagroup>... <expr> <units> [-o [<output_var>:]<outputfile>] [--attributes <attributes>]
where square brackets denote optional commands and:
<datagroup>
is a modified CIS datagroup of the format
<variable>[=<alias>]...:<filename>[:product=<productname>]
. One or more datagroups should be given.<variable>
is a mandatory variable or list of variables to use.<alias>
is an optional alternative variable name to use in place of the name given in the file. As you will see in the expression section, the variable names given will need to be valid python variable names, which means:1. They may use only the characters [A-Z], [a-z] and numbers [0-9] provided they do not start with a number
2. The only special character which may be used is the underscore (_) - but don’t use two consecutively (see security note)
3. Don’t use any of the reserved python keywords such as
class
orand
as variable names (they’re OK if they’re only part of a name though).4. Avoid using names of python builtins like
max
orabs
(again, it’s OK if they’re only part of a name).So if the variable name in your file violates these rules (e.g. ‘550-870Angstrom’) use an alias:
550-870Angstrom=a550to870
<filename>
is a mandatory file or list of files to read from.<productname>
is an optional CIS data product to use (see Data Products):
See Datagroups for a more detailed explanation of datagroups.
<expr>
is the arithmetic expression to evaluate; for example:
variable1+variable2
. Use the following basic rules to get started:1. Use the variable names (or aliases) as given in the datagroups (they’re case-sensitive) - don’t enclose them in quotes.
2. If your expression contains whitespace, you’ll need to enclose the whole expression in single or double quotes.
3. Construct your expression using plus
+
, minus-
, times*
, divide/
, power**
(note that you can’t use^
for exponents, like you typically can in spreadsheets and some other computer languages). Parentheses()
can be used to group elements so that your expression is evaluated in the order you intend.If you need more functionality, you’re encountering errors or not getting the answer you expect then you should consider the following.
1. This expression will be evaluated in Python using the eval() method (see security note), so the expression must be a valid Python expression.
2. The only Python methods available to you are a trimmed down list of the python builtins: ‘abs’, ‘all’, ‘any’, ‘bool’, ‘cmp’, ‘divmod’, ‘enumerate’, ‘filter’, ‘int’, ‘len’, ‘map’, ‘max’, ‘min’, ‘pow’, ‘range’, ‘reduce’, ‘reversed’, ‘round’, ‘sorted’, ‘sum’, ‘xrange’, ‘zip’.
3. The numpy module is available, so you can use any of its methods e.g.
numpy.mean(variable1)
.4. For security reasons, double underscores (
__
) must not appear anywhere in the expression.5. The expression must produce an output array of the same shape as the input variables.
6. The expression is evaluated at the array level, not at the element level - so the variables in an expression represent numpy arrays, not individual numeric values. This means that
numpy.mean([var1,var2])
will give you a combined average over the whole of both arrays (i.e. a single number, not an array), which would be invalid (consider the previous rule). However, you could add the mean (over the whole array) of one variable to every point on a second variable by doingvar1 + numpy.mean(var2)
.
Note
CIS eval command will flatten ungridded data so that structure present in the input files will be ignored. This allows you to compare ungridded data with different shapes, e.g. (3,5) and (15,)
<units>
- is a mandatory argument describing the units of the resulting expression. This should be a
CF compliant
units string, e.g.
"kg m^-3"
. Where this contains spaces, the whole string should be enclosed in quotes. <outputfile>
is an optional argument specifying the file to output to. This will be automatically given a
.nc
extension if not present and if the output is ungridded, will be prepended withcis-
to identify it as a CIS output file. This must not be the same file path as any of the input files. If not provided, the default output filename is out.nc<output_var>
is an optional prefix to the output file argument to specify the name of the output variable within the output file, e.g.-o my_new_var:output_filename.nc
. If not provided, the default output variable name is calculated_variable
<attributes>
- is an optional argument allowing users to provide additional metadata to be included in the evaluation output variable.
This should be indicated by the attributes flag (
--attributes
or-a
). The attributes should then follow in comma-separated, key=value pairs, for example--attributes standard_name=convective_rainfall_amount,echam_version=6.1.00
. Whitespace is permitted in both the names and the values, but then must be enclosed in quotes:-a "operating system = "AIX 6.1 Power6"
. Colons or equals signs may not be used in attribute names or values.
Evaluation Examples¶
Comparison of annual Aerosol Optical Thickness from models¶
In this example we compare annual Aerosol Optical Thickness from ECHAM and HadGEM model data. The data used in this
example can be found at /group_workspaces/jasmin/cis/data
.
First we produce annual averages of our data by aggregating:
$ cis aggregate od550aer:ECHAM_fixed/2007_2D_3hr/od550aer.nc t -o echam-od550aer
$ cis aggregate od550aer:HadGEM_fixed/test_fix/od550aer.nc t -o hadgem-od550aer
$ cis plot od550aer:echam-od550aer.nc --xmin -180 --xmax 180 --cbarorient=horizontal --title="ECHAM AOT550" --vmin=0 --vmax=0.5
$ cis plot od550aer:hadgem-od550aer.nc --xmin -180 --xmax 180 --cbarorient=horizontal --title="HadGEM AOT550" --vmin=0 --vmax=0.5


We then linearly interpolate the HadGEM data onto the ECHAM grid:
$ cis col od550aer:hadgem-od550aer.nc echam-od550aer.nc:collocator=lin -o hadgem-od550aer-collocated
$ cis plot od550aer:hadgem-od550aer-collocated.nc --xmin -180 --xmax 180 --cbarorient=horizontal --title="HadGEM AOT550" --vmin=0 --vmax=0.5

Next we subtract the two fields using:
$ cis eval od550aer=a:echam-od550aer.nc od550=b:hadgem-od550aer-collocated.nc "a-b" 1 -o modeldifference
Finally we plot the evaluated output:
$ cis plot od550aer:modeldifference.nc --xmin -180 --xmax 180 --cbarorient=horizontal --title="ECHAM-HadGEM difference AOT550" --v min=-0.25 --vmax=0.2

Calculation of Angstrom exponent for AERONET data¶
AERONET data allows us to calculate Angstrom Exponent (AE) and then compare it against the AE already in the file. They should strongly correlate although it is not expected they will be identical due to averaging etc during production of AERONET datafiles.
The file agoufou.lev20 refers to /group_workspaces/jasmin/cis/data/aeronet/AOT/LEV20/ALL_POINTS/920801_121229_Agoufou.lev20
The AE is calculated using an eval statement:
$ cis eval AOT_440,AOT_870:agoufou.lev20 "(-1)* (numpy.log(AOT_870/AOT_440)/numpy.log(870./440.))" 1 -o alfa
Plotting it shows the expected correlation:
$ cis plot 440-870Angstrom:agoufou.lev20 calculated_variable:cis-alfa.nc --type comparativescatter --itemwidth=10 --xlabel="AERONET 440-870Angstrom" --ylabel="AERONET (-1)*(numpy.log(AOT_870/AOT_440)/numpy.log(870./440.))"

This correlation can be confirmed by using the CIS stats command:
$ cis stats 440-870Angstrom:agoufou.lev20 calculated_variable:cis-alfa.nc
==================================
RESULTS OF STATISTICAL COMPARISON:
==================================
Number of points: 63126
Mean value of dataset 1: 0.290989032142
Mean value of dataset 2: 0.295878214327
Standard deviation for dataset 1: 0.233995525021
Standard deviation for dataset 2: 0.235381075635
Mean of absolute difference: 0.00488918218519
Standard deviation of absolute difference: 0.00546343157047
Mean of relative difference: 0.0284040419499
Standard deviation of relative difference: 3.95137224542
Spearman's rank coefficient: 0.999750939223
Linear regression gradient: 1.00566622549
Linear regression intercept: 0.003240372714
Linear regression r-value: 0.999746457079
Linear regression standard error: 0.00530006646489
Using Evaluation for Conditional Aggregation¶
The eval command can be combined with other CIS commands to allow you to perform more complex tasks than would otherwise be possible.
For example, you might want to aggregate a satellite measurement of one variable only when the corresponding cloud cover fraction (stored in separate variable) is less than a certain value. The aggregate command doesn’t allow this kind of conditional aggregation on its own, but you can use an evaluation to achieve this in two stages.
In this example we use the MODIS file MOD04_L2.A2010001.2255.005.2010005215814.hdf
in directory
/group_workspaces/jasmin/cis/data/MODIS/MOD04_L2/
. The optical depth and cloud cover variables can be seen in the
following two plots:
$ cis plot Optical_Depth_Land_And_Ocean:MOD04_L2.A2010001.2255.005.2010005215814.hdf --xmin 132 --xmax 162 --ymin -70 --title "Aerosol optical depth" --cbarscale 0.5 --itemwidth 10 -o cloud_fraction.png
$ cis plot Cloud_Fraction_Ocean:MOD04_L2.A2010001.2255.005.2010005215814.hdf --xmin 132 --xmax 162 --ymin -70 --title "Cloud cover fraction" --cbarscale 0.5 --itemwidth 10 -o cloud_fraction.png


First we perform an evaluation using the numpy.masked_where method to produce an optical depth variable that is masked at all points where the cloud cover is more than 20%:
$ cis eval Cloud_Fraction_Ocean=cloud,Optical_Depth_Land_And_Ocean=od:MOD04_L2.A2010001.2255.005.2010005215814.hdf "numpy.ma.masked_where(cloud > 0.2, od)" 1 -o od:masked_optical_depth.nc
$ cis plot od:cis-masked_optical_depth.nc --xmin 132 --xmax 162 --ymin -70 --title Aerosol optical depth --cbarscale 0.5 --itemwidth 10 -o masked_optical_depth.png'

Then we perform an aggregation on this masked output file to give the end result - aerosol optical depth aggregated only using points where the cloud cover is less than 20%:
$ cis aggregate od:cis-masked_optical_depth.nc x=[132,162,0.5],y=[-70,-57,0.5] -o aggregated_masked_optical_depth
$ cis plot od:aggregated_masked_optical_depth.nc --xmin 132 --xmax 162 --ymin -70 --title "Aerosol optical depth (cloud fraction > 0.2)" --cbarscale 0.5 -o aggregated_aod.png

Statistics¶
The Community Intercomparison Suite allows you to perform statistical analysis on two variables using the ‘stats’ command. For example, you might wish to examine the correlation between a model data variable and actual measurements. The ‘stats’ command will calculate:
- Number of data points used in the analysis.
- The mean and standard deviation of each dataset (separately).
- The mean and standard deviation of the absolute difference (var2 - var1).
- The mean and standard deviation of the relative difference ((var2 - var1) / var1).
- The Linear Pearson correlation coefficient.
- The Spearman Rank correlation coefficient.
- The coefficients of linear regression (i.e. var2 = a var1 + b ), r-value, and standard error of the estimate.
These values will be displayed on screen and can optionally be save as NetCDF output.
Note
Both variables used in a statistical analysis must be of the same shape in order to be compatible, i.e. the same number of points in each dimension, and of the same type (ungridded or gridded). This means that, for example, operations between different data products are unlikely to work correctly - performing a collocation or aggregation onto a common grid would be a good pre-processing step.
Note
Only points which have non-missing values for both variables will be included in the analysis. The number of points this includes is part of the output of the stats command.
Warning
Unlike aggregation, stats
does not currently use latitude weighting to account for the
relative areas of different grid cells.
The statistics syntax looks like this:
$ cis stats <datagroup>... [-o <outputfile>]
where:
<datagroup>
is a CIS datagroup specifying the variables and files to read and is of the format
<variable>...:<filename>[:product=<productname>]
where:<variable>
is a mandatory variable or list of variables to use.<filenames>
is a mandatory file or list of files to read from.<productname>
is an optional CIS data product to use (see Data Products):
One or more datagroups should be given, but the total number of variables declared in all datagroups must be exactly two. See Datagroups for a more detailed explanation of datagroups.
<outputfile>
- is an optional argument specifying a file to output to. This will be automatically given a
.nc
extension if not present. This must not be the same file path as any of the input files. If not provided, then the output will not be saved to a file and will only be displayed on screen.
Statistics Example¶
In this example, we perform a statistical comparison of Aeronet aerosol optical thickness at two wavelengths.
The data we are using is shown in the following CIS plot commands
and can be found at /group_workspaces/jasmin/cis/data
:
$ cis plot AOT_500:aeronet/AOT/LEV20/ALL_POINTS/920801_121229_Yonsei_University.lev20 --title "Aerosol optical thickness 550nm"
$ cis plot AOT_440:aeronet/AOT/LEV20/ALL_POINTS/920801_121229_Yonsei_University.lev20 --title "Aerosol optical thickness 440nm"


We then perform a statistical comparison of these variables using:
$ cis stats AOT_500,AOT_440:aeronet/AOT/LEV20/ALL_POINTS/920801_121229_Yonsei_University.lev20
Which gives the following output:
===================================================================
RESULTS OF STATISTICAL COMPARISON:
-------------------------------------------------------------------
Compared all points which have non-missing values in both variables
===================================================================
Number of points: 10727
Mean value of dataset 1: 0.427751965508
Mean value of dataset 2: 0.501316673814
Standard deviation for dataset 1: 0.307680514916
Standard deviation for dataset 2: 0.346274598431
Mean of absolute difference: 0.0735647083061
Standard deviation of absolute difference: 0.0455684788406
Mean of relative difference: 0.188097066086
Standard deviation of relative difference: 0.0528621773819
Spearman's rank coefficient: 0.998289763952
Linear regression gradient: 1.12233533743
Linear regression intercept: 0.0212355272705
Linear regression r-value: 0.997245296339
Linear regression standard error: 0.0256834603945
Overlay Plot Examples¶
First subset some gridded data that will be used for the examples:
cis subset od550aer:aerocom.HadGEM3-A-GLOMAP.A2.CTRL.monthly.od550aer.2006.nc t=[2006-10-13] -o HadGEM_od550aer-subset
cis subset rsutcs:aerocom.HadGEM3-A-GLOMAP.A2.CTRL.monthly.rsutcs.2006.nc t=[2006-10-13] -o HadGEM_rsutcs-subset
Contour over heatmap¶
cis plot od550aer:HadGEM_od550aer-subset.nc:type=heatmap rsutcs:HadGEM_rsutcs-subset.nc:type=contour,color=white,contlevels=[1,10,25,50,175] --type overlay --plotwidth 20 --plotheight 15 --cbarscale 0.5 -o overlay1.png

cis plot od550aer:HadGEM_od550aer-subset.nc:type=heatmap,cmap=binary rsutcs:HadGEM_rsutcs-subset.nc:type=contour,cmap=jet,contlevels=[1,10,25,50,175] --type overlay --xmin -180 --xmax 180 --plotwidth 20 --plotheight 15 --cbarscale 0.5 -o overlay2.png

Filled contour with transparency on NASA Blue Marble¶
cis plot od550aer:HadGEM_od550aer-subset.nc:cmap=Reds,type=contourf,transparency=0.5,cmin=0.15 --type overlay --xmin -180 --xmax 180 --plotwidth 20 --plotheight 15 --cbarscale 0.5 --nasabluemarble

Scatter plus Filled Contour¶
cis subset rsutcs:HadGEM_rsutcs-subset.nc x=[-180,-90],y=[0,90] -o HadGEM_rsutcs-subset2
cis plot GGALT:RF04.20090114.192600_035100.PNI.nc:type=scatter rsutcs:HadGEM_rsutcs-subset2.nc:type=contourf,contlevels=[0,10,20,30,40,50,100],transparency=0.7,contlabel=true,contfontsize=18 --type overlay --plotwidth 20 --plotheight 15 --xaxis longitude --yaxis latitude --xmin -180 --xmax -90 --ymin 0 --ymax 90 --itemwidth 20 -o overlay4.png

cis plot GGALT:RF04.20090114.192600_035100.PNI.nc:type=scatter rsutcs:HadGEM_rsutcs-subset2.nc:type=contourf,contlevels=[40,50,100],transparency=0.3,contlabel=true,contfontsize=18,cmap=Reds --type overlay --plotwidth 20 --plotheight 15 --xaxis longitude --yaxis latitude --xmin -180 --xmax -90 --ymin 0 --ymax 90 --itemwidth 20 --nasabluemarble -o overlay5.png

File Locations¶
The gridded data files can be found at:
/group_workspaces/jasmin/cis/AeroCom/A2/HadGEM3-A-GLOMAP.A2.CTRL/renamed
and the ungridded:
/group_workspaces/jasmin/cis/jasmin_cis_repo_test_files
How can I read my own data?¶
Introduction¶
One of the key strengths of CIS is the ability for users to create their own plugins to read data which CIS doesn’t currently support. These plugins can then be shared with the community to allow other users access to that data. Although the plugins are written in Python this tutorial assumes no experience in Python. Some programming experience is however assumed.
Note
Any technical details that may be useful to experienced Python programmers will be highlighted in this style - they aren’t necessary for completing the tutorial.
Here we describe the process of creating and sharing a plugin. A CIS plugin is simply a python (.py) file with a set of methods (or functions) to describe how the plugin should behave.
Note
The methods for each plugin are described within a Class, this gives the plugin a name and allows CIS to ensure that all of the necessary methods have been implemented.
There are a few methods that the plugin must contain, and some which are optional. A skeleton plugin would look like this:
class MyProd(AProduct):
def get_file_signature(self):
# Code goes here
def create_coords(self, filenames):
...
def create_data_object(self, filenames, variable):
...
Note that in python whitespace matters! When filling in the above methods the code for the method should be indented from the signature by four spaces like this:
Class MyProd(AProduct):
def get_file_signature(self):
# Code goes here
foo = bar
Note also that the name of the plugin (MyProd) in this case should be changed to describe the data which it will read. (Don’t change the AProduct part though – this is important for telling CIS that this is a plugin for reading data.)
Note
The plugin class subclasses AProduct
which is the abstract class which
defines the methods that the plugin needs to override. It also includes
a few helper functions for error catching.
When CIS looks for data plugins it searches for all classes which sub-class
AProduct
. There are also plugins available for collocation with their own abstract base classes,
so that users can store multiple plugin types in the same plugin directory.
In order to turn the above skeleton into a working plugin we need to fill in each of the methods with the some code, which turns our data into something CIS will understand. Often it is easiest to start from an existing plugin that reads closely matching data. For example creating a plugin to read some other CCI data would probably be easiest to start from the Cloud or Aerosol CCI plugins. We have created three different tutorials to walk you through the creation of some of the existing plugins to try and illustrate the process. The Easy tutorial walks through the creation of a basic plugin, the Medium tutorial builds on that by creating a plugin which has a bit more detail, and finally the Advanced plugin talks through some of the main considerations when creating a large and complicated plugin.
A more general template plugin is included here in case no existing plugin matches your need. We have also created a short reference describing the purpose of each method the plugins implement here.
Note
Plugins aren’t the only way you can contribute though. CIS is an open source project hosted on GitHub, so please feel free to submit pull-requests for new features or bug-fixes – just check with the community first so that we’re not duplicating our effort.
Using and testing your plugin¶
It is important that CIS knows where to look to find your new plugin, and this is easily done by setting the environment variable CIS_PLUGIN_HOME to point to the directory within which your plugin is stored.
Once you have done this CIS will automatically use your plugin for reading any files which match the file signature you used.
If you have any issues with this (because for example the file signature clashes with a built-in plugin) you can tell CIS to use your plugin when reading data by simply specifying it after the variable and filename in most CIS commands, e.g.:
cis subset a_variable:filename.nc:product=MyProd ...
Sharing your plugin¶
This is the easy bit! Once you’re happy that your plugin can fairly reliably read a currently unsupported dataset you should share it with the community. Use the upload form here to submit your plugin to the community.
We moderate the plugins we receive to ensure the plugins received are appropriate and meet a minimum level of quality. We’re not expecting the plugins to necessarily be production quality code but we do expect them to work for the subset of data they claim to. Having said that, if we feel a plugin provides really a valuable capability and is of high quality we may incorporate that plugin into the core CIS data readers – with credit to the author of course!
Tutorials¶
Easy¶
A simple plugin to start with is the plugin for reading native ungridded CIS data.
One of the first things to consider is which type of file our plugin is going to be aimed at reading. It is advisable to not make the definition too broad, it’s easy to have multiple plugins so don’t try and over complicate the plugin by having it read many different types of file. Roughly, one plugin should describe a set of data with the same metadata.
Since the CIS plugin is designed to read any data which CIS produces, the signature matches any file which starts with cis- and ends with .nc:
def get_file_signature(self):
return [r'cis\\-.\*\\.nc']
This uses a wildcard string to tell CIS which files do and don’t match for our product.
Note
For an introduction to regular expressions see, for example, https://docs.python.org/2/howto/regex.html
The next step is to complete the AProduct.create_coords()
method. CIS uses this
method to create a set of coordinates from the data, so it needs to
return any appropriate coordinates in the shape that CIS expects it.
There are a number of low-level data reading routines within CIS that
can help you read in your data. For the CIS plugin (which is reading
netCDF data) we use two methods from the cis.data_io.netcdf
module: read_many_files_individually
and
get_metadata
. We also
import the Coord
data type, which is where we store the coordinates that
we’ve read, and UngriddedCoordinates
- which is what we return to CIS.
Note
In python it’s very easy to import classes and methods from other modules within your package, and across packages using the from and import commands. The file-reading routines used here are used by many of the other data products. See the API section for further details about using CIS as a python library.
Don’t worry too much about what these methods do at this stage, just use the import lines below and you should be fine.
def create_coords(self, filenames, usr_variable=None):
from cis.data_io.netcdf import read_many_files_individually, get_metadata
from cis.data_io.Coord import Coord, CoordList
from cis.data_io.ungridded_data import UngriddedCoordinates
Next, we create a list of netCDF variable names which we know are stored in our file and send that to the file reading routine:
var_data = read_many_files_individually(filenames, ["longitude","latitude", "time"])
Then we create a CoordList
to store our coordinates in, a Coord
for each
of those coordinate variables, and then just give them a short label for
plotting purposes (x,y,z etc) – it is strongly advisable that you use
the standard definitions used below for your axis definitions (and use z
for altitude and p for pressure).
coords = CoordList()
coords.append(Coord(var_data[“longitude”,get_metadata(var_data[“longitude”][0]),axis=”x”))
coords.append(Coord(var_data[“latitude”,get_metadata(var_data[“latitude”][0]),axis=”y”))
coords.append(Coord(var_data[“time”,get_metadata(var_data[“time”][0]),axis=”t”))
That’s it, now we can return those coordinates in a way that CIS will understand:
return UngriddedCoordinates(coords)
The last method we have to write is the AProduct.create_data_object()
method,
which is used by CIS to pull together the coordinates and a particular
data variable into an UngriddedData
object. It’s even simpler than the
previous method. We can use the same read_many_files_individually
method as we did before, and this time pass it the variable the user has
asked for:
def create_data_object(self, filenames, variable):
from cis.data_io.ungridded_data import UngriddedData
usr_var_data = read_many_files_individually(filenames,variable)[variable]
Then we create the coordinates using the create_coords()
method we’ve
just written:
coords = self.create_coords(filename)
And finally we return the ungridded data, this combines the coordinates from the file and the variable requested by the user:
return UngriddedData(usr_var_data, get_metadata(usr_var_data[0]),coords)
Bringing it all together, tidying it up a bit and including some error catching gives us:
import logging
from cis.data_io.products.AProduct import AProduct
from cis.data_io.netcdf import read_many_files_individually, get_metadata
class cis(AProduct):
def get_file_signature(self):
return [r'cis\-.*\.nc']
def create_coords(self, filenames, usr_variable=None):
from cis.data_io.Coord import Coord, CoordList
from cis.data_io.ungridded_data import UngriddedCoordinates
from cis.exceptions import InvalidVariableError
variables = [("longitude", "x"), ("latitude", "y"), ("altitude", "z"), ("time", "t"), ("air_pressure", "p")]
logging.info("Listing coordinates: " + str(variables))
coords = CoordList()
for variable in variables:
try:
var_data = read_many_files_individually(filenames,variable[0])[variable[0]]
coords.append(Coord(var_data, get_metadata(var_data[0]),axis=variable[1]))
except InvalidVariableError:
pass
return UngriddedCoordinates(coords)
def create_data_object(self, filenames, variable):
from cis.data_io.ungridded_data import UngriddedData
usr_var_data = read_many_files_individually(filenames,variable)[variable]
coords = self.create_coords(filename)
return UngriddedData(usr_var_data, get_metadata(usr_var_data[0]), coords)
Medium¶
For this example we will look at the AERONET data reading plugin. AERONET is a ground based sun-photometer network that produces time-series data for each groundstation in a csv based text file. There is some information about the ground station in the header of the file, and then a table of data with a time column, and a column for each of the measured values.
The AProduct.get_file_signature()
method is straightforward, so we first consider
the AProduct.create_coords()
method. Here we have actually shifted all of the work
to a private method called _create_coord_list()
, for reasons which we
will explain shortly:
def create_coords(self, filenames, variable=None):
return UngriddedCoordinates(self._create_coord_list(filenames))
Note
In python there is not really such a thing as a ‘private’ method as there is in Java and C#, but we can signify that a method shouldn’t be accessed externally by starting its name with one or two underscores.
In this method we import an AERONET data reading routine:
def _create_coord_list(self, filenames, data=None):
from cis.data_io.ungridded_data import Metadata
from cis.data_io.aeronet import load_multiple_aeronet
This data reading routine actually performs much of the hard work in reading the AERONET file:
if data is None:
data = load_multiple_aeronet(filenames)
Note that we only read the files if Data is None, that is if we haven’t been passed any data already.
Note
The load_multiple_aeronet routine uses the numpy genfromtext method to read in the csv file. This is a very useful method for reading text based files as it allows you to define the data formats of each of the columns, tell it which lines to ignore as comments and, optionally, mask out any missing values. This method would provide a useful example for reading different kinds of text based file.
We just have to describe (add metadata to) each of the components in this method:
coords = CoordList()
coords.append(Coord(data['longitude'], Metadata(name="Longitude",shape=(len(data),),units="degrees_east", range=(-180, 180))))
coords.append(Coord(data['latitude'], Metadata(name="Latitude",shape=(len(data),),units="degrees_north", range=(-90, 90))))
coords.append(Coord(data['altitude'], Metadata(name="Altitude",shape=(len(data),), units="meters")))
time_coord = Coord(data["datetime"], Metadata(name="DateTime",standard_name='time', shape=(len(data),),units="DateTime Object"), "X")
Note that we’ve explicitly added things like units and a shape. These are sometimes already populated for us when reading e.g. NetCDF files, but in the case of AERONET data we have to fill it out ‘by hand’.
Internally CIS uses a ‘standard’ time defined as fractional days since
the 1st January 1600, on a Gregorian calendar. This allows us
to straightforwardly compare model and measurement times regardless of
their reference point. There are many helper methods for converting
different date-time formats to this standard time, here we use
Coord.convert_datetime_to_standard_time()
, and then include the coordinate
in the coordinate list:
time_coord.convert_datetime_to_standard_time()
coords.append(time_coord)
Finally we return the coordinates:
return coords
For the create_data_object()
method we have the familiar signature and
import statements:
def create_data_object(self, filenames, variable):
from cis.data_io.aeronet import load_multiple_aeronet
from cis.exceptions import InvalidVariableError
We can pass the job of reading the data to our AERONET reading routine – catching any errors which occur because the variable doesn’t exist.
try:
data_obj = load_multiple_aeronet(filenames, [variable])
except ValueError:
raise InvalidVariableError(variable + " does not exist in " + str(filenames))
Note
Notice here that we’re catching a ValueError
– which Numpy throws when
it can’t find the specified variable in the data, and rethrowing the
same error as an InvalidVariableError
, so that CIS knows how to deal
with it. Any plugins should use this error when a user specifies a
variable which isn’t within the specified file.
Now we have read the data, we load the coordinate list, but notice that we also pass in the data we’ve just read. This is why we created a separate coordinate reading routine earlier: The data containing the coordinates has already been read in the line above, so we don’t need to read it twice, we just need to pull out the coordinates. This saves time opening the file multiple times, and can be a useful pattern to remember for files which aren’t direct access (such as text files).
coords = self._create_coord_list(filenames, data_obj)
Finally we return the complete data object, including some associated metadata and the coordinates.
return UngriddedData(data_obj[variable], Metadata(name=variable, long_name=variable, shape=(len(data_obj),), missing_value=-999.0), coords)
Here’s the plugin in full:
class Aeronet(AProduct):
def get_file_signature(self):
return [r'.*\.lev20']
def _create_coord_list(self, filenames, data=None):
from cis.data_io.ungridded_data import Metadata
from cis.data_io.aeronet import load_multiple_aeronet
if data is None:
data = load_multiple_aeronet(filenames)
coords = CoordList()
coords.append(Coord(data['longitude'], Metadata(name="Longitude", shape=(len(data),),
units="degrees_east", range=(-180, 180))))
coords.append(Coord(data['latitude'], Metadata(name="Latitude", shape=(len(data),),
units="degrees_north", range=(-90, 90))))
coords.append(Coord(data['altitude'], Metadata(name="Altitude", shape=(len(data),), units="meters")))
time_coord = Coord(data["datetime"], Metadata(name="DateTime", standard_name='time', shape=(len(data),),
units="DateTime Object"), "X")
time_coord.convert_datetime_to_standard_time()
coords.append(time_coord)
return coords
def create_coords(self, filenames, variable=None):
return UngriddedCoordinates(self._create_coord_list(filenames))
def create_data_object(self, filenames, variable):
from cis.data_io.aeronet import load_multiple_aeronet
from cis.exceptions import InvalidVariableError
try:
data_obj = load_multiple_aeronet(filenames, [variable])
except ValueError:
raise InvalidVariableError(variable + " does not exist in " + str(filenames))
coords = self._create_coord_list(filenames, data_obj)
return UngriddedData(data_obj[variable],
Metadata(name=variable, long_name=variable, shape=(len(data_obj),), missing_value=-999.0),
coords)
Advanced¶
This more advanced tutorial will cover some of the difficulties when reading in data which differs significantly from the structure CIS expects, and/or has little metadata in the associated files. We take the MODIS L2 plugin as our example, and discuss each method in turn.
There are a number of specific MODIS L2 products which we have tested using this plugin, each with their own file signature, and so in this plugin we take advantage of the fact that the regular expression returned by get_file_signature can be a list. This way we create a simple regular expression for each MODIS L2 products that we’re supporting - rather than trying to create one, more complicated, regular expression which matches just these products at the exclusion of all others:
def get_file_signature(self):
product_names = ['MYD06_L2', 'MOD06_L2', 'MYD04_L2', 'MOD04_L2']
regex_list = [r'.*' + product + '.*\.hdf' for product in product_names]
return regex_list
We have implemented the optional get_variable_names method here because MODIS files sometimes contain variables which CIS is unable to handle due to their irregular shape. We only want to report the variable which CIS can read so we check each variable before adding it to the list of variables we return. We know that MODIS only contains SD variables so we can ignore any other types.
Note
HDF files can contain both Vdatas (VD) and Scientific Datasets (SD) data collections (among others). These are stored and accessed quite differently, which makes dealing with these files quite fiddly - we often have to treat each case separately. In this case we know MODIS files only have SD datasets which makes things a bit simpler.
def get_variable_names(self, filenames, data_type=None):
import pyhdf.SD
# Determine the valid shape for variables
sd = pyhdf.SD.SD(filenames[0])
datasets = sd.datasets()
valid_shape = datasets['Latitude'][1] # Assumes that latitude shape == longitude shape (it should)
variables = set([])
for filename in filenames:
sd = pyhdf.SD.SD(filename)
for var_name, var_info in sd.datasets().iteritems():
if var_info[1] == valid_shape:
variables.add(var_name)
return variables
MODIS data often has a scale factor built in, and stored against each variable, this method reads that scale factor for a particular variable and checks it against our built-in list of scale factors.
def __get_data_scale(self, filename, variable):
from cis.exceptions import InvalidVariableError
from pyhdf import SD
try:
meta = SD.SD(filename).datasets()[variable][0][0]
except KeyError:
raise InvalidVariableError("Variable "+variable+" not found")
for scaling in self.modis_scaling:
if scaling in meta:
return scaling
return None
In order to use data which has been scaled, we re-scale it on reading. This creates some overhead in the reading of the data, but saves considerable time when performing other operations on it later in the process. Routines like this can often be adapted from available Fortran or IDL routines (assuming no python routines are available) for your data.
def __field_interpolate(self,data,factor=5):
'''
Interpolates the given 2D field by the factor,
edge pixels are defined by the ones in the centre,
odd factors only!
'''
import numpy as np
logging.debug("Performing interpolation...")
output = np.zeros((factor*data.shape[0],factor*data.shape[1]))*np.nan
output[int(factor/2)::factor,int(factor/2)::factor] = data
for i in range(1,factor+1):
output[(int(factor/2)+i):(-1*factor/2+1):factor,:] = i*((output[int(factor/2)+factor::factor,:]-output[int(factor/2):(-1*factor):factor,:])
/float(factor))+output[int(factor/2):(-1*factor):factor,:]
for i in range(1,factor+1):
output[:,(int(factor/2)+i):(-1*factor/2+1):factor] = i*((output[:,int(factor/2)+factor::factor]-output[:,int(factor/2):(-1*factor):factor])
/float(factor))+output[:,int(factor/2):(-1*factor):factor]
return output
Next we read the coordinates from the file (using the same method of factoring out as we used in the Aeronet case).
def _create_coord_list(self, filenames, variable=None):
import datetime as dt
variables = ['Latitude', 'Longitude', 'Scan_Start_Time']
logging.info("Listing coordinates: " + str(variables))
As usual we rely on the lower level IO reading routines to provide the raw data, in this case using the hdf.read routine.
sdata, vdata = hdf.read(filenames, variables)
Note
Notice we have to put the vdata data somewhere, even though we don’t use it in this case.
We have to check whether we need to scale the coordinates to match the variable being read:
apply_interpolation = False
if variable is not None:
scale = self.__get_data_scale(filenames[0], variable)
apply_interpolation = True if scale is "1km" else False
Then we can read the coordinates, one at a time. We know the latitude information is stored in an SD dataset called Latitude, so we read that and interpolate it if needed.
lat = sdata['Latitude']
sd_lat = hdf.read_data(lat, "SD")
lat_data = self.__field_interpolate(sd_lat) if apply_interpolation else sd_lat
lat_metadata = hdf.read_metadata(lat, "SD")
lat_coord = Coord(lat_data, lat_metadata,'Y')
The same for Longitude:
lon = sdata['Longitude']
lon_data = self.__field_interpolate(hdf.read_data(lon,"SD")) if apply_interpolation else hdf.read_data(lon,"SD")
lon_metadata = hdf.read_metadata(lon,"SD")
lon_coord = Coord(lon_data, lon_metadata,'X')
Next we read the time variable, remembering to convert it to our internal standard time. (We know that the MODIS’ atomic clock time is referenced to the 1st January 1993.)
time = sdata['Scan_Start_Time']
time_metadata = hdf.read_metadata(time,"SD")
# Ensure the standard name is set
time_metadata.standard_name = 'time'
time_coord = Coord(time,time_metadata,"T")
time_coord.convert_TAI_time_to_std_time(dt.datetime(1993,1,1,0,0,0))
return CoordList([lat_coord,lon_coord,time_coord])
def create_coords(self, filenames, variable=None):
return UngriddedCoordinates(self._create_coord_list(filenames))
For the create_data_object we are really just pulling the above methods together to read the specific variable the user has requested and combine it with the coordinates.
def create_data_object(self, filenames, variable):
logging.debug("Creating data object for variable " + variable)
# reading coordinates
# the variable here is needed to work out whether to apply interpolation to the lat/lon data or not
coords = self._create_coord_list(filenames, variable)
# reading of variables
sdata, vdata = hdf.read(filenames, variable)
# retrieve data + its metadata
var = sdata[variable]
metadata = hdf.read_metadata(var, "SD")
return UngriddedData(var, metadata, coords)
We have also implemented the AProduct.get_file_format()
method which allows some
associated tools (for example the CEDA_DI tool) to use CIS to index
files which they wouldn’t otherwise be able to read. We just return a
file format descriptor as a string.
def get_file_format(self, filenames):
"""
Get the file format
:param filenames: the filenames of the file
:return: file format
"""
return "HDF4/ModisL2"
The full MODIS L2 plugin is rather long to show but can be downloaded here.
Data plugin reference¶
This section provides a reference describing the expected behaviour of each of the functions a plugin can implement. The following methods are mandatory:
-
AProduct.
get_file_signature
() This method should return a list of regular expressions, which CIS uses to decide which data product to use for a given file. If more than one regular expression is provided in the list then the file can match any of the expressions. The first product with a signature that matches the filename will be used. The order in which the products are searched is determined by the priority property, highest value first; internal products generally have a priority of 10.
For example, this would match all files with a name containing the string ‘CODE’ and with the ‘nc’ extension.:
return [r'.*CODE*.nc']
Note
If the signature has matched the framework will call
AProduct.get_file_type_error()
, this gives the product a chance to open the file and check the contents.Returns: A list of regex to match the product’s file naming convention. Return type: list
-
AProduct.
create_coords
(filenames) Reads the coordinates from one or more files. Note that this method may have to make certain assumptions about the file in order to return a single coordinate set. The user should be warned through the logger if this is the case.
Parameters: filenames (list) – List of filenames to read coordinates from Returns: CommonData
object
-
AProduct.
create_data_object
(filenames, variable) Create and return an
CommonData
object for a given variable from one or more files.Parameters: - filenames (list) – List of filenames of files to read
- variable (str) – Variable to read from the files
Returns: An
CommonData
object representing the specified variableRaises: - FileIOError – Unable to read a file
- InvalidVariableError – Variable not present in file
While these may be implemented optionally:
-
AProduct.
get_variable_names
(filenames, data_type=None) Get a list of available variable names from the filenames list passed in. This general implementation can be overridden in specific products to include/exclude variables which may or may not be relevant. The data_type parameter can be used to specify extra information.
Parameters: - filenames (list) – List of string filenames of files to be read from
- data_type (str) – ‘SD’ or ‘VD’ to specify only return SD or VD variables from HDF files. This may take on other values in specific product implementations.
Returns: A set of variable names as strings
Return type: str
-
AProduct.
get_file_type_error
(filename) Check a single file to see if it is of the correct type, and if not return a list of errors. If the return is None then there are no errors and this is the correct data product to use for this file.
This method gives a mechanism for a data product to identify itself as the correct product when a specific enough file signature cannot be provided. For example GASSP is a type of NetCDF file and so filenames end with .nc but so do other NetCDF files, so the data product opens the file and looks for the GASSP version attribute, and if it doesn’t find it returns an error.
Parameters: filename (str) – The filename for the file Returns: List of errors, or None Return type: list or None
-
AProduct.
get_file_format
(filename) Returns a file format hierarchy separated by slashes, of the form
TopLevelFormat/SubFormat/SubFormat/Version
. E.g.NetCDF/GASSP/1.0
,ASCII/ASCIIHyperpoint
orHDF4/CloudSat
. This is mainly used within the ceda_di indexing tool. If not set it will default to the products name.A filename of an example file can be provided to enable the determination of, for example, a dataset version number.
Parameters: filename (str) – Filename of file to be inspected Returns: File format, of the form [parent/]format/specific instance/version
, or the class nameReturn type: str Raises: FileFormatError if there is an error
Analysis plugin development¶
Users can write their own plugins for performing the collocation of two data sets. There are three different types of plugin available for collocation, first we will describe the overall design and how these different components interact, then each will be described in more detail.
Basic collocation design¶
The diagram below demonstrates the basic design of the collocation system, and the roles of each of the components. In the simple case of the default collocator (which returns only one value) the Collocator loops over each of the sample points, calls the relevant Constraint to reduce the number of data points, and then the Kernel which returns a single value, which the collocator stores.

Kernel¶
A kernel is used to convert the constrained points into values in the output. There are two sorts of kernel one
which act on the final point location and a set of data points (these derive from Kernel
) and the more specific kernels
which act upon just an array of data (these derive from AbstractDataOnlyKernel
, which in turn derives from Kernel
).
The data only kernels are less flexible but should execute faster. To create a new kernel inherit from Kernel
and
implement the abstract method Kernel.get_value()
. To make a data only kernel inherit from AbstractDataOnlyKernel
and implement AbstractDataOnlyKernel.get_value_for_data_only()
and optionally overload AbstractDataOnlyKernel.get_value()
.
These methods are outlined below.
-
Kernel.
get_value
(point, data) This method should return a single value (if
Kernel.return_size
is 1) or a list of n values (ifKernel.return_size
is n) based on some calculation on the data given a single point.The data is deliberately left unspecified in the interface as it may be any type of data, however it is expected that each implementation will only work with a specific type of data (gridded, ungridded etc.) Note that this method will be called for every sample point and so could become a bottleneck for calculations, it is advisable to make it as quick as is practical. If this method is unable to provide a value (for example if no data points were given) a ValueError should be thrown.
Parameters: - point – A single HyperPoint
- data – A set of data points to reduce to a single value
Returns: For return_size=1 a single value (number) otherwise a list of return values, which represents some operation on the points provided
Raises ValueError: When the method is unable to return a value
-
AbstractDataOnlyKernel.
get_value_for_data_only
(values) This method should return a single value (if
Kernel.return_size
is 1) or a list of n values (ifKernel.return_size
is n) based on some calculation on the the values (a numpy array).Note that this method will be called for every sample point in which data can be placed and so could become a bottleneck for calculations, it is advisable to make it as quick as is practical. If this method is unable to provide a value (for example if no data points were given) a ValueError should be thrown. This method will not be called if there are no values to be used for calculations.
Parameters: values – A numpy array of values (can not be none or empty) Returns: A single data item if return_size is 1 or a list of items containing Kernel.return_size
itemsRaises ValueError: If there are any problems creating a value
Constraint¶
The constraint limits the data points for a given sample point.
The user can also add a new constraint mechanism by subclassing Constraint
and providing an implementation for
Constraint.constrain_points()
. If more control is needed over the iteration sequence then the
Constraint.get_iterator()
method can also be
overloaded. Note however that this may not be respected by all collocators, who may still iterate over all
sample data points. It is possible to write your own collocator (or extend an existing one) to ensure the correct
iterator is used - see the next section. Both these methods, and their signatures, are outlined below.
-
Constraint.
constrain_points
(point, data) This method should return a subset of the data given a single reference point. It is expected that the data returned should be of the same type as that given - but this isn’t mandatory. It is possible that this function will return zero points (no data), the collocation class is responsible for providing a fill_value.
Parameters: - point (HyperPoint) – A single HyperPoint
- data – A set of data points to be reduced
Returns: A reduced set of data points
-
Constraint.
get_iterator
(missing_data_for_missing_sample, coord_map, coords, data_points, shape, points, output_data) Iterator to iterate through the points needed to be calculated. The default iterator, iterates through all the sample points calling
Constraint.constrain_points()
for each one.Parameters: - missing_data_for_missing_sample – If true anywhere there is missing data on the sample then final point is missing; otherwise just use the sample
- coord_map – Coordinate map - list of tuples of indexes of hyperpoint coord, data coords and output coords
- coords – The coordinates to map the data onto
- data_points – The (non-masked) data points
- shape – Shape of the final data values
- points – The original points object, these are the points to collocate
- output_data – Output data set
Returns: Iterator which iterates through (sample indices, hyper point and constrained points) to be placed in these points
To enable a constraint to use a AbstractDataOnlyKernel
, the method
get_iterator_for_data_only()
should be implemented (again though, this may be ignored by a collocator). An
example of this is the BinnedCubeCellOnlyConstraint.get_iterator_for_data_only()
implementation.
Collocator¶
Another plugin which is available is the collocation method itself. A new one can be created by subclassing Collocator
and
providing an implementation for Collocator.collocate()
. This method takes a number of sample
points and applies the given constraint and kernel methods on the data for each of those points. It is responsible for
returning the new data object to be written to the output file. As such, the user could create a collocation routine
capable of handling multiple return values from the kernel, and hence creating multiple data objects, by creating a
new collocation method.
Note
The collocator is also responsible for dealing with any missing values in sample points. (Some sets of sample points may
include values which may or may not be masked.) Sometimes the user may wish to mask the output for such points, the
missing_data_for_missing_sample
attribute is used to determine the expected behaviour.
The interface is detailed here:
-
Collocator.
collocate
(points, data, constraint, kernel) The method is responsible for setting up and running the collocation. It should take a set of data and map that onto the given (sample) points using the constraint and kernel provided.
Parameters: - points – A set of sample points onto which we will collocate some other ‘data’
- data – Some other data to be collocated onto the ‘points’
- constraint – A
Constraint
instance which provides aConstraint.constrain_points()
method, and optionally anConstraint.get_iterator()
method - kernel – A
Kernel
instance which provides aKernel.get_value()
method
Returns: One or more
CommonData
(or subclasses of) objects whose coordinates lie on the points defined above.
Implementation¶
For all of these plugins any new variables, such as limits, constraint values or averaging parameters,
are automatically set as attributes in the relevant object. For example, if the user wanted to write a new
constraint method (AreaConstraint
, say) which needed a variable called area
, this can be accessed with self.area
within the constraint object. This will be set to whatever the user specifies at the command line for that variable, e.g.:
$ ./cis.py col my_sample_file rain:"model_data_?.nc"::AreaConstraint,area=6000,fill_value=0.0:nn_gridded
Example implementations of new collocation plugins are demonstrated below for each of the plugin types:
class MyCollocator(Collocator):
def collocate(self, points, data, constraint, kernel):
values = []
for point in points:
con_points = constraint.constrain_points(point, data)
try:
values.append(kernel.get_value(point, con_points))
except ValueError:
values.append(constraint.fill_value)
new_data = LazyData(values, data.metadata)
new_data.missing_value = constraint.fill_value
return new_data
class MyConstraint(Constraint):
def constrain_points(self, ref_point, data):
con_points = []
for point in data:
if point.value > self.val_check:
con_points.append(point)
return con_points
class MyKernel(Kernel):
def get_value(self, point, data):
nearest_point = point.furthest_point_from()
for data_point in data:
if point.compdist(nearest_point, data_point):
nearest_point = data_point
return nearest_point.val
Maintenance and Developer Guide¶
Source files¶
The cis source code is hosted at https://github.com/cedadev/jasmin_cis.git, while the conda recipes and other files are hosted here: https://github.com/cistools.
Test suites¶
The unit tests suite can be ran using Nose readily. Just go the root of the repository (i.e. cis) and type
nosetests cis/test/unit
and this will run the full suite of tests.
A comprehensive set of integration tests are also provided. There is a folder full of test data
at: /group_workspaces/jasmin/cis/cis_repo_test_files
which has been compressed and is available as a tar inside that
folder.
To add files to the folder simply copy them in then delete the old tar file and create a new one with:
tar --dereference -zcvf cis_repo_test_files.tar.gz .
Ignore warning about file changing - it is because the tar file is in the directory. Having the tar file in the
directory, however, means the archive can be easily unpacked, without creating an intermediate folder.
To make the integration tests run this needs to be copied to the local machine and decompressed. Then set the
environment variable CIS_DATA_HOME
to the location of the data sets, and run nosetests cis/test/integration
.
There are also a number of plot tests available under the test/plot_tests
directory in
the test_plotting.py
script. These integration tests use matplotlib to perform a byte-wise comparision of the output
against reference plots, using a pre-defined tolerance. Any tests which fail can be evaluated using the idiff.py
tool in the same directory. Running this will present a graphical interface showing the reference plot, the test output,
and the difference between them. You can either choose to accept the difference which will move the test output to the
reference directory, or reject it.
Dependencies¶
A graph representing the dependency tree can be found at doc/cis_dependency.dot
(use XDot to read it)

Creating a Release¶
To carry out intermediate releases follow this procedure:
- Check the version number and status is updated in the CIS source code (cis/__init__.py)
- Tag the new version on Github with new version number and release notes.
- Create a tarball - use
python setup.py egg_info sdist
in the cis root dir. - Install this onto the release virtual environment: this is at
/group_workspaces/jasmin/cis/cis_dev_venv
. So activate the venv, upload the tarball somewhere on the GWS and then dopip install <LOCATION_OF_TARBALL>
. - Create an anaconda build on each platform (OS X, Linux and Windows) - see below.
- Request Phil Kershaw upload the tarball to PyPi. (Optional)
For a release onto JASMIN, complete the steps above and then ask Alan Iwi to produce an RPM, deploy it on a test VM, confirm functionality then rollout across full JAP and LOTUS nodes.
Anaconda Build¶
The Anaconda build recipes for CIS and the dependencies which can’t be found either in the core channel, or in SciTools are stored in their own github repository here. To build a new CIS package clone the conda-recipes repository and then run the following command:
$ conda build -c cistools -c scitools cis
By default this will run the full unit-test suite before successful completion. You can also optionally run the integration test suite by specifying the CIS_DATA_HOME environment variable.
To upload the package to the cistools channel on Anaconda.org use:
$ binstar upload <package_location> -u cistools
Alternatively, when creating release candidates you may wish to upload the package to the ‘beta’ channel. This gives an opportunity to test the packaging and installation process on a number of machines. To do so, use:
$ binstar upload <package_location> -u cistools --channel beta
To install cis from the beta channel use:
$ conda install -c https://conda.binstar.org/cistools/channel/beta -c cistools -c scitools cis
Documentation¶
The documentation and API reference are both generated using a mixture of markdown and autogenerated documentation using the Sphinx autodoc package. Build the documentation using:
python setup.py build_sphinx
This will output the documentation in html under the directory doc/_build/html
.
Continuous Integration Server¶
JASMIN provide a Jenkins CI Server on which the CIS unit and integration tests are run whenever origin/master is updated.
The integration tests take approximately 7 hours to run whilst the unit tests take about 5s. The Jenkins server is
hosted on jasmin-sci1-dev at /var/lib/jenkins
and is accessed at http://jasmin-sci1-dev.ceda.ac.uk:8080/
We also have a Travis cloud instance (https://travis-ci.org/cedadev/cis) which in principle allows us to build and test on both Linux and OS X. There are unit test builds currently working but because of a hard time limit on builds (120 minutes) the integration tests don’t currently run.
Copying files to the CI server¶
The contents of the test folder will not be automatically copied across to the Jenkins directory, so if you add any
files to the folder you’ll need to manually copy them to the Jenkins directory or the integration tests will fail. The
directory is /var/lib/jenkins/workspace/CIS Integration Tests/cis/test/test_files/
. This is not entirely simple
because:
- We don’t have write permissions on the test folder
- Jenkins doesn’t have read permissions for the CIS group_workspace
In order to copy files across we have done the following:
- Copy the files we want to /tmp
- Open up the CIS Integration Tests webpage and click ‘Configure’
- Scroll down to ‘Build’ where the shell script to be executed is found and insert a line to copy the file to the
directory, e.g.
cp /tmp/file.nc /var/lib/jenkins/workspace/CIS Integration Tests/cis/test/test_files
- Run the CIS Integration Tests
- Remove the line from the build script
- Remove the files from /tmp
Problems with Jenkins¶
Sometimes the Jenkins server experiences problems which make it unusable. One particular issue we’ve encountered more than once is that Jenkins occasionally loses all its stylesheets and then becomes impossible to use. Asking CEDA support (or Phil Kershaw) to restart Jenkins should solve this.
CIS as a Python library (API)¶
Main API¶
As a command line tool, CIS has not been designed with a python API in mind. There are however some utility functions
that may provide a useful start for those who wish to use CIS as a python library. For example, the functions in the
base cis module provide a straightforward way to load your data. They can be easily import using, for example: from cis import read_data
.
One of the advantages of using CIS as a Python library is that you are able to perform multiple operations in one go,
that is without writing to disk in between. In certain cases this may provide a significant speed-up.
Note
This section of the documentation expects a greater level of Python experience than the other sections. There are many helpful Python guides and tutorials available around the web if you wish to learn more.
The read_data()
function is a simple way to read a single gridded or ungridded data object (e.g. a NetCDF
variable) from one or more files. CIS will determine the best way to interpret the datafile by comparing the file
signature with the built-in data reading plugins and any user defined plugins. Specifying a particular product
allows the user to override this automatic detection.
-
cis.
read_data
(filenames, variable, product=None)¶ Read a specific variable from a list of files Files can be either gridded or ungridded but not a mix of both. First tries to read data as gridded, if that fails, tries as ungridded.
Parameters: - filenames (string or list) – The filenames of the files to read. This can be either a single filename as a string, a comma
separated list, or a
list
of string filenames. Filenames can include directories which will be expanded to include all files in that directory, or wildcards such as*
or?
. - variable (str) – The variable to read from the files
- product (str) – The name of the data reading plugin to use to read the data (e.g.
Cloud_CCI
).
Returns: The specified data as either a
GriddedData
orUngriddedData
object.- filenames (string or list) – The filenames of the files to read. This can be either a single filename as a string, a comma
separated list, or a
The read_data_list()
function is very similar to read_data()
except that it allows the user to specify
more than one variable name. This function returns a list of data objects, either all of which will be gridded, or all
ungridded, but not a mix. For ungridded data lists it is assumed that all objects share the same coordinates.
-
cis.
read_data_list
(filenames, variables, product=None, aliases=None)¶ Read multiple data objects from a list of files. Files can be either gridded or ungridded but not a mix of both.
Parameters: - filenames (string or list) – The filenames of the files to read. This can be either a single filename as a string, a comma
separated list, or a
list
of string filenames. Filenames can include directories which will be expanded to include all files in that directory, or wildcards such as*
or?
. - variables (string or list) – One or more variables to read from the files
- product (str) – The name of the data reading plugin to use to read the data (e.g.
Cloud_CCI
). - aliases (string or list) – List of aliases to put on each variable’s data object as an alternative means of identifying them.
Returns: A list of the data read out (either a
GriddedDataList
orUngriddedDataList
depending on the type of data contained in the files)- filenames (string or list) – The filenames of the files to read. This can be either a single filename as a string, a comma
separated list, or a
Data Objects¶
Each of the above methods return either GriddedData
or UngriddedData
objects. These objects are the main
data handling objects used within CIS, and the methods on each of these types are documented in the
data modules section. These classes do however share a common interface, defined by the CommonData
class, which is detailed below. For technical reasons some methods which are common to both GriddedData
and UngriddedData
are not defined in the CommonData
interface. The most useful of these methods are probably summary()
and save_data()
.
-
class
cis.data_io.common_data.
CommonData
Interface of common methods implemented for gridded and ungridded data.
-
alias
Return an alias for the variable name. This is an alternative name by which this data object may be identified if, for example, the actual variable name is not valid for some use (such as performing a python evaluation).
Returns: The alias Return type: str
-
as_data_frame
(copy) Convert a CommonData object to a Pandas DataFrame.
Parameters: copy – Create a copy of the data for the new DataFrame? Default is True. Returns: A Pandas DataFrame representing the data and coordinates. Note that this won’t include any metadata.
-
get_all_points
() Returns a list-like object allowing access to all points as HyperPoints. The object should allow iteration over points and access to individual points.
Returns: list-like object of data points
-
get_coordinates_points
() Returns a list-like object allowing access to the coordinates of all points as HyperPoints. The object should allow iteration over points and access to individual points.
Returns: list-like object of data points
-
get_non_masked_points
() Returns a list-like object allowing access to all points as HyperPoints. The object should allow iteration over non-masked points and access to individual points.
Returns: list-like object of data points
-
history
Return the associated history of the object
Returns: The history Return type: str
-
is_gridded
() Returns value indicating whether the data/coordinates are gridded.
-
var_name
Return the variable name associated with this data object
Returns: The ariable name
-
Unsupported API¶
Warning
While the above interfaces are designed as a ‘public’ API and unlikely to change over CIS versions, those documented below are not yet standardised and may change or be removed even between minor version revisions. It is expected however that these particular classes will be developed and stabilised over time to form part of the ‘public’ API.
Collocation¶
The main collocation class can be imported using from cis.collocation import Collocate
, it’s methods are outlined below:
-
class
cis.collocation.
Collocate
(sample_points, missing_data_for_missing_sample=False, collocator_factory=<cis.collocation.col.CollocatorFactory object>)¶ Perform a general collocation
-
__init__
(sample_points, missing_data_for_missing_sample=False, collocator_factory=<cis.collocation.col.CollocatorFactory object>)¶ Constructor
Parameters: - sample_points (CommonData) – Sample points to collocate on to
- output_filename – Filename to output to
- missing_data_for_missing_sample – Write missing values out when sample data is missing
- collocator_factory (CollocatorFactory) – An optional configuration object
-
__weakref__
¶ list of weak references to the object (if defined)
-
collocate
(data, col_name=None, col_params=None, kern=None, kern_params=None)¶ Perform the collocation.
Parameters: - data (CommonData) – Data to collocate
- col_name (str) – Name of the collocator
- col_params (dict) – Parameters dictionary for the collocation and constraint
- kern (str) – The kernel to use
- kern_params (dict) – The kernel parameters to use
Return CommonData: The collocated data
Raises CoordinateNotFoundError: If the collocator was unable to compare the sample and data points
-
Aggregation¶
The main collocation class can be imported using from cis.aggregation import Aggregate
, it’s methods are outlined below.
Note that currently this object saves the output directly to file, but it is expected that in the future it will return
the result for the user to output as needed.
-
class
cis.aggregation.
Aggregate
(grid, output_file, data_reader=<cis.data_io.data_reader.DataReader object>, data_writer=<cis.data_io.data_writer.DataWriter object>)¶ -
__init__
(grid, output_file, data_reader=<cis.data_io.data_reader.DataReader object>, data_writer=<cis.data_io.data_writer.DataWriter object>)¶ Constructor
Parameters: - grid (dict) – A dictionary of dimension_name:AggregationGrid key value pairs.
- output_file – The filename to output the result to
- data_reader – Optional
DataReader
configuration object - data_writer – Optional
DataWriter
configuration object
-
__weakref__
¶ list of weak references to the object (if defined)
-
aggregate
(variables, filenames, product=None, kernel=None)¶ Aggregate the given variables based on the initialised grid
Parameters: - variables (string or list) – One or more variables to read from the files
- filenames (string or list) – One or more filenames of the files to read
- product (str) – Name of data product to use (optional)
- kernel (str) – Name of kernel to use (the default is ‘moments’)
-
Subsetting¶
The main collocation class can be imported using from cis.subsetting import Subset
, it’s methods are outlined below:
Note that currently this object saves the output directly to file, but it is expected that in the future it will return
the result for the user to output as needed.
-
class
cis.subsetting.
Subset
(limits, output_file, data_reader=<cis.data_io.data_reader.DataReader object>, data_writer=<cis.data_io.data_writer.DataWriter object>)¶ Class for subsetting Ungridded or Gridded data either temporally, or spatially or both.
-
__init__
(limits, output_file, data_reader=<cis.data_io.data_reader.DataReader object>, data_writer=<cis.data_io.data_writer.DataWriter object>)¶ Constructor
Parameters: - limits (dict) – A dictionary of dimension_name:SubsetLimits key value pairs.
- output_file – The filename to output the result to
- data_reader – Optional
DataReader
configuration object - data_writer – Optional
DataWriter
configuration object
-
__weakref__
¶ list of weak references to the object (if defined)
-
subset
(variables, filenames, product=None)¶ Subset the given variables based on the initialised limits
Parameters: - variables (string or list) – One or more variables to read from the files
- filenames (string or list) – One or more filenames of the files to read
- product (str) – Name of data product to use (optional)
-
Stats¶
The main collocation class can be imported using from cis.stats import StatsAnalyzer
, it’s methods are outlined below:
-
class
cis.stats.
StatsAnalyzer
(data1, data2) Analyse datasets to produce statistics.
-
__init__
(data1, data2) Create a statistics analyser for two data sets
Parameters: - data1 (CommonData) – First data object
- data2 (CommonData) – Second data object
-
analyze
() Perform a statistical analysis on two data sets.
Returns: List of StatisticsResult instances.
-
points_count
() Count all points which will be used for statistical comparison operations (i.e. are non-missing in both datasets).
Returns: List of StatisticsResults
-
means
() Means of two datasets
Returns: List of StatisticsResults
-
stddevs
() Corrected sample standard deviation of datasets
Returns: List of StatisticsResults
-
abs_mean
() Mean of absolute difference d2-d1
Returns: List of StatisticsResults
-
abs_stddev
() Standard deviation of absolute difference d2-d1
Returns: List of StatisticsResults
-
rel_mean
() Mean of relative difference (d2-d1)/d1
Returns: List of StatisticsResults
-
rel_stddev
() Mean of relative difference (d2-d1)/d1
Returns: List of StatisticsResults
-
spearmans_rank
() Perform a spearman’s rank on the data
Returns: List of StatisticsResults
-
linear_regression
() Perform a linear regression on the data
Returns: List of StatisticsResults
-
__weakref__
list of weak references to the object (if defined)
-
Full Python reference documentation¶
The rest of the documentation below documents internal CIS functions and modules which are not intended to be used as an API at all. They are documented here as a reference for developers and other interested parties.
cis.data_io package¶
cis.data_io.products package¶
-
class
cis.data_io.products.AProduct.
AProduct
¶ Bases:
object
Abstract class for the various possible data products. This just defines the interface which the subclasses must implement.
-
create_coords
(filenames)¶ Reads the coordinates from one or more files. Note that this method may have to make certain assumptions about the file in order to return a single coordinate set. The user should be warned through the logger if this is the case.
Parameters: filenames (list) – List of filenames to read coordinates from Returns: CommonData
object
-
create_data_object
(filenames, variable)¶ Create and return an
CommonData
object for a given variable from one or more files.Parameters: - filenames (list) – List of filenames of files to read
- variable (str) – Variable to read from the files
Returns: An
CommonData
object representing the specified variableRaises: - FileIOError – Unable to read a file
- InvalidVariableError – Variable not present in file
-
get_file_format
(filename)¶ Returns a file format hierarchy separated by slashes, of the form
TopLevelFormat/SubFormat/SubFormat/Version
. E.g.NetCDF/GASSP/1.0
,ASCII/ASCIIHyperpoint
orHDF4/CloudSat
. This is mainly used within the ceda_di indexing tool. If not set it will default to the products name.A filename of an example file can be provided to enable the determination of, for example, a dataset version number.
Parameters: filename (str) – Filename of file to be inspected Returns: File format, of the form [parent/]format/specific instance/version
, or the class nameReturn type: str Raises: FileFormatError if there is an error
-
get_file_signature
()¶ This method should return a list of regular expressions, which CIS uses to decide which data product to use for a given file. If more than one regular expression is provided in the list then the file can match any of the expressions. The first product with a signature that matches the filename will be used. The order in which the products are searched is determined by the priority property, highest value first; internal products generally have a priority of 10.
For example, this would match all files with a name containing the string ‘CODE’ and with the ‘nc’ extension.:
return [r'.*CODE*.nc']
Note
If the signature has matched the framework will call
AProduct.get_file_type_error()
, this gives the product a chance to open the file and check the contents.Returns: A list of regex to match the product’s file naming convention. Return type: list
-
get_file_type_error
(filename)¶ Check a single file to see if it is of the correct type, and if not return a list of errors. If the return is None then there are no errors and this is the correct data product to use for this file.
This method gives a mechanism for a data product to identify itself as the correct product when a specific enough file signature cannot be provided. For example GASSP is a type of NetCDF file and so filenames end with .nc but so do other NetCDF files, so the data product opens the file and looks for the GASSP version attribute, and if it doesn’t find it returns an error.
Parameters: filename (str) – The filename for the file Returns: List of errors, or None Return type: list or None
-
get_variable_names
(filenames, data_type=None)¶ Get a list of available variable names from the filenames list passed in. This general implementation can be overridden in specific products to include/exclude variables which may or may not be relevant. The data_type parameter can be used to specify extra information.
Parameters: - filenames (list) – List of string filenames of files to be read from
- data_type (str) – ‘SD’ or ‘VD’ to specify only return SD or VD variables from HDF files. This may take on other values in specific product implementations.
Returns: A set of variable names as strings
Return type: str
-
priority
= 10¶
-
valid_dimensions
= None¶
-
-
exception
cis.data_io.products.AProduct.
ProductPluginException
(message, original_exception)¶ Bases:
exceptions.Exception
Represents an error which has occurred inside of a Product plugin
-
original_exception
= None¶
-
-
cis.data_io.products.AProduct.
get_coordinates
(filenames, product=None)¶ Top level routine for calling the correct product’s
create_coords()
routine.Parameters: - filenames (list) – A list of filenames to read data from
- product (str) – The product to read data with - this should be a string which matches the name of one of the subclasses of AProduct
Returns: A
CoordList
object
-
cis.data_io.products.AProduct.
get_data
(filenames, variable, product=None)¶ Top level routine for calling the correct product’s
create_data_object()
routine.Parameters: - filenames (list) – A list of filenames to read data from
- variable (str) – The variable to create the
CommonData
object from - product (str) – The product to read data with - this should be a string which matches the name of one of the
subclasses of
AProduct
. If none is supplied it is guessed from the filename signature.
Returns: A
CommonData
variable
-
cis.data_io.products.AProduct.
get_file_format
(filenames, product=None)¶ Returns the files format of throws FileFormatError if there is an error in the format
Parameters: - filenames (list) – the filenames to read
- product (str) – the product to use, if not specified search
Returns: File format
Raises ClassNotFoundError: if there is no reader for this class
-
cis.data_io.products.AProduct.
get_product_full_name
(filenames, product=None)¶ Get the full name of the product which would read this file
Parameters: - filenames (list) – list of filenames to read
- product (str) – specified product to use
-
cis.data_io.products.AProduct.
get_variables
(filenames, product=None, data_type=None)¶ Top level routine for calling the correct product’s
get_variable_names()
routine.Parameters: - filenames (list) – A list of filenames to read the variables from
- product (str) – The product to read data with - this should be a string which matches the name of one of the subclasses of AProduct
Returns: A set of variable names as strings
Data modules¶
Module for the UngriddedData class
-
class
cis.data_io.ungridded_data.
LazyData
(data, metadata, data_retrieval_callback=None)¶ Bases:
object
Wrapper (adaptor) class for the different types of possible ungridded data.
-
add_attributes
(attributes)¶ Add a variable attribute to this data
Parameters: attributes – Dictionary of attribute names (keys) and values. Returns:
-
add_history
(new_history)¶ Appends to, or creates, the metadata history attribute using the supplied history string. The new entry is prefixed with a timestamp.
Parameters: new_history – history string
-
copy_metadata_from
(other_data)¶ Method to copy the metadata from one UngriddedData/Cube object to another
-
data
¶ This is a getter for the data property. It caches the raw data if it has not already been read. Throws a MemoryError when reading for the first time if the data is too large.
-
data_flattened
¶ Returns a 1D flattened view (or copy, if necessary) of the data.
-
long_name
¶
-
name
()¶ This routine returns the first name property which is not empty out of: _name, standard_name and long_name. If they are all empty it returns an empty string :return: The name of the data object as a string
-
remove_attribute
(key)¶ Remove a variable attribute from this data
Parameters: key – Attribute key to remove Returns:
-
save_data
(output_file)¶
-
shape
¶
-
standard_name
¶
-
units
¶
-
update_range
(range=None)¶
-
update_shape
(shape=None)¶
-
var_name
¶
-
-
class
cis.data_io.ungridded_data.
Metadata
(name='', standard_name='', long_name='', shape='', units='', range='', factor='', offset='', missing_value='', calendar='', history='', misc=None)¶ Bases:
object
-
alter_standard_name
(new_standard_name)¶ Alter the standard name and log an info line to say this is happening if the standard name is not empty. Also changes internal name for metadata or the same.
Parameters: new_standard_name –
-
classmethod
from_CubeMetadata
(cube_meta)¶
-
static
guess_standard_name
(name)¶
-
summary
(offset=5)¶ Creates a unicode summary of the metadata object
Parameters: offset – The left hand padding to apply to the text Returns: The summary
-
-
class
cis.data_io.ungridded_data.
UngriddedCoordinates
(coords)¶ Bases:
cis.data_io.common_data.CommonData
Wrapper (adaptor) class for the different types of possible ungridded data.
-
alias
¶ Return an alias for the variable name. This is an alternative name by which this data object may be identified if, for example, the actual variable name is not valid for some use (such as performing a python evaluation).
Returns: The alias Return type: str
-
as_data_frame
(copy=True)¶ Convert an UngriddedCoordinates object to a Pandas DataFrame.
Parameters: copy – Create a copy of the data for the new DataFrame? Default is True. Returns: A Pandas DataFrame representing the data and coordinates. Note that this won’t include any metadata.
-
coord
(name_or_coord=None, standard_name=None, long_name=None, attributes=None, axis=None)¶ Raise: CoordinateNotFoundError Returns: A single coord given the same arguments as coords()
.
-
coords
(name_or_coord=None, standard_name=None, long_name=None, attributes=None, axis=None, dim_coords=True)¶ Returns: A list of coordinates in this UngriddedData object fitting the given criteria
-
filenames
= []¶
-
get_all_points
()¶ Returns a HyperPointView of the points.
Returns: HyperPointView of all the data points
-
get_coordinates_points
()¶
-
get_non_masked_points
()¶ Returns a HyperPointView for which the default iterator omits masked points.
Returns: HyperPointView of the data points
-
history
¶
-
hyper_point
(index)¶ Parameters: index – The index in the array to find the point for Returns: A hyperpoint representing the data at that point
-
is_gridded
¶ Returns value indicating whether the data/coordinates are gridded.
-
lat
¶
-
lon
¶
-
time
¶
-
var_name
¶ Return the variable name associated with this data object
Returns: The ariable name
-
x
¶
-
y
¶
-
-
class
cis.data_io.ungridded_data.
UngriddedData
(data, metadata, coords, data_retrieval_callback=None)¶ Bases:
cis.data_io.ungridded_data.LazyData
,cis.data_io.common_data.CommonData
Wrapper (adaptor) class for the different types of possible ungridded data.
-
add_attributes
(attributes)¶ Add a variable attribute to this data
Parameters: attributes – Dictionary of attribute names (keys) and values. Returns:
-
add_history
(new_history)¶ Appends to, or creates, the metadata history attribute using the supplied history string. The new entry is prefixed with a timestamp.
Parameters: new_history – history string
-
alias
¶ Return an alias for the variable name. This is an alternative name by which this data object may be identified if, for example, the actual variable name is not valid for some use (such as performing a python evaluation).
Returns: The alias Return type: str
-
as_data_frame
(copy=True)¶ Convert an UngriddedData object to a Pandas DataFrame.
Parameters: copy – Create a copy of the data for the new DataFrame? Default is True. Returns: A Pandas DataFrame representing the data and coordinates. Note that this won’t include any metadata.
-
coord
(name_or_coord=None, standard_name=None, long_name=None, attributes=None, axis=None)¶ Raise: CoordinateNotFoundError Returns: A single coord given the same arguments as coords()
.
-
coords
(name_or_coord=None, standard_name=None, long_name=None, attributes=None, axis=None, dim_coords=True)¶ Returns: A list of coordinates in this UngriddedData object fitting the given criteria
-
coords_flattened
¶
-
copy
()¶ Create a copy of this UngriddedData object with new data and coordinates so that that they can be modified without held references being affected. Will call any lazy loading methods in the data and coordinates
Returns: Copied UngriddedData object
-
copy_metadata_from
(other_data)¶ Method to copy the metadata from one UngriddedData/Cube object to another
-
data
¶ This is a getter for the data property. It caches the raw data if it has not already been read. Throws a MemoryError when reading for the first time if the data is too large.
-
data_flattened
¶ Returns a 1D flattened view (or copy, if necessary) of the data.
-
filenames
= []¶
-
find_standard_coords
()¶ Constructs a list of the standard coordinates. The standard coordinates are latitude, longitude, altitude, air_pressure and time; they occur in the return list in this order.
Returns: list of coordinates or None if coordinate not present
-
classmethod
from_points_array
(hyperpoints)¶ Constuctor for building an UngriddedData object from a list of hyper points
Parameters: hyperpoints – list of HyperPoints
-
get_all_points
()¶ Returns a HyperPointView of the points.
Returns: HyperPointView of all the data points
-
get_coordinates_points
()¶ Returns a HyperPointView of the coordinates of points.
Returns: HyperPointView of the coordinates of points
-
get_non_masked_points
()¶ Returns a HyperPointView for which the default iterator omits masked points.
Returns: HyperPointView of the data points
-
history
¶
-
hyper_point
(index)¶ Parameters: index – The index in the array to find the point for Returns: A hyperpoint representing the data at that point
-
is_gridded
¶ Returns value indicating whether the data/coordinates are gridded.
-
lat
¶
-
lon
¶
-
long_name
¶
-
make_new_with_same_coordinates
(data=None, var_name=None, standard_name=None, long_name=None, history=None, units=None, flatten=False)¶ Create a new, empty UngriddedData object with the same coordinates as this one.
Parameters: - data – Data to use (if None then defaults to all zeros)
- var_name – Variable name
- standard_name – Variable CF standard name
- long_name – Variable long name
- history – Data history string
- units – Variable units
- flatten – Whether to flatten the data and coordinates (for ungridded data only)
Returns: UngriddedData instance
-
name
()¶ This routine returns the first name property which is not empty out of: _name, standard_name and long_name. If they are all empty it returns an empty string :return: The name of the data object as a string
-
remove_attribute
(key)¶ Remove a variable attribute from this data
Parameters: key – Attribute key to remove Returns:
-
save_data
(output_file)¶
-
shape
¶
-
standard_name
¶
-
summary
()¶ Unicode summary of the UngriddedData with metadata of itself and its coordinates
-
time
¶
-
units
¶
-
update_range
(range=None)¶
-
update_shape
(shape=None)¶
-
var_name
¶
-
x
¶
-
y
¶
-
-
class
cis.data_io.ungridded_data.
UngriddedDataList
(iterable=())¶ Bases:
cis.data_io.common_data.CommonDataList
Class which represents multiple UngriddedData objects (e.g. from reading multiple variables)
-
add_history
(new_history)¶ Appends to, or creates, the metadata history attribute using the supplied history string. The new entry is prefixed with a timestamp. :param new_history: history string
-
append
(p_object)¶
-
append_or_extend
(item_to_add)¶ Append or extend an item to an existing list, depending on whether the item to add is itself a list or not. :param item_to_add: Item to add (may be list or not).
-
as_data_frame
(copy=True)¶ Convert an UngriddedDataList object to a Pandas DataFrame. Note that UngriddedDataList objects are expected to share coordinates, so only the coordinates from the first object in the list are used.
Parameters: copy – Create a copy of the data for the new DataFrame? Default is True. Returns: A Pandas DataFrame representing the data and coordinates. Note that this won’t include any metadata. Note
This function will copy your data by default. If you have a large array that cannot be copied, make sure it is not masked and use copy=False.
-
coord
(*args, **kwargs)¶ Call
UnGriddedData.coord(*args, **kwargs)()
for the first item of data (assumes all data in list has same coordinates)Parameters: - args –
- kwargs –
Returns:
-
coords
(*args, **kwargs)¶ Returns all coordinates used in all the data object :return: A list of coordinates in this data list object fitting the given criteria
-
copy
()¶ Create a copy of this UngriddedDataList with new data and coordinates so that that they can be modified without held references being affected. Will call any lazy loading methods in the data and coordinates
Returns: Copied UngriddedData object
-
count
(value) → integer -- return number of occurrences of value¶
-
extend
(iterable)¶
-
filenames
¶ Get the filenames in this data list
-
get_non_masked_points
()¶ Returns a list containing a HyperPointViews for which the default iterator omits masked points, for each item in this UngriddedDataList.
Returns: List of HyperPointViews of the data points
-
index
(value[, start[, stop]]) → integer -- return first index of value.¶ Raises ValueError if the value is not present.
-
insert
()¶ L.insert(index, object) – insert object before index
-
is_gridded
¶ Returns value indicating whether the data/coordinates are gridded.
-
pop
([index]) → item -- remove and return item at index (default last).¶ Raises IndexError if list is empty or index is out of range.
-
remove
()¶ L.remove(value) – remove first occurrence of value. Raises ValueError if the value is not present.
-
reverse
()¶ L.reverse() – reverse IN PLACE
-
save_data
(output_file)¶ Save the UngriddedDataList to a file
Parameters: output_file – output filename Returns:
-
set_longitude_range
(range_start)¶ Rotates the longitude coordinate array and changes its values by 360 as necessary to force the values to be within a 360 range starting at the specified value. :param range_start: starting value of required longitude range
-
sort
()¶ L.sort(cmp=None, key=None, reverse=False) – stable sort IN PLACE; cmp(x, y) -> -1, 0, 1
-
var_name
¶ Get the variable names in this list
-
-
class
cis.data_io.Coord.
Coord
(data, metadata, axis='')¶ Bases:
cis.data_io.ungridded_data.LazyData
-
add_attributes
(attributes)¶ Add a variable attribute to this data
Parameters: attributes – Dictionary of attribute names (keys) and values. Returns:
-
add_history
(new_history)¶ Appends to, or creates, the metadata history attribute using the supplied history string. The new entry is prefixed with a timestamp.
Parameters: new_history – history string
-
convert_TAI_time_to_std_time
(ref)¶
-
convert_datetime_to_standard_time
()¶
-
convert_julian_to_std_time
()¶
-
convert_to_std_time
(time_stamp_info=None)¶ Convert this coordinate to standard time. It will use either: the units of the coordinate if it is in the standard ‘x since y’ format; or the first word of the units, combined with the time stamp (if the timestamp is not given an error is thrown).
Parameters: time_stamp_info – the time stamp info from the file, None if it does not exist
-
copy
()¶ Create a copy of this Coord object with new data so that that they can be modified without held references being affected. This will call any lazy loading methods in the coordinate data
Returns: Copied Coord
-
copy_metadata_from
(other_data)¶ Method to copy the metadata from one UngriddedData/Cube object to another
-
data
¶ This is a getter for the data property. It caches the raw data if it has not already been read. Throws a MemoryError when reading for the first time if the data is too large.
-
data_flattened
¶ Returns a 1D flattened view (or copy, if necessary) of the data.
-
classmethod
from_many_coordinates
(coords)¶ Create a single coordinate object from the concatenation of all of the coordinate objects in the input list, updating the shape as appropriate
Parameters: coords – A list of coordinate objects to be combined Returns: A single Coord
object
-
long_name
¶
-
name
()¶ This routine returns the first name property which is not empty out of: _name, standard_name and long_name. If they are all empty it returns an empty string :return: The name of the data object as a string
-
points
¶ Alias for
self.data()
, to matchiris.coords.Coord.points()
interfaceReturns: Coordinate data values
-
remove_attribute
(key)¶ Remove a variable attribute from this data
Parameters: key – Attribute key to remove Returns:
-
save_data
(output_file)¶
-
set_longitude_range
(range_start)¶ Confine the coordinate longitude range to 360 degrees from the
range_start
value.Parameters: range_start (float) – Start of the longitude range
-
shape
¶
-
standard_name
¶
-
units
¶
-
update_range
(range=None)¶
-
update_shape
(shape=None)¶
-
var_name
¶
-
-
class
cis.data_io.Coord.
CoordList
(*args)¶ Bases:
list
All the functionality of a standard
list
with addedCoord
context.-
append
(other)¶ Safely add a new coordinate object to the list, this checks for a unique
axis
andstandard_name
.:param
Coord
other: Other coord to add :raises DuplicateCoordinateError: If the coordinate is not unique in the list
-
copy
()¶ Create a copy of this CoordList object with new data so that that they can be modified without held references being affected. This will call any lazy loading methods in the coordinate data
Returns: Copied CoordList
-
count
(value) → integer -- return number of occurrences of value¶
-
extend
()¶ L.extend(iterable) – extend list by appending elements from the iterable
-
find_standard_coords
()¶ Constructs a list of the standard coordinates. The standard coordinates are latitude, longitude, altitude, air_pressure and time; they occur in the return list in this order.
Returns: list
of coordinates or None if coordinate not present
-
get_coord
(name_or_coord=None, standard_name=None, long_name=None, attributes=None, axis=None)¶ Return a single coord fitting the given criteria. This is deliberately very similar to
Cube.coord()
method to maintain a similar interface and because the functionality is similar. There is no distinction between dimension coordinates and auxilliary coordinates here though.Parameters: - name_or_coord – This should be either: The standard name or long name or default name of the desired
coordinate; Or, a
Coord
instance whose metadata should be used for the search criteria (note that currently only the standard name is compared). If None, does not check for name. Also see,Cube.name
. - standard_name (string or None) – The CF standard name of the desired coordinate. If None, does not check for standard name.
- long_name (string or None) – An unconstrained description of the coordinate. If None, does not check for long_name.
- attributes (dict or None) – A dictionary of attributes desired on the coordinates. If None, does not check for attributes
- axis (string or None) – The desired coordinate axis, see
iris.util.guess_coord_axis()
. If None, does not check for axis. Accepts the values ‘X’, ‘Y’, ‘Z’ and ‘T’ (case-insensitive).
Raises CoordinateNotFoundError: If the arguments given do not result in precisely 1 coordinate being matched.
Returns: A single
Coord
.- name_or_coord – This should be either: The standard name or long name or default name of the desired
coordinate; Or, a
-
get_coordinates_points
()¶
-
get_coords
(name_or_coord=None, standard_name=None, long_name=None, attributes=None, axis=None)¶ Return a list of coordinates in this
CoordList
fitting the given criteria. This is deliberately very similar toCube.coords()
to maintain a similar interface and because the functionality is similar. There is no distinction between dimension coordinates and auxiliary coordinates here though.Parameters: - name_or_coord – This should be either: The standard name or long name or default name of the desired
coordinate; Or, a
Coord
instance whose metadata should be used for the search criteria (note that currently only the standard name is compared). If None, does not check for name. Also see,Cube.name
. - standard_name (string or None) – The CF standard name of the desired coordinate. If None, does not check for standard name.
- long_name (string or None) – An unconstrained description of the coordinate. If None, does not check for long_name.
- attributes (dict or None) – A dictionary of attributes desired on the coordinates. If None, does not check for attributes
- axis (string or None) – The desired coordinate axis, see
iris.util.guess_coord_axis()
. If None, does not check for axis. Accepts the values ‘X’, ‘Y’, ‘Z’ and ‘T’ (case-insensitive).
Returns: A
CoordList
of coordinates fitting the given criteria- name_or_coord – This should be either: The standard name or long name or default name of the desired
coordinate; Or, a
-
get_standard_coords
(data_len)¶ Constructs a list of the standard coordinate values. The standard coordinates are latitude, longitude, altitude, time and air_pressure; they occur in the return list in this order. If a standard coordinate has not been found it’s values are returned as a list of length
data_len
.Parameters: data_len (int) – Expected length of coordinate data Returns: list
of indexed sequences of coordinate values
-
index
(value[, start[, stop]]) → integer -- return first index of value.¶ Raises ValueError if the value is not present.
-
insert
()¶ L.insert(index, object) – insert object before index
-
pop
([index]) → item -- remove and return item at index (default last).¶ Raises IndexError if list is empty or index is out of range.
-
remove
()¶ L.remove(value) – remove first occurrence of value. Raises ValueError if the value is not present.
-
reverse
()¶ L.reverse() – reverse IN PLACE
-
sort
()¶ L.sort(cmp=None, key=None, reverse=False) – stable sort IN PLACE; cmp(x, y) -> -1, 0, 1
-
-
cis.data_io.gridded_data.
load_cube
(*args, **kwargs)¶
-
cis.data_io.gridded_data.
make_from_cube
(cube)¶
-
class
cis.data_io.common_data.
CommonData
¶ Bases:
object
Interface of common methods implemented for gridded and ungridded data.
-
alias
¶ Return an alias for the variable name. This is an alternative name by which this data object may be identified if, for example, the actual variable name is not valid for some use (such as performing a python evaluation).
Returns: The alias Return type: str
-
as_data_frame
(copy)¶ Convert a CommonData object to a Pandas DataFrame.
Parameters: copy – Create a copy of the data for the new DataFrame? Default is True. Returns: A Pandas DataFrame representing the data and coordinates. Note that this won’t include any metadata.
-
filenames
= []¶
-
get_all_points
()¶ Returns a list-like object allowing access to all points as HyperPoints. The object should allow iteration over points and access to individual points.
Returns: list-like object of data points
-
get_coordinates_points
()¶ Returns a list-like object allowing access to the coordinates of all points as HyperPoints. The object should allow iteration over points and access to individual points.
Returns: list-like object of data points
-
get_non_masked_points
()¶ Returns a list-like object allowing access to all points as HyperPoints. The object should allow iteration over non-masked points and access to individual points.
Returns: list-like object of data points
-
history
¶ Return the associated history of the object
Returns: The history Return type: str
-
is_gridded
()¶ Returns value indicating whether the data/coordinates are gridded.
-
var_name
¶ Return the variable name associated with this data object
Returns: The ariable name
-
-
class
cis.data_io.common_data.
CommonDataList
(iterable=())¶ Bases:
list
Interface for common list methods implemented for both gridded and ungridded data
-
add_history
(new_history)¶ Appends to, or creates, the metadata history attribute using the supplied history string. The new entry is prefixed with a timestamp. :param new_history: history string
-
append
(p_object)¶
-
append_or_extend
(item_to_add)¶ Append or extend an item to an existing list, depending on whether the item to add is itself a list or not. :param item_to_add: Item to add (may be list or not).
-
coords
(*args, **kwargs)¶ Returns all coordinates used in all the data object :return: A list of coordinates in this data list object fitting the given criteria
-
extend
(iterable)¶
-
filenames
¶ Get the filenames in this data list
-
is_gridded
¶ Returns value indicating whether the data/coordinates are gridded.
-
set_longitude_range
(range_start)¶ Rotates the longitude coordinate array and changes its values by 360 as necessary to force the values to be within a 360 range starting at the specified value. :param range_start: starting value of required longitude range
-
var_name
¶ Get the variable names in this list
-
Low-level IO modules¶
Module containing NetCDF file reading functions
-
cis.data_io.netcdf.
find_missing_value
(var)¶ Get the missing / fill value of the variable
Parameters: var – NetCDF Variable instance Returns: missing / fill value
-
cis.data_io.netcdf.
get_data
(var)¶ Reads raw data from a NetCDF.Variable instance.
Parameters: var – The specific Variable instance to read Returns: A numpy maskedarray. Missing values are False in the mask.
-
cis.data_io.netcdf.
get_metadata
(var)¶ Retrieves all metadata
Parameters: var – the Variable to read metadata from Returns: A metadata object
-
cis.data_io.netcdf.
get_netcdf_file_attributes
(filename)¶ Get all the global attributes from a NetCDF file
Parameters: filename – The filename of the file to get the variables from Returns: a dictionary of attributes and their values
-
cis.data_io.netcdf.
get_netcdf_file_variables
(filename, exclude_coords=False)¶ Get all the variables contained in a NetCDF file. Variables in NetCDF4 Hierarchical groups are returned with their fully qualified variable name in the form
<group1>.<group2....>.<variable_name>
, e.g.``AVHRR.Ch4CentralWavenumber``.Parameters: - filename – The filename of the file to get the variables from
- exclude_coords – Exclude coordinate variables if True
Returns: An OrderedDict containing {variable_name: NetCDF Variable instance}
-
cis.data_io.netcdf.
read
(filename, usr_variables)¶ Reads a Variable from a NetCDF file
Parameters: - filename – The name (with path) of the NetCDF file to read.
- usr_variables – A variable (dataset) name to read from the files. The name must appear exactly as in in the
NetCDF file. Variable names may be fully qualified NetCDF4 Hierarchical group variables in the form
<group1>.<group2....>.<variable_name>
, e.g.AVHRR.Ch4CentralWavenumber
.
Returns: A Variable instance constructed from the input file
-
cis.data_io.netcdf.
read_many_files
(filenames, usr_variables, dim=None)¶ Reads a single Variable from many NetCDF files. This method uses the netCDF4 MFDataset class and so is NOT suitable for NetCDF4 datasets (only ‘CLASSIC’ netcdf).
Parameters: - filenames – A list of NetCDF filenames to read, or a string with wildcards.
- usr_variables – A list of variable (dataset) names to read from the files. The names must appear exactly as in in the NetCDF file.
- dim – The name of the dimension on which to aggregate the data. None is the default which tries to aggregate over the unlimited dimension
Returns: A list of variable instances constructed from all of the input files
-
cis.data_io.netcdf.
read_many_files_individually
(filenames, usr_variables)¶ Read multiple Variables from many NetCDF files manually - i.e. not with MFDataset as this doesn’t always work, in particular for NetCDF4 files.
Parameters: - filenames – A list of NetCDF filenames to read, or a string with wildcards.
- usr_variables – A list of variable (dataset) names to read from the files. The names must appear exactly as
in in the NetCDF file. Variable names may be fully qualified NetCDF4 Hierarchical group variables in the form
<group1>.<group2....>.<variable_name>
, e.g.AVHRR.Ch4CentralWavenumber
.
Returns: A dictionary of lists of variable instances constructed from all of the input files with the fully qualified variable name as the key
-
cis.data_io.netcdf.
remove_variables_with_non_spatiotemporal_dimensions
(variables, spatiotemporal_var_names)¶ Remove from a list of netCDF variables any which have dimensionality which is not in an approved list of valid spatial or temporal dimensions (e.g. sensor number, pseudo dimensions). CIS currently does not support variables with this dimensionality and will fail if they are used.
Parameters: - variables – Dictionary of netCDF variable names : Variable objects. Variable names may be fully qualified
NetCDF4 Hierarchical group variables in the form
<group1>.<group2....>.<variable_name>
, e.g.AVHRR.Ch4CentralWavenumber
. - spatiotemporal_var_names – List of valid spatiotemporal dimensions.
Returns: None
- variables – Dictionary of netCDF variable names : Variable objects. Variable names may be fully qualified
NetCDF4 Hierarchical group variables in the form
Module for writing data to NetCDF files
-
cis.data_io.write_netcdf.
add_data_to_file
(data_object, filename)¶ Parameters: - data_object –
- filename –
Returns:
-
cis.data_io.write_netcdf.
write
(data_object, filename)¶ Parameters: - data_object –
- filename –
Returns:
-
cis.data_io.write_netcdf.
write_coordinate_list
(coord_list, filename)¶ Writes coordinates to a netCDF file.
Parameters: - coord_list – list of Coord objects
- filename – file to which to write
-
cis.data_io.write_netcdf.
write_coordinates
(coords, filename)¶ Writes coordinates to a netCDF file.
Parameters: - coords – UngriddedData or UngriddedCoordinates object for which the coordinates are to be written
- filename – file to which to write
-
cis.data_io.hdf.
get_hdf4_file_metadata
(filename)¶ This returns a dictionary of file attributes, which often contains metadata information about the whole file. The value of each attribute can simply be a big string which will often need to be parsed manually thereafter. :param filename :return: dictionary of string attributes
-
cis.data_io.hdf.
get_hdf4_file_variables
(filename, data_type=None)¶ Get all variables from a file containing ungridded data. Concatenate variable from both VD and SD data
Parameters: - filename – The filename of the file to get the variables from
- data_type – String representing the HDF data type, i.e. ‘VD’ or ‘SD’. if None, both are computed.
-
cis.data_io.hdf.
read
(filenames, variables)¶
-
cis.data_io.hdf.
read_data
(data_dict, data_type, missing_values=None)¶
-
cis.data_io.hdf.
read_metadata
(data_dict, data_type)¶
Module containing hdf file utility functions for the SD object
-
class
cis.data_io.hdf_sd.
HDF_SDS
(filename, variable)¶ Bases:
object
This class is used in place of the pyhdf.SD.SDS class to allow the file contents to be loaded at a later time rather than in this module read method (so that we can close the SD instances and free up file handles)
-
attributes
()¶ Call pyhdf.SD.SDS.attributes(), opening and closing the file
-
get
()¶ Call pyhdf.SD.SDS.get(), opening and closing the file
-
info
()¶ Call pyhdf.SD.SDS.info(), opening and closing the file
-
-
cis.data_io.hdf_sd.
get_calipso_data
(sds)¶ Reads raw data from an SD instance. Automatically applies the scaling factors and offsets to the data arrays found in Calipso data.
Parameters: sds – The specific sds instance to read Returns: A numpy array containing the raw data with missing data is replaced by NaN.
-
cis.data_io.hdf_sd.
get_data
(sds, missing_values=None)¶ Reads raw data from an SD instance. Automatically applies the scaling factors and offsets to the data arrays often found in NASA HDF-EOS data (e.g. MODIS)
Parameters: sds – The specific sds instance to read Returns: A numpy array containing the raw data with missing data is replaced by NaN.
-
cis.data_io.hdf_sd.
get_hdf_SD_file_variables
(filename)¶ Get all the variables from an HDF SD file
Parameters: filename (str) – The filename of the file to get the variables from Returns: An OrderedDict containing the variables from the file
-
cis.data_io.hdf_sd.
get_metadata
(sds)¶
-
cis.data_io.hdf_sd.
read
(filename, variables=None, datadict=None)¶ Reads SD from a HDF4 file into a dictionary.
Parameters: - filename (str) – The name (with path) of the HDF file to read.
- names (iterable) – A sequence of variable (dataset) names to read from the file (default None, causing all variables to be read). The names must appear exactly as in in the HDF file.
- datadict (dict) – Optional dictionary to add data to, otherwise a new, empty dictionary is created
Returns: A dictionary containing data for requested variables. Missing data is replaced by NaN.
Module containing hdf file utility functions for the VD object
-
class
cis.data_io.hdf_vd.
VDS
¶ Bases:
cis.data_io.hdf_vd.VDS
-
cis.data_io.hdf_vd.
get_data
(vds, first_record=False, missing_values=None)¶ Actually read the data from the VDS handle. We shouldn’t need to check for HDF being installed here because the VDS object which is being passed to us can only have come from pyhdf.
Parameters: - vds –
- first_record –
- missing_values –
Returns:
-
cis.data_io.hdf_vd.
get_hdf_VD_file_variables
(filename)¶ Get all the variables from an HDF VD file
Parameters: filename – The filename of the file to get the variables from Returns: An OrderedDict containing the variables from the file
-
cis.data_io.hdf_vd.
get_metadata
(vds)¶
-
cis.data_io.hdf_vd.
read
(filename, variables=None, datadict=None)¶ Given a filename and a list of file names return a dictionary of VD data handles
Parameters: - filename – full path to a single HDF4 file
- variables – A list of variables to read, if no variables are given, no variables are read
- datadict – A dictionary of variable name, data handle pairs to be appended to
Returns: An updated datadict with any new variables appended.
-
cis.data_io.aeronet.
get_aeronet_file_variables
(filename)¶ Return a list of valid Aeronet file variables with invalid characters removed. We need to remove invalid characters primarily for writing back out to CF-compliant NetCDF. :param filename: Full path to the file to read :return: A list of Aeronet variable names in the order they appear in the file
-
cis.data_io.aeronet.
get_file_metadata
(filename, variable='', shape=None)¶
-
cis.data_io.aeronet.
load_aeronet
(fname, variables=None)¶ loads aeronet lev 2.0 csv file.
Originally from http://code.google.com/p/metamet/ License: GNU GPL v3Parameters: - fname – data file name
- variables – A list of variables to return
Returns: A dictionary of variables names and numpy arrays containing the data for that variable
-
cis.data_io.aeronet.
load_multiple_aeronet
(fnames, variables=None)¶
Data reading and writing modules¶
-
class
cis.data_io.data_reader.
DataReader
(get_data_func=<function get_data>, get_coords_func=<function get_coordinates>, get_variables_func=<function get_variables>)¶ Bases:
object
High level class to manage reading data from a file. Principally, manages operations between one or multiple variables, and gridded or un-gridded data.
-
read_coordinates
(filenames, product=None)¶ Read the coordinates from a file :param filenames: The filename of the files to read :return: A CoordList object
-
read_data_list
(filenames, variables, product=None, aliases=None)¶ Read multiple data objects. Files can be either gridded or ungridded but not a mix of both.
Parameters: - filenames (string or list) – One or more filenames of the files to read
- variables (string or list) – One or more variables to read from the files
- product (str) – Name of data product to use (optional)
- aliases – List of variable aliases to put on each variables data object as an alternative means of identifying them. (Optional)
Returns: A list of the data read out (either a GriddedDataList or UngriddedDataList depending on the type of data contained in the files)
-
read_datagroups
(datagroups)¶ Read data from a set of datagroups
Parameters: datagroups – A list of datagroups. Each datagroup represents a grouping of files and variables, where the set of files may be logically considered to represent the same data (an example would be 2D model data split into monthly output files where the grid is the same). The following should be true of a datagroup:
- All variables in a datagroup are present in all the files in that datagroup
- The shape of the data returned from each variable must be the same in each file, so that they may be concatenated
- They should all be openable by the same CIS data product
- They should be dictionaries of the following format:
{'filenames': ['filename1.nc', 'filename2.nc'], 'variables': ['variable1', 'variable2'], 'product' : 'Aerosol_CCI'}
Returns: A list of CommonData objects (either GriddedData or UngriddedData, or a combination)
-
-
cis.data_io.data_reader.
expand_filelist
(filelist)¶ Parameters: filelist – A single element, or list, or comma seperated string of filenames, wildcarded filenames or directories Returns: A flat list of files which exist - with no duplicates Raises ValueError: if any of the files in the list do not exist.
-
class
cis.data_io.data_writer.
DataWriter
¶ Bases:
object
High level class for writing data to a file
-
write_data
(data, output_file)¶ Write data to a file.
Parameters: - data (CommonData) – Data to write
- output_file (str) – Output file name
-
cis.aggregation package¶
cis.aggregation.aggregate module¶
-
class
cis.aggregation.aggregate.
Aggregate
(grid, output_file, data_reader=<cis.data_io.data_reader.DataReader object>, data_writer=<cis.data_io.data_writer.DataWriter object>)¶ Bases:
object
-
aggregate
(variables, filenames, product=None, kernel=None)¶ Aggregate the given variables based on the initialised grid
Parameters: - variables (string or list) – One or more variables to read from the files
- filenames (string or list) – One or more filenames of the files to read
- product (str) – Name of data product to use (optional)
- kernel (str) – Name of kernel to use (the default is ‘moments’)
-
cis.aggregation.aggregation_grid module¶
-
class
cis.aggregation.aggregation_grid.
AggregationGrid
¶ Bases:
cis.aggregation.aggregation_grid.AggregationGrid
Holds the start and delta values for the aggregation grid. is_date indicates whether the limits are date/times - None if unknown :ivar start: aggregation start point :type start: str :ivar delta: aggregation step to take through grid :type delta: str :ivar is_time: indicates whether the limits apply to a time dimension: None if not known :type is_type: bool
cis.aggregation.aggregation_kernels module¶
Kernels used for the aggregation of GRIDDED data only. (Ungridded aggregation uses the standard collocation kernels.)
-
class
cis.aggregation.aggregation_kernels.
MultiKernel
(cell_method, sub_kernels)¶ Bases:
object
Represents a set of kernels to be applied each in turn
cis.aggregation.aggregator module¶
-
class
cis.aggregation.aggregator.
Aggregator
(data, grid)¶ Bases:
object
-
aggregate_gridded
(kernel)¶
-
aggregate_ungridded
(kernel)¶ Performs aggregation for ungridded data by first generating a new grid, converting it into a cube, then collocating using the appropriate kernel and a cube cell constraint
-
get_grid
(coord)¶
-
-
cis.aggregation.aggregator.
add_month_midpoint
(dt_object, months)¶
-
cis.aggregation.aggregator.
add_year_midpoint
(dt_object, years)¶
-
cis.aggregation.aggregator.
aggregation_grid_array
(start, end, delta, is_time, coordinate)¶
-
cis.aggregation.aggregator.
categorise_coord_function
(start, end, delta, is_time)¶
-
cis.aggregation.aggregator.
find_nearest
(array, value)¶ Find the nearest to the parameter value in the array :param array: A numpy array :param value: A single value :return: A single value from the array
-
cis.aggregation.aggregator.
month_past_end_of_year
(month, year)¶
cis.collocation package¶
cis.collocation.col module¶
Top level collocation objects
-
class
cis.collocation.col.
Collocate
(sample_points, missing_data_for_missing_sample=False, collocator_factory=<cis.collocation.col.CollocatorFactory object>)¶ Bases:
object
Perform a general collocation
-
collocate
(data, col_name=None, col_params=None, kern=None, kern_params=None)¶ Perform the collocation.
Parameters: - data (CommonData) – Data to collocate
- col_name (str) – Name of the collocator
- col_params (dict) – Parameters dictionary for the collocation and constraint
- kern (str) – The kernel to use
- kern_params (dict) – The kernel parameters to use
Return CommonData: The collocated data
Raises CoordinateNotFoundError: If the collocator was unable to compare the sample and data points
-
-
class
cis.collocation.col.
CollocatorFactory
¶ Bases:
object
Class for creating Collocator, Constraint and Kernel instances
-
get_collocator_instances_for_method
(method_name, kernel_name, collocator_params, kernel_params, sample_gridded, data_gridded)¶ Get instances of the correct classes for collocation :param method_name: Collocation method name (e.g. ‘lin’, ‘nn’) :param kernel_name: Kernel class name :param collocator_params: Collocation parameters :param kernel_params: Kernel parameters :param sample_gridded: Is the sample data gridded? :param data_gridded: Is the collocation data gridded? :return: Collocator, Constrain and Kernel instances
-
get_default_collocator_name
(method_name, sample_gridded, data_gridded)¶
-
cis.collocation.col_framework module¶
-
class
cis.collocation.col_framework.
AbstractDataOnlyKernel
¶ Bases:
cis.collocation.col_framework.Kernel
A Kernel that can work on data only, e.g. mean only requires the data values to calculate the mean, not the sampling point.
-
get_value
(point, data)¶ This method is redundant in the AbstractDataOnlyKernel and only serves as an interface to
AbstractDataOnlyKernel()
, removing the unnecessary point and checking for one or more data points.Parameters: - point – A single HyperPoint
- data – A set of data points to reduce to a single value
Returns: For return_size=1 a single value (number) otherwise a list of returns values, which represents some operation on the points provided
-
get_value_for_data_only
(values)¶ This method should return a single value (if
Kernel.return_size
is 1) or a list of n values (ifKernel.return_size
is n) based on some calculation on the the values (a numpy array).Note that this method will be called for every sample point in which data can be placed and so could become a bottleneck for calculations, it is advisable to make it as quick as is practical. If this method is unable to provide a value (for example if no data points were given) a ValueError should be thrown. This method will not be called if there are no values to be used for calculations.
Parameters: values – A numpy array of values (can not be none or empty) Returns: A single data item if return_size is 1 or a list of items containing Kernel.return_size
itemsRaises ValueError: If there are any problems creating a value
-
-
class
cis.collocation.col_framework.
CellConstraint
¶ Bases:
cis.collocation.col_framework.Constraint
Superclass of constraints acting on cells surrounding sample points.
The point argument in constrain_points is a HyperPoint in which the coordinate values are of type iris.coords.Cell.
-
get_iterator
(missing_data_for_missing_sample, coord_map, coords, data_points, shape, points, output_data)¶
-
-
class
cis.collocation.col_framework.
Collocator
(fill_value=None, var_name='', var_long_name='', var_units='', missing_data_for_missing_sample=False)¶ Bases:
object
Class which provides a method for performing collocation. This just defines the interface which the subclasses must implement.
-
collocate
(points, data, constraint, kernel)¶ The method is responsible for setting up and running the collocation. It should take a set of data and map that onto the given (sample) points using the constraint and kernel provided.
Parameters: - points – A set of sample points onto which we will collocate some other ‘data’
- data – Some other data to be collocated onto the ‘points’
- constraint – A
Constraint
instance which provides aConstraint.constrain_points()
method, and optionally anConstraint.get_iterator()
method - kernel – A
Kernel
instance which provides aKernel.get_value()
method
Returns: One or more
CommonData
(or subclasses of) objects whose coordinates lie on the points defined above.
-
-
class
cis.collocation.col_framework.
Constraint
¶ Bases:
object
Class which provides a method for constraining a set of points. A single HyperPoint is given as a reference but the data points to be reduced ultimately may be of any type. This just defines the interface which the subclasses must implement.
-
constrain_points
(point, data)¶ This method should return a subset of the data given a single reference point. It is expected that the data returned should be of the same type as that given - but this isn’t mandatory. It is possible that this function will return zero points (no data), the collocation class is responsible for providing a fill_value.
Parameters: - point (HyperPoint) – A single HyperPoint
- data – A set of data points to be reduced
Returns: A reduced set of data points
-
get_iterator
(missing_data_for_missing_sample, coord_map, coords, data_points, shape, points, output_data)¶ Iterator to iterate through the points needed to be calculated. The default iterator, iterates through all the sample points calling
Constraint.constrain_points()
for each one.Parameters: - missing_data_for_missing_sample – If true anywhere there is missing data on the sample then final point is missing; otherwise just use the sample
- coord_map – Coordinate map - list of tuples of indexes of hyperpoint coord, data coords and output coords
- coords – The coordinates to map the data onto
- data_points – The (non-masked) data points
- shape – Shape of the final data values
- points – The original points object, these are the points to collocate
- output_data – Output data set
Returns: Iterator which iterates through (sample indices, hyper point and constrained points) to be placed in these points
-
-
class
cis.collocation.col_framework.
IndexedConstraint
¶ Bases:
cis.collocation.col_framework.Constraint
Superclass of constraints that expect points to be referenced by index.
-
get_iterator
(missing_data_for_missing_sample, coord_map, coords, data_points, shape, points, output_data)¶
-
-
class
cis.collocation.col_framework.
Kernel
¶ Bases:
object
Class which provides a method for taking a number of points and returning one value. For example a nearest neighbour algorithm or sort algorithm or mean. This just defines the interface which the subclasses must implement.
-
get_value
(point, data)¶ This method should return a single value (if
Kernel.return_size
is 1) or a list of n values (ifKernel.return_size
is n) based on some calculation on the data given a single point.The data is deliberately left unspecified in the interface as it may be any type of data, however it is expected that each implementation will only work with a specific type of data (gridded, ungridded etc.) Note that this method will be called for every sample point and so could become a bottleneck for calculations, it is advisable to make it as quick as is practical. If this method is unable to provide a value (for example if no data points were given) a ValueError should be thrown.
Parameters: - point – A single HyperPoint
- data – A set of data points to reduce to a single value
Returns: For return_size=1 a single value (number) otherwise a list of return values, which represents some operation on the points provided
Raises ValueError: When the method is unable to return a value
-
get_variable_details
(var_name, var_long_name, var_standard_name, var_units)¶ Returns the details of all variables to be created from the outputs of a kernel.
Parameters: - var_name (str) – Base variable name
- var_long_name (str) – Base variable long name
- var_standard_name (str) – Base variable standard_name
- var_units (str) – Base variable units
Returns: Tuple of tuples, each containing (variable name, variable long name, variable units)
-
return_size
= 1¶ The number of values the
Kernel.get_value()
should be expected to return (i.e. the length of the return list).
-
-
class
cis.collocation.col_framework.
PointConstraint
¶ Bases:
cis.collocation.col_framework.Constraint
Superclass of constraints acting on sample points.
The point argument in constrain_points is a HyperPoint.
-
cis.collocation.col_framework.
get_collocator
(method=None)¶ Top level routine for finding the correct Collocator object. :param method: The collocate method to find - this should be a string which matches the name of one of the subclasses of Collocator :return: One of Collocator’s subclasses
-
cis.collocation.col_framework.
get_constraint
(method=None)¶ Top level routine for finding the correct Constraint object. This doesn’t instantiate the constraint class as it may need variables passed to the constructor :param method: The constraint method to find - this should be a string which matches the name of one of the subclasses of Constraint :return: One of Constraint’s subclasses
-
cis.collocation.col_framework.
get_kernel
(method=None)¶ Top level routine for finding the correct Kernel object. :param method: The kernel method to find - this should be a string which matches the name of one of the subclasses of Kernel :return: One of Kernel’s subclasses
cis.collocation.col_implementations module¶
-
class
cis.collocation.col_implementations.
BinnedCubeCellOnlyConstraint
¶ Bases:
cis.collocation.col_framework.Constraint
Constraint for constraining HyperPoints to be within an iris.coords.Cell. With an iterator which only travels over those cells with a value in.
Uses the index_data method to bin all the points.
-
constrain_points
(sample_point, data)¶
-
get_iterator
(missing_data_for_missing_sample, coord_map, coords, data_points, shape, points, output_data)¶
-
get_iterator_for_data_only
(missing_data_for_missing_sample, coord_map, coords, data_points, shape, points, values)¶ The method returns an iterator over the output indices and a numpy array slice of the data values. This may not be called by all collocators who may choose to iterate over all sample points instead.
Parameters: - missing_data_for_missing_sample – If true anywhere there is missing data on the sample then final point is missing; otherwise just use the sample
- coord_map – Not needed for the data only kernel
- coords – Not needed for the data only kernel
- data_points – The (non-masked) data points
- shape – Not needed
- points – The original points object, these are the points to collocate
- values – Not needed
Returns: Iterator which iterates through (sample indices and data slice) to be placed in these points
-
-
class
cis.collocation.col_implementations.
BinningCubeCellConstraint
¶ Bases:
cis.collocation.col_framework.IndexedConstraint
Constraint for constraining HyperPoints to be within an iris.coords.Cell.
Uses the index_data method to bin all the points
-
constrain_points
(sample_point, data)¶ Returns HyperPoints lying within a cell.
This implementation returns the points that have been stored in the appropriate bin by the index_data method. :param sample_point: HyperPoint of indices of cells defining sample region :param data: list of HyperPoints to check :return: HyperPointList of points found within cell
-
-
class
cis.collocation.col_implementations.
CubeCellConstraint
¶ Bases:
cis.collocation.col_framework.CellConstraint
Constraint for constraining HyperPoints to be within an iris.coords.Cell.
-
constrain_points
(sample_point, data)¶ Returns HyperPoints lying within a cell. :param sample_point: HyperPoint of cells defining sample region :param data: list of HyperPoints to check :return: HyperPointList of points found within cell
-
-
class
cis.collocation.col_implementations.
DummyCollocator
(fill_value=None, var_name='', var_long_name='', var_units='', missing_data_for_missing_sample=False)¶ Bases:
cis.collocation.col_framework.Collocator
-
collocate
(points, data, constraint, kernel)¶ - This collocator does no collocation at all - it just returns the original data values. This might be useful if the input data for one variable is already known to be on the same grid as points. This routine could check the coordinates are the same but currently does no such check.
Parameters: - points – A list of HyperPoints
- data – An UngriddedData object or Cube
- constraint – Unused
- kernel – Unused
Returns: A single LazyData object
-
-
class
cis.collocation.col_implementations.
DummyConstraint
¶ Bases:
cis.collocation.col_framework.Constraint
-
constrain_points
(point, data)¶
-
-
class
cis.collocation.col_implementations.
GeneralGriddedCollocator
(fill_value=None, var_name='', var_long_name='', var_units='', missing_data_for_missing_sample=False)¶ Bases:
cis.collocation.col_framework.Collocator
Performs collocation of data on to the points of a cube (ie onto a gridded dataset).
-
collocate
(points, data, constraint, kernel)¶ Parameters: - points – cube defining the sample points
- data – CommonData object providing data to be collocated (or list of Data)
- constraint – instance of a Constraint subclass, which takes a data object and returns a subset of that data based on it’s internal parameters
- kernel – instance of a Kernel subclass which takes a number of points and returns a single value
Returns: GriddedDataList of collocated data
-
-
class
cis.collocation.col_implementations.
GeneralUngriddedCollocator
(fill_value=None, var_name='', var_long_name='', var_units='', missing_data_for_missing_sample=False)¶ Bases:
cis.collocation.col_framework.Collocator
Collocator for locating onto ungridded sample points
-
collocate
(points, data, constraint, kernel)¶ This collocator takes a list of HyperPoints and a data object (currently either Ungridded data or a Cube) and returns one new LazyData object with the values as determined by the constraint and kernel objects. The metadata for the output LazyData object is copied from the input data object.
Parameters: - points – UngriddedData or UngriddedCoordinates defining the sample points
- data – An UngriddedData object or Cube, or any other object containing metadata that the constraint object can read. May also be a list of objects, in which case a list will be returned
- constraint – An instance of a Constraint subclass which takes a data object and returns a subset of that data based on it’s internal parameters
- kernel – An instance of a Kernel subclass which takes a number of points and returns a single value
Returns: A single LazyData object
-
-
class
cis.collocation.col_implementations.
GriddedCollocator
(var_name='', var_long_name='', var_units='', missing_data_for_missing_sample=False)¶ Bases:
cis.collocation.col_framework.Collocator
-
collocate
(points, data, constraint, kernel)¶ This collocator takes two Iris cubes, and collocates from the data cube onto the grid of the ‘points’ cube. The collocator then returns another Iris cube. :param points: An Iris cube with the sampling grid to collocate onto. :param data: The Iris cube with the data to be collocated. :param constraint: None allowed yet, as this is unlikely to be required for gridded-gridded. :param kernel: The kernel to use, current options are gridded_gridded_nn and gridded_gridded_li. :return: An Iris cube with the collocated data.
-
-
class
cis.collocation.col_implementations.
SepConstraint
(h_sep=None, a_sep=None, p_sep=None, t_sep=None)¶ Bases:
cis.collocation.col_framework.PointConstraint
-
alt_constraint
(point, ref_point)¶
-
constrain_points
(ref_point, data)¶
-
horizontal_constraint
(point, ref_point)¶
-
pressure_constraint
(point, ref_point)¶
-
time_constraint
(point, ref_point)¶
-
-
class
cis.collocation.col_implementations.
SepConstraintKdtree
(h_sep=None, a_sep=None, p_sep=None, t_sep=None)¶ Bases:
cis.collocation.col_framework.PointConstraint
A separation constraint that uses a k-D tree to optimise spatial constraining. If no horizontal separation parameter is supplied, this reduces to an exhaustive search using the other parameter(s).
-
alt_constraint
(point, ref_point)¶
-
constrain_points
(ref_point, data)¶
-
horizontal_constraint
(point, ref_point)¶
-
pressure_constraint
(point, ref_point)¶
-
time_constraint
(point, ref_point)¶
-
-
class
cis.collocation.col_implementations.
gridded_gridded_li
¶ Bases:
cis.collocation.col_framework.Kernel
-
get_value
(point, data)¶ Not needed for gridded/gridded collocation.
-
-
class
cis.collocation.col_implementations.
gridded_gridded_nn
¶ Bases:
cis.collocation.col_framework.Kernel
-
get_value
(point, data)¶ Not needed for gridded/gridded collocation.
-
-
class
cis.collocation.col_implementations.
li
(extrapolate=False, nn_vertical=False)¶ Bases:
cis.collocation.col_framework.Kernel
Linear Interpolation Kernel
-
get_value
(point, data)¶ Co-location routine using iris’ linear interpolation algorithm. This only makes sense for gridded data.
-
-
cis.collocation.col_implementations.
make_coord_map
(points, data)¶ Create a map for how coordinates from the sample points map to the standard hyperpoint coordinates. Ignoring coordinates which are not present in the data :param points: sample points :param data: data to map :return: list of tuples, each tuple is index of coordinate to use tuple is (hyper point coord index, sample point coord index, output coord index)
-
class
cis.collocation.col_implementations.
max
¶ Bases:
cis.collocation.col_framework.AbstractDataOnlyKernel
Calculate the maximum value
-
get_value_for_data_only
(values)¶ Return the maximum value
-
-
class
cis.collocation.col_implementations.
mean
¶ Bases:
cis.collocation.col_framework.AbstractDataOnlyKernel
Calculate mean of data points
-
get_value_for_data_only
(values)¶ return the mean
-
-
class
cis.collocation.col_implementations.
min
¶ Bases:
cis.collocation.col_framework.AbstractDataOnlyKernel
Calculate the minimum value
-
get_value_for_data_only
(values)¶ Return the minimum value
-
-
class
cis.collocation.col_implementations.
moments
(mean_name='', stddev_name='', nopoints_name='')¶ Bases:
cis.collocation.col_framework.AbstractDataOnlyKernel
-
get_value_for_data_only
(values)¶ Returns the mean, standard deviation and number of values
-
get_variable_details
(var_name, var_long_name, var_standard_name, var_units)¶ Sets name and units for mean, standard deviation and number of points variables, based on those of the base variable or overridden by those specified as kernel parameters. :param var_name: base variable name :param var_long_name: base variable long name :param var_standard_name: base variable standard name :param var_units: base variable units :return: tuple of tuples each containing (variable name, variable long name, variable units)
-
return_size
= 3¶
-
-
class
cis.collocation.col_implementations.
nn_a
¶ Bases:
cis.collocation.col_implementations.nn_altitude
Nearest neighbour altitude kernel - alias for nn_altitude.
-
class
cis.collocation.col_implementations.
nn_altitude
¶ Bases:
cis.collocation.col_framework.Kernel
-
get_value
(point, data)¶ - Collocation using nearest neighbours in altitude, where both points and
- data are a list of HyperPoints. The default point is the first point.
-
-
class
cis.collocation.col_implementations.
nn_gridded
¶ Bases:
cis.collocation.col_framework.Kernel
-
get_value
(point, data)¶ Co-location routine using nearest neighbour algorithm optimized for gridded data. This calls out to iris to do the work.
-
-
class
cis.collocation.col_implementations.
nn_h
¶ Bases:
cis.collocation.col_implementations.nn_horizontal
Nearest neighbour horizontal kernel - alias for nn_horizontal.
-
class
cis.collocation.col_implementations.
nn_horizontal
¶ Bases:
cis.collocation.col_framework.Kernel
-
get_value
(point, data)¶ - Collocation using nearest neighbours along the face of the earth where both points and
- data are a list of HyperPoints. The default point is the first point.
-
-
class
cis.collocation.col_implementations.
nn_horizontal_kdtree
¶ Bases:
cis.collocation.col_framework.Kernel
-
get_value
(point, data)¶ Collocation using nearest neighbours along the face of the earth using a k-D tree index.
-
-
class
cis.collocation.col_implementations.
nn_p
¶ Bases:
cis.collocation.col_implementations.nn_pressure
Nearest neighbour pressure kernel - alias for nn_pressure.
-
class
cis.collocation.col_implementations.
nn_pressure
¶ Bases:
cis.collocation.col_framework.Kernel
-
get_value
(point, data)¶ - Collocation using nearest neighbours in pressure, where both points and
- data are a list of HyperPoints. The default point is the first point.
-
-
class
cis.collocation.col_implementations.
nn_t
¶ Bases:
cis.collocation.col_implementations.nn_time
Nearest neighbour time kernel - alias for nn_time.
-
class
cis.collocation.col_implementations.
nn_time
¶ Bases:
cis.collocation.col_framework.Kernel
-
get_value
(point, data)¶ - Collocation using nearest neighbours in time, where both points and
- data are a list of HyperPoints. The default point is the first point.
-
-
class
cis.collocation.col_implementations.
stddev
¶ Bases:
cis.collocation.col_framework.AbstractDataOnlyKernel
Calculate the standard deviation
-
get_value_for_data_only
(values)¶ Return the standard deviation points
-
cis.plotting package¶
cis.plotting.plot module¶
Class for plotting graphs. Also contains a dictionary for the valid plot types. All plot types need to be imported and added to the plot_types dictionary in order to be used.
-
class
cis.plotting.plot.
Plotter
(packed_data_items, plot_type=None, out_filename=None, *mplargs, **mplkwargs)¶ Bases:
object
-
output_to_file_or_screen
(out_filename=None)¶ Outputs to screen unless a filename is given
Parameters: out_filename – The filename of the file to save the plot to. Various file extensions can be used, with png being the default
-
plot_types
= {'contourf': <class 'cis.plotting.contourf_plot.Contourf_Plot'>, 'heatmap': <class 'cis.plotting.heatmap.Heatmap'>, 'comparativescatter': <class 'cis.plotting.comparative_scatter.Comparative_Scatter'>, 'overlay': <class 'cis.plotting.overlay.Overlay'>, 'line': <class 'cis.plotting.line_plot.Line_Plot'>, 'histogram2d': <class 'cis.plotting.histogram2d.Histogram_2D'>, 'histogram3d': <class 'cis.plotting.histogram3d.Histogram_3D'>, 'contour': <class 'cis.plotting.contour_plot.Contour_Plot'>, 'scatter': <class 'cis.plotting.scatter_plot.Scatter_Plot'>}¶
-
remove_unassigned_arguments
()¶ Removes arguments from the mplkwargs if they are equal to None
-
set_default_plot_type
(data)¶ Sets the default plot type based on the number of dimensions of the data :param data: A list of packed data items :return: The default plot type as a string
-
cis.subsetting package¶
cis.subsetting.subset module¶
-
class
cis.subsetting.subset.
Subset
(limits, output_file, data_reader=<cis.data_io.data_reader.DataReader object>, data_writer=<cis.data_io.data_writer.DataWriter object>)¶ Bases:
object
Class for subsetting Ungridded or Gridded data either temporally, or spatially or both.
-
subset
(variables, filenames, product=None)¶ Subset the given variables based on the initialised limits
Parameters: - variables (string or list) – One or more variables to read from the files
- filenames (string or list) – One or more filenames of the files to read
- product (str) – Name of data product to use (optional)
-
cis.subsetting.subset_constraint module¶
-
class
cis.subsetting.subset_constraint.
CoordLimits
¶ Bases:
cis.subsetting.subset_constraint.CoordLimits
Holds the start and end values for subsetting limits. :ivar coord: the coordinate the limit applies to :ivar start: subsetting limit start :ivar end: subsetting limit end :ivar constraint_function: function determining whether the constraint is satisfied
-
class
cis.subsetting.subset_constraint.
GriddedSubsetConstraint
¶ Bases:
cis.subsetting.subset_constraint.SubsetConstraint
Implementation of SubsetConstraint for subsetting gridded data.
-
constrain
(data)¶ Subsets the supplied data using a combination of iris.cube.Cube.extract and iris.cube.Cube.intersection, depending on whether intersection is supported (whether the coordinate has a defined modulus). :param data: data to be subsetted :return: subsetted data or None if all data excluded. @rtype: cis.data_io.gridded_data.GriddedData
-
-
class
cis.subsetting.subset_constraint.
SubsetConstraint
¶ Bases:
cis.subsetting.subset_framework.SubsetConstraintInterface
Abstract Constraint for subsetting.
Holds the limits for subsetting in each dimension.
-
set_limit
(coord, dim_min, dim_max)¶ Sets boundary values for a dimension to be used in subsetting. :param coord: coordinate to which limit applies :param dim_min: lower bound on dimension or None to indicate no lower bound :param dim_max: upper bound on dimension or None to indicate no upper bound
-
-
class
cis.subsetting.subset_constraint.
UngriddedSubsetConstraint
¶ Bases:
cis.subsetting.subset_constraint.SubsetConstraint
Implementation of SubsetConstraint for subsetting ungridded data.
-
constrain
(data)¶ Subsets the supplied data.
Parameters: data – data to be subsetted Returns: subsetted data
-
cis.subsetting.subset_framework module¶
cis.subsetting.subset_limits module¶
-
class
cis.subsetting.subset_limits.
SubsetLimits
¶ Bases:
cis.subsetting.subset_limits.SubsetLimits
Holds the start and end values for subsetting limits. is_date indicates whether the limits are date/times - None if unknown :ivar start: subsetting limit start :type start: str :ivar end: subsetting limit end :type end: str :ivar is_time: indicates whether the limits apply to a time dimension: None if not known :type is_type: bool
cis.stats module¶
-
class
cis.stats.
StatsAnalyzer
(data1, data2)¶ Analyse datasets to produce statistics.
-
analyze
()¶ Perform a statistical analysis on two data sets.
Returns: List of StatisticsResult instances.
-
points_count
()¶ Count all points which will be used for statistical comparison operations (i.e. are non-missing in both datasets).
Returns: List of StatisticsResults
-
means
()¶ Means of two datasets
Returns: List of StatisticsResults
-
stddevs
()¶ Corrected sample standard deviation of datasets
Returns: List of StatisticsResults
-
abs_mean
()¶ Mean of absolute difference d2-d1
Returns: List of StatisticsResults
-
abs_stddev
()¶ Standard deviation of absolute difference d2-d1
Returns: List of StatisticsResults
-
rel_mean
()¶ Mean of relative difference (d2-d1)/d1
Returns: List of StatisticsResults
-
rel_stddev
()¶ Mean of relative difference (d2-d1)/d1
Returns: List of StatisticsResults
-
spearmans_rank
()¶ Perform a spearman’s rank on the data
Returns: List of StatisticsResults
-
linear_regression
()¶ Perform a linear regression on the data
Returns: List of StatisticsResults
-
CIS utility functions¶
cis.time_util module¶
Utilities for converting time units
-
cis.time_util.
calculate_mid_time
(t1, t2)¶ - Find the mid time between two times expressed as floats
Parameters: - t1 – a time represented as a float
- t2 – a time in the same representation as t1
Returns: a float representing the time between t1 and t2
-
cis.time_util.
convert_cube_time_coord_to_standard_time
(cube)¶ Converts the time coordinate from the one in the cube to one based on a standard time unit. :param cube: cube to modify :return: the cube
-
cis.time_util.
convert_datetime_to_std_time
(dt)¶
-
cis.time_util.
convert_julian_date_to_std_time
(days_since)¶ Convert an array of julian days to cis standard time
- ..note:
- Array should have units like: Julian Date, days elapsed since 12:00 January 1, 4713 BC
Parameters: days_since – numpy array of fractional days since 12:00 January 1, 4713 BC Returns: fractional days since cis standard time
-
cis.time_util.
convert_sec_since_to_std_time
(seconds, ref)¶ Convert a number of seconds since a given reference datetime to a number of days since our standard time. The given reference DateTime must be on the Gregorian calendar.
Parameters: - seconds – Array of seconds (since the reference time provided)
- ref – The reference datetime which the seconds are counted from
Type: ndarray
Type: DateTime
Returns: A numpy array containing all of the time values (in fractional days since the CIS standard time)
-
cis.time_util.
convert_std_time_to_datetime
(std_time)¶
-
cis.time_util.
convert_time_since_to_std_time
(time_array, units)¶
-
cis.time_util.
convert_time_using_time_stamp_info_to_std_time
(time_array, units, time_stamp_info=None)¶ Convert the time using time stamp info and the first word of the units :param time_array: the time array to convert :param units: the units of the array (e.g. day or Days from the file time reference 2012-12-12) :param time_stamp_info: the time stamp to use for the convertion :return: converted data
cis.utils module¶
-
class
cis.utils.
OrderedSet
(iterable=None)¶ Bases:
_abcoll.MutableSet
From http://code.activestate.com/recipes/576694/
-
add
(key)¶
-
discard
(key)¶
-
pop
(last=True)¶
-
-
cis.utils.
add_element_to_list_in_dict
(my_dict, key, value)¶
-
cis.utils.
add_file_prefix
(prefix, filepath)¶ Add a prefix to a filename taking into account any path that might be present before that actual filename
Parameters: - prefix – A string to prefix the filename with
- filepath – Filename, optionally including path
Returns: A string with the full path to the prefixed file
-
cis.utils.
add_to_list_if_not_none
(item, list)¶ Add a value to a list if it is not None
Parameters: - item – the item to add
- list – the list to append it to
Returns: nothing
-
cis.utils.
apply_intersection_mask_to_two_arrays
(array1, array2)¶ Ensure two (optionally) masked arrays have the same mask. If both arrays are masked the intersection of the masks is used. If one array is masked and the other is not, the mask from the masked array is applied to the unmasked array. If neither array is masked then both arrays are returned as masked arrays with an empty mask.
Parameters: - array1 – An (optionally masked) array
- array2 – Another (optionally masked) array
Returns: Two masked arrays with a common mask
-
cis.utils.
apply_mask_to_numpy_array
(in_array, mask)¶ Element-wise ORs the mask with the mask of the array. If the mask masks no elements, no change is made. If the array is not masked, it is converted to a masked array.
Parameters: - in_array (numpy array or masked array) – input array
- mask (numpy array of boolean) – mask
-
cis.utils.
array_equal_including_nan
(array1, array2)¶ Parameters: - array1 – A numpy array
- array2 – Another numpy array (can be of a different shape)
Returns: True or false if the arrays are equal, including NaNs.
-
cis.utils.
calculate_histogram_bin_edges
(data, axis, user_range, step, log_scale=False)¶ Parameters: - data – A numpy array
- axis – The axis on which the data will be plotted. Set to “x” for histogram2d
- user_range – A dictionary containing the min and max values for the edges specified by the user. The data min and max is used if the user did not specify
- step – The distance between each bin edge/the width of each bin
Returns: An array containing a list of bin edges (i.e. when each bin starts and ends)
-
cis.utils.
concatenate
(arrays, axis=0)¶ Concatenate a list of numpy arrays into one larger array along the axis specified (the default axis is zero). If any of the arrays are masked arrays then the returned array will be a masked array with the correct mask, otherwise a numpy array is returned.
Parameters: - arrays – A list of numpy arrays (masked or not)
- axis – The axis along which to concatenate (the default is 0)
Returns: The concatenated array
-
cis.utils.
copy_attributes
(source, dest)¶ Copy all attributes from one object to another
Parameters: - source – Object to copy attributes from
- dest – Object to copy attributes to
Returns: None
-
cis.utils.
create_masked_array_for_missing_data
(data, missing_val)¶
-
cis.utils.
create_masked_array_for_missing_values
(data, missing_values)¶
-
cis.utils.
deprecated
(func)¶ This is a decorator which can be used to mark functions as deprecated. It will result in a warning being emitted when the function is used.
Taken from http://code.activestate.com/recipes/391367-deprecated/
-
cis.utils.
dimensions_equal
(dimensions, other_dimensions)¶ Check to see if two dimensions are the same (contain the same variables in the same order)
Parameters: - dimensions – dimension list
- other_dimensions – other dimension list
-
cis.utils.
expand_1d_to_2d_array
(array_1d, length, axis=0)¶ General utility routine to ‘extend a 1D array into a 2D array by duplicating the data along a given ‘axis’ (default is 0) of size ‘length’.
Examples:
>>> a = np.array([1, 2, 3, 4]) >>> expand_1d_to_2d_array(a, 4, axis=0) [[1 2 3 4] [1 2 3 4] [1 2 3 4] [1 2 3 4]] >>> a = np.array([1, 2, 3, 4]) >>> expand_1d_to_2d_array(a, 4, axis=1) [[1 1 1 1] [2 2 2 2] [3 3 3 3] [4 4 4 4]]
Parameters: - array_1d –
- length –
- axis –
Returns:
-
cis.utils.
find_longitude_wrap_start
(x_variable, packed_data_items)¶ ONLY WORK OUT THE WRAP START OF THE DATA :param x_variable: :param x_range: :param packed_data_items: :return:
-
cis.utils.
fix_longitude_range
(lons, range_start)¶ Shifts longitude values by +/- 360 to fit within a 360 degree range starting at a specified value. It is assumed that a no shifts larger than 360 are needed.
Parameters: - lons – numpy array of longitude values
- range_start – longitude at start of 360 degree range into which values are required to fit
Returns: array of fixed longitudes
-
cis.utils.
get_class_name
(cls)¶ Returns the qualified class name of a class.
Parameters: cls – class Returns: class name
-
cis.utils.
get_coord
(data_object, variable, data)¶ Find a specified coord
Parameters: - data_object –
- variable –
- data –
Returns:
-
cis.utils.
guess_coord_axis
(coord)¶ Returns X, Y, Z or T corresponding to longitude, latitude, altitude or time respectively if the coordinate can be determined to be one of these (based on the standard name only, in this implementation).
This is intended to be similar to iris.util.guess_coord_axis.
-
cis.utils.
haversine
(lat, lon, lat2, lon2)¶ Computes the Haversine distance between two points
-
cis.utils.
index_iterator
(shape)¶ Iterates over the indexes of a multi-dimensional array of a specified shape. The last index changes most rapidly.
Parameters: shape – sequence of array dimensions Returns: yields tuples of array indexes
-
cis.utils.
index_iterator_for_non_masked_data
(shape, points)¶ Iterates over the indexes of a multi-dimensional array of a specified shape. The last index changes most rapidly.
Parameters: shape – sequence of array dimensions Returns: yields tuples of array indexes
-
cis.utils.
index_iterator_nditer
(shape, points)¶ Iterates over the indexes of a multi-dimensional array of a specified shape. The last index changes most rapidly.
Parameters: shape – sequence of array dimensions Returns: yields tuples of array indexes
-
cis.utils.
isnan
(number)¶
-
cis.utils.
listify
(item)¶ If item is not a list, return it as a list
Parameters: item – Item which may or may not be a list Returns: List
-
cis.utils.
log_memory_profile
(location)¶ Write the total memory to the log as debug message
Parameters: location – location in the program where the memory measurement was taken Returns: nothing
-
cis.utils.
parse_distance_with_units_to_float_km
(distance)¶ Parse a string such as ‘10km’ or ‘1.0e3m’ to a distance in km
Parameters: distance – string to parse Returns: A distance in km
-
cis.utils.
parse_distance_with_units_to_float_m
(distance)¶ Parse a string such as ‘10km’ or ‘1.0e3m’ to a distance in m
Parameters: distance – string to parse Returns: A distance in m
-
cis.utils.
parse_key_val_list
(input_list)¶ - Takes list of keyword value strings (seperated by =) and returns a dictionary with those keys and values NOTE if a key has no value, the key is stored and given the value True
Parameters: input_list – A list of strings which are keyword value pairs separated by = Returns: A dictionary of the keywords and values
-
cis.utils.
parse_key_val_string
(arguments, separator)¶ Takes a (comma) separated list of keyword value pairs (separated by =) and returns a dictionary with those keys and values
Parameters: - arguments – A string which is a separated list of keyword value pairs
- separator – String which is used to split the string into a list
Returns: A dictionary of the keywords and values
-
cis.utils.
remove_file_prefix
(prefix, filepath)¶ Remove a prefix from a filename, taking into account any path that might be present before that actual filename
Parameters: - prefix – The prefix to remove
- filepath – Filename, optional including path
Returns: A string with the full path to the un-prefixed file
-
cis.utils.
set_cube_standard_name_if_valid
(cube, standard_name)¶ Set a cube’s standard name if it is a valid CF compliant name, otherwise set it to None
Parameters: - cube – Cube to set standard name on
- standard_name – Standard name to set
Returns:
-
cis.utils.
split_into_float_and_units
(measurement)¶ Split a string such as ‘1000m’ or ‘1.0e3’ to a value and, optionally, units
Parameters: distance – string to parse Returns: A distance in m
-
cis.utils.
unpack_data_object
(data_object, x_variable, y_variable, x_wrap_start)¶ :param data_object A cube or an UngriddedData object :return: A dictionary containing x, y and data as numpy arrays
-
cis.utils.
wrap_longitude_coordinate_values
(x_min, x_max)¶
cis.exceptions module¶
Custom CIS exceptions
-
exception
cis.exceptions.
CISError
¶ Bases:
exceptions.Exception
-
exception
cis.exceptions.
ClassNotFoundError
¶ Bases:
cis.exceptions.CISError
-
exception
cis.exceptions.
CoordinateNotFoundError
¶ Bases:
cis.exceptions.CISError
-
exception
cis.exceptions.
DuplicateCoordinateError
¶ Bases:
cis.exceptions.CISError
-
exception
cis.exceptions.
FileFormatError
(error_list, *args, **kwargs)¶ Bases:
cis.exceptions.CISError
Throw when there is an error determining the type of a file
-
error_list
= ['Unknown error']¶
-
-
exception
cis.exceptions.
InconsistentDimensionsError
¶ Bases:
cis.exceptions.CISError
-
exception
cis.exceptions.
InvalidCommandLineOptionError
¶ Bases:
cis.exceptions.CISError
-
exception
cis.exceptions.
InvalidDataTypeError
¶ Bases:
cis.exceptions.CISError
-
exception
cis.exceptions.
InvalidDimensionError
¶ Bases:
cis.exceptions.CISError
-
exception
cis.exceptions.
InvalidFileExtensionError
¶ Bases:
cis.exceptions.CISError
-
exception
cis.exceptions.
InvalidHistogramStyleError
¶ Bases:
cis.exceptions.CISError
-
exception
cis.exceptions.
InvalidLineStyleError
¶ Bases:
cis.exceptions.CISError
-
exception
cis.exceptions.
InvalidNumberOfDatagroupsSpecifiedError
¶ Bases:
cis.exceptions.CISError
-
exception
cis.exceptions.
InvalidOperationError
¶ Bases:
cis.exceptions.CISError
-
exception
cis.exceptions.
InvalidPlotFormatError
¶ Bases:
cis.exceptions.CISError
-
exception
cis.exceptions.
InvalidPlotTypeError
¶ Bases:
cis.exceptions.CISError
-
exception
cis.exceptions.
InvalidVariableError
¶ Bases:
cis.exceptions.CISError
-
exception
cis.exceptions.
NoDataInSubsetError
¶ Bases:
cis.exceptions.CISError
-
exception
cis.exceptions.
NotEnoughAxesSpecifiedError
¶ Bases:
cis.exceptions.CISError
-
exception
cis.exceptions.
UserPrintableException
(message)¶ Bases:
cis.exceptions.CISError
This exception is thrown if the program has failed for a known reason. This message is printed without a stack trace