Welcome to the Community Intercomparison Suite’s documentation!

Contents:

Installing CIS

A pre-packaged version of CIS is available for installation using conda for 64-bit Linux, Mac OSX and Windows.

Once conda is installed, you can easily install CIS with the following command:

$ conda install -c cistools -c scitools cis

If you don’t already have conda, you must first download and install it. Anaconda is a free conda package that includes Python and many common scientific and data analysis libraries, and is available here. Further documentation on using Anaconda and the features it provides can be found at http://docs.continuum.io/anaconda/index.html.

To check that CIS is installed correctly, simply type cis version to display the version number, for example:

$ cis version
Using CIS version: V1R2M0 (Stable)

In order to upgrade CIS to the latest version use:

$ conda update cis

Dependencies

If you choose to install the dependencies yourself, use the following command to check the required dependencies are present:

$ python setup.py checkdep

What’s new in CIS

What’s new in CIS 1.2

This page documents the new features added, and bugs fixed in CIS since version 1.1. See all changes here: https://github.com/cedadev/cis/compare/1.1.0...1.2.0

CIS 1.2 features

  • All new cis info command provides much more detailed information about ungridded data variables and enables multiple variables to be output at a time.
  • Updated a number of routines to take advantage of Iris 1.8 features. In particular gridded-gridded collocation using the nearest neighbour kernel should be significantly faster. Iris 1.8 is now the minimum version required for CIS.
  • Gridded-ungridded collocation now supports collocation from cubes with hybrid height or hybrid pressure coordinates for both nearest neighbour and linear interpolation kernels.
  • Built-in support for reading multiple HadGEM .pp files directly.
  • All new API and plugin development documentation, including a number of tutorials

Bugs fixed

  • JASCIS-253 - Any ungridded points which contain a NaN in any of its coordinate values will now be ignored by CIS
  • JASCIS-250 - Multiple HadGEM files can now be read correctly through the new data plugins.
  • JASCIS-197 - Gridded-gridded collocation now respects scalar coordinates
  • JASCIS-199 - Aggregation now correctly uses the bounds supplied by the user, even when collapsing to length one coordinates.
  • Speed improvement to the ungridded-gridded collocation using linear interpolation
  • Several bug fixes for reading multiple GASSP ship files
  • Renamed and restructured the collocation modules for consistency
  • Many documentation spelling and formatting updates
  • Many code formatting updates for PEP8 compliance

What’s new in CIS 1.1

This page documents the new features added, and bugs fixed in CIS since version 1.0. For more detail see all changes here: https://github.com/cedadev/cis/compare/1.0.0...1.1.0

CIS 1.1 features

  • JASMIN-CIS is now called CIS, and the packages, modules and documentation have been renamed accordingly.
  • Conda packages are now available to allow much easier installation of CIS, and across more platforms: Linux, OSX and Windows.
  • PyHDF is now an optional dependency. This makes the installation of CIS on e.g. Windows much easier when HDF reading is not required.

Bugs fixed

  • JASCIS-243 - Error when reading multiple GASSP aircraft files
  • JASCIS-139 - Updated ungridded aggregation to rename any variables which clash with coordinate variables, as this breaks during the output otherwise.
  • Compatibility fixes for Numpy versions >1.8 and Python-NetCDF versions >1.1.
  • Fix Caliop pressure units which were stored as hPA, but need to be hPa to conform to CF.
  • The integration test data has been moved completely out of the repository - making the download quicker and less bloated. It’s location can be specified by setting the CIS_DATA_HOME environment variable.
  • A test runner has been created to allow easy running of the unit and integration test.

What’s new in CIS 1.1.1

This section documents changes in CIS since version 1.1, these were primarily bug fixes and documentation updates. See all changes here: https://github.com/cedadev/cis/compare/1.1.0...1.1.1

Bugs fixed

  • JASCIS-181 - Updated eval documentation
  • JASCIS-239 - Documented the requirement of PyHamCrest for running tests
  • JASCIS-249 - CIS will now accept variables and filenames (such as Windows paths) which include a colon as long as they are escaped with a backslash. E.g. cis plot my_var:C\:\my_file.nc.
  • Occasionally HDF will exit when reading an invalid HDF file without throwing any exceptions. To protect against this the HDF reader will now insist on an .hdf extension for any files it reads.

What kind of data can CIS deal with?

Writing

When creating files from a CIS command, CIS uses the NetCDF 4 classic format. Ungridded output files are always prefixed with cis-, and both ungridded and gridded output are always suffixed with .nc.

Reading

CIS has built-in support for NetCDF and HDF4 file formats. That said, most data requires some sort of pre-processing before being ready to be plotted or analysed (this could be scale factors or offsets needing to applied, or even just knowing what the dependencies between variables are). For that reason, the way CIS deals with reading in data files is via the concept of “data products”. Each product has its own very specific way of reading and interpreting the data in order for it to be ready to be plotted, analysed, etc.

So far, CIS can read the following ungridded data files:

Dataset Product name Type File Signature
AERONET Aeronet Ground-stations *.lev20
Aerosol CCI Aerosol_CCI Satellite *ESACCI*AEROSOL*
CALIOP L1 Caliop_L1 Satellite CAL_LID_L1-ValStage1-V3*.hdf
CALIOP L2 Caliop_L2 Satellite CAL_LID_L2_05kmAPro-Prov-V3*.hdf
CloudSat CloudSat Satellite *_CS_*GRANULE*.hdf
Flight campaigns NCAR_NetCDF_RAF Aircraft RF*.nc
MODIS L2 MODIS_L2 Satellite *MYD06_L2*.hdf, *MOD06_L2*.hdf, *MYD04_L2*.hdf, *MOD04_L2*.hdf, *MYDATML2.*.hdf, *MODATML2*.hdf
Cloud CCI Cloud_CCI Satellite *ESACCI*CLOUD*
CSV datapoints ASCII_Hyperpoints N/A *.txt
CIS ungridded cis CIS output cis-*.nc
NCAR-RAF NCAR_NetCDF_RAF Aircraft *.nc containing the attribute Conventions with the value NCAR-RAF/nimbus
GASSP NCAR_NetCDF_RAF Aircraft *.nc containing the attribute GASSP_Version
GASSP NCAR_NetCDF_RAF Ship *.nc containing the attribute GASSP_Version, with no altitude
GASSP NCAR_NetCDF_RAF Ground-station *.nc containing the attribute GASSP_Version, with attributes Station_Lat, Station_Lon and Station_Altitude

It can also read the following gridded data types:

Dataset Product name Type File Signature
MODIS L3 daily MODIS_L3 Satellite *MYD08_D3*.hdf, *MOD08_D3*.hdf, *MOD08_E3*.hdf
HadGEM pp data HadGEM_PP Gridded Model Data *.pp
Net_CDF Gridded Data NetCDF_Gridded Gridded Model Data *.nc (this is the default for NetCDF Files that do not match any other signature)

The file signature is used to automatically recognise which product definition to use. Note the product can overridden easily by being specified at the command line.

This is of course far from being an exhaustive list of what’s out there. To cope with this, a “plugin” architecture has been designed so that the user can readily use their own data product reading routines, without even having to change the code - see the plugin development page for more information. There are also mechanisms to allow you to overwrite default behaviour if the built-in products listed above do not achieve the desired results.

Datagroups

Most CIS commands operate on a ‘datagroup’, which is a unit of data containing one or more similar variables and one or more files from which those variables should be taken. A datagroup represents closely related data from a specific instrument or model and as such is associated with only one data product.

A datagroup is specified with the syntax:

<variable>...:<filename>[:product=<productname>] where:

  • <variable> is a mandatory argument specifying the variable or variable names to use. This should be the name of the variable as described in the file, e.g. the NetCDF variable name or HDF SDS/VDATA variable name. Multiple variables may be specified by commas, and variables may be wildcarded using any wildcards compatible with the python module glob, so that *, ? and [] can all be used

Attention

When specifying multiple variables, it is essential that they be on the same grid (i.e. use the same coordinates).

  • <filenames> is a mandatory argument used to specify the files to read the variable from. These can be specified as a comma seperated list of the following possibilities:

    1.   a single filename - this should be the full path to the file

    2.   a single directory - all files in this directory will be read

    3.   a wildcarded filename - A filename with any wildcards compatible with the python module glob, so that *, ? and [] can all be used. E.g., /path/to/my/test*file_[0-9].

Attention

When multiple files are specified (whether through use of commas, pointing at a directory, or wildcarding), then all those files must contain all of the specified variables, and the files should be ‘compatible’ - it should be possible to aggregate them together using a shared dimension - typically time (in a NetCDF file this is usually the unlimited dimension). So selecting multiple monthly files for a model run would be OK, but selecting files from two different datatypes would not be OK.

  • <productname> is an optional argument used to specify the type of files being read. If omitted, the program will attempt to figure out which product to use based on the filename. See Reading to see a list of available products and their file signatures.

For example:

illum:20080620072500-ESACCI-L2_CLOUD-CLD_PRODUCTS-MODIS-AQUA-fv1.0.nc
Cloud_Fraction_*:MOD*,MODIS_dir/:product=MODIS_L2

Some file paths or variable names might contain colons (:), these need to be escaped so that CIS can tell the difference between it and the colons used to separate Datagroup elements. Simply use a backslash () to escape these characters. For example:

"TOTAL RAINFALL RATE\: LS+CONV KG/M2/S:C\:\My files\MODIS_dir:product=MODIS_L2"

Notice that we have used outer quotes to allow for the spaces in the variable and file names, and used the backslashes to escape the colons.

Reading hybrid height data with separate orography data

CIS supports the reading of gridded data containing hybrid height and pressure fields, with an orography field supplied in a separate file. The file containing the orography field (which should be properly referenced from a formula term in the data file) can just be appended to the list of files to be read in and CIS will attempt to create an appropriate altitude dimension.

Reading NetCDF4 Hierarchical Groups

CIS supports the reading of NetCDF4 hierarchical groups. These can be specified on the command line in the format <group>.<variable_name>, e.g. AVHRR.Ch4CentralWavenumber. Groups can be nested to any required depth like <group1>.<group2...>.<variable_name>.

CIS currently does not support writing out of NetCDF4 groups, so any groups read in will be output ‘flat’.

Reading groups in user-developed product plugins

Most of the methods in the cis.data_io.netcdf module support netCDF4 groups using the syntax described above - users should use this module when designing their own plugins to ensure support for groups.

Example plots

_images/model.png _images/line.png _images/MODIS_L2.png _images/MODIS_L3.png _images/seviri-ctt.png _images/aerosol_cci.png _images/comparative_scatter_Aeronet.jpg _images/comparativehistogram2d.png _images/agoufou_18022013_all_three.gif _images/cloudcci.png _images/cloudsat_RVOD.png _images/caliop_l1b.png _images/aircraft.png

Using the command line

Run the following command to print help and check that it runs: cis --help

The following should be displayed:

usage: cis [-h] {plot,info,col,aggregate,subset,version} ...

positional arguments:
  {plot,info,col,aggregate,subset,version}
    plot           Create plots
    info           Get information about a file
    col            Perform collocation
    aggregate      Perform aggregation
    subset         Perform subsetting
    eval           Evaluate a numeric expression
    stats          Perform statistical comparison of two datasets
    version        Display the CIS version number

optional arguments:
  -h, --help   show this help message and exit

There are 8 commands the program can execute:

  • plot which is used to plot the data
  • info which prints information about a given input file
  • col which is used to perform collocation on data
  • aggregate which is used to perform aggregation along coordinates in the data
  • subset which is used to perform subsetting of the data
  • eval which is used to evaluate a numeric expression on data
  • stats which is used to perform a statistical comparison of two datasets
  • version which is used to display the version number of CIS

If an error occurs while running any of these commands, you may wish to check the log file ‘cis.log’; the default location for this is the current user’s home directory.

LSF Batch Job Submission

CIS jobs may be submitted to an LSF type batch submission system (e.g. the JASMIN environment) by using the command cis.lsf instead of cis. In this case the job will be sent to the batch system and any output will be written to the log file.

Getting file information

Running `` ./cis.py info $filenames `` will print a list of the variables available in those files such as:

Trop
latitude
longitude_1
surface
unspecified_1
level6
ht
msl
latitude_1

To get more specific information about a given variable, simply run:

$ ./cis.py info $filenames --variable $var1 $var2 $var3

where $var1, $var2 and $var3 are the names of the variables to get the information for.

Other options available include:
  • --product which allows the user to override the default product for the files, and
  • --type which allows the user to list only SD or VD variables from an HDF file, the default is All

Here is an example:

Ungridded data: SO4 / (ug m-3)
     Shape = (6478,)
     Total number of points = 6478
     Number of non-masked points = 6478
     Long name = Sulphate
     Standard name = SO4
     Units = ug m-3
     Missing value = -9999
     Range = (-0.57346399999999997, 7.0020300000000004)
     History =
     Coordinates:
       time
          Long name = Starting time
          Standard name = time
          Units = days since 1600-01-01 00:00:00
          Calendar = gregorian
          Missing value = -9999
          Range = ('2008-07-10 02:04:35', '2008-07-20 09:50:33')
          History =
       latitude
          Long name = Latitude
          Standard name = latitude
          Units = N degree
          Missing value = -9999
          Range = (4.0211802, 7.14886)
          History =
       longitude
          Long name = Longitude
          Standard name = longitude
          Units = E degree
          Missing value = -9999
          Range = (114.439, 119.733)
          History =
       altitude
          Long name = Altitude
          Standard name = altitude
          Units = m
          Missing value = -9999
          Range = (51.164299, 6532.6401)
          History =

Subsetting

Subsetting allows the reduction of data by extracting variables and restricting them to ranges of one or more coordinates.

To perform subsetting, run a command of the format:

$ cis subset <datagroup> <limits> [-o <outputfile>]

where:

<datagroup>

is a CIS datagroup specifying the variables and files to read and is of the format <variable>...:<filename>[:product=<productname>] where:

  • variable is a mandatory variable or list of variables to use.
  • filenames is a mandatory file or list of files to read from.
  • product is an optional CIS data product to use (see Data Products):

See Datagroups for a more detailed explanation of datagroups.

<limits>

is a comma separated sequence of one or more coordinate range assignments of the form variable=[start,end] or variable=[value] in which

  • variable is the name of the variable to be subsetted, or one of x, y, z or t, which refer to longitude, latitude, altitude or time, respectively.
  • start is the value at the start of the coordinate range to be included
  • end is the value at the end of the coordinate range to be included
  • value is taken as the start and end value.

Note

Longitude coordinates are considered to be circular, so that -10 is equivalent to 350. The start and end must describe a monotonically increasing coordinate range, so x=[90,-90] is invalid, but could be specified using x=[90,270]. The range between the start and end must not be greater than 360 degrees. The output coordinates will be on the requested grid, not the grid of the source data.

Note

Date/times are specified in the format: YYYY-MM-DDThh:mm:ss in which YYYY-MM-DD is a date and hh:mm:ss is a time. A colon or space can be used instead of the ‘T’ separator (but if a space is used, the argument must be quoted). Any trailing components of the date/time may be omitted. When a date/time is used as a range start, the earliest date/time compatible with the supplied components is used (e.g., 2010-04 is treated as 2010-04-01T00:00:00) and when used as a range end, the latest compatible date/time is used. Including optional and alternative components, the syntax is YYYY[-MM[-DD[{T|:| }hh[:mm[:ss]]]]]. When the t=[value] form is used, value is interpreted as both the start and end value, as described above, giving a range spanning the specified date/time, e.g., t=[2010] gives a range spanning the whole of the year 2010.

outputfile
is an optional argument to specify the name to use for the file output. This is automatically given a .nc extension and prepended with cis-`, if it contains ungridded data, to make it distinguishable as a collocated file. The default filename is ``cis-out.nc for ungridded data, and out.nc for gridded data.

A full example would be:

$ cis subset solar_3:xglnwa.pm.k8dec-k9nov.col.tm.nc x=[0,180],y=[0,90] -o Xglnwa-solar_3

Gridded netCDF data is output as gridded data, while ungridded and non-netCDF gridded data is output as ungridded data.

Examples

Below are examples of subsetting using each of the supported products (together with a command to plot the output):

$ cis subset AO2CO2:RF04.20090114.192600_035100.PNI.nc t=[2009-01-14:19:26:00,2009-01-14:19:36:00] -o RF04-AO2CO2-out
$ cis plot AO2CO2:cis-RF04-AO2CO2-out.nc

$ cis subset IO_RVOD_ice_water_content:2007180125457_06221_CS_2B-CWC-RVOD_GRANULE_P_R04_E02.hdf t=[2007-06-29:13:00,2007-06-29:13:30] -o CloudSAT-out
$ cis plot IO_RVOD_ice_water_content:cis-CloudSAT-out.nc --xaxis=time --yaxis=altitude

$ cis subset Cloud_Top_Temperature:MYD06_L2.A2011100.1720.051.2011102130126.hdf x=[-50,-40],y=[0,10] -o MODIS_L2-out
$ cis plot Cloud_Top_Temperature:cis-MODIS_L2-out.nc

$ cis subset cwp:20080620072500-ESACCI-L2_CLOUD-CLD_PRODUCTS-MODIS-AQUA-fv1.0.nc x=[85,90],y=[-3,3] -o Cloud_CCI-out
$ cis plot atmosphere_mass_content_of_cloud_liquid_water:cis-Cloud_CCI-out.nc

$ cis subset AOD870:20080612093821-ESACCI-L2P_AEROSOL-ALL-AATSR_ENVISAT-ORAC_32855-fv02.02.nc x=[-5,20],y=[15,25] -o Aerosol_CCI-out
$ cis plot atmosphere_optical_thickness_due_to_aerosol:cis-Aerosol_CCI-out.nc

$ cis subset 440675Angstrom:920801_121229_Abracos_Hill.lev20 t=[2002] -o Aeronet-out
$ cis plot 440675Angstrom:cis-Aeronet-out.nc --xaxis=time --yaxis=440675Angstrom

$ cis subset solar_3:xglnwa.pm.k8dec-k9nov.vprof.tm.nc y=[0,90] -o Xglnwa_vprof-out
$ cis plot solar_3:Xglnwa_vprof-out.nc

$ cis subset solar_3:xglnwa.pm.k8dec-k9nov.col.tm.nc x=[0,180],y=[0,90] -o Xglnwa-out
$ cis plot solar_3:Xglnwa-out.nc

$ cis subset Cloud_Top_Temperature_Mean_Mean:MOD08_E3.A2010009.005.2010026072315.hdf x=[0,179.9],y=[0,90] -o MODIS_L3-out
$ cis plot Cloud_Top_Temperature_Mean_Mean:cis-MODIS_L3-out.nc

The files used above can be found at:

/group_workspaces/jasmin/cis/jasmin_cis_repo_test_files/
  2007180125457_06221_CS_2B-CWC-RVOD_GRANULE_P_R04_E02.hdf
  20080612093821-ESACCI-L2P_AEROSOL-ALL-AATSR_ENVISAT-ORAC_32855-fv02.02.nc
  20080620072500-ESACCI-L2_CLOUD-CLD_PRODUCTS-MODIS-AQUA-fv1.0.nc
  MOD08_E3.A2010009.005.2010026072315.hdf
  MYD06_L2.A2011100.1720.051.2011102130126.hdf
  RF04.20090114.192600_035100.PNI.nc
  xglnwa.pm.k8dec-k9nov.col.tm.nc
  xglnwa.pm.k8dec-k9nov.vprof.tm.nc
/group_workspaces/jasmin/cis/data/aeoronet/AOT/LEV20/ALL_POINTS/
  920801_121229_Abracos_Hill.lev20

Aggregation

The Community Intercomparison Suite (CIS) has the ability to aggregate both gridded and ungridded data along one or more coordinates. For example, you might aggregate a dataset over the longitude coordinate to produce an averaged measurement of variation over latitude.

CIS supports ‘complete collapse’ of a coordinate - where all values in that dimension are aggregated so that the coordinate no longer exists - and ‘partial collapse’ - where a coordinate is aggregated into bins of fixed size, so that the coordinate still exists but is on a coarser grid. Partial collapse is currently only supported for ungridded data. The output of an aggregation is always a CF compliant gridded NetCDF file.

The aggregation command has the following syntax:

$ cis aggregate <datagroup>[:options] <grid> [-o <outputfile>]

where:

<datagroup>

is a CIS datagroup specifying the variables and files to read and is of the format <variable>...:<filename>[:product=<productname>] where:

  • <variable> is a mandatory variable or list of variables to use.
  • <filenames> is a mandatory file or list of files to read from.
  • <productname> is an optional CIS data product to use (see Data Products):

See Datagroups for a more detailed explanation of datagroups.

<options>

Optional arguments given as keyword=value in a comma separated list. Options are:

  • kernel=<kernel> - the method by which the value in each aggregation cell is determined. <kernel> should be one of:

    • mean - use the mean value of all the data points in that aggregation cell. For gridded data, this mean is weighted to take into account differing cell areas due to the projection of lat/lon lines on the Earth.
    • min - use the lowest valid value of all the data points in that aggregate cell.
    • max - use the highest valid value of all the data points in that aggregate cell.
    • moments - In addition to returning the mean value of each cell (weighted where applicable), this kernel also outputs the number of points used to calculate that mean and the standard deviation of those values, each as a separate variable in the output file.

    If not specified the default is moments.

  • product=<productname> is an optional argument used to specify the type of files being read. If omitted, CIS will attempt to figure out which product to use based on the filename. See Reading to see a list of available product names and their file signatures.

<grid>

This mandatory argument specifies the coordinates to aggregate over and whether they should be completely collapsed or aggregated into bins. Multiple coordinates can be aggregated over, in which case they should be separated by commas. Coordinates may be identified using their variable names (e.g. latitude) or by choosing from x, y, t, z, p which refer to longitude, latitude, time, altitude and pressure respectively.

  • Complete collapse - To perform a complete collapse of a coordinate, simply provide the name of the coordinate(s) as a comma separated list - e.g. x,y will aggregate data completely over both latitude and longitude. For ungridded data this will result in length one coordinates with bounds reflecting the maximum and minimum values of the collapsed coordinate.
  • Partial collapse - To aggregate a coordinate into bins, specify the start, end and step size of those bins in the form coordinate=[start,end,step]. The step may be missed out, in which case the bin will span the whole range given. Partial collapse is currently only supported for ungridded data.

Note

Longitude coordinates are considered to be circular, so that -10 is equivalent to 350. The start and end must describe a monotonically increasing coordinate range, so x=[90,-90,10] is invalid, but could be specified using x=[90,270,10]. The range between the start and end must not be greater than 360 degrees.

Complete and partial collapses may be mixed where applicable - for example, to completely collapse time and to aggregate latitude on a grid from -45 degrees to 45 degrees, using a step size of 10 degrees:

t,y=[-45,45,10]

Note

For ungridded data, if a coordinate is left unspecified it is collapsed completely. This is in contrast to gridded data where a coordinate left unspecified is not used in the aggregation at all.

Note

The range specified is the very start and end of the grid, the actual midpoints of the aggregation cells will start at start + delta/2.

Date/times:

Date/times are specified in the format: YYYY-MM-DDThh:mm:ss in which YYYY-MM-DD is a date and hh:mm:ss is a time. A colon or space can be used instead of the ‘T’ separator (but if a space is used, the argument must be quoted). Any trailing components of the date/time may be omitted. When a date/time is used as a range start, the earliest date/time compatible with the supplied components is used (e.g., 2010-04 is treated as 2010-04-01T00:00:00) and when used as a range end, the latest compatible date/time is used. Including optional and alternative components, the syntax is YYYY[-MM[-DD[{T|:| }hh[:mm[:ss]]]]].

Date/time steps are specified in the ISO 8061 format PnYnMnDTnHnMnS, where any particular time period is optional, for example P1MT30M would specify a time interval of 1 month and 30 minutes. Years and months are treated as calendar years and months, meaning they are not necessarily fixed in length. For example a date interval of 1 year and 1 month would mean going from 12:00 15th April 2013 to 12:00 15th May 2013. The are two exceptions to this, in rare cases such as starting at 30th January and going forward 1 month, the month is instead treated as a period of 28 days. Also, for the purposes of finding midpoints for the start in a month the month is always treated as 30 days. For example, to start on the 3rd November 2011 at 12:00 and aggregate over each month up to 3rd January 2013 at 12:00:

  • t=[2011-11-03T12:00,2013-01,P1M]
<outputfile>
is an optional argument to specify the name to use for the file output. This is automatically given a .nc extension if not present. This must not be the same file path as any of the input files. If not supplied, the default filename is out.nc.

A full example would be:

$ cis aggregate rsutcs:rsutcs_Amon_HadGEM2-A_sstClim_r1i1p1_*.nc:product=NetCDF_Gridded,kernel=mean t,y=[-90,90,20],x -o rsutcs-mean

Conditional Aggregation

Sometimes you may want to perform an aggregation over all the points that meet a certain criteria - for example, aggregating satellite data only where the cloud cover fraction is below a certain threshold. This is possible by performing a CIS evaluation on your data first - see Using Evaluation for Conditional Aggregation

Aggregation Examples

Ungridded aggregation

Aircraft Track

Original data:

$ cis plot TT_A:RF04.20090114.192600_035100.PNI.nc --xmin -180 --xmax -120 --ymin 0 --ymax 90
_images/NCAR-RAF-1.png

Aggregating onto a coarse grid:

$ cis aggregate TT_A:RF04.20090114.192600_035100.PNI.nc x=[-180,-120,3],y=[0,90,3] -o NCAR_RAF-1
$ cis plot TT_A:NCAR_RAF-1.nc
_images/NCAR-RAF-2.png

Aggregating onto a fine grid:

$ cis aggregate TT_A:RF04.20090114.192600_035100.PNI.nc x=[180,240,0.3],y=[0,90,0.3] -o NCAR_RAF-2
$ cis plot TT_A:NCAR_RAF-2.nc
_images/NCAR-RAF-3.png

Aggregating with altitude and time:

$ cis aggregate TT_A:RF04.20090114.192600_035100.PNI.nc t=[2009-01-14T19:30,2009-01-15T03:45,30M],z=[0,15000,1000] -o NCAR_RAF-3
$ cis plot TT_A:NCAR_RAF-3.nc --xaxis time --yaxis altitude
_images/NCAR-RAF-4.png

Aggregating with altitude and pressure:

$ cis aggregate TT_A:RF04.20090114.192600_035100.PNI.nc p=[100,1100,20],z=[0,15000,500] -o NCAR_RAF-4
$ cis plot TT_A:NCAR_RAF-4.nc --xaxis altitude --yaxis air_pressure --logy
_images/NCAR-RAF-5.png
MODIS L3 Data

Original data:

$ cis plot Cloud_Top_Temperature_Mean_Mean:MOD08_E3.A2010009.005.2010026072315.hdf
_images/MODIS-6.png

Aggregating with a mean kernel:

$ cis aggregate Cloud_Top_Temperature_Mean_Mean:MOD08_E3.A2010009.005.2010026072315.hdf x=[-180,180,10],y=[-90,90,10] -o cloud-mean
$ cis plot Cloud_Top_Temperature_Mean_Mean:cloud-mean.nc
_images/MODIS-7.png

Aggregating with the standard deviation kernel:

$ cis aggregate Cloud_Top_Temperature_Mean_Mean:MOD08_E3.A2010009.005.2010026072315.hdf:kernel=stddev x=[-180,180,10],y=[-90,90,10] -o cloud-stddev
$ cis plot Cloud_Top_Temperature_Mean_Mean:cloud-stddev.nc &
_images/MODIS-7.png

Aggregating with the maximum kernel:

$ cis aggregate Cloud_Top_Temperature_Mean_Mean:MOD08_E3.A2010009.005.2010026072315.hdf:kernel=max x=[-180,180,10],y=[-90,90,10] -o cloud-max
$ cis plot Cloud_Top_Temperature_Mean_Mean:cloud-max.nc
_images/MODIS-9.png

Aggregating with the minimum kernel:

$ cis aggregate Cloud_Top_Temperature_Mean_Mean:MOD08_E3.A2010009.005.2010026072315.hdf:kernel=min x=[-180,180,10],y=[-90,90,10] -o cloud-min
$ cis plot Cloud_Top_Temperature_Mean_Mean:cloud-min.nc
_images/MODIS-10.png

Gridded aggregation

Aggregating 3D model data over time and longitude to produce an averaged measure of variation with latitude:

$ cis aggregate rsutcs:rsutcs_Amon_HadGEM2-A_sstClim_r1i1p1_185912-188911.nc:kernel=mean t,x -o agg-out.nc
$ cis plot rsutcs:agg-out.nc --xaxis latitude --yaxis rsutcs -o gridded_collapse.png
_images/gridded_collapse.png

This file can be found in:

/group_workspaces/jasmin/cis/data/CMIP5

Collocation

One of the key features of the Community Intercomparison Suite (CIS) is the ability to collocate one or more arbitrary data sets onto a common set of coordinates. This page briefly describes how to perform collocation in a number of scenarios.

To perform collocation, run a command of the format:

$ cis col <datagroup> <samplegroup> -o <outputfile>

where:

<datagroup>

is a CIS datagroup specifying the variables and files to read and is of the format <variable>...:<filename>[:product=<productname>] where:

  • <variable> is a mandatory variable or list of variables to use.
  • <filenames> is a mandatory file or list of files to read from.
  • <productname> is an optional CIS data product to use (see Data Products):

See Datagroups for a more detailed explanation of datagroups.

<samplegroup>

is of the format <filename>[:<options>] The available options are described in more detail below. They are entered in a comma separated list, such as variable=Temperature,collocator=bin,kernel=mean. Not all combinations of collocator and data are available; see Available Collocators.

  • <filename> is a single filename with the points to collocate onto.

  • variable is an optional argument used to specify which variable’s coordinates to use for collocation. If a variable is specified, a missing value will be set in the output file at every point for which the sample variable has a missing value. If a variable is not specified, non-missing values will be set at all sample points unless collocation at a point does not result in a valid value.

  • collocator is an optional argument that specifies the collocation method. Parameters for the collocator, if any, are placed in square brackets after the collocator name, for example, collocator=box[fill_value=-999,h_sep=1km]. If not specified, a Default Collocator is identified for your data / sample combination. The collocators available are:

    • bin For use only with ungridded data and gridded sample points. Data points are placed in bins corresponding to the cell bounds surrounding each grid point. The bounds are taken from the gridded data if they are defined, otherwise the mid-points between grid points are used. The binned points should then be processed by one of the kernels to give a numeric value for each bin.

    • box For use with gridded and ungridded sample points and data. A search region is defined by the parameters and points within the defined separation of each sample point are associated with the point. The points should then be processed by one of the kernels to give a numeric value for each bin. The parameters defining the search box are:

      • h_sep - the horizontal separation. The units can be specified as km or m (for example h_sep=1.5km); if none are specified then the default is km.
      • a_sep - the altitude separation. The units can be specified as km or m, as for h_sep; if none are specified then the default is m.
      • p_sep - the pressure separation. This is not an absolute separation as for h_sep and a_sep, but a relative one, so is specified as a ratio. For example a constraint of p_sep = 2, for a point at 10 hPa, would cover the range 5 hPa < points < 20 hPa. Note that p_sep >= 1.
      • t_sep - the time separation. This can be specified in years, months, days, hours, minutes or seconds using PnYnMnDTnHnMnS (the T separator can be replaced with a colon or a space, but if using a space quotes are required). For example to specify a time separation of one and a half months and thirty minutes you could use t_sep=P1M15DT30M. It is worth noting that the units for time comparison are fractional days, so that years are converted to the number of days in a Gregorian year, and months are 1/12th of a Gregorian year.

      If h_sep is specified, a k-d tree index based on longitudes and latitudes of data points is used to speed up the search for points. It h_sep is not specified, an exhaustive search is performed for points satisfying the other separation constraints.

    • lin For use with gridded source data only. A value is calculated by linear interpolation for each sample point. The extrapolation mode can be controlled with the extrapolate keyword. The default mode is not to extrapolate values for sample points outside of the gridded data source (masking them in the output instead). Setting extrapolate=True will override this and instruct the kernel to extrapolate these values outside of the data source instead.

      Sometimes it can be useful to use a different kernel in the vertical direction, for example when collocating ship data you may want to linearly interpolate the data points horizontaly and in time, but just take the nearest vertical value. Set the nn_vertical keyword to True to set the vertical interpolation to nearest neighbour rather than linear intepolation. Note, this will only work when the vertical coordinates of the source data are hybrid height or hybrid pressure.

    • nn For use with gridded source data only. The data point closest to each sample point is found, and the data value is set at the sample point.

    • dummy For use with ungridded data only. Returns the source data as the collocated data irrespective of the sample points. This might be useful if variables from the original sample file are wanted in the output file but are already on the correct sample points.

    Collocators have the following general optional parameters, which can be used in addition to any specific ones listed above:

    • fill_value - The numerical value to apply to the collocated point if there are no points which satisfy the constraint.
    • var_name - Specifies the name of the variable in the resulting NetCDF file.
    • var_long_name - Specifies the variable’s long name.
    • var_units - Specifies the variable’s units.
  • kernel is used to specify the kernel to use for collocation methods that create an intermediate set of points for further processing, that is box and bin. The default kernel for box and bin is moments. The built-in kernel methods currently available are:

    • moments - Default. This is an averaging kernel that returns the mean, standard deviation and the number of points remaining after the specified constraint has been applied. This can be used for gridded or ungridded sample points where the collocator is one of ‘bin’ or ‘box’. The names of the variables in the output file are the name of the input variable with a suffix to identify which quantity they represent:
      • Mean - no suffix - the mean value of all data points which were mapped to that sample grid point (data points with missing values are excluded)
      • Standard Deviation - suffix: _std_dev - The corrected sample standard deviation (i.e. 1 degree of freedom) of all the data points mapped to that sample grid point (data points with missing values are excluded)
      • Number of points - suffix: _num_points - The number of data points mapped to that sample grid point (data points with missing values are excluded)
    • mean - an averaging kernel that returns the mean values of any points found by the collocation method
    • nn_t (or nn_time) - nearest neighbour in time algorithm
    • nn_h (or nn_horizontal) - nearest neighbour in horizontal distance
    • nn_a (or nn_altitude) - nearest neighbour in altitude
    • nn_p (or nn_pressure) - nearest neighbour in pressure (as in a vertical coordinate). Note that similarly to the p_sep constraint that this works on the ratio of pressure, so the nearest neighbour to a point with a value of 10 hPa, out of a choice of 5 hPa and 19 hPa, would be 19 hPa, as 19/10 < 10/5.
  • product is an optional argument used to specify the type of files being read. If omitted, the program will attempt to determine which product to use based on the filename, as listed at Reading.

<outputfile>
is an optional argument specifying the file to output to. This will be automatically given a .nc extension if not present and if the output is ungridded, will be prepended with cis- to identify it as a CIS output file. This must not be the same file path as any of the input files. If not provided, the default output filename is out.nc

A full example would be:

$ cis col rain:"my_data_??.*" my_sample_file:collocator=box[h_sep=50km,t_sep=6000S],kernel=nn_t -o my_col

Warning

When collocating two data sets with different spatio-temporal domains, the sampling points should be within the spatio-temporal domain of the source data. Otherwise, depending on the collocation options selected, strange artifacts can occur, particularly with linear interpolation. Spatio-temporal domains can be reduced in CIS with Aggregation or Subsetting.

Available Collocators and Kernels

Collocation type      
( data -> sample) Available Collocators Default Collocator Default Kernel
Gridded -> gridded lin, nn, box lin None
Ungridded -> gridded bin, box bin moments
Gridded -> ungridded nn, lin nn None
Ungridded -> ungridded box box moments

Collocation output files

All ungridded collocation output files are prefixed with cis- and both ungridded and gridded data files are suffixed with .nc (so there is no need to specify the extension in the output parameter). This is to ensure the cis data product is always used to read collocated ungridded data.

It is worth noting that in the process of collocation all of the data and sample points are represented as 1-d lists, so any structural information about the input files is lost. This is done to ensure consistency in the collocation output. This means, however, that input files which may have been plotable as, for example, a heatmap may not be after collocation. In this situation plotting the data as a scatter plot will yield the required results.

Each collocated output variable has a history attributed created (or appended to) which contains all of the parameters and file names which went into creating it. An example might be:

double mass_fraction_of_cloud_liquid_water_in_air(pixel_number) ;
    ...
    mass_fraction_of_cloud_liquid_water_in_air:history = "Collocated onto sampling from:   [\'/test/test_files/RF04.20090114.192600_035100.PNI.nc\'] using CIS version V0R4M4\n",
        "variable: mass_fraction_of_cloud_liquid_water_in_air\n",
        "with files: [\'/test/test_files/xenida.pah9440.nc\']\n",
        "using collocator: DifferenceCollocator\n",
        "collocator parameters: {}\n",
        "constraint method: None\n",
        "constraint parameters: None\n",
        "kernel: None\n",
        "kernel parameters: None" ;
    mass_fraction_of_cloud_liquid_water_in_air:shape = 30301 ;
double difference(pixel_number) ;
    ...

Writing your own plugins

The collocation framework was designed to make it easy to write your own plugins. Plugins can be written to create new kernels, new constraint methods and even whole collocation methods. See the analysis plugin development section for more details.

Collocation Examples

Ungridded to Ungridded Collocation Examples

Ungridded data with vertical component

First subset two Caliop data files:

$ cis subset Temperature:CAL_LID_L2_05kmAPro-Prov-V3-01.2009-12-31T23-36-08ZN.hdf x=[170,180],y=[60,80],z=[28000,29000],p=[13,15] -o 2009
$ cis subset Temperature:CAL_LID_L2_05kmAPro-Prov-V3-01.2010-01-01T00-22-28ZD.hdf x=[170,180],y=[60,80],z=[28000,29000],p=[12,13.62] -o 2010

Results of subset can be plotted with:

$ cis plot Temperature:cis-2009.nc --itemwidth 25 --xaxis time --yaxis air_pressure
$ cis plot Temperature:cis-2010.nc --itemwidth 25 --xaxis time --yaxis air_pressure

Then collocate data, and plot output:

$ cis col Temperature:cis-2010.nc cis-2009.nc:collocator=box[p_sep=1.1],kernel=nn_p
$ cis plot Temperature:cis-out.nc --itemwidth 25 --xaxis time --yaxis air_pressure

The output for the two subset data files, and the collocated data should look like:

_images/2009-subset.png _images/2010-subset.png _images/PressureCollocation.png
File Locations

The files used above can be found at:

/group_workspaces/jasmin/cis/data/caliop/CAL-LID-L2-05km-APro

Ungridded data collocation using k-D tree indexing

These examples show the syntax for using the k-D tree optimisation of the separation constraint. The indexing is only by horizontal position.

Nearest-Neighbour Kernel

The first example is of Aerosol CCI data on to the points of a MODIS L3 file (which is an ungridded data file but with points lying on a grid).

Subset to a relevant region:

$ cis subset AOD550:20080612093821-ESACCI-L2P_AEROSOL-ALL-AATSR_ENVISAT-ORAC_32855-fv02.02.nc x=[-6,0],y=[20,30] -o AOD550n_3
$ cis subset Cloud_Top_Temperature_Mean_Mean:MOD08_E3.A2010009.005.2010026072315.hdf x=[-6,0],y=[20,30] -o MOD08n_3

The results of subsetting can be plotted with:

$ cis plot AOD550:cis-AOD550n_3.nc --itemwidth 10
$ cis plot Cloud_Top_Temperature_Mean_Mean:cis-MOD08n_3.nc --itemwidth 20

These should look like:

_images/AOD550n_3.png _images/MOD08n_3.png

To collocate with the nearest-neighbour kernel use:

$ cis col Cloud_Top_Temperature_Mean_Mean:cis-MOD08n_3.nc cis-AOD550n_3.nc:collocator=box[h_sep=150],kernel=nn_h -o MOD08_on_AOD550_nn_kdt

This can be plotted with:

$ cis plot Cloud_Top_Temperature_Mean_Mean:cis-MOD08_on_AOD550_nn_kdt.nc --itemwidth 10

The sample points are more closely spaced than the data points, hence a patchwork effect is produced.

_images/MOD08_on_AOD550_nn_kdt.png

Collocating the full Aerosol CCI file on to the MODIS L3 with:

$ cis col AOD550:20080612093821-ESACCI-L2P_AEROSOL-ALL-AATSR_ENVISAT-ORAC_32855-fv02.02.nc MOD08_E3.A2010009.005.2010026072315.hdf:variable=Cloud_Top_Temperature_Mean_Mean,collocator=box[h_sep=150],kernel=nn_h -o AOD550_on_MOD08_kdt_nn_full

gives the following result

_images/AOD550_on_MOD08_kdt_nn_full.png
Mean Kernel

This example is similar to the first nearest-neighbour collocation above:

$ cis col Cloud_Top_Temperature_Mean_Mean:cis-MOD08n_3.nc cis-AOD550n_3.nc:collocator=box[h_sep=75],kernel=mean -o MOD08_on_AOD550_hsep_75km

Plotting this again gives a granular result:

$ cis plot Cloud_Top_Temperature_Mean_Mean:cis-MOD08_on_AOD550_hsep_75km.nc --itemwidth 10
_images/MOD08_on_AOD550_hsep_75km.png

This example collocates the Aerosol CCI data on to the MODIS L3 grid:

$ cis col AOD550:20080612093821-ESACCI-L2P_AEROSOL-ALL-AATSR_ENVISAT-ORAC_32855-fv02.02.nc MOD08_E3.A2010009.005.2010026072315.hdf:variable=Cloud_Top_Temperature_Mean_Mean,collocator=box[h_sep=50,fill_value=-999],kernel=mean -o AOD550_on_MOD08_kdt_hsep_50km_full

This can be plotted as follows, with the full image and zoomed into a representative section show below:

$ cis plot AOD550:cis-AOD550_on_MOD08_kdt_hsep_50km_full.nc --itemwidth 50
_images/AOD550_on_MOD08_kdt_hsep_50km_full.png _images/AOD550_on_MOD08_kdt_hsep_50km_full_zoom.png

The reverse collocation can be performed with this command (taking about 7 minutes):

$ cis col Cloud_Top_Temperature_Mean_Mean:MOD08_E3.A2010009.005.2010026072315.hdf 20080612093821-ESACCI-L2P_AEROSOL-ALL-AATSR_ENVISAT-ORAC_32855-fv02.02.nc:variable=AOD550,collocator=box[h_sep=100,fill_value=-999],kernel=mean -o MOD08_on_AOD550_kdt_hsep_100km_var_full

Plotting it with this command gives the result below:

$ cis plot Cloud_Top_Temperature_Mean_Mean:cis-MOD08_on_AOD550_kdt_hsep_100km_var_full.nc
_images/MOD08_on_AOD550_kdt_hsep_100km_var_full.png

Omitting the variable option in the sample group gives collocated values over a full satellite track (taking about 30 minutes):

$ cis col Cloud_Top_Temperature_Mean_Mean:MOD08_E3.A2010009.005.2010026072315.hdf 20080612093821-ESACCI-L2P_AEROSOL-ALL-AATSR_ENVISAT-ORAC_32855-fv02.02.nc:collocator=box[h_sep=100,fill_value=-999],kernel=mean -o MOD08_on_AOD550_kdt_hsep_100km_full

Plotting it with this command gives the result below:

$ cis plot Cloud_Top_Temperature_Mean_Mean:cis-MOD08_on_AOD550_kdt_hsep_100km_full.nc
_images/MOD08_on_AOD550_kdt_hsep_100km_full.png
File Locations

The files used above can be found at:

/group_workspaces/jasmin/cis/jasmin_cis_repo_test_files/
  20080612093821-ESACCI-L2P_AEROSOL-ALL-AATSR_ENVISAT-ORAC_32855-fv02.02.nc
  MOD08_E3.A2010009.005.2010026072315.hdf

Examples of collocation of ungridded data on to gridded

Simple Example of Aerosol CCI Data on to a 4x4 Grid

This is a trivial example that collocates on to a 4x4 spatial grid at a single time:

$ cis subset tas:tas_day_HadGEM2-ES_rcp45_r1i1p1_20051201-20151130.nc x=[0,2],y=[24,26],t=[2008-06-12T1,2008-06-12] -o tas_day_HadGEM2-ES_rcp45_r1i1p1_20051201-20151130.nc -o tas_1

$ cis subset AOD550:20080612093821-ESACCI-L2P_AEROSOL-ALL-AATSR_ENVISAT-ORAC_32855-fv02.02.nc x=[0,2],y=[24,26] -o AOD550n_1

$ cis col AOD550:cis-AOD550n_1.nc tas_1.nc:collocator=bin[fill_value=-9999.0],kernel=mean -o AOD550_on_tas_1

$ cis plot AOD550:AOD550_on_tas_1.nc

Note that for ungridded gridded collocation, and the collocator must be one bin or box and a kernel such as “mean” must be used.

The plotted image looks like:

_images/Aerosol_CCI_4x4.png

Aerosol CCI with Three Time Steps

This example involves collocation on to a grid with three time steps. The ungridded data all has times within the middle step, so the output has missing values for all grid points with the time equal to the first or third value. This can be seen using ncdump:

$ cis subset tas:tas_day_HadGEM2-ES_rcp45_r1i1p1_20051201-20151130.nc x=[-6,-.0001],y=[20,30],t=[2008-06-11T1,2008-06-13] -o tas_3day

$ cis subset AOD550:20080612093821-ESACCI-L2P_AEROSOL-ALL-AATSR_ENVISAT-ORAC_32855-fv02.02.nc x=[-6,0],y=[20,30] -o AOD550n_3

$ cis col AOD550:cis-AOD550n_3.nc tas_3day.nc:collocator=bin[fill_value=-9999.0],kernel=mean -o AOD550_on_tas_3day

$ ncdump AOD550_on_tas_3day.nc |less

Aerosol CCI with One Time Step

This is as above but subsetting the grid to one time step so that the output can be plotted directly:

$ cis subset tas:tas_day_HadGEM2-ES_rcp45_r1i1p1_20051201-20151130.nc t=[2008-06-12T1,2008-06-12] -o tas_2008-06-12

$ cis col AOD550:20080612093821-ESACCI-L2P_AEROSOL-ALL-AATSR_ENVISAT-ORAC_32855-fv02.02.nc tas_2008-06-12.nc:collocator=bin[fill_value=-9999.0],kernel=mean -o AOD550_on_tas_1day

$ cis plot AOD550:AOD550_on_tas_1day.nc
$ cis plot AOD550:20080612093821-ESACCI-L2P_AEROSOL-ALL-AATSR_ENVISAT-ORAC_32855-fv02.02.nc
$ cis plot tas:tas_2008-06-12.nc

These are the plots before and after collocation:

_images/Aerosol_CCI.png _images/Aerosol_CCI_col.png

Example with NCAR RAF Data

This example uses the data in RF04.20090114.192600_035100.PNI.nc. However, this file does not have standard_name or units accepted as valid by Iris. These were modified using ncdump and ncgen, giving RF04_fixed_AO2CO2.nc:

$ cis subset tas:tas_day_HadGEM2-ES_rcp45_r1i1p1_20051201-20151130.nc t=[2009-01-14T1,2009-01-14] -o tas_2009-01-14

$ cis col AO2CO2:RF04_fixed_AO2CO2.nc tas_2009-01-14.nc:collocator=bin[fill_value=-9999.0],kernel=mean -o RF04_on_tas

$ cis plot AO2CO2:RF04_on_tas.nc:product=NetCDF_Gridded

These are the plots before and after collocation:

_images/RF04.png _images/RF04_col.png

Cloud CCI with One Time Step

This is analogous to the Aerosol CCI example:

$ cis subset tas:tas_day_HadGEM2-ES_rcp45_r1i1p1_20051201-20151130.nc t=[2008-06-20T1,2008-06-20] -o tas_2008-06-20

$ cis col cwp:20080620072500-ESACCI-L2_CLOUD-CLD_PRODUCTS-MODIS-AQUA-fv1.0.nc tas_2008-06-20.nc:collocator=bin[fill_value=-9999.0],kernel=mean -o Cloud_CCI_on_tas

$ cis plot cwp:Cloud_CCI_on_tas.nc
$ cis plot cwp:20080620072500-ESACCI-L2_CLOUD-CLD_PRODUCTS-MODIS-AQUA-fv1.0.nc

These are the plots before and after collocation:

_images/Cloud_CCI.png _images/Cloud_CCI_col.png

File Locations

The files used above can be found at:

/group_workspaces/jasmin/cis/jasmin_cis_repo_test_files/
  20080612093821-ESACCI-L2P_AEROSOL-ALL-AATSR_ENVISAT-ORAC_32855-fv02.02.nc
  20080620072500-ESACCI-L2_CLOUD-CLD_PRODUCTS-MODIS-AQUA-fv1.0.nc
  RF04.20090114.192600_035100.PNI.nc
/group_workspaces/jasmin/cis/example_data/
  RF04_fixed_AO2CO2.nc
/group_workspaces/jasmin/cis/gridded-test-data/cmip5.output1.MOHC.HadGEM2-ES.rcp45.day.atmos.day.r1i1p1.v20111128/
  tas_day_HadGEM2-ES_rcp45_r1i1p1_20051201-20151130.nc

Examples of Gridded to Gridded Collocation

Example of Gridded Data onto a Finer Grid

First to show original data subset to a single time slice:

$ cis subset rsutcs:rsutcs_Amon_HadGEM2-A_sstClim_r1i1p1_185912-188911.nc t=[1859-12-12] -o sub1

Plot for subset data:

$ cis plot rsutcs:sub1.nc

Collocate onto a finer grid, which was created using nearest neighbour:

$ cis col rsutcs:rsutcs_Amon_HadGEM2-A_sstClim_r1i1p1_185912-188911.nc dummy_high_res_cube_-180_180.nc:collocator=nn -o 2
$ cis subset rsutcs:2.nc t=[1859-12-12] -o sub2
$ cis plot rsutcs:sub2.nc

Collocate onto a finer grid, which was created using linear interpolation:

$ cis col rsutcs:rsutcs_Amon_HadGEM2-A_sstClim_r1i1p1_185912-188911.nc dummy_high_res_cube_-180_180.nc:collocator=lin -o 3
$ cis subset rsutcs:3.nc t=[1859-12-12] -o sub3
$ cis plot rsutcs:sub3.nc

Before, after nearest neighbour and after linear interpolation:

_images/OriginalData.png _images/HorizontalNN.png _images/HorizontalLI.png

4D Gridded Data with latitude, longitude, air_pressure and time to a New Grid

$ cis col temp:aerocom.INCA.A2.RAD-CTRL.monthly.temp.2006-fixed.nc dummy_low_res_cube_4D.nc:collocator=lin -o 4D-col

Note the file aerocom.INCA.A2.RAD-CTRL.monthly.temp.2006-fixed.nc has the standard name of presnivs changed to air_pressure, in order to be read correctly.

Slices at Different Pressures
$ cis subset temp:4D-col.nc t=[2006-01],z=[100000] -o sub9
$ cis plot temp:sub9.nc
$ cis subset temp:4D-col.nc t=[2006-01],z=[0] -o sub10
$ cis plot temp:sub10.nc
_images/PressureSlice1.png _images/PressureSlice2.png
Pressure against time
$ cis subset temp:4D-col.nc x=[0],t=[2006-01] -o sub11
$ cis plot temp:sub11.nc --xaxis latitude --yaxis air_pressure
$ cis subset temp:aerocom.INCA.A2.RAD-CTRL.monthly.temp.2006-fixed.nc x=[0],t=[2006-01] -o sub12
$ cis plot temp:sub12.nc --xaxis latitude --yaxis air_pressure
_images/PressureCollocated.png _images/PressureOriginal.png

File Locations

The files used above can be found at:

/group_workspaces/jasmin/cis/sprint_reviews/SR4-IB/gridded_col2

Plotting

Plotting is straightforward:

$ cis plot variable:filenames

This will attempt to locate the variable variable in all of the specified filenames, work out its metadata, such as units, labels, etc. and the appropriate chart type to plot, so that a line graph is used for two dimensional data, a scatter plot is used for three dimensional ungridded data and a heatmap for three dimensional gridded data. Other types of chart can be specified using the --type option. Available types are:

line
a simple line plot, for three dimensional data the third variable is represented by the line colour
scatter
a scatter plot, for three dimensional data the third variable is represented by the maker
heatmap
a heatmap especially suitable for gridded data

Warning

Basemap versions <= 1.0.7 have known issues when plotting heatmaps, particularly when using --xmin or --xmax options. Use a newer version if available, otherwise check your output for validity, especially around the meridians.”

contour
a standard contour plot, see contour options
contourf
a filled contour plot, see contour options

histogram3d

histogram2d

comparativescatter
allows two variables to be plotted against each other, specified as cis plot variable1:filename1 variable2:filename2 --type comparativescatter
overlay
a collection of plots overlaid on one another, see overlay plots
scatteroverlay
a heatmap overlayed with a scatter plot, see scatter-overlay plots

Note that filenames is a non-optional argument used to specify the files to read the variable from. These can be specified as a comma separated list of the following possibilities:

  1. A single filename - this should be the full path to the file
  2. A single directory - all files in this directory will be read
  3. A wildcarded filename - A filename with any wildcards compatible with the python module glob, so that *, ? and [] can all be used. For example /path/to/my/test*file_[0-9].

Note that when using option 2, the filenames in the directory will be automatically sorted into alphabetical order. When using option 3, the filenames matching the wildcard will also be sorted into alphabetical order. The order of the comma separated list will however remain as the user specified, e.g.:

$ cis plot $var:filename1,filename2,wildc*rd,/my/dir/,filename3

would read filename1, then filename2, then all the files that match wildc*rd (in alphabetical order), then all the files in the directory /my/dir/ (in alphabetical order) and then finally filename3.

Plot Options

There are a number of optional arguments, which should be entered as a comma separated list after the mandatory arguments, for example variable:filename:product=Cis,edgecolor=black. The options are:

color
colour of markers, e.g. for scatter plot points or contour lines, see Available Colours and Markers
cmap
colour map to use, e.g. for contour lines or heatmap, see Available Colours and Markers
cmin
the minimum value for the colourmap
cmax
the maximum value for the colourmap
edgecolor
colour of scatter marker edges (can be used to differentiate scatter markers with a colourmap from the background plot)
itemstyle
shape of scatter marker, see Available Colours and Markers
label
name of datagroup for the legend
product
the data product to use for the plot

Additional datagroup options for contour plots only:

contnlevels
the number of levels for the contour plot
contlevels
a list of levels for the contour plot, e.g. contlevels=[0,1,3,10]
contlabel
options are true or false, if true then contour labels are shown
contwidth
width of the contour lines
contfontsize
size for labels on contour plot

Note that label refers to the label the plot will have on the legend, for example if a multi-series line graph or scatter plot is plotted. To set the labels of the axes, use --xlabel and --ylabel. --cbarlabel can be used to set the label on the colour bar.

The axes can be specified with --xaxis and --yaxis. Gridded data supports any coordinate axes available in the file, while ungridded data supports the following coordinate options (if available in the data):

  • latitude
  • longitude
  • time
  • altitude
  • air_pressure
  • variable - the variable being plotted

If the product is not specified, the program will attempt to figure out which product should be used based on the filename. See What kind of data can CIS deal with? to see a list of available products and their file signatures, or run cis plot -h.

Saving to a File

By default a plot will be displayed on screen. To save it to an image file instead, use the --output option. Available output types are png, pdf, ps, eps and svg, which can be selected using the appropriate filename extension, for example --output plot.svg.

Plot Formatting

There are a number of plot formatting options available:

--xlabel
The label for the x axis
--ylabel
The label for the y axis
--cbarlabel
The label for the colorbar
--xtickangle
The angle for the ticks on the x axis
--ytickangle
The angle for the ticks on the y axis
--title
The title of the plot
--itemwidth
The width of an item. Unit are points in the case of a line, and points squared in the case of a scatter point
--fontsize
The size of the font in points
--cmap
The colour map to be used when plotting a 3D plot, see Available Colours and Markers
--height
The height of the plot, in inches
--width
The width of the plot, in inches
--xbinwidth
The width of the histogram bins on the x axis
--ybinwidth
The width of the histogram bins on the y axis
--cbarorient
The orientation of the colour bar, either horizontal or vertical
--nocolourbar
Hides the colour bar on a 3D plot
--grid
Shows grid lines
--plotwidth
width of the plot in inches
--plotheight
height of the plot in inches
--cbarscale
this can be used to change the size of the colourbar when plotting and defaults to 0.55 for vertical colorbars, 1.0 for horizontal.
--coastlinescolour
The colour of the coastlines on a map, see Available Colours and Markers
--nasabluemarble
Use the NASA Blue Marble for the background, instead of coastlines, when doing lat-lon plots

Setting Plot Ranges

The arguments --xmin, --xmax, --xstep, --ymin, --ymax, --ystep, --vmin, --vmax, --vstep can be used to specify the range of values to plot, where x and y correspond to the axes and v corresponds to the colours.

When the arguments refer to dates or times, they should be in the format YYYY-MM-DDThh:mm:ss, where the time is optional. A colon or a space is also a valid date and time separator (if using a space quotes are necessary).

The step arguments are used to specify the tick spacing on the axes and vstep is used to specify the tick spacing on the colorbar.

When the step arguments refer to an amount of time, they should be in the ISO 8061 format PnYnMnDTnHnMnS, where any particular time group is optional, case does not matter, and T can be substituted for either a colon or a space (if using a space quotes are necessary).

For example, to specify a tick spacing of one month and six seconds on the x axis, the following argument should be given: --xstep 1m6S

Note: If a value is negative, then an equals sign must be used, e.g. --xmin=-5.

To plot using a log scale:

--logx
The x axis will be plotted using a log scale of base 10
--logy
The y axis will be plotted using a log scale of base 10
--logv
The values (colours) will be plotted using a log scale of base 10

Overlaying Multiple Plots

Using --type overlay allows multiple files to be specified on the command line to be plotted, each with its own type, which is specified as e.g. type=heatmap, along with the other datagroup options. Currently supported plot types are heatmap, contour, contourf and scatter. An additional datagroup option available is transparency, which allows the transparency for a layer to be set. transparency take a value between 0 and 1, where 0 is completely opaque and 1 fully transparent.

For example, to plot a heatmap and a contour plot the following options can be used:

cis plot var1:file1:type=heatmap var2:file2:type=contour,color=white --type overlay --plotwidth 20 --plotheight 15 --cbarscale 0.5 -o overlay.png

Note that the first file specified is treated in a special way, from this the default plot dimensions are deduced, and the colorbar displayed will be for this datagroup only.

Many more examples are available in the overlay examples page.

Scatter Overlay Plots

Note

Note that scatteroverlay is to be depreciated, as the overlay option will allow a more general method for overlaying multiple datasets

Three types of plot overlay are currently available:

  • Overlaying several line graphs
  • Overlaying several scatter plots
  • Overlaying a heatmap with several scatter graphs

To overlay several line graphs or scatter plots, simply use the plot command as before, but simply specify multiple files and variables, e.g.:

$ cis plot $var1:$filename1:edgecolor=black $var2:$filename2:edgecolor=red

To plot two variables from the same file, simply use the above command with $filename1 in place of $filename2.

To overlay a heatmap with several scatter graphs, use the following command:

$ cis plot $var1:$filename1:label=label1 $var2:$filename2:color=colour2,itemstyle=style2,label=label2 $var3:$filename3:color=colour3,itemstyle=style3,label=label3 --type scatteroverlay

Where `` $filename1 `` refers to the file containing the heatmap data and the other two filenames refer to the files containing the scatter data.

If the scatter data is 3 dimensional, then the colour argument can be omitted and the data will be plotted using the same colour map as the heatmap. This can be overridden by explicitly including the colour argument.

Available Colours and Markers

CIS recognises any valid html colour, specified using its name e.g. red for options such as item colour (line/scatter colour) and the colour of the coast lines.

A list of available colour maps for 3D plots, such as heatmaps, scatter and contour plots, can be found here: colour maps.

For a list of available scatter point styles, see here: scatter point styles.

Evaluation

The Community Intercomparison Suite allows you to perform general arithmetic operations between different variables using the ‘eval’ command. For example, you might want to calculate the (relative) difference between two variables.

Note

All variables used in a evaluation must be of the same shape in order to be compatible, i.e. the same number of points in each dimension, and of the same type (Ungridded or Gridded). This means that, for example, operations between different data products are unlikely to work correctly - performing a collocation or aggregation onto a common grid would be a good pre-processing step.

Warning

This CIS command performs a Python eval() on user input. This has the potential to be a security risk and before deploying CIS to any environment where your user input is untrusted (e.g. if you want to run CIS as a web service) you must satisfy yourself that any security risks have been mitigated. CIS implements the following security restrictions on the expression which is evaluated:

  • The eval() operates in a restricted namespace that only has access to a select handful of builtins (see expr below) - so __import__, for example, is unavailable.
  • The only module available in the namespace is numpy.
  • Any expression containing two consecutive underscores (__) is assumed to be harmful and will not be evaluated.

The evaluate syntax looks like this:

$ cis eval <datagroup>... <expr> <units> [-o [<output_var>:]<outputfile>] [--attributes <attributes>]

where square brackets denote optional commands and:

<datagroup>

is a modified CIS datagroup of the format <variable>[=<alias>]...:<filename>[:product=<productname>]. One or more datagroups should be given.

  • <variable> is a mandatory variable or list of variables to use.

  • <alias> is an optional alternative variable name to use in place of the name given in the file. As you will see in the expression section, the variable names given will need to be valid python variable names, which means:

    1.   They may use only the characters [A-Z], [a-z] and numbers [0-9] provided they do not start with a number

    2.   The only special character which may be used is the underscore (_) - but don’t use two consecutively (see security note)

    3.   Don’t use any of the reserved python keywords such as class or and as variable names (they’re OK if they’re only part of a name though).

    4.   Avoid using names of python builtins like max or abs (again, it’s OK if they’re only part of a name).

    So if the variable name in your file violates these rules (e.g. ‘550-870Angstrom’) use an alias:

    550-870Angstrom=a550to870

  • <filename> is a mandatory file or list of files to read from.

  • <productname> is an optional CIS data product to use (see Data Products):

See Datagroups for a more detailed explanation of datagroups.

<expr>

is the arithmetic expression to evaluate; for example: variable1+variable2. Use the following basic rules to get started:

1.   Use the variable names (or aliases) as given in the datagroups (they’re case-sensitive) - don’t enclose them in quotes.

2.   If your expression contains whitespace, you’ll need to enclose the whole expression in single or double quotes.

3.   Construct your expression using plus +, minus -, times *, divide / , power ** (note that you can’t use ^ for exponents, like you typically can in spreadsheets and some other computer languages). Parentheses () can be used to group elements so that your expression is evaluated in the order you intend.

If you need more functionality, you’re encountering errors or not getting the answer you expect then you should consider the following.

1.   This expression will be evaluated in Python using the eval() method (see security note), so the expression must be a valid Python expression.

2.   The only Python methods available to you are a trimmed down list of the python builtins: ‘abs’, ‘all’, ‘any’, ‘bool’, ‘cmp’, ‘divmod’, ‘enumerate’, ‘filter’, ‘int’, ‘len’, ‘map’, ‘max’, ‘min’, ‘pow’, ‘range’, ‘reduce’, ‘reversed’, ‘round’, ‘sorted’, ‘sum’, ‘xrange’, ‘zip’.

3.   The numpy module is available, so you can use any of its methods e.g. numpy.mean(variable1).

4.   For security reasons, double underscores (__) must not appear anywhere in the expression.

5.   The expression must produce an output array of the same shape as the input variables.

6.   The expression is evaluated at the array level, not at the element level - so the variables in an expression represent numpy arrays, not individual numeric values. This means that numpy.mean([var1,var2]) will give you a combined average over the whole of both arrays (i.e. a single number, not an array), which would be invalid (consider the previous rule). However, you could add the mean (over the whole array) of one variable to every point on a second variable by doing var1 + numpy.mean(var2).

Note

CIS eval command will flatten ungridded data so that structure present in the input files will be ignored. This allows you to compare ungridded data with different shapes, e.g. (3,5) and (15,)

<units>
is a mandatory argument describing the units of the resulting expression. This should be a CF compliant units string, e.g. "kg m^-3". Where this contains spaces, the whole string should be enclosed in quotes.
<outputfile>

is an optional argument specifying the file to output to. This will be automatically given a .nc extension if not present and if the output is ungridded, will be prepended with cis- to identify it as a CIS output file. This must not be the same file path as any of the input files. If not provided, the default output filename is out.nc

  • <output_var> is an optional prefix to the output file argument to specify the name of the output variable within the output file, e.g. -o my_new_var:output_filename.nc. If not provided, the default output variable name is calculated_variable
<attributes>
is an optional argument allowing users to provide additional metadata to be included in the evaluation output variable. This should be indicated by the attributes flag (--attributes or -a). The attributes should then follow in comma-separated, key=value pairs, for example --attributes standard_name=convective_rainfall_amount,echam_version=6.1.00. Whitespace is permitted in both the names and the values, but then must be enclosed in quotes: -a "operating system = "AIX 6.1 Power6". Colons or equals signs may not be used in attribute names or values.

Evaluation Examples

Comparison of annual Aerosol Optical Thickness from models

In this example we compare annual Aerosol Optical Thickness from ECHAM and HadGEM model data. The data used in this example can be found at /group_workspaces/jasmin/cis/data.

First we produce annual averages of our data by aggregating:

$ cis aggregate od550aer:ECHAM_fixed/2007_2D_3hr/od550aer.nc t -o echam-od550aer
$ cis aggregate od550aer:HadGEM_fixed/test_fix/od550aer.nc t -o hadgem-od550aer

$ cis plot od550aer:echam-od550aer.nc --xmin -180 --xmax 180 --cbarorient=horizontal --title="ECHAM AOT550" --vmin=0 --vmax=0.5
$ cis plot od550aer:hadgem-od550aer.nc --xmin -180 --xmax 180 --cbarorient=horizontal --title="HadGEM AOT550" --vmin=0 --vmax=0.5
_images/echam_aggregated.png _images/hadgem_aggregated.png

We then linearly interpolate the HadGEM data onto the ECHAM grid:

$ cis col od550aer:hadgem-od550aer.nc echam-od550aer.nc:collocator=lin -o hadgem-od550aer-collocated

$ cis plot od550aer:hadgem-od550aer-collocated.nc --xmin -180 --xmax 180 --cbarorient=horizontal --title="HadGEM AOT550" --vmin=0 --vmax=0.5
_images/hadgem_collocated.png

Next we subtract the two fields using:

$ cis eval od550aer=a:echam-od550aer.nc od550=b:hadgem-od550aer-collocated.nc "a-b" 1 -o modeldifference

Finally we plot the evaluated output:

$ cis plot od550aer:modeldifference.nc --xmin -180 --xmax 180 --cbarorient=horizontal --title="ECHAM-HadGEM difference AOT550" --v min=-0.25 --vmax=0.2
_images/echam_hadgem_difference.png

Calculation of Angstrom exponent for AERONET data

AERONET data allows us to calculate Angstrom Exponent (AE) and then compare it against the AE already in the file. They should strongly correlate although it is not expected they will be identical due to averaging etc during production of AERONET datafiles.

The file agoufou.lev20 refers to /group_workspaces/jasmin/cis/data/aeronet/AOT/LEV20/ALL_POINTS/920801_121229_Agoufou.lev20

The AE is calculated using an eval statement:

$ cis eval AOT_440,AOT_870:agoufou.lev20 "(-1)* (numpy.log(AOT_870/AOT_440)/numpy.log(870./440.))" 1 -o alfa

Plotting it shows the expected correlation:

$ cis plot 440-870Angstrom:agoufou.lev20 calculated_variable:cis-alfa.nc --type comparativescatter --itemwidth=10 --xlabel="AERONET 440-870Angstrom" --ylabel="AERONET (-1)*(numpy.log(AOT_870/AOT_440)/numpy.log(870./440.))"
_images/angstrom_exponent.png

This correlation can be confirmed by using the CIS stats command:

$ cis stats 440-870Angstrom:agoufou.lev20 calculated_variable:cis-alfa.nc

==================================
RESULTS OF STATISTICAL COMPARISON:
==================================
Number of points: 63126
Mean value of dataset 1: 0.290989032142
Mean value of dataset 2: 0.295878214327
Standard deviation for dataset 1: 0.233995525021
Standard deviation for dataset 2: 0.235381075635
Mean of absolute difference: 0.00488918218519
Standard deviation of absolute difference: 0.00546343157047
Mean of relative difference: 0.0284040419499
Standard deviation of relative difference: 3.95137224542
Spearman's rank coefficient: 0.999750939223
Linear regression gradient: 1.00566622549
Linear regression intercept: 0.003240372714
Linear regression r-value: 0.999746457079
Linear regression standard error: 0.00530006646489

Using Evaluation for Conditional Aggregation

The eval command can be combined with other CIS commands to allow you to perform more complex tasks than would otherwise be possible.

For example, you might want to aggregate a satellite measurement of one variable only when the corresponding cloud cover fraction (stored in separate variable) is less than a certain value. The aggregate command doesn’t allow this kind of conditional aggregation on its own, but you can use an evaluation to achieve this in two stages.

In this example we use the MODIS file MOD04_L2.A2010001.2255.005.2010005215814.hdf in directory /group_workspaces/jasmin/cis/data/MODIS/MOD04_L2/. The optical depth and cloud cover variables can be seen in the following two plots:

$ cis plot Optical_Depth_Land_And_Ocean:MOD04_L2.A2010001.2255.005.2010005215814.hdf --xmin 132 --xmax 162 --ymin -70 --title "Aerosol optical depth" --cbarscale 0.5 --itemwidth 10 -o cloud_fraction.png
$ cis plot Cloud_Fraction_Ocean:MOD04_L2.A2010001.2255.005.2010005215814.hdf --xmin 132 --xmax 162 --ymin -70 --title "Cloud cover fraction" --cbarscale 0.5 --itemwidth 10 -o cloud_fraction.png
_images/modis_optical_depth.png _images/modis_cloud_fraction.png

First we perform an evaluation using the numpy.masked_where method to produce an optical depth variable that is masked at all points where the cloud cover is more than 20%:

$ cis eval Cloud_Fraction_Ocean=cloud,Optical_Depth_Land_And_Ocean=od:MOD04_L2.A2010001.2255.005.2010005215814.hdf "numpy.ma.masked_where(cloud > 0.2, od)" 1 -o od:masked_optical_depth.nc
$ cis plot od:cis-masked_optical_depth.nc --xmin 132 --xmax 162 --ymin -70 --title Aerosol optical depth --cbarscale 0.5 --itemwidth 10 -o masked_optical_depth.png'
_images/modis_masked_optical_depth.png

Then we perform an aggregation on this masked output file to give the end result - aerosol optical depth aggregated only using points where the cloud cover is less than 20%:

$ cis aggregate od:cis-masked_optical_depth.nc x=[132,162,0.5],y=[-70,-57,0.5] -o aggregated_masked_optical_depth
$ cis plot od:aggregated_masked_optical_depth.nc --xmin 132 --xmax 162 --ymin -70 --title "Aerosol optical depth (cloud fraction > 0.2)" --cbarscale 0.5 -o aggregated_aod.png
_images/modis_aggregated_aod.png

Statistics

The Community Intercomparison Suite allows you to perform statistical analysis on two variables using the ‘stats’ command. For example, you might wish to examine the correlation between a model data variable and actual measurements. The ‘stats’ command will calculate:

  1. Number of data points used in the analysis.
  2. The mean and standard deviation of each dataset (separately).
  3. The mean and standard deviation of the absolute difference (var2 - var1).
  4. The mean and standard deviation of the relative difference ((var2 - var1) / var1).
  5. The Linear Pearson correlation coefficient.
  6. The Spearman Rank correlation coefficient.
  7. The coefficients of linear regression (i.e. var2 = a var1 + b ), r-value, and standard error of the estimate.

These values will be displayed on screen and can optionally be save as NetCDF output.

Note

Both variables used in a statistical analysis must be of the same shape in order to be compatible, i.e. the same number of points in each dimension, and of the same type (ungridded or gridded). This means that, for example, operations between different data products are unlikely to work correctly - performing a collocation or aggregation onto a common grid would be a good pre-processing step.

Note

Only points which have non-missing values for both variables will be included in the analysis. The number of points this includes is part of the output of the stats command.

Warning

Unlike aggregation, stats does not currently use latitude weighting to account for the relative areas of different grid cells.

The statistics syntax looks like this:

$ cis stats <datagroup>... [-o <outputfile>]

where:

<datagroup>

is a CIS datagroup specifying the variables and files to read and is of the format <variable>...:<filename>[:product=<productname>] where:

  • <variable> is a mandatory variable or list of variables to use.
  • <filenames> is a mandatory file or list of files to read from.
  • <productname> is an optional CIS data product to use (see Data Products):

One or more datagroups should be given, but the total number of variables declared in all datagroups must be exactly two. See Datagroups for a more detailed explanation of datagroups.

<outputfile>
is an optional argument specifying a file to output to. This will be automatically given a .nc extension if not present. This must not be the same file path as any of the input files. If not provided, then the output will not be saved to a file and will only be displayed on screen.

Statistics Example

In this example, we perform a statistical comparison of Aeronet aerosol optical thickness at two wavelengths. The data we are using is shown in the following CIS plot commands and can be found at /group_workspaces/jasmin/cis/data:

$ cis plot AOT_500:aeronet/AOT/LEV20/ALL_POINTS/920801_121229_Yonsei_University.lev20 --title "Aerosol optical thickness 550nm"
$ cis plot AOT_440:aeronet/AOT/LEV20/ALL_POINTS/920801_121229_Yonsei_University.lev20 --title "Aerosol optical thickness 440nm"
_images/stats-aero500.png _images/stats-aero440.png

We then perform a statistical comparison of these variables using:

$ cis stats AOT_500,AOT_440:aeronet/AOT/LEV20/ALL_POINTS/920801_121229_Yonsei_University.lev20

Which gives the following output:

===================================================================
RESULTS OF STATISTICAL COMPARISON:
-------------------------------------------------------------------
Compared all points which have non-missing values in both variables
===================================================================
Number of points: 10727
Mean value of dataset 1: 0.427751965508
Mean value of dataset 2: 0.501316673814
Standard deviation for dataset 1: 0.307680514916
Standard deviation for dataset 2: 0.346274598431
Mean of absolute difference: 0.0735647083061
Standard deviation of absolute difference: 0.0455684788406
Mean of relative difference: 0.188097066086
Standard deviation of relative difference: 0.0528621773819
Spearman's rank coefficient: 0.998289763952
Linear regression gradient: 1.12233533743
Linear regression intercept: 0.0212355272705
Linear regression r-value: 0.997245296339
Linear regression standard error: 0.0256834603945

Overlay Plot Examples

First subset some gridded data that will be used for the examples:

cis subset od550aer:aerocom.HadGEM3-A-GLOMAP.A2.CTRL.monthly.od550aer.2006.nc t=[2006-10-13] -o HadGEM_od550aer-subset

cis subset rsutcs:aerocom.HadGEM3-A-GLOMAP.A2.CTRL.monthly.rsutcs.2006.nc t=[2006-10-13] -o HadGEM_rsutcs-subset

Contour over heatmap

cis plot od550aer:HadGEM_od550aer-subset.nc:type=heatmap rsutcs:HadGEM_rsutcs-subset.nc:type=contour,color=white,contlevels=[1,10,25,50,175] --type overlay --plotwidth 20 --plotheight 15 --cbarscale 0.5 -o overlay1.png
_images/overlay1.png
cis plot od550aer:HadGEM_od550aer-subset.nc:type=heatmap,cmap=binary rsutcs:HadGEM_rsutcs-subset.nc:type=contour,cmap=jet,contlevels=[1,10,25,50,175] --type overlay --xmin -180 --xmax 180 --plotwidth 20 --plotheight 15 --cbarscale 0.5 -o overlay2.png
_images/overlay2.png

Filled contour with transparency on NASA Blue Marble

cis plot od550aer:HadGEM_od550aer-subset.nc:cmap=Reds,type=contourf,transparency=0.5,cmin=0.15 --type overlay --xmin -180 --xmax 180 --plotwidth 20 --plotheight 15 --cbarscale 0.5 --nasabluemarble
_images/overlay3.png

Scatter plus Filled Contour

cis subset rsutcs:HadGEM_rsutcs-subset.nc x=[-180,-90],y=[0,90] -o HadGEM_rsutcs-subset2

cis plot GGALT:RF04.20090114.192600_035100.PNI.nc:type=scatter rsutcs:HadGEM_rsutcs-subset2.nc:type=contourf,contlevels=[0,10,20,30,40,50,100],transparency=0.7,contlabel=true,contfontsize=18 --type overlay --plotwidth 20 --plotheight 15 --xaxis longitude --yaxis latitude --xmin -180 --xmax -90 --ymin 0 --ymax 90 --itemwidth 20 -o overlay4.png
_images/overlay4.png
cis plot GGALT:RF04.20090114.192600_035100.PNI.nc:type=scatter rsutcs:HadGEM_rsutcs-subset2.nc:type=contourf,contlevels=[40,50,100],transparency=0.3,contlabel=true,contfontsize=18,cmap=Reds --type overlay --plotwidth 20 --plotheight 15 --xaxis longitude --yaxis latitude --xmin -180 --xmax -90 --ymin 0 --ymax 90 --itemwidth 20 --nasabluemarble -o overlay5.png
_images/overlay5.png

File Locations

The gridded data files can be found at:

/group_workspaces/jasmin/cis/AeroCom/A2/HadGEM3-A-GLOMAP.A2.CTRL/renamed

and the ungridded:

/group_workspaces/jasmin/cis/jasmin_cis_repo_test_files

How can I read my own data?

Introduction

One of the key strengths of CIS is the ability for users to create their own plugins to read data which CIS doesn’t currently support. These plugins can then be shared with the community to allow other users access to that data. Although the plugins are written in Python this tutorial assumes no experience in Python. Some programming experience is however assumed.

Note

Any technical details that may be useful to experienced Python programmers will be highlighted in this style - they aren’t necessary for completing the tutorial.

Here we describe the process of creating and sharing a plugin. A CIS plugin is simply a python (.py) file with a set of methods (or functions) to describe how the plugin should behave.

Note

The methods for each plugin are described within a Class, this gives the plugin a name and allows CIS to ensure that all of the necessary methods have been implemented.

There are a few methods that the plugin must contain, and some which are optional. A skeleton plugin would look like this:

class MyProd(AProduct):
    def get_file_signature(self):
    # Code goes here

    def create_coords(self, filenames):
        ...

    def create_data_object(self, filenames, variable):
        ...

Note that in python whitespace matters! When filling in the above methods the code for the method should be indented from the signature by four spaces like this:

Class MyProd(AProduct):

    def get_file_signature(self):
        # Code goes here
        foo = bar

Note also that the name of the plugin (MyProd) in this case should be changed to describe the data which it will read. (Don’t change the AProduct part though – this is important for telling CIS that this is a plugin for reading data.)

Note

The plugin class subclasses AProduct which is the abstract class which defines the methods that the plugin needs to override. It also includes a few helper functions for error catching.

When CIS looks for data plugins it searches for all classes which sub-class AProduct. There are also plugins available for collocation with their own abstract base classes, so that users can store multiple plugin types in the same plugin directory.

In order to turn the above skeleton into a working plugin we need to fill in each of the methods with the some code, which turns our data into something CIS will understand. Often it is easiest to start from an existing plugin that reads closely matching data. For example creating a plugin to read some other CCI data would probably be easiest to start from the Cloud or Aerosol CCI plugins. We have created three different tutorials to walk you through the creation of some of the existing plugins to try and illustrate the process. The Easy tutorial walks through the creation of a basic plugin, the Medium tutorial builds on that by creating a plugin which has a bit more detail, and finally the Advanced plugin talks through some of the main considerations when creating a large and complicated plugin.

A more general template plugin is included here in case no existing plugin matches your need. We have also created a short reference describing the purpose of each method the plugins implement here.

Note

Plugins aren’t the only way you can contribute though. CIS is an open source project hosted on GitHub, so please feel free to submit pull-requests for new features or bug-fixes – just check with the community first so that we’re not duplicating our effort.

Using and testing your plugin

It is important that CIS knows where to look to find your new plugin, and this is easily done by setting the environment variable CIS_PLUGIN_HOME to point to the directory within which your plugin is stored.

Once you have done this CIS will automatically use your plugin for reading any files which match the file signature you used.

If you have any issues with this (because for example the file signature clashes with a built-in plugin) you can tell CIS to use your plugin when reading data by simply specifying it after the variable and filename in most CIS commands, e.g.:

cis subset a_variable:filename.nc:product=MyProd ...

Sharing your plugin

This is the easy bit! Once you’re happy that your plugin can fairly reliably read a currently unsupported dataset you should share it with the community. Use the upload form here to submit your plugin to the community.

We moderate the plugins we receive to ensure the plugins received are appropriate and meet a minimum level of quality. We’re not expecting the plugins to necessarily be production quality code but we do expect them to work for the subset of data they claim to. Having said that, if we feel a plugin provides really a valuable capability and is of high quality we may incorporate that plugin into the core CIS data readers – with credit to the author of course!

Tutorials

Easy

A simple plugin to start with is the plugin for reading native ungridded CIS data.

One of the first things to consider is which type of file our plugin is going to be aimed at reading. It is advisable to not make the definition too broad, it’s easy to have multiple plugins so don’t try and over complicate the plugin by having it read many different types of file. Roughly, one plugin should describe a set of data with the same metadata.

Since the CIS plugin is designed to read any data which CIS produces, the signature matches any file which starts with cis- and ends with .nc:

def get_file_signature(self):
    return [r'cis\\-.\*\\.nc']

This uses a wildcard string to tell CIS which files do and don’t match for our product.

Note

For an introduction to regular expressions see, for example, https://docs.python.org/2/howto/regex.html

The next step is to complete the AProduct.create_coords() method. CIS uses this method to create a set of coordinates from the data, so it needs to return any appropriate coordinates in the shape that CIS expects it.

There are a number of low-level data reading routines within CIS that can help you read in your data. For the CIS plugin (which is reading netCDF data) we use two methods from the cis.data_io.netcdf module: read_many_files_individually and get_metadata. We also import the Coord data type, which is where we store the coordinates that we’ve read, and UngriddedCoordinates - which is what we return to CIS.

Note

In python it’s very easy to import classes and methods from other modules within your package, and across packages using the from and import commands. The file-reading routines used here are used by many of the other data products. See the API section for further details about using CIS as a python library.

Don’t worry too much about what these methods do at this stage, just use the import lines below and you should be fine.

def create_coords(self, filenames, usr_variable=None):
    from cis.data_io.netcdf import read_many_files_individually, get_metadata
    from cis.data_io.Coord import Coord, CoordList
    from cis.data_io.ungridded_data import UngriddedCoordinates

Next, we create a list of netCDF variable names which we know are stored in our file and send that to the file reading routine:

var_data = read_many_files_individually(filenames, ["longitude","latitude", "time"])

Then we create a CoordList to store our coordinates in, a Coord for each of those coordinate variables, and then just give them a short label for plotting purposes (x,y,z etc) – it is strongly advisable that you use the standard definitions used below for your axis definitions (and use z for altitude and p for pressure).

coords = CoordList()
coords.append(Coord(var_data[“longitude”,get_metadata(var_data[“longitude”][0]),axis=”x”))
coords.append(Coord(var_data[“latitude”,get_metadata(var_data[“latitude”][0]),axis=”y”))
coords.append(Coord(var_data[“time”,get_metadata(var_data[“time”][0]),axis=”t”))

That’s it, now we can return those coordinates in a way that CIS will understand:

return UngriddedCoordinates(coords)

The last method we have to write is the AProduct.create_data_object() method, which is used by CIS to pull together the coordinates and a particular data variable into an UngriddedData object. It’s even simpler than the previous method. We can use the same read_many_files_individually method as we did before, and this time pass it the variable the user has asked for:

def create_data_object(self, filenames, variable):
    from cis.data_io.ungridded_data import UngriddedData
    usr_var_data = read_many_files_individually(filenames,variable)[variable]

Then we create the coordinates using the create_coords() method we’ve just written:

coords = self.create_coords(filename)

And finally we return the ungridded data, this combines the coordinates from the file and the variable requested by the user:

return UngriddedData(usr_var_data, get_metadata(usr_var_data[0]),coords)

Bringing it all together, tidying it up a bit and including some error catching gives us:

import logging
from cis.data_io.products.AProduct import AProduct
from cis.data_io.netcdf import read_many_files_individually, get_metadata

class cis(AProduct):

     def get_file_signature(self):
         return [r'cis\-.*\.nc']

     def create_coords(self, filenames, usr_variable=None):
         from cis.data_io.Coord import Coord, CoordList
         from cis.data_io.ungridded_data import UngriddedCoordinates
         from cis.exceptions import InvalidVariableError

         variables = [("longitude", "x"), ("latitude", "y"), ("altitude", "z"), ("time", "t"), ("air_pressure", "p")]

         logging.info("Listing coordinates: " + str(variables))

         coords = CoordList()
         for variable in variables:
             try:
                 var_data = read_many_files_individually(filenames,variable[0])[variable[0]]
                 coords.append(Coord(var_data, get_metadata(var_data[0]),axis=variable[1]))
             except InvalidVariableError:
                 pass

         return UngriddedCoordinates(coords)

     def create_data_object(self, filenames, variable):
         from cis.data_io.ungridded_data import UngriddedData
         usr_var_data = read_many_files_individually(filenames,variable)[variable]
         coords = self.create_coords(filename)
         return UngriddedData(usr_var_data, get_metadata(usr_var_data[0]), coords)

Medium

For this example we will look at the AERONET data reading plugin. AERONET is a ground based sun-photometer network that produces time-series data for each groundstation in a csv based text file. There is some information about the ground station in the header of the file, and then a table of data with a time column, and a column for each of the measured values.

The AProduct.get_file_signature() method is straightforward, so we first consider the AProduct.create_coords() method. Here we have actually shifted all of the work to a private method called _create_coord_list(), for reasons which we will explain shortly:

def create_coords(self, filenames, variable=None):
    return UngriddedCoordinates(self._create_coord_list(filenames))

Note

In python there is not really such a thing as a ‘private’ method as there is in Java and C#, but we can signify that a method shouldn’t be accessed externally by starting its name with one or two underscores.

In this method we import an AERONET data reading routine:

def _create_coord_list(self, filenames, data=None):
    from cis.data_io.ungridded_data import Metadata
    from cis.data_io.aeronet import load_multiple_aeronet

This data reading routine actually performs much of the hard work in reading the AERONET file:

if data is None:
    data = load_multiple_aeronet(filenames)

Note that we only read the files if Data is None, that is if we haven’t been passed any data already.

Note

The load_multiple_aeronet routine uses the numpy genfromtext method to read in the csv file. This is a very useful method for reading text based files as it allows you to define the data formats of each of the columns, tell it which lines to ignore as comments and, optionally, mask out any missing values. This method would provide a useful example for reading different kinds of text based file.

We just have to describe (add metadata to) each of the components in this method:

coords = CoordList()
coords.append(Coord(data['longitude'], Metadata(name="Longitude",shape=(len(data),),units="degrees_east", range=(-180, 180))))
coords.append(Coord(data['latitude'], Metadata(name="Latitude",shape=(len(data),),units="degrees_north", range=(-90, 90))))
coords.append(Coord(data['altitude'], Metadata(name="Altitude",shape=(len(data),), units="meters")))
time_coord = Coord(data["datetime"], Metadata(name="DateTime",standard_name='time', shape=(len(data),),units="DateTime Object"), "X")

Note that we’ve explicitly added things like units and a shape. These are sometimes already populated for us when reading e.g. NetCDF files, but in the case of AERONET data we have to fill it out ‘by hand’.

Internally CIS uses a ‘standard’ time defined as fractional days since the 1st January 1600, on a Gregorian calendar. This allows us to straightforwardly compare model and measurement times regardless of their reference point. There are many helper methods for converting different date-time formats to this standard time, here we use Coord.convert_datetime_to_standard_time(), and then include the coordinate in the coordinate list:

time_coord.convert_datetime_to_standard_time()
coords.append(time_coord)

Finally we return the coordinates:

return coords

For the create_data_object() method we have the familiar signature and import statements:

def create_data_object(self, filenames, variable):
    from cis.data_io.aeronet import load_multiple_aeronet
    from cis.exceptions import InvalidVariableError

We can pass the job of reading the data to our AERONET reading routine – catching any errors which occur because the variable doesn’t exist.

try:
    data_obj = load_multiple_aeronet(filenames, [variable])
except ValueError:
    raise InvalidVariableError(variable + " does not exist in " + str(filenames))

Note

Notice here that we’re catching a ValueError – which Numpy throws when it can’t find the specified variable in the data, and rethrowing the same error as an InvalidVariableError, so that CIS knows how to deal with it. Any plugins should use this error when a user specifies a variable which isn’t within the specified file.

Now we have read the data, we load the coordinate list, but notice that we also pass in the data we’ve just read. This is why we created a separate coordinate reading routine earlier: The data containing the coordinates has already been read in the line above, so we don’t need to read it twice, we just need to pull out the coordinates. This saves time opening the file multiple times, and can be a useful pattern to remember for files which aren’t direct access (such as text files).

coords = self._create_coord_list(filenames, data_obj)

Finally we return the complete data object, including some associated metadata and the coordinates.

return UngriddedData(data_obj[variable], Metadata(name=variable, long_name=variable, shape=(len(data_obj),), missing_value=-999.0), coords)

Here’s the plugin in full:

class Aeronet(AProduct):

    def get_file_signature(self):
        return [r'.*\.lev20']

    def _create_coord_list(self, filenames, data=None):
        from cis.data_io.ungridded_data import Metadata
        from cis.data_io.aeronet import load_multiple_aeronet

        if data is None:
            data = load_multiple_aeronet(filenames)

        coords = CoordList()
        coords.append(Coord(data['longitude'], Metadata(name="Longitude", shape=(len(data),),
                                                        units="degrees_east", range=(-180, 180))))
        coords.append(Coord(data['latitude'], Metadata(name="Latitude", shape=(len(data),),
                                                       units="degrees_north", range=(-90, 90))))
        coords.append(Coord(data['altitude'], Metadata(name="Altitude", shape=(len(data),), units="meters")))
        time_coord = Coord(data["datetime"], Metadata(name="DateTime", standard_name='time', shape=(len(data),),
                                                      units="DateTime Object"), "X")
        time_coord.convert_datetime_to_standard_time()
        coords.append(time_coord)

        return coords

    def create_coords(self, filenames, variable=None):
        return UngriddedCoordinates(self._create_coord_list(filenames))

    def create_data_object(self, filenames, variable):
        from cis.data_io.aeronet import load_multiple_aeronet
        from cis.exceptions import InvalidVariableError

        try:
            data_obj = load_multiple_aeronet(filenames, [variable])
        except ValueError:
            raise InvalidVariableError(variable + " does not exist in " + str(filenames))

        coords = self._create_coord_list(filenames, data_obj)

        return UngriddedData(data_obj[variable],
                             Metadata(name=variable, long_name=variable, shape=(len(data_obj),), missing_value=-999.0),
                             coords)

Advanced

This more advanced tutorial will cover some of the difficulties when reading in data which differs significantly from the structure CIS expects, and/or has little metadata in the associated files. We take the MODIS L2 plugin as our example, and discuss each method in turn.

There are a number of specific MODIS L2 products which we have tested using this plugin, each with their own file signature, and so in this plugin we take advantage of the fact that the regular expression returned by get_file_signature can be a list. This way we create a simple regular expression for each MODIS L2 products that we’re supporting - rather than trying to create one, more complicated, regular expression which matches just these products at the exclusion of all others:

def get_file_signature(self):
    product_names = ['MYD06_L2', 'MOD06_L2', 'MYD04_L2', 'MOD04_L2']
    regex_list = [r'.*' + product + '.*\.hdf' for product in product_names]
    return regex_list

We have implemented the optional get_variable_names method here because MODIS files sometimes contain variables which CIS is unable to handle due to their irregular shape. We only want to report the variable which CIS can read so we check each variable before adding it to the list of variables we return. We know that MODIS only contains SD variables so we can ignore any other types.

Note

HDF files can contain both Vdatas (VD) and Scientific Datasets (SD) data collections (among others). These are stored and accessed quite differently, which makes dealing with these files quite fiddly - we often have to treat each case separately. In this case we know MODIS files only have SD datasets which makes things a bit simpler.

def get_variable_names(self, filenames, data_type=None):
    import pyhdf.SD

    # Determine the valid shape for variables
    sd = pyhdf.SD.SD(filenames[0])
    datasets = sd.datasets()
    valid_shape = datasets['Latitude'][1]  # Assumes that latitude shape == longitude shape (it should)

    variables = set([])
    for filename in filenames:
        sd = pyhdf.SD.SD(filename)
        for var_name, var_info in sd.datasets().iteritems():
            if var_info[1] == valid_shape:
                variables.add(var_name)

    return variables

MODIS data often has a scale factor built in, and stored against each variable, this method reads that scale factor for a particular variable and checks it against our built-in list of scale factors.

def __get_data_scale(self, filename, variable):
    from cis.exceptions import InvalidVariableError
    from pyhdf import SD

    try:
        meta = SD.SD(filename).datasets()[variable][0][0]
    except KeyError:
        raise InvalidVariableError("Variable "+variable+" not found")

    for scaling in self.modis_scaling:
        if scaling in meta:
            return scaling
    return None

In order to use data which has been scaled, we re-scale it on reading. This creates some overhead in the reading of the data, but saves considerable time when performing other operations on it later in the process. Routines like this can often be adapted from available Fortran or IDL routines (assuming no python routines are available) for your data.

def __field_interpolate(self,data,factor=5):
    '''
    Interpolates the given 2D field by the factor,
    edge pixels are defined by the ones in the centre,
    odd factors only!
    '''
    import numpy as np

    logging.debug("Performing interpolation...")

    output = np.zeros((factor*data.shape[0],factor*data.shape[1]))*np.nan
    output[int(factor/2)::factor,int(factor/2)::factor] = data
    for i in range(1,factor+1):
        output[(int(factor/2)+i):(-1*factor/2+1):factor,:] = i*((output[int(factor/2)+factor::factor,:]-output[int(factor/2):(-1*factor):factor,:])
                                                                /float(factor))+output[int(factor/2):(-1*factor):factor,:]
    for i in range(1,factor+1):
        output[:,(int(factor/2)+i):(-1*factor/2+1):factor] = i*((output[:,int(factor/2)+factor::factor]-output[:,int(factor/2):(-1*factor):factor])
                                                                /float(factor))+output[:,int(factor/2):(-1*factor):factor]
    return output

Next we read the coordinates from the file (using the same method of factoring out as we used in the Aeronet case).

def _create_coord_list(self, filenames, variable=None):
    import datetime as dt

    variables = ['Latitude', 'Longitude', 'Scan_Start_Time']
    logging.info("Listing coordinates: " + str(variables))

As usual we rely on the lower level IO reading routines to provide the raw data, in this case using the hdf.read routine.

sdata, vdata = hdf.read(filenames, variables)

Note

Notice we have to put the vdata data somewhere, even though we don’t use it in this case.

We have to check whether we need to scale the coordinates to match the variable being read:

apply_interpolation = False
if variable is not None:
    scale = self.__get_data_scale(filenames[0], variable)
    apply_interpolation = True if scale is "1km" else False

Then we can read the coordinates, one at a time. We know the latitude information is stored in an SD dataset called Latitude, so we read that and interpolate it if needed.

lat = sdata['Latitude']
sd_lat = hdf.read_data(lat, "SD")
lat_data = self.__field_interpolate(sd_lat) if apply_interpolation else sd_lat
lat_metadata = hdf.read_metadata(lat, "SD")
lat_coord = Coord(lat_data, lat_metadata,'Y')

The same for Longitude:

lon = sdata['Longitude']
lon_data = self.__field_interpolate(hdf.read_data(lon,"SD")) if apply_interpolation else hdf.read_data(lon,"SD")
lon_metadata = hdf.read_metadata(lon,"SD")
lon_coord = Coord(lon_data, lon_metadata,'X')

Next we read the time variable, remembering to convert it to our internal standard time. (We know that the MODIS’ atomic clock time is referenced to the 1st January 1993.)

time = sdata['Scan_Start_Time']
time_metadata = hdf.read_metadata(time,"SD")
# Ensure the standard name is set
time_metadata.standard_name = 'time'
time_coord = Coord(time,time_metadata,"T")
time_coord.convert_TAI_time_to_std_time(dt.datetime(1993,1,1,0,0,0))

return CoordList([lat_coord,lon_coord,time_coord])
def create_coords(self, filenames, variable=None):
    return UngriddedCoordinates(self._create_coord_list(filenames))

For the create_data_object we are really just pulling the above methods together to read the specific variable the user has requested and combine it with the coordinates.

def create_data_object(self, filenames, variable):
    logging.debug("Creating data object for variable " + variable)

    # reading coordinates
    # the variable here is needed to work out whether to apply interpolation to the lat/lon data or not
    coords = self._create_coord_list(filenames, variable)

    # reading of variables
    sdata, vdata = hdf.read(filenames, variable)

    # retrieve data + its metadata
    var = sdata[variable]
    metadata = hdf.read_metadata(var, "SD")

    return UngriddedData(var, metadata, coords)

We have also implemented the AProduct.get_file_format() method which allows some associated tools (for example the CEDA_DI tool) to use CIS to index files which they wouldn’t otherwise be able to read. We just return a file format descriptor as a string.

def get_file_format(self, filenames):
    """
    Get the file format
    :param filenames: the filenames of the file
    :return: file format
    """

    return "HDF4/ModisL2"

The full MODIS L2 plugin is rather long to show but can be downloaded here.

Data plugin reference

This section provides a reference describing the expected behaviour of each of the functions a plugin can implement. The following methods are mandatory:

AProduct.get_file_signature()

This method should return a list of regular expressions, which CIS uses to decide which data product to use for a given file. If more than one regular expression is provided in the list then the file can match any of the expressions. The first product with a signature that matches the filename will be used. The order in which the products are searched is determined by the priority property, highest value first; internal products generally have a priority of 10.

For example, this would match all files with a name containing the string ‘CODE’ and with the ‘nc’ extension.:

return [r'.*CODE*.nc']

Note

If the signature has matched the framework will call AProduct.get_file_type_error(), this gives the product a chance to open the file and check the contents.

Returns:A list of regex to match the product’s file naming convention.
Return type:list
AProduct.create_coords(filenames)

Reads the coordinates from one or more files. Note that this method may have to make certain assumptions about the file in order to return a single coordinate set. The user should be warned through the logger if this is the case.

Parameters:filenames (list) – List of filenames to read coordinates from
Returns:CommonData object
AProduct.create_data_object(filenames, variable)

Create and return an CommonData object for a given variable from one or more files.

Parameters:
  • filenames (list) – List of filenames of files to read
  • variable (str) – Variable to read from the files
Returns:

An CommonData object representing the specified variable

Raises:
  • FileIOError – Unable to read a file
  • InvalidVariableError – Variable not present in file

While these may be implemented optionally:

AProduct.get_variable_names(filenames, data_type=None)

Get a list of available variable names from the filenames list passed in. This general implementation can be overridden in specific products to include/exclude variables which may or may not be relevant. The data_type parameter can be used to specify extra information.

Parameters:
  • filenames (list) – List of string filenames of files to be read from
  • data_type (str) – ‘SD’ or ‘VD’ to specify only return SD or VD variables from HDF files. This may take on other values in specific product implementations.
Returns:

A set of variable names as strings

Return type:

str

AProduct.get_file_type_error(filename)

Check a single file to see if it is of the correct type, and if not return a list of errors. If the return is None then there are no errors and this is the correct data product to use for this file.

This method gives a mechanism for a data product to identify itself as the correct product when a specific enough file signature cannot be provided. For example GASSP is a type of NetCDF file and so filenames end with .nc but so do other NetCDF files, so the data product opens the file and looks for the GASSP version attribute, and if it doesn’t find it returns an error.

Parameters:filename (str) – The filename for the file
Returns:List of errors, or None
Return type:list or None
AProduct.get_file_format(filename)

Returns a file format hierarchy separated by slashes, of the form TopLevelFormat/SubFormat/SubFormat/Version. E.g. NetCDF/GASSP/1.0, ASCII/ASCIIHyperpoint or HDF4/CloudSat. This is mainly used within the ceda_di indexing tool. If not set it will default to the products name.

A filename of an example file can be provided to enable the determination of, for example, a dataset version number.

Parameters:filename (str) – Filename of file to be inspected
Returns:File format, of the form [parent/]format/specific instance/version, or the class name
Return type:str
Raises:FileFormatError if there is an error

Analysis plugin development

Users can write their own plugins for performing the collocation of two data sets. There are three different types of plugin available for collocation, first we will describe the overall design and how these different components interact, then each will be described in more detail.

Basic collocation design

The diagram below demonstrates the basic design of the collocation system, and the roles of each of the components. In the simple case of the default collocator (which returns only one value) the Collocator loops over each of the sample points, calls the relevant Constraint to reduce the number of data points, and then the Kernel which returns a single value, which the collocator stores.

_images/CollocationDiagram.png

Kernel

A kernel is used to convert the constrained points into values in the output. There are two sorts of kernel one which act on the final point location and a set of data points (these derive from Kernel) and the more specific kernels which act upon just an array of data (these derive from AbstractDataOnlyKernel, which in turn derives from Kernel). The data only kernels are less flexible but should execute faster. To create a new kernel inherit from Kernel and implement the abstract method Kernel.get_value(). To make a data only kernel inherit from AbstractDataOnlyKernel and implement AbstractDataOnlyKernel.get_value_for_data_only() and optionally overload AbstractDataOnlyKernel.get_value(). These methods are outlined below.

Kernel.get_value(point, data)

This method should return a single value (if Kernel.return_size is 1) or a list of n values (if Kernel.return_size is n) based on some calculation on the data given a single point.

The data is deliberately left unspecified in the interface as it may be any type of data, however it is expected that each implementation will only work with a specific type of data (gridded, ungridded etc.) Note that this method will be called for every sample point and so could become a bottleneck for calculations, it is advisable to make it as quick as is practical. If this method is unable to provide a value (for example if no data points were given) a ValueError should be thrown.

Parameters:
  • point – A single HyperPoint
  • data – A set of data points to reduce to a single value
Returns:

For return_size=1 a single value (number) otherwise a list of return values, which represents some operation on the points provided

Raises ValueError:
 

When the method is unable to return a value

AbstractDataOnlyKernel.get_value_for_data_only(values)

This method should return a single value (if Kernel.return_size is 1) or a list of n values (if Kernel.return_size is n) based on some calculation on the the values (a numpy array).

Note that this method will be called for every sample point in which data can be placed and so could become a bottleneck for calculations, it is advisable to make it as quick as is practical. If this method is unable to provide a value (for example if no data points were given) a ValueError should be thrown. This method will not be called if there are no values to be used for calculations.

Parameters:values – A numpy array of values (can not be none or empty)
Returns:A single data item if return_size is 1 or a list of items containing Kernel.return_size items
Raises ValueError:
 If there are any problems creating a value

Constraint

The constraint limits the data points for a given sample point. The user can also add a new constraint mechanism by subclassing Constraint and providing an implementation for Constraint.constrain_points(). If more control is needed over the iteration sequence then the Constraint.get_iterator() method can also be overloaded. Note however that this may not be respected by all collocators, who may still iterate over all sample data points. It is possible to write your own collocator (or extend an existing one) to ensure the correct iterator is used - see the next section. Both these methods, and their signatures, are outlined below.

Constraint.constrain_points(point, data)

This method should return a subset of the data given a single reference point. It is expected that the data returned should be of the same type as that given - but this isn’t mandatory. It is possible that this function will return zero points (no data), the collocation class is responsible for providing a fill_value.

Parameters:
  • point (HyperPoint) – A single HyperPoint
  • data – A set of data points to be reduced
Returns:

A reduced set of data points

Constraint.get_iterator(missing_data_for_missing_sample, coord_map, coords, data_points, shape, points, output_data)

Iterator to iterate through the points needed to be calculated. The default iterator, iterates through all the sample points calling Constraint.constrain_points() for each one.

Parameters:
  • missing_data_for_missing_sample – If true anywhere there is missing data on the sample then final point is missing; otherwise just use the sample
  • coord_map – Coordinate map - list of tuples of indexes of hyperpoint coord, data coords and output coords
  • coords – The coordinates to map the data onto
  • data_points – The (non-masked) data points
  • shape – Shape of the final data values
  • points – The original points object, these are the points to collocate
  • output_data – Output data set
Returns:

Iterator which iterates through (sample indices, hyper point and constrained points) to be placed in these points

To enable a constraint to use a AbstractDataOnlyKernel, the method get_iterator_for_data_only() should be implemented (again though, this may be ignored by a collocator). An example of this is the BinnedCubeCellOnlyConstraint.get_iterator_for_data_only() implementation.

Collocator

Another plugin which is available is the collocation method itself. A new one can be created by subclassing Collocator and providing an implementation for Collocator.collocate(). This method takes a number of sample points and applies the given constraint and kernel methods on the data for each of those points. It is responsible for returning the new data object to be written to the output file. As such, the user could create a collocation routine capable of handling multiple return values from the kernel, and hence creating multiple data objects, by creating a new collocation method.

Note

The collocator is also responsible for dealing with any missing values in sample points. (Some sets of sample points may include values which may or may not be masked.) Sometimes the user may wish to mask the output for such points, the missing_data_for_missing_sample attribute is used to determine the expected behaviour.

The interface is detailed here:

Collocator.collocate(points, data, constraint, kernel)

The method is responsible for setting up and running the collocation. It should take a set of data and map that onto the given (sample) points using the constraint and kernel provided.

Parameters:
Returns:

One or more CommonData (or subclasses of) objects whose coordinates lie on the points defined above.

Implementation

For all of these plugins any new variables, such as limits, constraint values or averaging parameters, are automatically set as attributes in the relevant object. For example, if the user wanted to write a new constraint method (AreaConstraint, say) which needed a variable called area, this can be accessed with self.area within the constraint object. This will be set to whatever the user specifies at the command line for that variable, e.g.:

$ ./cis.py col my_sample_file rain:"model_data_?.nc"::AreaConstraint,area=6000,fill_value=0.0:nn_gridded

Example implementations of new collocation plugins are demonstrated below for each of the plugin types:

class MyCollocator(Collocator):

    def collocate(self, points, data, constraint, kernel):
        values = []
        for point in points:
            con_points = constraint.constrain_points(point, data)
            try:
                values.append(kernel.get_value(point, con_points))
            except ValueError:
                values.append(constraint.fill_value)
        new_data = LazyData(values, data.metadata)
        new_data.missing_value = constraint.fill_value
        return new_data


class MyConstraint(Constraint):

    def constrain_points(self, ref_point, data):
        con_points = []
        for point in data:
            if point.value > self.val_check:
                con_points.append(point)
        return con_points


class MyKernel(Kernel):

    def get_value(self, point, data):
        nearest_point = point.furthest_point_from()
        for data_point in data:
            if point.compdist(nearest_point, data_point):
                nearest_point = data_point
        return nearest_point.val

Maintenance and Developer Guide

Source files

The cis source code is hosted at https://github.com/cedadev/jasmin_cis.git, while the conda recipes and other files are hosted here: https://github.com/cistools.

Test suites

The unit tests suite can be ran using Nose readily. Just go the root of the repository (i.e. cis) and type nosetests cis/test/unit and this will run the full suite of tests.

A comprehensive set of integration tests are also provided. There is a folder full of test data at: /group_workspaces/jasmin/cis/cis_repo_test_files which has been compressed and is available as a tar inside that folder.

To add files to the folder simply copy them in then delete the old tar file and create a new one with:

tar --dereference -zcvf cis_repo_test_files.tar.gz .

Ignore warning about file changing - it is because the tar file is in the directory. Having the tar file in the directory, however, means the archive can be easily unpacked, without creating an intermediate folder. To make the integration tests run this needs to be copied to the local machine and decompressed. Then set the environment variable CIS_DATA_HOME to the location of the data sets, and run nosetests cis/test/integration.

There are also a number of plot tests available under the test/plot_tests directory which can be run using the run_all.sh script. These perform a diff of some standard plots against reference plots, however small changes in the platform libraries and fonts can break these tests so they shouldn’t be relied on.

Dependencies

A graph representing the dependency tree can be found at doc/cis_dependency.dot (use XDot to read it)

_images/dep.png

Creating a Release

To carry out intermediate releases follow this procedure:

  1. Check the version number and status is updated in the CIS source code (cis/__init__.py)
  2. Tag the new version on Github with new version number and release notes.
  3. Create a tarball - use python setup.py egg_info sdist in the cis root dir.
  4. Install this onto the release virtual environment: this is at /group_workspaces/jasmin/cis/cis_dev_venv. So activate the venv, upload the tarball somewhere on the GWS and then do pip install <LOCATION_OF_TARBALL>.
  5. Create an anaconda build on each platform (OS X, Linux and Windows) - see below.
  6. Request Phil Kershaw upload the tarball to PyPi. (Optional)

For a release onto JASMIN, complete the steps above and then ask Alan Iwi to produce an RPM, deploy it on a test VM, confirm functionality then rollout across full JAP and LOTUS nodes.

Anaconda Build

The Anaconda build recipes for CIS and the dependencies which can’t be found either in the core channel, or in SciTools are stored in their own github repository here. To build a new CIS package clone the conda-recipes repository and then run the following command:

$ conda build -c cistools -c scitools cis --numpy 1.8

By default this will run the full unit-test suite before successful completion. You can also optionally run the integration test suite by specifying the CIS_DATA_HOME environment variable.

To upload the package to the cistools channel on Anaconda.org use:

$ binstar upload <package_location> -u cistools

Alternatively, when creating release candidates you may wish to upload the package to the ‘beta’ channel. This gives an opportunity to test the packaging and installation process on a number of machines. To do so, use:

$ binstar upload <package_location> -u cistools --channel beta

To install cis from the beta channel use:

$ conda install -c https://conda.binstar.org/cistools/channel/beta -c cistools -c scitools cis

Documentation

The documentation and API reference are both generated using a mixture of markdown and autogenerated documentation using the Sphinx autodoc package. Build the documentation using:

python setup.py build_sphinx

This will output the documentation in html under the directory doc/_build/html.

Continuous Integration Server

JASMIN provide a Jenkins CI Server on which the CIS unit and integration tests are run whenever origin/master is updated. The integration tests take approximately 7 hours to run whilst the unit tests take about 5s. The Jenkins server is hosted on jasmin-sci1-dev at /var/lib/jenkins and is accessed at http://jasmin-sci1-dev.ceda.ac.uk:8080/

We also have a Travis cloud instance (https://travis-ci.org/cedadev/cis) which in principle allows us to build and test on both Linux and OS X. There are unit test builds currently working but because of a hard time limit on builds (120 minutes) the integration tests don’t currently run.

Copying files to the CI server

The contents of the test folder will not be automatically copied across to the Jenkins directory, so if you add any files to the folder you’ll need to manually copy them to the Jenkins directory or the integration tests will fail. The directory is /var/lib/jenkins/workspace/CIS Integration Tests/cis/test/test_files/. This is not entirely simple because:

  • We don’t have write permissions on the test folder
  • Jenkins doesn’t have read permissions for the CIS group_workspace

In order to copy files across we have done the following:

  1. Copy the files we want to /tmp
  2. Open up the CIS Integration Tests webpage and click ‘Configure’
  3. Scroll down to ‘Build’ where the shell script to be executed is found and insert a line to copy the file to the directory, e.g. cp /tmp/file.nc /var/lib/jenkins/workspace/CIS Integration Tests/cis/test/test_files
  4. Run the CIS Integration Tests
  5. Remove the line from the build script
  6. Remove the files from /tmp

Problems with Jenkins

Sometimes the Jenkins server experiences problems which make it unusable. One particular issue we’ve encountered more than once is that Jenkins occasionally loses all its stylesheets and then becomes impossible to use. Asking CEDA support (or Phil Kershaw) to restart Jenkins should solve this.

CIS as a Python library (API)

Main API

As a command line tool, CIS has not been designed with a python API in mind. There are however some utility functions that may provide a useful start for those who wish to use CIS as a python library. For example, the functions in the base cis module provide a straightforward way to load your data. They can be easily import using, for example: from cis import read_data. One of the advantages of using CIS as a Python library is that you are able to perform multiple operations in one go, that is without writing to disk in between. In certain cases this may provide a significant speed-up.

Note

This section of the documentation expects a greater level of Python experience than the other sections. There are many helpful Python guides and tutorials available around the web if you wish to learn more.

The read_data() function is a simple way to read a single gridded or ungridded data object (e.g. a NetCDF variable) from one or more files. CIS will determine the best way to interpret the datafile by comparing the file signature with the built-in data reading plugins and any user defined plugins. Specifying a particular product allows the user to override this automatic detection.

cis.read_data(filenames, variable, product=None)

Read a specific variable from a list of files Files can be either gridded or ungridded but not a mix of both. First tries to read data as gridded, if that fails, tries as ungridded.

Parameters:
  • filenames (string or list) – The filenames of the files to read. This can be either a single filename as a string, a comma separated list, or a list of string filenames. Filenames can include directories which will be expanded to include all files in that directory, or wildcards such as * or ?.
  • variable (str) – The variable to read from the files
  • product (str) – The name of the data reading plugin to use to read the data (e.g. Cloud_CCI).
Returns:

The specified data as either a GriddedData or UngriddedData object.

The read_data_list() function is very similar to read_data() except that it allows the user to specify more than one variable name. This function returns a list of data objects, either all of which will be gridded, or all ungridded, but not a mix. For ungridded data lists it is assumed that all objects share the same coordinates.

cis.read_data_list(filenames, variables, product=None, aliases=None)

Read multiple data objects from a list of files. Files can be either gridded or ungridded but not a mix of both.

Parameters:
  • filenames (string or list) – The filenames of the files to read. This can be either a single filename as a string, a comma separated list, or a list of string filenames. Filenames can include directories which will be expanded to include all files in that directory, or wildcards such as * or ?.
  • variables (string or list) – One or more variables to read from the files
  • product (str) – The name of the data reading plugin to use to read the data (e.g. Cloud_CCI).
  • aliases (string or list) – List of aliases to put on each variable’s data object as an alternative means of identifying them.
Returns:

A list of the data read out (either a GriddedDataList or UngriddedDataList depending on the type of data contained in the files)

Data Objects

Each of the above methods return either GriddedData or UngriddedData objects. These objects are the main data handling objects used within CIS, and the methods on each of these types are documented in the data modules section. These classes do however share a common interface, defined by the CommonData class, which is detailed below. For technical reasons some methods which are common to both GriddedData and UngriddedData are not defined in the CommonData interface. The most useful of these methods are probably summary() and save_data().

class cis.data_io.common_data.CommonData

Interface of common methods implemented for gridded and ungridded data.

alias

Return an alias for the variable name. This is an alternative name by which this data object may be identified if, for example, the actual variable name is not valid for some use (such as performing a python evaluation).

Returns:The alias
Return type:str
get_all_points()

Returns a list-like object allowing access to all points as HyperPoints. The object should allow iteration over points and access to individual points.

Returns:list-like object of data points
get_coordinates_points()

Returns a list-like object allowing access to the coordinates of all points as HyperPoints. The object should allow iteration over points and access to individual points.

Returns:list-like object of data points
get_non_masked_points()

Returns a list-like object allowing access to all points as HyperPoints. The object should allow iteration over non-masked points and access to individual points.

Returns:list-like object of data points
history

Return the associated history of the object

Returns:The history
Return type:str
is_gridded()

Returns value indicating whether the data/coordinates are gridded.

var_name

Return the variable name associated with this data object

Returns:The ariable name

Unsupported API

Warning

While the above interfaces are designed as a ‘public’ API and unlikely to change over CIS versions, those documented below are not yet standardised and may change or be removed even between minor version revisions. It is expected however that these particular classes will be developed and stabilised over time to form part of the ‘public’ API.

Collocation

The main collocation class can be imported using from cis.collocation import Collocate, it’s methods are outlined below:

class cis.collocation.Collocate(sample_points, missing_data_for_missing_sample=False, collocator_factory=<cis.collocation.col.CollocatorFactory object>)

Perform a general collocation

__init__(sample_points, missing_data_for_missing_sample=False, collocator_factory=<cis.collocation.col.CollocatorFactory object>)

Constructor

Parameters:
  • sample_points (CommonData) – Sample points to collocate on to
  • output_filename – Filename to output to
  • missing_data_for_missing_sample – Write missing values out when sample data is missing
  • collocator_factory (CollocatorFactory) – An optional configuration object
__weakref__

list of weak references to the object (if defined)

collocate(data, col_name=None, col_params=None, kern=None, kern_params=None)

Perform the collocation.

Parameters:
  • data (CommonData) – Data to collocate
  • col_name (str) – Name of the collocator
  • col_params (dict) – Parameters dictionary for the collocation and constraint
  • kern (str) – The kernel to use
  • kern_params (dict) – The kernel parameters to use
Return CommonData:
 

The collocated data

Raises CoordinateNotFoundError:
 

If the collocator was unable to compare the sample and data points

Aggregation

The main collocation class can be imported using from cis.aggregation import Aggregate, it’s methods are outlined below. Note that currently this object saves the output directly to file, but it is expected that in the future it will return the result for the user to output as needed.

class cis.aggregation.Aggregate(grid, output_file, data_reader=<cis.data_io.data_reader.DataReader object>, data_writer=<cis.data_io.data_writer.DataWriter object>)
__init__(grid, output_file, data_reader=<cis.data_io.data_reader.DataReader object>, data_writer=<cis.data_io.data_writer.DataWriter object>)

Constructor

Parameters:
  • grid (dict) – A dictionary of dimension_name:AggregationGrid key value pairs.
  • output_file – The filename to output the result to
  • data_reader – Optional DataReader configuration object
  • data_writer – Optional DataWriter configuration object
__weakref__

list of weak references to the object (if defined)

aggregate(variables, filenames, product=None, kernel=None)

Aggregate the given variables based on the initialised grid

Parameters:
  • variables (string or list) – One or more variables to read from the files
  • filenames (string or list) – One or more filenames of the files to read
  • product (str) – Name of data product to use (optional)
  • kernel (str) – Name of kernel to use (the default is ‘moments’)

Subsetting

The main collocation class can be imported using from cis.subsetting import Subset, it’s methods are outlined below: Note that currently this object saves the output directly to file, but it is expected that in the future it will return the result for the user to output as needed.

class cis.subsetting.Subset(limits, output_file, subsetter=<cis.subsetting.subsetter.Subsetter object>, data_reader=<cis.data_io.data_reader.DataReader object>, data_writer=<cis.data_io.data_writer.DataWriter object>)

Class for subsetting Ungridded or Gridded data either temporally, or spatially or both.

__init__(limits, output_file, subsetter=<cis.subsetting.subsetter.Subsetter object>, data_reader=<cis.data_io.data_reader.DataReader object>, data_writer=<cis.data_io.data_writer.DataWriter object>)

Constructor

Parameters:
  • limits (dict) – A dictionary of dimension_name:SubsetLimits key value pairs.
  • output_file – The filename to output the result to
  • subsetter – Optional Subsetter configuration object
  • data_reader – Optional DataReader configuration object
  • data_writer – Optional DataWriter configuration object
__weakref__

list of weak references to the object (if defined)

subset(variables, filenames, product=None)

Subset the given variables based on the initialised limits

Parameters:
  • variables (string or list) – One or more variables to read from the files
  • filenames (string or list) – One or more filenames of the files to read
  • product (str) – Name of data product to use (optional)

Stats

The main collocation class can be imported using from cis.stats import StatsAnalyzer, it’s methods are outlined below:

class cis.stats.StatsAnalyzer(data1, data2)

Analyse datasets to produce statistics.

__init__(data1, data2)

Create a statistics analyser for two data sets

Parameters:
analyze()

Perform a statistical analysis on two data sets.

Returns:List of StatisticsResult instances.
points_count()

Count all points which will be used for statistical comparison operations (i.e. are non-missing in both datasets).

Returns:List of StatisticsResults
means()

Means of two datasets

Returns:List of StatisticsResults
stddevs()

Corrected sample standard deviation of datasets

Returns:List of StatisticsResults
abs_mean()

Mean of absolute difference d2-d1

Returns:List of StatisticsResults
abs_stddev()

Standard deviation of absolute difference d2-d1

Returns:List of StatisticsResults
rel_mean()

Mean of relative difference (d2-d1)/d1

Returns:List of StatisticsResults
rel_stddev()

Mean of relative difference (d2-d1)/d1

Returns:List of StatisticsResults
spearmans_rank()

Perform a spearman’s rank on the data

Returns:List of StatisticsResults
linear_regression()

Perform a linear regression on the data

Returns:List of StatisticsResults
__weakref__

list of weak references to the object (if defined)

Full Python reference documentation

The rest of the documentation below documents internal CIS functions and modules which are not intended to be used as an API at all. They are documented here as a reference for developers and other interested parties.

cis.data_io package
cis.data_io.products package
The abstract AProduct module
class cis.data_io.products.AProduct.AProduct

Bases: object

Abstract class for the various possible data products. This just defines the interface which the subclasses must implement.

create_coords(filenames)

Reads the coordinates from one or more files. Note that this method may have to make certain assumptions about the file in order to return a single coordinate set. The user should be warned through the logger if this is the case.

Parameters:filenames (list) – List of filenames to read coordinates from
Returns:CommonData object
create_data_object(filenames, variable)

Create and return an CommonData object for a given variable from one or more files.

Parameters:
  • filenames (list) – List of filenames of files to read
  • variable (str) – Variable to read from the files
Returns:

An CommonData object representing the specified variable

Raises:
  • FileIOError – Unable to read a file
  • InvalidVariableError – Variable not present in file
get_file_format(filename)

Returns a file format hierarchy separated by slashes, of the form TopLevelFormat/SubFormat/SubFormat/Version. E.g. NetCDF/GASSP/1.0, ASCII/ASCIIHyperpoint or HDF4/CloudSat. This is mainly used within the ceda_di indexing tool. If not set it will default to the products name.

A filename of an example file can be provided to enable the determination of, for example, a dataset version number.

Parameters:filename (str) – Filename of file to be inspected
Returns:File format, of the form [parent/]format/specific instance/version, or the class name
Return type:str
Raises:FileFormatError if there is an error
get_file_signature()

This method should return a list of regular expressions, which CIS uses to decide which data product to use for a given file. If more than one regular expression is provided in the list then the file can match any of the expressions. The first product with a signature that matches the filename will be used. The order in which the products are searched is determined by the priority property, highest value first; internal products generally have a priority of 10.

For example, this would match all files with a name containing the string ‘CODE’ and with the ‘nc’ extension.:

return [r'.*CODE*.nc']

Note

If the signature has matched the framework will call AProduct.get_file_type_error(), this gives the product a chance to open the file and check the contents.

Returns:A list of regex to match the product’s file naming convention.
Return type:list
get_file_type_error(filename)

Check a single file to see if it is of the correct type, and if not return a list of errors. If the return is None then there are no errors and this is the correct data product to use for this file.

This method gives a mechanism for a data product to identify itself as the correct product when a specific enough file signature cannot be provided. For example GASSP is a type of NetCDF file and so filenames end with .nc but so do other NetCDF files, so the data product opens the file and looks for the GASSP version attribute, and if it doesn’t find it returns an error.

Parameters:filename (str) – The filename for the file
Returns:List of errors, or None
Return type:list or None
get_variable_names(filenames, data_type=None)

Get a list of available variable names from the filenames list passed in. This general implementation can be overridden in specific products to include/exclude variables which may or may not be relevant. The data_type parameter can be used to specify extra information.

Parameters:
  • filenames (list) – List of string filenames of files to be read from
  • data_type (str) – ‘SD’ or ‘VD’ to specify only return SD or VD variables from HDF files. This may take on other values in specific product implementations.
Returns:

A set of variable names as strings

Return type:

str

priority = 10
valid_dimensions = None
exception cis.data_io.products.AProduct.ProductPluginException(message, original_exception)

Bases: exceptions.Exception

Represents an error which has occurred inside of a Product plugin

original_exception = None
cis.data_io.products.AProduct.get_coordinates(filenames, product=None)

Top level routine for calling the correct product’s create_coords() routine.

Parameters:
  • filenames (list) – A list of filenames to read data from
  • product (str) – The product to read data with - this should be a string which matches the name of one of the subclasses of AProduct
Returns:

A CoordList object

cis.data_io.products.AProduct.get_data(filenames, variable, product=None)

Top level routine for calling the correct product’s create_data_object() routine.

Parameters:
  • filenames (list) – A list of filenames to read data from
  • variable (str) – The variable to create the CommonData object from
  • product (str) – The product to read data with - this should be a string which matches the name of one of the subclasses of AProduct. If none is supplied it is guessed from the filename signature.
Returns:

A CommonData variable

cis.data_io.products.AProduct.get_file_format(filenames, product=None)

Returns the files format of throws FileFormatError if there is an error in the format

Parameters:
  • filenames (list) – the filenames to read
  • product (str) – the product to use, if not specified search
Returns:

File format

Raises ClassNotFoundError:
 

if there is no reader for this class

cis.data_io.products.AProduct.get_product_full_name(filenames, product=None)

Get the full name of the product which would read this file

Parameters:
  • filenames (list) – list of filenames to read
  • product (str) – specified product to use
cis.data_io.products.AProduct.get_variables(filenames, product=None, data_type=None)

Top level routine for calling the correct product’s get_variable_names() routine.

Parameters:
  • filenames (list) – A list of filenames to read the variables from
  • product (str) – The product to read data with - this should be a string which matches the name of one of the subclasses of AProduct
Returns:

A set of variable names as strings

Data modules

Module for the UngriddedData class

class cis.data_io.ungridded_data.LazyData(data, metadata, data_retrieval_callback=None)

Bases: object

Wrapper (adaptor) class for the different types of possible ungridded data.

add_attributes(attributes)

Add a variable attribute to this data

Parameters:attributes – Dictionary of attribute names (keys) and values.
Returns:
add_history(new_history)

Appends to, or creates, the metadata history attribute using the supplied history string. The new entry is prefixed with a timestamp.

Parameters:new_history – history string
copy_metadata_from(other_data)

Method to copy the metadata from one UngriddedData/Cube object to another

data

This is a getter for the data property. It caches the raw data if it has not already been read. Throws a MemoryError when reading for the first time if the data is too large.

data_flattened

Returns a 1D flattened view (or copy, if necessary) of the data.

long_name
name()

This routine returns the first name property which is not empty out of: _name, standard_name and long_name. If they are all empty it returns an empty string :return: The name of the data object as a string

remove_attribute(key)

Remove a variable attribute from this data

Parameters:key – Attribute key to remove
Returns:
save_data(output_file)
shape
standard_name
units
update_range(range=None)
update_shape(shape=None)
var_name
class cis.data_io.ungridded_data.Metadata(name='', standard_name='', long_name='', shape='', units='', range='', factor='', offset='', missing_value='', calendar='', history='', misc=None)

Bases: object

alter_standard_name(new_standard_name)

Alter the standard name and log an info line to say this is happening if the standard name is not empty. Also changes internal name for metadata or the same.

Parameters:new_standard_name
classmethod from_CubeMetadata(cube_meta)
static guess_standard_name(name)
summary(offset=5)

Creates a unicode summary of the metadata object

Parameters:offset – The left hand padding to apply to the text
Returns:The summary
class cis.data_io.ungridded_data.UngriddedCoordinates(coords)

Bases: cis.data_io.common_data.CommonData

Wrapper (adaptor) class for the different types of possible ungridded data.

alias

Return an alias for the variable name. This is an alternative name by which this data object may be identified if, for example, the actual variable name is not valid for some use (such as performing a python evaluation).

Returns:The alias
Return type:str
coord(name=None, standard_name=None, long_name=None, attributes=None, axis=None)
Raise:CoordinateNotFoundError
Returns:A single coord given the same arguments as L(coords).
coords(name=None, standard_name=None, long_name=None, attributes=None, axis=None, dim_coords=True)
Returns:A list of coordinates in this UngriddedData object fitting the given criteria
filenames = []
get_all_points()

Returns a HyperPointView of the points.

Returns:HyperPointView of all the data points
get_coordinates_points()
get_non_masked_points()

Returns a HyperPointView for which the default iterator omits masked points.

Returns:HyperPointView of the data points
history
hyper_point(index)
Parameters:index – The index in the array to find the point for
Returns:A hyperpoint representing the data at that point
is_gridded

Returns value indicating whether the data/coordinates are gridded.

lat
lon
var_name

Return the variable name associated with this data object

Returns:The ariable name
x
y
class cis.data_io.ungridded_data.UngriddedData(data, metadata, coords, data_retrieval_callback=None)

Bases: cis.data_io.ungridded_data.LazyData, cis.data_io.common_data.CommonData

Wrapper (adaptor) class for the different types of possible ungridded data.

add_attributes(attributes)

Add a variable attribute to this data

Parameters:attributes – Dictionary of attribute names (keys) and values.
Returns:
add_history(new_history)

Appends to, or creates, the metadata history attribute using the supplied history string. The new entry is prefixed with a timestamp.

Parameters:new_history – history string
alias

Return an alias for the variable name. This is an alternative name by which this data object may be identified if, for example, the actual variable name is not valid for some use (such as performing a python evaluation).

Returns:The alias
Return type:str
coord(name=None, standard_name=None, long_name=None, attributes=None, axis=None)
Raise:CoordinateNotFoundError
Returns:A single coord given the same arguments as L(coords).
coords(name=None, standard_name=None, long_name=None, attributes=None, axis=None, dim_coords=True)
Returns:A list of coordinates in this UngriddedData object fitting the given criteria
coords_flattened
copy()

Create a copy of this UngriddedData object with new data and coordinates so that that they can be modified without held references being affected. Will call any lazy loading methods in the data and coordinates

Returns:Copied UngriddedData object
copy_metadata_from(other_data)

Method to copy the metadata from one UngriddedData/Cube object to another

data

This is a getter for the data property. It caches the raw data if it has not already been read. Throws a MemoryError when reading for the first time if the data is too large.

data_flattened

Returns a 1D flattened view (or copy, if necessary) of the data.

filenames = []
find_standard_coords()

Constructs a list of the standard coordinates. The standard coordinates are latitude, longitude, altitude, air_pressure and time; they occur in the return list in this order.

Returns:list of coordinates or None if coordinate not present
classmethod from_points_array(hyperpoints)

Constuctor for building an UngriddedData object from a list of hyper points

Parameters:hyperpoints – list of HyperPoints
get_all_points()

Returns a HyperPointView of the points.

Returns:HyperPointView of all the data points
get_coordinates_points()

Returns a HyperPointView of the coordinates of points.

Returns:HyperPointView of the coordinates of points
get_non_masked_points()

Returns a HyperPointView for which the default iterator omits masked points.

Returns:HyperPointView of the data points
history
hyper_point(index)
Parameters:index – The index in the array to find the point for
Returns:A hyperpoint representing the data at that point
is_gridded

Returns value indicating whether the data/coordinates are gridded.

lat
lon
long_name
make_new_with_same_coordinates(data=None, var_name=None, standard_name=None, long_name=None, history=None, units=None, flatten=False)

Create a new, empty UngriddedData object with the same coordinates as this one.

Parameters:
  • data – Data to use (if None then defaults to all zeros)
  • var_name – Variable name
  • standard_name – Variable CF standard name
  • long_name – Variable long name
  • history – Data history string
  • units – Variable units
  • flatten – Whether to flatten the data and coordinates (for ungridded data only)
Returns:

UngriddedData instance

name()

This routine returns the first name property which is not empty out of: _name, standard_name and long_name. If they are all empty it returns an empty string :return: The name of the data object as a string

remove_attribute(key)

Remove a variable attribute from this data

Parameters:key – Attribute key to remove
Returns:
save_data(output_file)
shape
standard_name
summary()

Unicode summary of the UngriddedData with metadata of itself and its coordinates

units
update_range(range=None)
update_shape(shape=None)
var_name
x
y
class cis.data_io.ungridded_data.UngriddedDataList(iterable=())

Bases: cis.data_io.common_data.CommonDataList

Class which represents multiple UngriddedData objects (e.g. from reading multiple variables)

add_history(new_history)

Appends to, or creates, the metadata history attribute using the supplied history string. The new entry is prefixed with a timestamp. :param new_history: history string

append(p_object)
append_or_extend(item_to_add)

Append or extend an item to an existing list, depending on whether the item to add is itself a list or not. :param item_to_add: Item to add (may be list or not).

coord(*args, **kwargs)

Call UnGriddedData.coord(*args, **kwargs)() for the first item of data (assumes all data in list has same coordinates)

Parameters:
  • args
  • kwargs
Returns:

coords(*args, **kwargs)

Returns all coordinates used in all the data object :return: A list of coordinates in this data list object fitting the given criteria

copy()

Create a copy of this UngriddedDataList with new data and coordinates so that that they can be modified without held references being affected. Will call any lazy loading methods in the data and coordinates

Returns:Copied UngriddedData object
count(value) → integer -- return number of occurrences of value
extend(iterable)
filenames

Get the filenames in this data list

get_non_masked_points()

Returns a list containing a HyperPointViews for which the default iterator omits masked points, for each item in this UngriddedDataList.

Returns:List of HyperPointViews of the data points
index(value[, start[, stop]]) → integer -- return first index of value.

Raises ValueError if the value is not present.

insert()

L.insert(index, object) – insert object before index

is_gridded

Returns value indicating whether the data/coordinates are gridded.

pop([index]) → item -- remove and return item at index (default last).

Raises IndexError if list is empty or index is out of range.

remove()

L.remove(value) – remove first occurrence of value. Raises ValueError if the value is not present.

reverse()

L.reverse() – reverse IN PLACE

save_data(output_file)

Save the UngriddedDataList to a file

Parameters:output_file – output filename
Returns:
set_longitude_range(range_start)

Rotates the longitude coordinate array and changes its values by 360 as necessary to force the values to be within a 360 range starting at the specified value. :param range_start: starting value of required longitude range

sort()

L.sort(cmp=None, key=None, reverse=False) – stable sort IN PLACE; cmp(x, y) -> -1, 0, 1

var_name

Get the variable names in this list

class cis.data_io.Coord.Coord(data, metadata, axis='')

Bases: cis.data_io.ungridded_data.LazyData

add_attributes(attributes)

Add a variable attribute to this data

Parameters:attributes – Dictionary of attribute names (keys) and values.
Returns:
add_history(new_history)

Appends to, or creates, the metadata history attribute using the supplied history string. The new entry is prefixed with a timestamp.

Parameters:new_history – history string
convert_TAI_time_to_std_time(ref)
convert_datetime_to_standard_time()
convert_julian_to_std_time(calender='standard')
convert_to_std_time(time_stamp_info=None)

Convert this coordinate to standard time. It will use either: the units of the coordinate if it is in the standard ‘x since y’ format; or the first word of the units, combined with the time stamp (if the timestamp is not given an error is thrown).

Parameters:time_stamp_info – the time stamp info from the file, None if it does not exist
copy()

Create a copy of this Coord object with new data so that that they can be modified without held references being affected. This will call any lazy loading methods in the coordinate data

Returns:Copied Coord
copy_metadata_from(other_data)

Method to copy the metadata from one UngriddedData/Cube object to another

data

This is a getter for the data property. It caches the raw data if it has not already been read. Throws a MemoryError when reading for the first time if the data is too large.

data_flattened

Returns a 1D flattened view (or copy, if necessary) of the data.

classmethod from_many_coordinates(coords)

Create a single coordinate object from the concatenation of all of the coordinate objects in the input list, updating the shape as appropriate

Parameters:coords – A list of coordinate objects to be combined
Returns:A single Coord object
long_name
name()

This routine returns the first name property which is not empty out of: _name, standard_name and long_name. If they are all empty it returns an empty string :return: The name of the data object as a string

points

Alias for self.data(), to match iris.coords.Coord.points() interface

Returns:Coordinate data values
remove_attribute(key)

Remove a variable attribute from this data

Parameters:key – Attribute key to remove
Returns:
save_data(output_file)
set_longitude_range(range_start)

Confine the coordinate longitude range to 360 degrees from the range_start value.

Parameters:range_start (float) – Start of the longitude range
shape
standard_name
units
update_range(range=None)
update_shape(shape=None)
var_name
class cis.data_io.Coord.CoordList(*args)

Bases: list

All the functionality of a standard list with added Coord context.

append(other)

Safely add a new coordinate object to the list, this checks for a unique axis and standard_name.

:param Coord other: Other coord to add :raises DuplicateCoordinateError: If the coordinate is not unique in the list

copy()

Create a copy of this CoordList object with new data so that that they can be modified without held references being affected. This will call any lazy loading methods in the coordinate data

Returns:Copied CoordList
count(value) → integer -- return number of occurrences of value
extend()

L.extend(iterable) – extend list by appending elements from the iterable

find_standard_coords()

Constructs a list of the standard coordinates. The standard coordinates are latitude, longitude, altitude, air_pressure and time; they occur in the return list in this order.

Returns:list of coordinates or None if coordinate not present
get_coord(name=None, standard_name=None, long_name=None, attributes=None, axis=None)

Return a single coord fitting the given criteria. This is deliberately very similar to Cube.coord() method to maintain a similar interface and because the functionality is similar. There is no distinction between dimension coordinates and auxilliary coordinates here though.

Parameters:
  • name (string or None) – The standard name or long name or default name of the desired coordinate. If None, does not check for name. Also see, Cube.name.
  • standard_name (string or None) – The CF standard name of the desired coordinate. If None, does not check for standard name.
  • long_name (string or None) – An unconstrained description of the coordinate. If None, does not check for long_name.
  • attributes (dict or None) – A dictionary of attributes desired on the coordinates. If None, does not check for attributes
  • axis (string or None) – The desired coordinate axis, see iris.util.guess_coord_axis(). If None, does not check for axis. Accepts the values ‘X’, ‘Y’, ‘Z’ and ‘T’ (case-insensitive).
Raises CoordinateNotFoundError:
 

If the arguments given do not result in precisely 1 coordinate being matched.

Returns:

A single Coord.

get_coordinates_points()
get_coords(name=None, standard_name=None, long_name=None, attributes=None, axis=None)

Return a list of coordinates in this CoordList fitting the given criteria. This is deliberately very similar to Cube.coords() to maintain a similar interface and because the functionality is similar. There is no distinction between dimension coordinates and auxiliary coordinates here though.

Parameters:
  • name (string or None) – The standard name or long name or default name of the desired coordinate. If None, does not check for name. Also see, Cube.name.
  • standard_name (string or None) – The CF standard name of the desired coordinate. If None, does not check for standard name.
  • long_name (string or None) – An unconstrained description of the coordinate. If None, does not check for long_name.
  • attributes (dict or None) – A dictionary of attributes desired on the coordinates. If None, does not check for attributes
  • axis (string or None) – The desired coordinate axis, see iris.util.guess_coord_axis(). If None, does not check for axis. Accepts the values ‘X’, ‘Y’, ‘Z’ and ‘T’ (case-insensitive).
Returns:

A CoordList of coordinates fitting the given criteria

get_standard_coords(data_len)

Constructs a list of the standard coordinate values. The standard coordinates are latitude, longitude, altitude, time and air_pressure; they occur in the return list in this order. If a standard coordinate has not been found it’s values are returned as a list of length data_len.

Parameters:data_len (int) – Expected length of coordinate data
Returns:list of indexed sequences of coordinate values
index(value[, start[, stop]]) → integer -- return first index of value.

Raises ValueError if the value is not present.

insert()

L.insert(index, object) – insert object before index

pop([index]) → item -- remove and return item at index (default last).

Raises IndexError if list is empty or index is out of range.

remove()

L.remove(value) – remove first occurrence of value. Raises ValueError if the value is not present.

reverse()

L.reverse() – reverse IN PLACE

sort()

L.sort(cmp=None, key=None, reverse=False) – stable sort IN PLACE; cmp(x, y) -> -1, 0, 1

cis.data_io.gridded_data.load_cube(*args, **kwargs)
cis.data_io.gridded_data.make_from_cube(cube)
class cis.data_io.common_data.CommonData

Bases: object

Interface of common methods implemented for gridded and ungridded data.

alias

Return an alias for the variable name. This is an alternative name by which this data object may be identified if, for example, the actual variable name is not valid for some use (such as performing a python evaluation).

Returns:The alias
Return type:str
filenames = []
get_all_points()

Returns a list-like object allowing access to all points as HyperPoints. The object should allow iteration over points and access to individual points.

Returns:list-like object of data points
get_coordinates_points()

Returns a list-like object allowing access to the coordinates of all points as HyperPoints. The object should allow iteration over points and access to individual points.

Returns:list-like object of data points
get_non_masked_points()

Returns a list-like object allowing access to all points as HyperPoints. The object should allow iteration over non-masked points and access to individual points.

Returns:list-like object of data points
history

Return the associated history of the object

Returns:The history
Return type:str
is_gridded()

Returns value indicating whether the data/coordinates are gridded.

var_name

Return the variable name associated with this data object

Returns:The ariable name
class cis.data_io.common_data.CommonDataList(iterable=())

Bases: list

Interface for common list methods implemented for both gridded and ungridded data

add_history(new_history)

Appends to, or creates, the metadata history attribute using the supplied history string. The new entry is prefixed with a timestamp. :param new_history: history string

append(p_object)
append_or_extend(item_to_add)

Append or extend an item to an existing list, depending on whether the item to add is itself a list or not. :param item_to_add: Item to add (may be list or not).

coords(*args, **kwargs)

Returns all coordinates used in all the data object :return: A list of coordinates in this data list object fitting the given criteria

extend(iterable)
filenames

Get the filenames in this data list

is_gridded

Returns value indicating whether the data/coordinates are gridded.

set_longitude_range(range_start)

Rotates the longitude coordinate array and changes its values by 360 as necessary to force the values to be within a 360 range starting at the specified value. :param range_start: starting value of required longitude range

var_name

Get the variable names in this list

Low-level IO modules

Module containing NetCDF file reading functions

cis.data_io.netcdf.find_missing_value(var)

Get the missing / fill value of the variable

Parameters:var – NetCDF Variable instance
Returns:missing / fill value
cis.data_io.netcdf.get_data(var)

Reads raw data from a NetCDF.Variable instance.

Parameters:var – The specific Variable instance to read
Returns:A numpy maskedarray. Missing values are False in the mask.
cis.data_io.netcdf.get_metadata(var)

Retrieves all metadata

Parameters:var – the Variable to read metadata from
Returns:A metadata object
cis.data_io.netcdf.get_netcdf_file_attributes(filename)

Get all the global attributes from a NetCDF file

Parameters:filename – The filename of the file to get the variables from
Returns:a dictionary of attributes and their values
cis.data_io.netcdf.get_netcdf_file_variables(filename, exclude_coords=False)

Get all the variables contained in a NetCDF file. Variables in NetCDF4 Hierarchical groups are returned with their fully qualified variable name in the form <group1>.<group2....>.<variable_name>, e.g.``AVHRR.Ch4CentralWavenumber``.

Parameters:
  • filename – The filename of the file to get the variables from
  • exclude_coords – Exclude coordinate variables if True
Returns:

An OrderedDict containing {variable_name: NetCDF Variable instance}

cis.data_io.netcdf.read(filename, usr_variables)

Reads a Variable from a NetCDF file

Parameters:
  • filename – The name (with path) of the NetCDF file to read.
  • usr_variables – A variable (dataset) name to read from the files. The name must appear exactly as in in the NetCDF file. Variable names may be fully qualified NetCDF4 Hierarchical group variables in the form <group1>.<group2....>.<variable_name>, e.g. AVHRR.Ch4CentralWavenumber.
Returns:

A Variable instance constructed from the input file

cis.data_io.netcdf.read_many_files(filenames, usr_variables, dim=None)

Reads a single Variable from many NetCDF files. This method uses the netCDF4 MFDataset class and so is NOT suitable for NetCDF4 datasets (only ‘CLASSIC’ netcdf).

Parameters:
  • filenames – A list of NetCDF filenames to read, or a string with wildcards.
  • usr_variables – A list of variable (dataset) names to read from the files. The names must appear exactly as in in the NetCDF file.
  • dim – The name of the dimension on which to aggregate the data. None is the default which tries to aggregate over the unlimited dimension
Returns:

A list of variable instances constructed from all of the input files

cis.data_io.netcdf.read_many_files_individually(filenames, usr_variables)

Read multiple Variables from many NetCDF files manually - i.e. not with MFDataset as this doesn’t alays work, in particular for NetCDF4 files.

Parameters:
  • filenames – A list of NetCDF filenames to read, or a string with wildcards.
  • usr_variables – A list of variable (dataset) names to read from the files. The names must appear exactly as in in the NetCDF file. Variable names may be fully qualified NetCDF4 Hierarchical group variables in the form <group1>.<group2....>.<variable_name>, e.g. AVHRR.Ch4CentralWavenumber.
Returns:

A dictionary of lists of variable instances constructed from all of the input files with the fully qualified variable name as the key

cis.data_io.netcdf.remove_variables_with_non_spatiotemporal_dimensions(variables, spatiotemporal_var_names)

Remove from a list of netCDF variables any which have dimensionality which is not in an approved list of valid spatial or temporal dimensions (e.g. sensor number, pseudo dimensions). CIS currently does not support variables with this dimensionality and will fail if they are used.

Parameters:
  • variables – Dictionary of netCDF variable names : Variable objects. Variable names may be fully qualified NetCDF4 Hierarchical group variables in the form <group1>.<group2....>.<variable_name>, e.g. AVHRR.Ch4CentralWavenumber.
  • spatiotemporal_var_names – List of valid spatiotemporal dimensions.
Returns:

None

Module for writing data to NetCDF files

cis.data_io.write_netcdf.add_data_to_file(data_object, filename)
Parameters:
  • data_object
  • filename
Returns:

cis.data_io.write_netcdf.write(data_object, filename)
Parameters:
  • data_object
  • filename
Returns:

cis.data_io.write_netcdf.write_coordinate_list(coord_list, filename)

Writes coordinates to a netCDF file.

Parameters:
  • coord_list – list of Coord objects
  • filename – file to which to write
cis.data_io.write_netcdf.write_coordinates(coords, filename)

Writes coordinates to a netCDF file.

Parameters:
  • coords – UngriddedData or UngriddedCoordinates object for which the coordinates are to be written
  • filename – file to which to write
cis.data_io.hdf.get_hdf4_file_metadata(filename)

This returns a dictionary of file attributes, which often contains metadata information about the whole file. The value of each attribute can simply be a big string which will often need to be parsed manually thereafter. :param filename :return: dictionary of string attributes

cis.data_io.hdf.get_hdf4_file_variables(filename, data_type=None)

Get all variables from a file containing ungridded data. Concatenate variable from both VD and SD data

Parameters:
  • filename – The filename of the file to get the variables from
  • data_type – String representing the HDF data type, i.e. ‘VD’ or ‘SD’. if None, both are computed.
cis.data_io.hdf.read(filenames, variables)
cis.data_io.hdf.read_data(data_dict, data_type, missing_values=None)
cis.data_io.hdf.read_metadata(data_dict, data_type)

Module containing hdf file utility functions for the SD object

class cis.data_io.hdf_sd.HDF_SDS(filename, variable)

Bases: object

This class is used in place of the pyhdf.SD.SDS class to allow the file contents to be loaded at a later time rather than in this module read method (so that we can close the SD instances and free up file handles)

attributes()

Call pyhdf.SD.SDS.attributes(), opening and closing the file

get()

Call pyhdf.SD.SDS.get(), opening and closing the file

info()

Call pyhdf.SD.SDS.info(), opening and closing the file

cis.data_io.hdf_sd.get_calipso_data(sds)

Reads raw data from an SD instance. Automatically applies the scaling factors and offsets to the data arrays found in Calipso data.

Parameters:sds – The specific sds instance to read
Returns:A numpy array containing the raw data with missing data is replaced by NaN.
cis.data_io.hdf_sd.get_data(sds, missing_values=None)

Reads raw data from an SD instance. Automatically applies the scaling factors and offsets to the data arrays often found in NASA HDF-EOS data (e.g. MODIS)

Parameters:sds – The specific sds instance to read
Returns:A numpy array containing the raw data with missing data is replaced by NaN.
cis.data_io.hdf_sd.get_hdf_SD_file_variables(filename)

Get all the variables from an HDF SD file

Parameters:filename (str) – The filename of the file to get the variables from
Returns:An OrderedDict containing the variables from the file
cis.data_io.hdf_sd.get_metadata(sds)
cis.data_io.hdf_sd.read(filename, variables=None, datadict=None)

Reads SD from a HDF4 file into a dictionary.

Parameters:
  • filename (str) – The name (with path) of the HDF file to read.
  • names (iterable) – A sequence of variable (dataset) names to read from the file (default None, causing all variables to be read). The names must appear exactly as in in the HDF file.
  • datadict (dict) – Optional dictionary to add data to, otherwise a new, empty dictionary is created
Returns:

A dictionary containing data for requested variables. Missing data is replaced by NaN.

Module containing hdf file utility functions for the VD object

class cis.data_io.hdf_vd.VDS

Bases: cis.data_io.hdf_vd.VDS

cis.data_io.hdf_vd.get_data(vds, first_record=False, missing_values=None)

Actually read the data from the VDS handle. We shouldn’t need to check for HDF being installed here because the VDS object which is being passed to us can only have come from pyhdf.

Parameters:
  • vds
  • first_record
  • missing_values
Returns:

cis.data_io.hdf_vd.get_hdf_VD_file_variables(filename)

Get all the variables from an HDF VD file

Parameters:filename – The filename of the file to get the variables from
Returns:An OrderedDict containing the variables from the file
cis.data_io.hdf_vd.get_metadata(vds)
cis.data_io.hdf_vd.read(filename, variables=None, datadict=None)

Given a filename and a list of file names return a dictionary of VD data handles

Parameters:
  • filename – full path to a single HDF4 file
  • variables – A list of variables to read, if no variables are given, no variables are read
  • datadict – A dictionary of variable name, data handle pairs to be appended to
Returns:

An updated datadict with any new variables appended.

cis.data_io.aeronet.get_aeronet_file_variables(filename)
cis.data_io.aeronet.get_file_metadata(filename, variable='', shape=None)
cis.data_io.aeronet.load_aeronet(fname, variables=None)

loads aeronet lev 2.0 csv file.

Originally from http://code.google.com/p/metamet/ License: GNU GPL v3
Parameters:
  • fname – data file name
  • keep_fields – A list of variables to return
Returns:

A

cis.data_io.aeronet.load_multiple_aeronet(fnames, variables=None)
Data reading and writing modules
class cis.data_io.data_reader.DataReader(get_data_func=<function get_data>, get_coords_func=<function get_coordinates>, get_variables_func=<function get_variables>)

Bases: object

High level class to manage reading data from a file. Principally, manages operations between one or multiple variables, and gridded or un-gridded data.

read_coordinates(filenames, product=None)

Read the coordinates from a file :param filenames: The filename of the files to read :return: A CoordList object

read_data_list(filenames, variables, product=None, aliases=None)

Read multiple data objects. Files can be either gridded or ungridded but not a mix of both.

Parameters:
  • filenames (string or list) – One or more filenames of the files to read
  • variables (string or list) – One or more variables to read from the files
  • product (str) – Name of data product to use (optional)
  • aliases – List of variable aliases to put on each variables data object as an alternative means of identifying them. (Optional)
Returns:

A list of the data read out (either a GriddedDataList or UngriddedDataList depending on the type of data contained in the files)

read_datagroups(datagroups)

Read data from a set of datagroups

Parameters:datagroups

A list of datagroups. Each datagroup represents a grouping of files and variables, where the set of files may be logically considered to represent the same data (an example would be 2D model data split into monthly output files where the grid is the same). The following should be true of a datagroup:

  1. All variables in a datagroup are present in all the files in that datagroup
  2. The shape of the data returned from each variable must be the same in each file, so that they may be concatenated
  3. They should all be openable by the same CIS data product
  4. They should be dictionaries of the following format:
    {'filenames': ['filename1.nc', 'filename2.nc'],
      'variables': ['variable1', 'variable2'],
      'product' : 'Aerosol_CCI'}
    
Returns:A list of data (either a GriddedDataList or an UngriddedDataList, depending on the data format)
cis.data_io.data_reader.expand_filelist(filelist)
Parameters:filelist – A single element, or list, or comma seperated string of filenames, wildcarded filenames or directories
Returns:A flat list of files which exist - with no duplicates
Raises ValueError:
 if any of the files in the list do not exist.
class cis.data_io.data_writer.DataWriter

Bases: object

High level class for writing data to a file

write_data(data, output_file)

Write data to a file.

Parameters:
  • data (CommonData) – Data to write
  • output_file (str) – Output file name
cis.aggregation package
cis.aggregation.aggregate module
class cis.aggregation.aggregate.Aggregate(grid, output_file, data_reader=<cis.data_io.data_reader.DataReader object>, data_writer=<cis.data_io.data_writer.DataWriter object>)

Bases: object

aggregate(variables, filenames, product=None, kernel=None)

Aggregate the given variables based on the initialised grid

Parameters:
  • variables (string or list) – One or more variables to read from the files
  • filenames (string or list) – One or more filenames of the files to read
  • product (str) – Name of data product to use (optional)
  • kernel (str) – Name of kernel to use (the default is ‘moments’)
cis.aggregation.aggregation_grid module
class cis.aggregation.aggregation_grid.AggregationGrid

Bases: cis.aggregation.aggregation_grid.AggregationGrid

Holds the start and delta values for the aggregation grid. is_date indicates whether the limits are date/times - None if unknown :ivar start: aggregation start point :type start: str :ivar delta: aggregation step to take through grid :type delta: str :ivar is_time: indicates whether the limits apply to a time dimension: None if not known :type is_type: bool

cis.aggregation.aggregation_kernels module
class cis.aggregation.aggregation_kernels.MultiKernel(cell_method, sub_kernels)

Bases: object

Represents a set of kernels to be applied each in turn

cis.aggregation.aggregator module
class cis.aggregation.aggregator.Aggregator(data, grid)

Bases: object

aggregate_gridded(kernel)
aggregate_ungridded(kernel)

Performs aggregation for ungridded data by first generating a new grid, converting it into a cube, then collocating using the appropriate kernel and a cube cell constraint

get_grid(coord)
cis.aggregation.aggregator.add_month_midpoint(dt_object, months)
cis.aggregation.aggregator.add_year_midpoint(dt_object, years)
cis.aggregation.aggregator.aggregation_grid_array(start, end, delta, is_time, coordinate)
cis.aggregation.aggregator.categorise_coord_function(start, end, delta, is_time)
cis.aggregation.aggregator.find_nearest(array, value)

Find the nearest to the parameter value in the array :param array: A numpy array :param value: A single value :return: A single value from the array

cis.aggregation.aggregator.month_past_end_of_year(month, year)
cis.collocation package
cis.collocation.col module

Top level collocation objects

class cis.collocation.col.Collocate(sample_points, missing_data_for_missing_sample=False, collocator_factory=<cis.collocation.col.CollocatorFactory object>)

Bases: object

Perform a general collocation

collocate(data, col_name=None, col_params=None, kern=None, kern_params=None)

Perform the collocation.

Parameters:
  • data (CommonData) – Data to collocate
  • col_name (str) – Name of the collocator
  • col_params (dict) – Parameters dictionary for the collocation and constraint
  • kern (str) – The kernel to use
  • kern_params (dict) – The kernel parameters to use
Return CommonData:
 

The collocated data

Raises CoordinateNotFoundError:
 

If the collocator was unable to compare the sample and data points

class cis.collocation.col.CollocatorFactory

Bases: object

Class for creating Collocator, Constraint and Kernel instances

get_collocator_instances_for_method(method_name, kernel_name, collocator_params, kernel_params, sample_gridded, data_gridded)

Get instances of the correct classes for collocation :param method_name: Collocation method name (e.g. ‘lin’, ‘nn’) :param kernel_name: Kernel class name :param collocator_params: Collocation parameters :param kernel_params: Kernel parameters :param sample_gridded: Is the sample data gridded? :param data_gridded: Is the collocation data gridded? :return: Collocator, Constrain and Kernel instances

get_default_collocator_name(method_name, sample_gridded, data_gridded)
cis.collocation.col_framework module
class cis.collocation.col_framework.AbstractDataOnlyKernel

Bases: cis.collocation.col_framework.Kernel

A Kernel that can work on data only, e.g. mean only requires the data values to calculate the mean, not the sampling point.

get_value(point, data)

This method is redundant in the AbstractDataOnlyKernel and only serves as an interface to AbstractDataOnlyKernel(), removing the unnecessary point and checking for one or more data points.

Parameters:
  • point – A single HyperPoint
  • data – A set of data points to reduce to a single value
Returns:

For return_size=1 a single value (number) otherwise a list of returns values, which represents some operation on the points provided

get_value_for_data_only(values)

This method should return a single value (if Kernel.return_size is 1) or a list of n values (if Kernel.return_size is n) based on some calculation on the the values (a numpy array).

Note that this method will be called for every sample point in which data can be placed and so could become a bottleneck for calculations, it is advisable to make it as quick as is practical. If this method is unable to provide a value (for example if no data points were given) a ValueError should be thrown. This method will not be called if there are no values to be used for calculations.

Parameters:values – A numpy array of values (can not be none or empty)
Returns:A single data item if return_size is 1 or a list of items containing Kernel.return_size items
Raises ValueError:
 If there are any problems creating a value
class cis.collocation.col_framework.CellConstraint

Bases: cis.collocation.col_framework.Constraint

Superclass of constraints acting on cells surrounding sample points.

The point argument in constrain_points is a HyperPoint in which the coordinate values are of type iris.coords.Cell.

get_iterator(missing_data_for_missing_sample, coord_map, coords, data_points, shape, points, output_data)
class cis.collocation.col_framework.Collocator(fill_value=None, var_name='', var_long_name='', var_units='', missing_data_for_missing_sample=False)

Bases: object

Class which provides a method for performing collocation. This just defines the interface which the subclasses must implement.

collocate(points, data, constraint, kernel)

The method is responsible for setting up and running the collocation. It should take a set of data and map that onto the given (sample) points using the constraint and kernel provided.

Parameters:
Returns:

One or more CommonData (or subclasses of) objects whose coordinates lie on the points defined above.

class cis.collocation.col_framework.Constraint

Bases: object

Class which provides a method for constraining a set of points. A single HyperPoint is given as a reference but the data points to be reduced ultimately may be of any type. This just defines the interface which the subclasses must implement.

constrain_points(point, data)

This method should return a subset of the data given a single reference point. It is expected that the data returned should be of the same type as that given - but this isn’t mandatory. It is possible that this function will return zero points (no data), the collocation class is responsible for providing a fill_value.

Parameters:
  • point (HyperPoint) – A single HyperPoint
  • data – A set of data points to be reduced
Returns:

A reduced set of data points

get_iterator(missing_data_for_missing_sample, coord_map, coords, data_points, shape, points, output_data)

Iterator to iterate through the points needed to be calculated. The default iterator, iterates through all the sample points calling Constraint.constrain_points() for each one.

Parameters:
  • missing_data_for_missing_sample – If true anywhere there is missing data on the sample then final point is missing; otherwise just use the sample
  • coord_map – Coordinate map - list of tuples of indexes of hyperpoint coord, data coords and output coords
  • coords – The coordinates to map the data onto
  • data_points – The (non-masked) data points
  • shape – Shape of the final data values
  • points – The original points object, these are the points to collocate
  • output_data – Output data set
Returns:

Iterator which iterates through (sample indices, hyper point and constrained points) to be placed in these points

class cis.collocation.col_framework.IndexedConstraint

Bases: cis.collocation.col_framework.Constraint

Superclass of constraints that expect points to be referenced by index.

get_iterator(missing_data_for_missing_sample, coord_map, coords, data_points, shape, points, output_data)
class cis.collocation.col_framework.Kernel

Bases: object

Class which provides a method for taking a number of points and returning one value. For example a nearest neighbour algorithm or sort algorithm or mean. This just defines the interface which the subclasses must implement.

get_value(point, data)

This method should return a single value (if Kernel.return_size is 1) or a list of n values (if Kernel.return_size is n) based on some calculation on the data given a single point.

The data is deliberately left unspecified in the interface as it may be any type of data, however it is expected that each implementation will only work with a specific type of data (gridded, ungridded etc.) Note that this method will be called for every sample point and so could become a bottleneck for calculations, it is advisable to make it as quick as is practical. If this method is unable to provide a value (for example if no data points were given) a ValueError should be thrown.

Parameters:
  • point – A single HyperPoint
  • data – A set of data points to reduce to a single value
Returns:

For return_size=1 a single value (number) otherwise a list of return values, which represents some operation on the points provided

Raises ValueError:
 

When the method is unable to return a value

get_variable_details(var_name, var_long_name, var_standard_name, var_units)

Returns the details of all variables to be created from the outputs of a kernel.

Parameters:
  • var_name (str) – Base variable name
  • var_long_name (str) – Base variable long name
  • var_standard_name (str) – Base variable standard_name
  • var_units (str) – Base variable units
Returns:

Tuple of tuples, each containing (variable name, variable long name, variable units)

return_size = 1

The number of values the Kernel.get_value() should be expected to return (i.e. the length of the return list).

class cis.collocation.col_framework.PointConstraint

Bases: cis.collocation.col_framework.Constraint

Superclass of constraints acting on sample points.

The point argument in constrain_points is a HyperPoint.

cis.collocation.col_framework.get_collocator(method=None)

Top level routine for finding the correct Collocator object. :param method: The collocate method to find - this should be a string which matches the name of one of the subclasses of Collocator :return: One of Collocator’s subclasses

cis.collocation.col_framework.get_constraint(method=None)

Top level routine for finding the correct Constraint object. This doesn’t instantiate the constraint class as it may need variables passed to the constructor :param method: The constraint method to find - this should be a string which matches the name of one of the subclasses of Constraint :return: One of Constraint’s subclasses

cis.collocation.col_framework.get_kernel(method=None)

Top level routine for finding the correct Kernel object. :param method: The kernel method to find - this should be a string which matches the name of one of the subclasses of Kernel :return: One of Kernel’s subclasses

cis.collocation.col_implementations module
class cis.collocation.col_implementations.BinnedCubeCellOnlyConstraint

Bases: cis.collocation.col_framework.Constraint

Constraint for constraining HyperPoints to be within an iris.coords.Cell. With an iterator which only travels over those cells with a value in.

Uses the index_data method to bin all the points.

constrain_points(sample_point, data)
get_iterator(missing_data_for_missing_sample, coord_map, coords, data_points, shape, points, output_data)
get_iterator_for_data_only(missing_data_for_missing_sample, coord_map, coords, data_points, shape, points, values)

The method returns an iterator over the output indices and a numpy array slice of the data values. This may not be called by all collocators who may choose to iterate over all sample points instead.

Parameters:
  • missing_data_for_missing_sample – If true anywhere there is missing data on the sample then final point is missing; otherwise just use the sample
  • coord_map – Not needed for the data only kernel
  • coords – Not needed for the data only kernel
  • data_points – The (non-masked) data points
  • shape – Not needed
  • points – The original points object, these are the points to collocate
  • values – Not needed
Returns:

Iterator which iterates through (sample indices and data slice) to be placed in these points

class cis.collocation.col_implementations.BinningCubeCellConstraint

Bases: cis.collocation.col_framework.IndexedConstraint

Constraint for constraining HyperPoints to be within an iris.coords.Cell.

Uses the index_data method to bin all the points

constrain_points(sample_point, data)

Returns HyperPoints lying within a cell.

This implementation returns the points that have been stored in the appropriate bin by the index_data method. :param sample_point: HyperPoint of indices of cells defining sample region :param data: list of HyperPoints to check :return: HyperPointList of points found within cell

class cis.collocation.col_implementations.CubeCellConstraint

Bases: cis.collocation.col_framework.CellConstraint

Constraint for constraining HyperPoints to be within an iris.coords.Cell.

constrain_points(sample_point, data)

Returns HyperPoints lying within a cell. :param sample_point: HyperPoint of cells defining sample region :param data: list of HyperPoints to check :return: HyperPointList of points found within cell

class cis.collocation.col_implementations.DummyCollocator(fill_value=None, var_name='', var_long_name='', var_units='', missing_data_for_missing_sample=False)

Bases: cis.collocation.col_framework.Collocator

collocate(points, data, constraint, kernel)
This collocator does no collocation at all - it just returns the original data values. This might be useful if the input data for one variable is already known to be on the same grid as points. This routine could check the coordinates are the same but currently does no such check.
Parameters:
  • points – A list of HyperPoints
  • data – An UngriddedData object or Cube
  • constraint – Unused
  • kernel – Unused
Returns:

A single LazyData object

class cis.collocation.col_implementations.DummyConstraint

Bases: cis.collocation.col_framework.Constraint

constrain_points(point, data)
class cis.collocation.col_implementations.GeneralGriddedCollocator(fill_value=None, var_name='', var_long_name='', var_units='', missing_data_for_missing_sample=False)

Bases: cis.collocation.col_framework.Collocator

Performs collocation of data on to the points of a cube (ie onto a gridded dataset).

collocate(points, data, constraint, kernel)
Parameters:
  • points – cube defining the sample points
  • data – CommonData object providing data to be collocated (or list of Data)
  • constraint – instance of a Constraint subclass, which takes a data object and returns a subset of that data based on it’s internal parameters
  • kernel – instance of a Kernel subclass which takes a number of points and returns a single value
Returns:

GriddedDataList of collocated data

class cis.collocation.col_implementations.GeneralUngriddedCollocator(fill_value=None, var_name='', var_long_name='', var_units='', missing_data_for_missing_sample=False)

Bases: cis.collocation.col_framework.Collocator

Collocator for locating onto ungridded sample points

collocate(points, data, constraint, kernel)

This collocator takes a list of HyperPoints and a data object (currently either Ungridded data or a Cube) and returns one new LazyData object with the values as determined by the constraint and kernel objects. The metadata for the output LazyData object is copied from the input data object.

Parameters:
  • points – UngriddedData or UngriddedCoordinates defining the sample points
  • data – An UngriddedData object or Cube, or any other object containing metadata that the constraint object can read. May also be a list of objects, in which case a list will be returned
  • constraint – An instance of a Constraint subclass which takes a data object and returns a subset of that data based on it’s internal parameters
  • kernel – An instance of a Kernel subclass which takes a number of points and returns a single value
Returns:

A single LazyData object

class cis.collocation.col_implementations.GriddedCollocator(var_name='', var_long_name='', var_units='', missing_data_for_missing_sample=False)

Bases: cis.collocation.col_framework.Collocator

collocate(points, data, constraint, kernel)

This collocator takes two Iris cubes, and collocates from the data cube onto the grid of the ‘points’ cube. The collocator then returns another Iris cube. :param points: An Iris cube with the sampling grid to collocate onto. :param data: The Iris cube with the data to be collocated. :param constraint: None allowed yet, as this is unlikely to be required for gridded-gridded. :param kernel: The kernel to use, current options are gridded_gridded_nn and gridded_gridded_li. :return: An Iris cube with the collocated data.

class cis.collocation.col_implementations.SepConstraint(h_sep=None, a_sep=None, p_sep=None, t_sep=None)

Bases: cis.collocation.col_framework.PointConstraint

alt_constraint(point, ref_point)
constrain_points(ref_point, data)
horizontal_constraint(point, ref_point)
pressure_constraint(point, ref_point)
time_constraint(point, ref_point)
class cis.collocation.col_implementations.SepConstraintKdtree(h_sep=None, a_sep=None, p_sep=None, t_sep=None)

Bases: cis.collocation.col_framework.PointConstraint

A separation constraint that uses a k-D tree to optimise spatial constraining. If no horizontal separation parameter is supplied, this reduces to an exhaustive search using the other parameter(s).

alt_constraint(point, ref_point)
constrain_points(ref_point, data)
horizontal_constraint(point, ref_point)
pressure_constraint(point, ref_point)
time_constraint(point, ref_point)
class cis.collocation.col_implementations.gridded_gridded_li

Bases: cis.collocation.col_framework.Kernel

get_value(point, data)

Not needed for gridded/gridded collocation.

class cis.collocation.col_implementations.gridded_gridded_nn

Bases: cis.collocation.col_framework.Kernel

get_value(point, data)

Not needed for gridded/gridded collocation.

class cis.collocation.col_implementations.li(extrapolate=False, nn_vertical=False)

Bases: cis.collocation.col_framework.Kernel

Linear Interpolation Kernel

get_value(point, data)

Co-location routine using iris’ linear interpolation algorithm. This only makes sense for gridded data.

cis.collocation.col_implementations.make_coord_map(points, data)

Create a map for how coordinates from the sample points map to the standard hyperpoint coordinates. Ignoring coordinates which are not present in the data :param points: sample points :param data: data to map :return: list of tuples, each tuple is index of coordinate to use tuple is (hyper point coord index, sample point coord index, output coord index)

class cis.collocation.col_implementations.max

Bases: cis.collocation.col_framework.AbstractDataOnlyKernel

Calculate the maximum value

get_value_for_data_only(values)

Return the maximum value

class cis.collocation.col_implementations.mean

Bases: cis.collocation.col_framework.AbstractDataOnlyKernel

Calculate mean of data points

get_value_for_data_only(values)

return the mean

class cis.collocation.col_implementations.min

Bases: cis.collocation.col_framework.AbstractDataOnlyKernel

Calculate the minimum value

get_value_for_data_only(values)

Return the minimum value

class cis.collocation.col_implementations.moments(mean_name='', stddev_name='', nopoints_name='')

Bases: cis.collocation.col_framework.AbstractDataOnlyKernel

get_value_for_data_only(values)

Returns the mean, standard deviation and number of values

get_variable_details(var_name, var_long_name, var_standard_name, var_units)

Sets name and units for mean, standard deviation and number of points variables, based on those of the base variable or overridden by those specified as kernel parameters. :param var_name: base variable name :param var_long_name: base variable long name :param var_standard_name: base variable standard name :param var_units: base variable units :return: tuple of tuples each containing (variable name, variable long name, variable units)

return_size = 3
class cis.collocation.col_implementations.nn_a

Bases: cis.collocation.col_implementations.nn_altitude

Nearest neighbour altitude kernel - alias for nn_altitude.

class cis.collocation.col_implementations.nn_altitude

Bases: cis.collocation.col_framework.Kernel

get_value(point, data)
Collocation using nearest neighbours in altitude, where both points and
data are a list of HyperPoints. The default point is the first point.
class cis.collocation.col_implementations.nn_gridded

Bases: cis.collocation.col_framework.Kernel

get_value(point, data)

Co-location routine using nearest neighbour algorithm optimized for gridded data. This calls out to iris to do the work.

class cis.collocation.col_implementations.nn_h

Bases: cis.collocation.col_implementations.nn_horizontal

Nearest neighbour horizontal kernel - alias for nn_horizontal.

class cis.collocation.col_implementations.nn_horizontal

Bases: cis.collocation.col_framework.Kernel

get_value(point, data)
Collocation using nearest neighbours along the face of the earth where both points and
data are a list of HyperPoints. The default point is the first point.
class cis.collocation.col_implementations.nn_horizontal_kdtree

Bases: cis.collocation.col_framework.Kernel

get_value(point, data)

Collocation using nearest neighbours along the face of the earth using a k-D tree index.

class cis.collocation.col_implementations.nn_p

Bases: cis.collocation.col_implementations.nn_pressure

Nearest neighbour pressure kernel - alias for nn_pressure.

class cis.collocation.col_implementations.nn_pressure

Bases: cis.collocation.col_framework.Kernel

get_value(point, data)
Collocation using nearest neighbours in pressure, where both points and
data are a list of HyperPoints. The default point is the first point.
class cis.collocation.col_implementations.nn_t

Bases: cis.collocation.col_implementations.nn_time

Nearest neighbour time kernel - alias for nn_time.

class cis.collocation.col_implementations.nn_time

Bases: cis.collocation.col_framework.Kernel

get_value(point, data)
Collocation using nearest neighbours in time, where both points and
data are a list of HyperPoints. The default point is the first point.
class cis.collocation.col_implementations.stddev

Bases: cis.collocation.col_framework.AbstractDataOnlyKernel

Calculate the standard deviation

get_value_for_data_only(values)

Return the standard deviation points

cis.plotting package
cis.plotting.plot module
cis.subsetting package
cis.subsetting.subset module
class cis.subsetting.subset.Subset(limits, output_file, subsetter=<cis.subsetting.subsetter.Subsetter object>, data_reader=<cis.data_io.data_reader.DataReader object>, data_writer=<cis.data_io.data_writer.DataWriter object>)

Bases: object

Class for subsetting Ungridded or Gridded data either temporally, or spatially or both.

subset(variables, filenames, product=None)

Subset the given variables based on the initialised limits

Parameters:
  • variables (string or list) – One or more variables to read from the files
  • filenames (string or list) – One or more filenames of the files to read
  • product (str) – Name of data product to use (optional)
cis.subsetting.subset_constraint module
class cis.subsetting.subset_constraint.CoordLimits

Bases: cis.subsetting.subset_constraint.CoordLimits

Holds the start and end values for subsetting limits. :ivar coord: the coordinate the limit applies to :ivar start: subsetting limit start :ivar end: subsetting limit end :ivar constraint_function: function determining whether the constraint is satisfied

class cis.subsetting.subset_constraint.GriddedSubsetConstraint

Bases: cis.subsetting.subset_constraint.SubsetConstraint

Implementation of SubsetConstraint for subsetting gridded data.

constrain(data)

Subsets the supplied data using a combination of iris.cube.Cube.extract and iris.cube.Cube.intersection, depending on whether intersection is supported (whether the coordinate has a defined modulus). :param data: data to be subsetted :return: subsetted data or None if all data excluded. @rtype: cis.data_io.gridded_data.GriddedData

class cis.subsetting.subset_constraint.SubsetConstraint

Bases: cis.subsetting.subset_framework.SubsetConstraintInterface

Abstract Constraint for subsetting.

Holds the limits for subsetting in each dimension.

set_limit(coord, dim_min, dim_max)

Sets boundary values for a dimension to be used in subsetting. :param coord: coordinate to which limit applies :param dim_min: lower bound on dimension or None to indicate no lower bound :param dim_max: upper bound on dimension or None to indicate no upper bound

class cis.subsetting.subset_constraint.UngriddedSubsetConstraint

Bases: cis.subsetting.subset_constraint.SubsetConstraint

Implementation of SubsetConstraint for subsetting ungridded data.

constrain(data)

Subsets the supplied data.

Parameters:data – data to be subsetted
Returns:subsetted data
cis.subsetting.subset_framework module
class cis.subsetting.subset_framework.SubsetConstraintInterface

Bases: object

Interface for subset constraint classes.

constrain(data)

Subsets the supplied data.

Parameters:data – data to be subsetted
Returns:subsetted data
class cis.subsetting.subset_framework.SubsetterInterface

Bases: object

Interface for subsetter classes.

subset(data, constraint)

Subsets the supplied data using the supplied constraint.

Parameters:
  • data – data to be subsetted
  • constraint – SubsetConstraint object to be used to subset data
Returns:

subsetted data

cis.subsetting.subset_limits module
class cis.subsetting.subset_limits.SubsetLimits

Bases: cis.subsetting.subset_limits.SubsetLimits

Holds the start and end values for subsetting limits. is_date indicates whether the limits are date/times - None if unknown :ivar start: subsetting limit start :type start: str :ivar end: subsetting limit end :type end: str :ivar is_time: indicates whether the limits apply to a time dimension: None if not known :type is_type: bool

cis.subsetting.subsetter module
class cis.subsetting.subsetter.Subsetter

Bases: cis.subsetting.subset_framework.SubsetterInterface

subset(data, constraint)

Subsets the supplied data using the supplied constraint.

Parameters:
  • data – data to be subsetted
  • constraint – SubsetConstraint object to be used to subset data
Returns:

subsetted data

cis.stats module
class cis.stats.StatsAnalyzer(data1, data2)

Analyse datasets to produce statistics.

analyze()

Perform a statistical analysis on two data sets.

Returns:List of StatisticsResult instances.
points_count()

Count all points which will be used for statistical comparison operations (i.e. are non-missing in both datasets).

Returns:List of StatisticsResults
means()

Means of two datasets

Returns:List of StatisticsResults
stddevs()

Corrected sample standard deviation of datasets

Returns:List of StatisticsResults
abs_mean()

Mean of absolute difference d2-d1

Returns:List of StatisticsResults
abs_stddev()

Standard deviation of absolute difference d2-d1

Returns:List of StatisticsResults
rel_mean()

Mean of relative difference (d2-d1)/d1

Returns:List of StatisticsResults
rel_stddev()

Mean of relative difference (d2-d1)/d1

Returns:List of StatisticsResults
spearmans_rank()

Perform a spearman’s rank on the data

Returns:List of StatisticsResults
linear_regression()

Perform a linear regression on the data

Returns:List of StatisticsResults
CIS utility functions
cis.time_util module

Utilities for converting time units

cis.time_util.calculate_mid_time(t1, t2)
Find the mid time between two times expressed as floats
Parameters:
  • t1 – a time represented as a float
  • t2 – a time in the same representation as t1
Returns:

a float representing the time between t1 and t2

cis.time_util.convert_cube_time_coord_to_standard_time(cube)

Converts the time coordinate from the one in the cube to one based on a standard time unit. :param cube: cube to modify :return: the cube

cis.time_util.convert_cube_time_coord_to_standard_time_assuming_gregorian_calendar(cube)

Converts the time coordinate from the one in the cube to one based on a standard time unit.

This approach assumes that source date is valid as a date in the calendar set for the standard time unit (Gregorian) which will not always be true. :param cube: cube to modify :return: the cube

cis.time_util.convert_datetime_to_std_time(dt)
cis.time_util.convert_days_since_to_std_time(days, ref)
cis.time_util.convert_julian_date_to_std_time(julian_date, calender='standard')
cis.time_util.convert_julian_date_to_std_time_array(julian_time_array, calender='standard')
cis.time_util.convert_obj_to_standard_date_array(time_array)
cis.time_util.convert_sec_since_to_std_time(seconds, ref)

Convert a number of seconds since a given reference datetime to a number of days since our standard time. This in principle could avoid the intermediate step converting to a datetime object except we don’t know which calender the reference is on, e.g. it could be a 360 day calendar

Parameters:
  • seconds
  • ref
Returns:

cis.time_util.convert_sec_since_to_std_time_array(tai_time_array, ref)
cis.time_util.convert_std_time_to_datetime(std_time)
cis.time_util.convert_time_since_to_std_time(time_array, units)
cis.time_util.convert_time_using_time_stamp_info_to_std_time(time_array, units, time_stamp_info=None)

Convert the time using time stamp info and the first word of the units :param time_array: the time array to convert :param units: the units of the array (e.g. day or Days from the file time reference 2012-12-12) :param time_stamp_info: the time stamp to use for the convertion :return: converted data

cis.utils module
class cis.utils.OrderedSet(iterable=None)

Bases: _abcoll.MutableSet

From http://code.activestate.com/recipes/576694/

add(key)
discard(key)
pop(last=True)
cis.utils.add_element_to_list_in_dict(my_dict, key, value)
cis.utils.add_file_prefix(prefix, filepath)

Add a prefix to a filename taking into account any path that might be present before that actual filename

Parameters:
  • prefix – A string to prefix the filename with
  • filepath – Filename, optionally including path
Returns:

A string with the full path to the prefixed file

cis.utils.add_to_list_if_not_none(item, list)

Add a value to a list if it is not None

Parameters:
  • item – the item to add
  • list – the list to append it to
Returns:

nothing

cis.utils.apply_intersection_mask_to_two_arrays(array1, array2)

Ensure two (optionally) masked arrays have the same mask. If both arrays are masked the intersection of the masks is used. If one array is masked and the other is not, the mask from the masked array is applied to the unmasked array. If neither array is masked then both arrays are returned as masked arrays with an empty mask.

Parameters:
  • array1 – An (optionally masked) array
  • array2 – Another (optionally masked) array
Returns:

Two masked arrays with a common mask

cis.utils.apply_mask_to_numpy_array(in_array, mask)

Element-wise ORs the mask with the mask of the array. If the mask masks no elements, no change is made. If the array is not masked, it is converted to a masked array.

Parameters:
  • in_array (numpy array or masked array) – input array
  • mask (numpy array of boolean) – mask
cis.utils.array_equal_including_nan(array1, array2)
Parameters:
  • array1 – A numpy array
  • array2 – Another numpy array (can be of a different shape)
Returns:

True or false if the arrays are equal, including NaNs.

cis.utils.calculate_histogram_bin_edges(data, axis, user_range, step, log_scale=False)
Parameters:
  • data – A numpy array
  • axis – The axis on which the data will be plotted. Set to “x” for histogram2d
  • user_range – A dictionary containing the min and max values for the edges specified by the user. The data min and max is used if the user did not specify
  • step – The distance between each bin edge/the width of each bin
Returns:

An array containing a list of bin edges (i.e. when each bin starts and ends)

cis.utils.concatenate(arrays, axis=0)

Concatenate a list of numpy arrays into one larger array along the axis specified (the default axis is zero). If any of the arrays are masked arrays then the returned array will be a masked array with the correct mask, otherwise a numpy array is returned.

Parameters:
  • arrays – A list of numpy arrays (masked or not)
  • axis – The axis along which to concatenate (the default is 0)
Returns:

The concatenated array

cis.utils.convert_array_type(array, new_type, operation, *args, **kwargs)
cis.utils.convert_masked_array_type(masked_array, new_type, operation, *args, **kwargs)
cis.utils.convert_numpy_array(array, new_type, operation, *args, **kwargs)
cis.utils.copy_attributes(source, dest)

Copy all attributes from one object to another

Parameters:
  • source – Object to copy attributes from
  • dest – Object to copy attributes to
Returns:

None

cis.utils.create_masked_array_for_missing_data(data, missing_val)
cis.utils.create_masked_array_for_missing_values(data, missing_values)
cis.utils.deprecated(func)

This is a decorator which can be used to mark functions as deprecated. It will result in a warning being emitted when the function is used.

Taken from http://code.activestate.com/recipes/391367-deprecated/

cis.utils.dimensions_equal(dimensions, other_dimensions)

Check to see if two dimensions are the same (contain the same variables in the same order)

Parameters:
  • dimensions – dimension list
  • other_dimensions – other dimension list
cis.utils.expand_1d_to_2d_array(array_1d, length, axis=0)

General utility routine to ‘extend a 1D array into a 2D array by duplicating the data along a given ‘axis’ (default is 0) of size ‘length’.

Examples:

>>> a = np.array([1, 2, 3, 4])
>>> expand_1d_to_2d_array(a, 4, axis=0)
[[1 2 3 4]
 [1 2 3 4]
 [1 2 3 4]
 [1 2 3 4]]

>>> a = np.array([1, 2, 3, 4])
>>> expand_1d_to_2d_array(a, 4, axis=1)
[[1 1 1 1]
 [2 2 2 2]
 [3 3 3 3]
 [4 4 4 4]]
Parameters:
  • array_1d
  • length
  • axis
Returns:

cis.utils.find_longitude_wrap_start(x_variable, x_range, packed_data_items)
cis.utils.fix_longitude_range(lons, range_start)

Shifts longitude values by +/- 360 to fit within a 360 degree range starting at a specified value. It is assumed that a no shifts larger than 360 are needed.

Parameters:
  • lons – numpy array of longitude values
  • range_start – longitude at start of 360 degree range into which values are required to fit
Returns:

array of fixed longitudes

cis.utils.get_class_name(cls)

Returns the qualified class name of a class.

Parameters:cls – class
Returns:class name
cis.utils.get_coord(data_object, variable, data)

Find a specified coord

Parameters:
  • data_object
  • variable
  • data
Returns:

cis.utils.guess_coord_axis(coord)

Returns X, Y, Z or T corresponding to longitude, latitude, altitude or time respectively if the coordinate can be determined to be one of these (based on the standard name only, in this implementation).

This is intended to be similar to iris.util.guess_coord_axis.

cis.utils.haversine(lat, lon, lat2, lon2)

Computes the Haversine distance between two points

cis.utils.index_iterator(shape)

Iterates over the indexes of a multi-dimensional array of a specified shape. The last index changes most rapidly.

Parameters:shape – sequence of array dimensions
Returns:yields tuples of array indexes
cis.utils.index_iterator_for_non_masked_data(shape, points)

Iterates over the indexes of a multi-dimensional array of a specified shape. The last index changes most rapidly.

Parameters:shape – sequence of array dimensions
Returns:yields tuples of array indexes
cis.utils.index_iterator_nditer(shape, points)

Iterates over the indexes of a multi-dimensional array of a specified shape. The last index changes most rapidly.

Parameters:shape – sequence of array dimensions
Returns:yields tuples of array indexes
cis.utils.isnan(number)
cis.utils.listify(item)

If item is not a list, return it as a list

Parameters:item – Item which may or may not be a list
Returns:List
cis.utils.log_memory_profile(location)

Write the total memory to the log as debug message

Parameters:location – location in the program where the memory measurement was taken
Returns:nothing
cis.utils.parse_distance_with_units_to_float_km(distance)

Parse a string such as ‘10km’ or ‘1.0e3m’ to a distance in km

Parameters:distance – string to parse
Returns:A distance in km
cis.utils.parse_distance_with_units_to_float_m(distance)

Parse a string such as ‘10km’ or ‘1.0e3m’ to a distance in m

Parameters:distance – string to parse
Returns:A distance in m
cis.utils.parse_key_val_list(input_list)
Takes list of keyword value strings (seperated by =) and returns a dictionary with those keys and values NOTE if a key has no value, the key is stored and given the value True
Parameters:input_list – A list of strings which are keyword value pairs separated by =
Returns:A dictionary of the keywords and values
cis.utils.parse_key_val_string(arguments, separator)

Takes a (comma) separated list of keyword value pairs (separated by =) and returns a dictionary with those keys and values

Parameters:
  • arguments – A string which is a separated list of keyword value pairs
  • separator – String which is used to split the string into a list
Returns:

A dictionary of the keywords and values

cis.utils.remove_file_prefix(prefix, filepath)

Remove a prefix from a filename, taking into account any path that might be present before that actual filename

Parameters:
  • prefix – The prefix to remove
  • filepath – Filename, optional including path
Returns:

A string with the full path to the un-prefixed file

cis.utils.set_cube_standard_name_if_valid(cube, standard_name)

Set a cube’s standard name if it is a valid CF compliant name, otherwise set it to None

Parameters:
  • cube – Cube to set standard name on
  • standard_name – Standard name to set
Returns:

cis.utils.split_into_float_and_units(measurement)

Split a string such as ‘1000m’ or ‘1.0e3’ to a value and, optionally, units

Parameters:distance – string to parse
Returns:A distance in m
cis.utils.unpack_data_object(data_object, x_variable, y_variable, x_wrap_start)

:param data_object A cube or an UngriddedData object :return: A dictionary containing x, y and data as numpy arrays

cis.utils.wrap_longitude_coordinate_values(x_min, x_max)
cis.exceptions module

Custom CIS exceptions

exception cis.exceptions.CISError

Bases: exceptions.Exception

exception cis.exceptions.ClassNotFoundError

Bases: cis.exceptions.CISError

exception cis.exceptions.CoordinateNotFoundError

Bases: cis.exceptions.CISError

exception cis.exceptions.DuplicateCoordinateError

Bases: cis.exceptions.CISError

exception cis.exceptions.FileFormatError(error_list, *args, **kwargs)

Bases: cis.exceptions.CISError

Throw when there is an error determining the type of a file

error_list = ['Unknown error']
exception cis.exceptions.InconsistentDimensionsError

Bases: cis.exceptions.CISError

exception cis.exceptions.InvalidCollocationMethodError

Bases: cis.exceptions.CISError

exception cis.exceptions.InvalidCommandLineOptionError

Bases: cis.exceptions.CISError

exception cis.exceptions.InvalidDataTypeError

Bases: cis.exceptions.CISError

exception cis.exceptions.InvalidDimensionError

Bases: cis.exceptions.CISError

exception cis.exceptions.InvalidFileExtensionError

Bases: cis.exceptions.CISError

exception cis.exceptions.InvalidHistogramStyleError

Bases: cis.exceptions.CISError

exception cis.exceptions.InvalidLineStyleError

Bases: cis.exceptions.CISError

exception cis.exceptions.InvalidNumberOfDatagroupsSpecifiedError

Bases: cis.exceptions.CISError

exception cis.exceptions.InvalidPlotFormatError

Bases: cis.exceptions.CISError

exception cis.exceptions.InvalidPlotTypeError

Bases: cis.exceptions.CISError

exception cis.exceptions.InvalidSliceIndexError

Bases: cis.exceptions.CISError

exception cis.exceptions.InvalidVariableError

Bases: cis.exceptions.CISError

exception cis.exceptions.NoDataInSubsetError

Bases: cis.exceptions.CISError

exception cis.exceptions.NotEnoughAxesSpecifiedError

Bases: cis.exceptions.CISError

exception cis.exceptions.UserPrintableException(message)

Bases: cis.exceptions.CISError

This exception is thrown if the program has failed for a known reason. This message is printed without a stack trace

Indices and tables