5. CIS as a Python library (API)¶

5.1. Main API¶

As a command line tool, CIS has not been designed with a python API in mind. There are however some utility functions that may provide a useful start for those who wish to use CIS as a python library. For example, the functions in the base cis module provide a straightforward way to load your data. They can be easily import using, for example: from cis import read_data. One of the advantages of using CIS as a Python library is that you are able to perform multiple operations in one go, that is without writing to disk in between. In certain cases this may provide a significant speed-up.

Note

This section of the documentation expects a greater level of Python experience than the other sections. There are many helpful Python guides and tutorials available around the web if you wish to learn more.

The read_data() function is a simple way to read a single gridded or ungridded data object (e.g. a NetCDF variable) from one or more files. CIS will determine the best way to interpret the datafile by comparing the file signature with the built-in data reading plugins and any user defined plugins. Specifying a particular product allows the user to override this automatic detection.

cis.read_data(filenames, variable, product=None)¶

Read a specific variable from a list of files Files can be either gridded or ungridded but not a mix of both. First tries to read data as gridded, if that fails, tries as ungridded.

Parameters:

filenames (string or list) – The filenames of the files to read. This can be either a single filename as a string, a comma separated list, or a list of string filenames. Filenames can include directories which will be expanded to include all files in that directory, or wildcards such as * or ?.
variable (str) – The variable to read from the files
product (str) – The name of the data reading plugin to use to read the data (e.g. Cloud_CCI).

Returns:

The specified data as either a GriddedData or UngriddedData object.

The read_data_list() function is very similar to read_data() except that it allows the user to specify more than one variable name. This function returns a list of data objects, either all of which will be gridded, or all ungridded, but not a mix. For ungridded data lists it is assumed that all objects share the same coordinates.

cis.read_data_list(filenames, variables, product=None, aliases=None)¶

Read multiple data objects from a list of files. Files can be either gridded or ungridded but not a mix of both.

Parameters:

filenames (string or list) – The filenames of the files to read. This can be either a single filename as a string, a comma separated list, or a list of string filenames. Filenames can include directories which will be expanded to include all files in that directory, or wildcards such as * or ?.
variables (string or list) – One or more variables to read from the files
product (str) – The name of the data reading plugin to use to read the data (e.g. Cloud_CCI).
aliases (string or list) – List of aliases to put on each variable’s data object as an alternative means of identifying them.

Returns:

A list of the data read out (either a GriddedDataList or UngriddedDataList depending on the type of data contained in the files)

The get_variables() function returns a list of variable names from one or more specified files. This can be useful to inspect a set of files before calling the read routines described above.

cis.get_variables(filenames, product=None, type=None)¶

Get a list of variables names from a list of files. Files can be either gridded or ungridded but not a mix of both.

Parameters:

filenames (string or list) – The filenames of the files to read. This can be either a single filename as a string, a comma separated list, or a list of string filenames. Filenames can include directories which will be expanded to include all files in that directory, or wildcards such as * or ?.
product (str) – The name of the data reading plugin to use to read the data (e.g. Cloud_CCI).
type (str) – The type of HDF data to read, i.e. ‘VD’ or ‘SD’

Returns:

A list of the variables

5.1.1. Data Objects¶

Each of the above methods return either GriddedData or UngriddedData objects. These objects are the main data handling objects used within CIS, and their main methods are discussed in the following section. These classes share a common interface, defined by the CommonData class, which is detailed below. For technical reasons some methods which are common to both GriddedData and UngriddedData are not defined in the CommonData interface. The most useful of these methods are probably summary() and save_data().

These objects can also be ‘sliced’ analogously to the underlying numpy arrays, and will return a copy of the requested data as a new CommonData object with the correct data, coordinates and metadata.

class cis.data_io.common_data.CommonData

Interface of common methods implemented for gridded and ungridded data.

alias

Return an alias for the variable name. This is an alternative name by which this data object may be identified if, for example, the actual variable name is not valid for some use (such as performing a python evaluation).

Returns:	The alias
Return type:	str

as_data_frame(copy)

Convert a CommonData object to a Pandas DataFrame.

Parameters:	copy – Create a copy of the data for the new DataFrame? Default is True.
Returns:	A Pandas DataFrame representing the data and coordinates. Note that this won’t include any metadata.

collocated_onto(sample, how='', kernel=None, missing_data_for_missing_sample=True, fill_value=None, var_name='', var_long_name='', var_units='', **kwargs)

Collocate the CommonData object with another CommonData object using the specified collocator and kernel.

Parameters:

sample (CommonData) – The sample data to collocate onto
how (str) – Collocation method (e.g. lin, nn, bin or box)
or cis.collocation.col_framework.Kernel kernel (str) –
missing_data_for_missing_sample (bool) – Should missing values in sample data be ignored for collocation?
fill_value (float) – Value to use for missing data
var_name (str) – The output variable name
var_long_name (str) – The output variable’s long name
var_units (str) – The output variable’s units
kwargs – Constraint arguments such as h_sep, a_sep, etc.

Return CommonData:

The collocated dataset

get_all_points()

Returns a list-like object allowing access to all points as HyperPoints. The object should allow iteration over points and access to individual points.

Returns:	list-like object of data points

get_coordinates_points()

Returns a list-like object allowing access to the coordinates of all points as HyperPoints. The object should allow iteration over points and access to individual points.

Returns:	list-like object of data points

get_non_masked_points()

Returns a list-like object allowing access to all points as HyperPoints. The object should allow iteration over non-masked points and access to individual points.

Returns:	list-like object of data points

history

Return the associated history of the object

Returns:	The history
Return type:	str

is_gridded(): Returns value indicating whether the data/coordinates are gridded.

plot(*args, **kwargs)

Plot the data. A matplotlib Axes is created if none is provided.

The default method for series data is ‘line’, otherwise (for e.g. a map plot) is ‘scatter2d’ for UngriddedData and ‘heatmap’ for GriddedData.

Parameters:	how (string) – The method to use, one of: “contour”, “contourf”, “heatmap”, “line”, “scatter”, “scatter2d”,

“comparativescatter”, “histogram”, “histogram2d” or “taylor” :param Axes ax: A matplotlib axes on which to draw the plot :param Coord or CommonData xaxis: The data to plot on the x axis :param Coord or CommonData yaxis: The data to plot on the y axis :param string or cartopy.crs.Projection projection: The projection to use for map plots (default is PlateCaree) :param float central_longitude: The central longitude to use for PlateCaree (if no other projection specified) :param string label: A label for the data. This is used for the title, colorbar or legend depending on plot type :param args: Other plot-specific args :param kwargs: Other plot-specific kwargs :return Axes: The matplotlib Axes on which the plot was drawn

sampled_from(data, how='', kernel=None, missing_data_for_missing_sample=True, fill_value=None, var_name='', var_long_name='', var_units='', **kwargs)

Collocate the CommonData object with another CommonData object using the specified collocator and kernel

Parameters:

or CommonDataList data (CommonData) – The data to resample
how (str) – Collocation method (e.g. lin, nn, bin or box)
or cis.collocation.col_framework.Kernel kernel (str) –
missing_data_for_missing_sample (bool) – Should missing values in sample data be ignored for collocation?
fill_value (float) – Value to use for missing data
var_name (str) – The output variable name
var_long_name (str) – The output variable’s long name
var_units (str) – The output variable’s units
kwargs – Constraint arguments such as h_sep, a_sep, etc.

Return CommonData:

The collocated dataset

set_longitude_range(range_start): Rotates the longitude coordinate array and changes its values by 360 as necessary to force the values to be within a 360 range starting at the specified value. :param range_start: starting value of required longitude range

subset(**kwargs)

Subset the CommonData object based on the specified constraints. Constraints on arbitrary coordinates are specified using keyword arguments. Each constraint must have two entries (a maximum and a minimum) although one of these can be None. Datetime objects can be used to specify upper and lower datetime limits, or a single PartialDateTime object can be used to specify a datetime range.

The keyword keys are used to find the relevant coordinate, they are looked for in order of name, standard_name, axis and var_name.

For example:

data.subset(time=[datetime.datetime(1984, 8, 28), datetime.datetime(1984, 8, 29)],: altitude=[45.0, 75.0])

Will subset the data from the start of the 28th of August 1984, to the end of the 29th, and between altitudes of 45 and 75 (in whatever units ares used for that Coordinate).

And:: data.subset(time=[PartialDateTime(1984, 9)])

Will subset the data to all of September 1984.

Return CommonData:
Parameters:	kwargs – The constraint arguments
	The subset of the data

var_name

Return the variable name associated with this data object

Returns:	The variable name

5.1.2. Pandas¶

All CommonData objects can be converted to Pandas DataFrames using the as_data_frame() methods. This provides an easy interface to the powerful statistical tools available in Pandas.

5.2. Analysis Methods¶

5.2.1. Collocation¶

Each data object provides both collocated_onto() and sampled_from() methods, which are different ways of calling the collocation depending on whether the object being called is the source or the sample. For example the function performed by the command line:

$ cis col Temperature:2010.nc 2009.nc:variable=Temperature

can be performed in Python using:

temperature_2010 = cis.read_data('Temperature', '2010.nc')
temperature_2009 = cis.read_data('Temperature', '2009.nc')
temperature_2010.sampled_from(temperature_2009)

or, equivalently:

temperature_2009.collocated_onto(temperature_2010)

5.2.2. Aggregation¶

UngriddedData objects provide the aggregate() method to allow easy aggregation. Each dimension of the desired grid is specified as a keyword and the start, end and step as the argument (as a tuple, list or slice).

For example:

data.aggregate(x=[-180, 180, 360], y=slice(-90, 90, 10))

or:

data.aggregate(how='mean', t=[PartialDateTime(2008,9), timedelta(days=1))

Datetime objects can be used to specify upper and lower datetime limits, or a single PartialDateTime object can be used to specify a datetime range. The gridstep can be specified as a DateTimeDelta object.

The keyword keys are used to find the relevant coordinate, they are looked for in order of name, standard_name, axis and var_name.

GriddedData objects provide the collapsed() method which shadows the Iris method of the same name. Our implementation is a slight extension of the Iris method which allows partial collapsing of multi-dimensional auxilliary coordinates.

5.2.3. Subsetting¶

All objects have a subset() method for easily subsetting data across arbitrary dimensions. Constraints on arbitrary coordinates are specified using keyword arguments. Each constraint must have two entries (a maximum and a minimum) although one of these can be None. Datetime objects can be used to specify upper and lower datetime limits, or a single PartialDateTime object can be used to specify a datetime range.

The keyword keys are used to find the relevant coordinate, they are looked for in order of name, standard_name, axis and var_name.

For example:

data.subset(time=[datetime.datetime(1984, 8, 28), datetime.datetime(1984, 8, 29)],
            altitude=[45.0, 75.0])

will subset the data from the start of the 28th of August 1984, to the end of the 29th, and between altitudes of 45 and 75 (in whatever units ares used for that Coordinate).

And:

data.subset(time=[PartialDateTime(1984, 9)])

will subset the data to all of September 1984.

5.2.4. Plotting¶

Plotting can also easily be performed on these objects. Many options are available depending on the plot type, but CIS will attempt to make a sensible default plot regardless of the datatype or dimensionality. The default method for series data is ‘line’, otherwise (for e.g. a map plot) is ‘scatter2d’ for UngriddedData and ‘heatmap’ for GriddedData.

A matplotlib Axes is created if none is provided, meaning the user is able to reformat, or export the plot however they like.