17. CIS as a Python library (API)¶

17.1. Main API¶

As a command line tool, CIS has not been designed with a python API in mind. There are however some utility functions that may provide a useful start for those who wish to use CIS as a python library. For example, the functions in the base cis module provide a straightforward way to load your data. They can be easily import using, for example: from cis import read_data. One of the advantages of using CIS as a Python library is that you are able to perform multiple operations in one go, that is without writing to disk in between. In certain cases this may provide a significant speed-up.

Note

This section of the documentation expects a greater level of Python experience than the other sections. There are many helpful Python guides and tutorials available around the web if you wish to learn more.

The read_data() function is a simple way to read a single gridded or ungridded data object (e.g. a NetCDF variable) from one or more files. CIS will determine the best way to interpret the datafile by comparing the file signature with the built-in data reading plugins and any user defined plugins. Specifying a particular product allows the user to override this automatic detection.

cis.read_data(filenames, variable, product=None)¶

Read a specific variable from a list of files Files can be either gridded or ungridded but not a mix of both. First tries to read data as gridded, if that fails, tries as ungridded.

Parameters:

filenames (string or list) – The filenames of the files to read. This can be either a single filename as a string, a comma separated list, or a list of string filenames. Filenames can include directories which will be expanded to include all files in that directory, or wildcards such as * or ?.
variable (str) – The variable to read from the files
product (str) – The name of the data reading plugin to use to read the data (e.g. Cloud_CCI).

Returns:

The specified data as either a GriddedData or UngriddedData object.

The read_data_list() function is very similar to read_data() except that it allows the user to specify more than one variable name. This function returns a list of data objects, either all of which will be gridded, or all ungridded, but not a mix. For ungridded data lists it is assumed that all objects share the same coordinates.

cis.read_data_list(filenames, variables, product=None, aliases=None)¶

Read multiple data objects from a list of files. Files can be either gridded or ungridded but not a mix of both.

Parameters:

filenames (string or list) – The filenames of the files to read. This can be either a single filename as a string, a comma separated list, or a list of string filenames. Filenames can include directories which will be expanded to include all files in that directory, or wildcards such as * or ?.
variables (string or list) – One or more variables to read from the files
product (str) – The name of the data reading plugin to use to read the data (e.g. Cloud_CCI).
aliases (string or list) – List of aliases to put on each variable’s data object as an alternative means of identifying them.

Returns:

A list of the data read out (either a GriddedDataList or UngriddedDataList depending on the type of data contained in the files)

17.1.1. Data Objects¶

Each of the above methods return either GriddedData or UngriddedData objects. These objects are the main data handling objects used within CIS, and the methods on each of these types are documented in the data modules section. These classes do however share a common interface, defined by the CommonData class, which is detailed below. For technical reasons some methods which are common to both GriddedData and UngriddedData are not defined in the CommonData interface. The most useful of these methods are probably summary() and save_data().

class cis.data_io.common_data.CommonData

Interface of common methods implemented for gridded and ungridded data.

alias

Return an alias for the variable name. This is an alternative name by which this data object may be identified if, for example, the actual variable name is not valid for some use (such as performing a python evaluation).

Returns:	The alias
Return type:	str

as_data_frame(copy)

Convert a CommonData object to a Pandas DataFrame.

Parameters:	copy – Create a copy of the data for the new DataFrame? Default is True.
Returns:	A Pandas DataFrame representing the data and coordinates. Note that this won’t include any metadata.

get_all_points()

Returns a list-like object allowing access to all points as HyperPoints. The object should allow iteration over points and access to individual points.

Returns:	list-like object of data points

get_coordinates_points()

Returns a list-like object allowing access to the coordinates of all points as HyperPoints. The object should allow iteration over points and access to individual points.

Returns:	list-like object of data points

get_non_masked_points()

Returns a list-like object allowing access to all points as HyperPoints. The object should allow iteration over non-masked points and access to individual points.

Returns:	list-like object of data points

history

Return the associated history of the object

Returns:	The history
Return type:	str

is_gridded(): Returns value indicating whether the data/coordinates are gridded.

var_name

Return the variable name associated with this data object

Returns:	The ariable name

17.2. Unsupported API¶

Warning

While the above interfaces are designed as a ‘public’ API and unlikely to change over CIS versions, those documented below are not yet standardised and may change or be removed even between minor version revisions. It is expected however that these particular classes will be developed and stabilised over time to form part of the ‘public’ API.

17.2.1. Collocation¶

The main collocation class can be imported using from cis.collocation import Collocate, it’s methods are outlined below:

class cis.collocation.Collocate(sample_points, missing_data_for_missing_sample=False, collocator_factory=<cis.collocation.col.CollocatorFactory object>)¶

Perform a general collocation

__init__(sample_points, missing_data_for_missing_sample=False, collocator_factory=<cis.collocation.col.CollocatorFactory object>)¶

Constructor

Parameters:	sample_points (CommonData) – Sample points to collocate on to output_filename – Filename to output to missing_data_for_missing_sample – Write missing values out when sample data is missing collocator_factory (CollocatorFactory) – An optional configuration object

__weakref__¶: list of weak references to the object (if defined)

collocate(data, col_name=None, col_params=None, kern=None, kern_params=None)¶

Perform the collocation.

Return CommonData:
Parameters:	data (CommonData) – Data to collocate col_name (str) – Name of the collocator col_params (dict) – Parameters dictionary for the collocation and constraint kern (str) – The kernel to use kern_params (dict) – The kernel parameters to use
	The collocated data
Raises CoordinateNotFoundError:
	If the collocator was unable to compare the sample and data points

17.2.2. Aggregation¶

The main collocation class can be imported using from cis.aggregation import Aggregate, it’s methods are outlined below. Note that currently this object saves the output directly to file, but it is expected that in the future it will return the result for the user to output as needed.

class cis.aggregation.Aggregate(grid, output_file, data_reader=<cis.data_io.data_reader.DataReader object>, data_writer=<cis.data_io.data_writer.DataWriter object>)¶

__init__(grid, output_file, data_reader=<cis.data_io.data_reader.DataReader object>, data_writer=<cis.data_io.data_writer.DataWriter object>)¶

Constructor

Parameters:	grid (dict) – A dictionary of dimension_name:AggregationGrid key value pairs. output_file – The filename to output the result to data_reader – Optional `DataReader` configuration object data_writer – Optional `DataWriter` configuration object

__weakref__¶: list of weak references to the object (if defined)

aggregate(variables, filenames, product=None, kernel=None)¶

Aggregate the given variables based on the initialised grid

Parameters:	variables (string or list) – One or more variables to read from the files filenames (string or list) – One or more filenames of the files to read product (str) – Name of data product to use (optional) kernel (str) – Name of kernel to use (the default is ‘moments’)

17.2.3. Subsetting¶

The main collocation class can be imported using from cis.subsetting import Subset, it’s methods are outlined below: Note that currently this object saves the output directly to file, but it is expected that in the future it will return the result for the user to output as needed.

class cis.subsetting.Subset(limits, output_file, data_reader=<cis.data_io.data_reader.DataReader object>, data_writer=<cis.data_io.data_writer.DataWriter object>)¶

Class for subsetting Ungridded or Gridded data either temporally, or spatially or both.

__init__(limits, output_file, data_reader=<cis.data_io.data_reader.DataReader object>, data_writer=<cis.data_io.data_writer.DataWriter object>)¶

Constructor

Parameters:	limits (dict) – A dictionary of dimension_name:SubsetLimits key value pairs. output_file – The filename to output the result to data_reader – Optional `DataReader` configuration object data_writer – Optional `DataWriter` configuration object

__weakref__¶: list of weak references to the object (if defined)

subset(variables, filenames, product=None)¶

Subset the given variables based on the initialised limits

Parameters:	variables (string or list) – One or more variables to read from the files filenames (string or list) – One or more filenames of the files to read product (str) – Name of data product to use (optional)

17.2.4. Stats¶

The main collocation class can be imported using from cis.stats import StatsAnalyzer, it’s methods are outlined below:

class cis.stats.StatsAnalyzer(data1, data2)

Analyse datasets to produce statistics.

__init__(data1, data2)

Create a statistics analyser for two data sets

Parameters:	data1 (CommonData) – First data object data2 (CommonData) – Second data object

analyze()

Perform a statistical analysis on two data sets.

Returns:	List of StatisticsResult instances.

points_count()

Count all points which will be used for statistical comparison operations (i.e. are non-missing in both datasets).

Returns:	List of StatisticsResults

means()

Means of two datasets

Returns:	List of StatisticsResults

stddevs()

Corrected sample standard deviation of datasets

Returns:	List of StatisticsResults

abs_mean()

Mean of absolute difference d2-d1

Returns:	List of StatisticsResults

abs_stddev()

Standard deviation of absolute difference d2-d1

Returns:	List of StatisticsResults

rel_mean()

Mean of relative difference (d2-d1)/d1

Returns:	List of StatisticsResults

rel_stddev()

Mean of relative difference (d2-d1)/d1

Returns:	List of StatisticsResults

spearmans_rank()

Perform a spearman’s rank on the data

Returns:	List of StatisticsResults

linear_regression()

Perform a linear regression on the data

Returns:	List of StatisticsResults

__weakref__: list of weak references to the object (if defined)

17.2.5. Full Python reference documentation¶

The rest of the documentation below documents internal CIS functions and modules which are not intended to be used as an API at all. They are documented here as a reference for developers and other interested parties.