17. CIS as a Python library (API)

17.1. Main API

As a command line tool, CIS has not been designed with a python API in mind. There are however some utility functions that may provide a useful start for those who wish to use CIS as a python library. For example, the functions in the base cis module provide a straightforward way to load your data. They can be easily import using, for example: from cis import read_data. One of the advantages of using CIS as a Python library is that you are able to perform multiple operations in one go, that is without writing to disk in between. In certain cases this may provide a significant speed-up.

Note

This section of the documentation expects a greater level of Python experience than the other sections. There are many helpful Python guides and tutorials available around the web if you wish to learn more.

The read_data() function is a simple way to read a single gridded or ungridded data object (e.g. a NetCDF variable) from one or more files. CIS will determine the best way to interpret the datafile by comparing the file signature with the built-in data reading plugins and any user defined plugins. Specifying a particular product allows the user to override this automatic detection.

cis.read_data(filenames, variable, product=None)

Read a specific variable from a list of files Files can be either gridded or ungridded but not a mix of both. First tries to read data as gridded, if that fails, tries as ungridded.

Parameters:
  • filenames (string or list) – The filenames of the files to read. This can be either a single filename as a string, a comma separated list, or a list of string filenames. Filenames can include directories which will be expanded to include all files in that directory, or wildcards such as * or ?.
  • variable (str) – The variable to read from the files
  • product (str) – The name of the data reading plugin to use to read the data (e.g. Cloud_CCI).
Returns:

The specified data as either a GriddedData or UngriddedData object.

The read_data_list() function is very similar to read_data() except that it allows the user to specify more than one variable name. This function returns a list of data objects, either all of which will be gridded, or all ungridded, but not a mix. For ungridded data lists it is assumed that all objects share the same coordinates.

cis.read_data_list(filenames, variables, product=None, aliases=None)

Read multiple data objects from a list of files. Files can be either gridded or ungridded but not a mix of both.

Parameters:
  • filenames (string or list) – The filenames of the files to read. This can be either a single filename as a string, a comma separated list, or a list of string filenames. Filenames can include directories which will be expanded to include all files in that directory, or wildcards such as * or ?.
  • variables (string or list) – One or more variables to read from the files
  • product (str) – The name of the data reading plugin to use to read the data (e.g. Cloud_CCI).
  • aliases (string or list) – List of aliases to put on each variable’s data object as an alternative means of identifying them.
Returns:

A list of the data read out (either a GriddedDataList or UngriddedDataList depending on the type of data contained in the files)

17.1.1. Data Objects

Each of the above methods return either GriddedData or UngriddedData objects. These objects are the main data handling objects used within CIS, and the methods on each of these types are documented in the data modules section. These classes do however share a common interface, defined by the CommonData class, which is detailed below. For technical reasons some methods which are common to both GriddedData and UngriddedData are not defined in the CommonData interface. The most useful of these methods are probably summary() and save_data().

class cis.data_io.common_data.CommonData

Interface of common methods implemented for gridded and ungridded data.

alias

Return an alias for the variable name. This is an alternative name by which this data object may be identified if, for example, the actual variable name is not valid for some use (such as performing a python evaluation).

Returns:The alias
Return type:str
as_data_frame(copy)

Convert a CommonData object to a Pandas DataFrame.

Parameters:copy – Create a copy of the data for the new DataFrame? Default is True.
Returns:A Pandas DataFrame representing the data and coordinates. Note that this won’t include any metadata.
get_all_points()

Returns a list-like object allowing access to all points as HyperPoints. The object should allow iteration over points and access to individual points.

Returns:list-like object of data points
get_coordinates_points()

Returns a list-like object allowing access to the coordinates of all points as HyperPoints. The object should allow iteration over points and access to individual points.

Returns:list-like object of data points
get_non_masked_points()

Returns a list-like object allowing access to all points as HyperPoints. The object should allow iteration over non-masked points and access to individual points.

Returns:list-like object of data points
history

Return the associated history of the object

Returns:The history
Return type:str
is_gridded()

Returns value indicating whether the data/coordinates are gridded.

var_name

Return the variable name associated with this data object

Returns:The ariable name

17.2. Unsupported API

Warning

While the above interfaces are designed as a ‘public’ API and unlikely to change over CIS versions, those documented below are not yet standardised and may change or be removed even between minor version revisions. It is expected however that these particular classes will be developed and stabilised over time to form part of the ‘public’ API.

17.2.1. Collocation

The main collocation class can be imported using from cis.collocation import Collocate, it’s methods are outlined below:

class cis.collocation.Collocate(sample_points, missing_data_for_missing_sample=False, collocator_factory=<cis.collocation.col.CollocatorFactory object>)

Perform a general collocation

__init__(sample_points, missing_data_for_missing_sample=False, collocator_factory=<cis.collocation.col.CollocatorFactory object>)

Constructor

Parameters:
  • sample_points (CommonData) – Sample points to collocate on to
  • output_filename – Filename to output to
  • missing_data_for_missing_sample – Write missing values out when sample data is missing
  • collocator_factory (CollocatorFactory) – An optional configuration object
__weakref__

list of weak references to the object (if defined)

collocate(data, col_name=None, col_params=None, kern=None, kern_params=None)

Perform the collocation.

Parameters:
  • data (CommonData) – Data to collocate
  • col_name (str) – Name of the collocator
  • col_params (dict) – Parameters dictionary for the collocation and constraint
  • kern (str) – The kernel to use
  • kern_params (dict) – The kernel parameters to use
Return CommonData:
 

The collocated data

Raises:

CoordinateNotFoundError – If the collocator was unable to compare the sample and data points

17.2.2. Aggregation

The main collocation class can be imported using from cis.aggregation import Aggregate, it’s methods are outlined below. Note that currently this object saves the output directly to file, but it is expected that in the future it will return the result for the user to output as needed.

class cis.aggregation.Aggregate(grid, output_file, data_reader=<cis.data_io.data_reader.DataReader object>, data_writer=<cis.data_io.data_writer.DataWriter object>)
__init__(grid, output_file, data_reader=<cis.data_io.data_reader.DataReader object>, data_writer=<cis.data_io.data_writer.DataWriter object>)

Constructor

Parameters:
  • grid (dict) – A dictionary of dimension_name:AggregationGrid key value pairs.
  • output_file – The filename to output the result to
  • data_reader – Optional DataReader configuration object
  • data_writer – Optional DataWriter configuration object
__weakref__

list of weak references to the object (if defined)

aggregate(variables, filenames, product=None, kernel=None)

Aggregate the given variables based on the initialised grid

Parameters:
  • variables (string or list) – One or more variables to read from the files
  • filenames (string or list) – One or more filenames of the files to read
  • product (str) – Name of data product to use (optional)
  • kernel (str) – Name of kernel to use (the default is ‘moments’)

17.2.3. Subsetting

The main collocation class can be imported using from cis.subsetting import Subset, it’s methods are outlined below: Note that currently this object saves the output directly to file, but it is expected that in the future it will return the result for the user to output as needed.

class cis.subsetting.Subset(limits, output_file, data_reader=<cis.data_io.data_reader.DataReader object>, data_writer=<cis.data_io.data_writer.DataWriter object>)

Class for subsetting Ungridded or Gridded data either temporally, or spatially or both.

__init__(limits, output_file, data_reader=<cis.data_io.data_reader.DataReader object>, data_writer=<cis.data_io.data_writer.DataWriter object>)

Constructor

Parameters:
  • limits (dict) – A dictionary of dimension_name:SubsetLimits key value pairs.
  • output_file – The filename to output the result to
  • data_reader – Optional DataReader configuration object
  • data_writer – Optional DataWriter configuration object
__weakref__

list of weak references to the object (if defined)

subset(variables, filenames, product=None)

Subset the given variables based on the initialised limits

Parameters:
  • variables (string or list) – One or more variables to read from the files
  • filenames (string or list) – One or more filenames of the files to read
  • product (str) – Name of data product to use (optional)

17.2.4. Stats

The main collocation class can be imported using from cis.stats import StatsAnalyzer, it’s methods are outlined below:

class cis.stats.StatsAnalyzer(data1, data2)

Analyse datasets to produce statistics.

__init__(data1, data2)

Create a statistics analyser for two data sets

Parameters:
analyze()

Perform a statistical analysis on two data sets.

Returns:List of StatisticsResult instances.
points_count()

Count all points which will be used for statistical comparison operations (i.e. are non-missing in both datasets).

Returns:List of StatisticsResults
means()

Means of two datasets

Returns:List of StatisticsResults
stddevs()

Corrected sample standard deviation of datasets

Returns:List of StatisticsResults
abs_mean()

Mean of absolute difference d2-d1

Returns:List of StatisticsResults
abs_stddev()

Standard deviation of absolute difference d2-d1

Returns:List of StatisticsResults
rel_mean()

Mean of relative difference (d2-d1)/d1

Returns:List of StatisticsResults
rel_stddev()

Mean of relative difference (d2-d1)/d1

Returns:List of StatisticsResults
spearmans_rank()

Perform a spearman’s rank on the data

Returns:List of StatisticsResults
linear_regression()

Perform a linear regression on the data

Returns:List of StatisticsResults
__weakref__

list of weak references to the object (if defined)