17. CIS as a Python library (API)¶
17.1. Main API¶
As a command line tool, CIS has not been designed with a python API in mind. There are however some utility functions
that may provide a useful start for those who wish to use CIS as a python library. For example, the functions in the
base cis module provide a straightforward way to load your data. They can be easily import using, for example: from cis import read_data
.
One of the advantages of using CIS as a Python library is that you are able to perform multiple operations in one go,
that is without writing to disk in between. In certain cases this may provide a significant speed-up.
Note
This section of the documentation expects a greater level of Python experience than the other sections. There are many helpful Python guides and tutorials available around the web if you wish to learn more.
The read_data()
function is a simple way to read a single gridded or ungridded data object (e.g. a NetCDF
variable) from one or more files. CIS will determine the best way to interpret the datafile by comparing the file
signature with the built-in data reading plugins and any user defined plugins. Specifying a particular product
allows the user to override this automatic detection.
-
cis.
read_data
(filenames, variable, product=None)¶ Read a specific variable from a list of files Files can be either gridded or ungridded but not a mix of both. First tries to read data as gridded, if that fails, tries as ungridded.
Parameters: - filenames (string or list) – The filenames of the files to read. This can be either a single filename as a string, a comma
separated list, or a
list
of string filenames. Filenames can include directories which will be expanded to include all files in that directory, or wildcards such as*
or?
. - variable (str) – The variable to read from the files
- product (str) – The name of the data reading plugin to use to read the data (e.g.
Cloud_CCI
).
Returns: The specified data as either a
GriddedData
orUngriddedData
object.- filenames (string or list) – The filenames of the files to read. This can be either a single filename as a string, a comma
separated list, or a
The read_data_list()
function is very similar to read_data()
except that it allows the user to specify
more than one variable name. This function returns a list of data objects, either all of which will be gridded, or all
ungridded, but not a mix. For ungridded data lists it is assumed that all objects share the same coordinates.
-
cis.
read_data_list
(filenames, variables, product=None, aliases=None)¶ Read multiple data objects from a list of files. Files can be either gridded or ungridded but not a mix of both.
Parameters: - filenames (string or list) – The filenames of the files to read. This can be either a single filename as a string, a comma
separated list, or a
list
of string filenames. Filenames can include directories which will be expanded to include all files in that directory, or wildcards such as*
or?
. - variables (string or list) – One or more variables to read from the files
- product (str) – The name of the data reading plugin to use to read the data (e.g.
Cloud_CCI
). - aliases (string or list) – List of aliases to put on each variable’s data object as an alternative means of identifying them.
Returns: A list of the data read out (either a
GriddedDataList
orUngriddedDataList
depending on the type of data contained in the files)- filenames (string or list) – The filenames of the files to read. This can be either a single filename as a string, a comma
separated list, or a
17.1.1. Data Objects¶
Each of the above methods return either GriddedData
or UngriddedData
objects. These objects are the main
data handling objects used within CIS, and the methods on each of these types are documented in the
data modules section. These classes do however share a common interface, defined by the CommonData
class, which is detailed below. For technical reasons some methods which are common to both GriddedData
and UngriddedData
are not defined in the CommonData
interface. The most useful of these methods are probably summary()
and save_data()
.
-
class
cis.data_io.common_data.
CommonData
Interface of common methods implemented for gridded and ungridded data.
-
alias
Return an alias for the variable name. This is an alternative name by which this data object may be identified if, for example, the actual variable name is not valid for some use (such as performing a python evaluation).
Returns: The alias Return type: str
-
as_data_frame
(copy) Convert a CommonData object to a Pandas DataFrame.
Parameters: copy – Create a copy of the data for the new DataFrame? Default is True. Returns: A Pandas DataFrame representing the data and coordinates. Note that this won’t include any metadata.
-
get_all_points
() Returns a list-like object allowing access to all points as HyperPoints. The object should allow iteration over points and access to individual points.
Returns: list-like object of data points
-
get_coordinates_points
() Returns a list-like object allowing access to the coordinates of all points as HyperPoints. The object should allow iteration over points and access to individual points.
Returns: list-like object of data points
-
get_non_masked_points
() Returns a list-like object allowing access to all points as HyperPoints. The object should allow iteration over non-masked points and access to individual points.
Returns: list-like object of data points
-
history
Return the associated history of the object
Returns: The history Return type: str
-
is_gridded
() Returns value indicating whether the data/coordinates are gridded.
-
var_name
Return the variable name associated with this data object
Returns: The ariable name
-
17.2. Unsupported API¶
Warning
While the above interfaces are designed as a ‘public’ API and unlikely to change over CIS versions, those documented below are not yet standardised and may change or be removed even between minor version revisions. It is expected however that these particular classes will be developed and stabilised over time to form part of the ‘public’ API.
17.2.1. Collocation¶
The main collocation class can be imported using from cis.collocation import Collocate
, it’s methods are outlined below:
-
class
cis.collocation.
Collocate
(sample_points, missing_data_for_missing_sample=False, collocator_factory=<cis.collocation.col.CollocatorFactory object>)¶ Perform a general collocation
-
__init__
(sample_points, missing_data_for_missing_sample=False, collocator_factory=<cis.collocation.col.CollocatorFactory object>)¶ Constructor
Parameters: - sample_points (CommonData) – Sample points to collocate on to
- output_filename – Filename to output to
- missing_data_for_missing_sample – Write missing values out when sample data is missing
- collocator_factory (CollocatorFactory) – An optional configuration object
-
__weakref__
¶ list of weak references to the object (if defined)
-
collocate
(data, col_name=None, col_params=None, kern=None, kern_params=None)¶ Perform the collocation.
Parameters: - data (CommonData) – Data to collocate
- col_name (str) – Name of the collocator
- col_params (dict) – Parameters dictionary for the collocation and constraint
- kern (str) – The kernel to use
- kern_params (dict) – The kernel parameters to use
Return CommonData: The collocated data
Raises CoordinateNotFoundError: If the collocator was unable to compare the sample and data points
-
17.2.2. Aggregation¶
The main collocation class can be imported using from cis.aggregation import Aggregate
, it’s methods are outlined below.
Note that currently this object saves the output directly to file, but it is expected that in the future it will return
the result for the user to output as needed.
-
class
cis.aggregation.
Aggregate
(grid, output_file, data_reader=<cis.data_io.data_reader.DataReader object>, data_writer=<cis.data_io.data_writer.DataWriter object>)¶ -
__init__
(grid, output_file, data_reader=<cis.data_io.data_reader.DataReader object>, data_writer=<cis.data_io.data_writer.DataWriter object>)¶ Constructor
Parameters: - grid (dict) – A dictionary of dimension_name:AggregationGrid key value pairs.
- output_file – The filename to output the result to
- data_reader – Optional
DataReader
configuration object - data_writer – Optional
DataWriter
configuration object
-
__weakref__
¶ list of weak references to the object (if defined)
-
aggregate
(variables, filenames, product=None, kernel=None)¶ Aggregate the given variables based on the initialised grid
Parameters: - variables (string or list) – One or more variables to read from the files
- filenames (string or list) – One or more filenames of the files to read
- product (str) – Name of data product to use (optional)
- kernel (str) – Name of kernel to use (the default is ‘moments’)
-
17.2.3. Subsetting¶
The main collocation class can be imported using from cis.subsetting import Subset
, it’s methods are outlined below:
Note that currently this object saves the output directly to file, but it is expected that in the future it will return
the result for the user to output as needed.
-
class
cis.subsetting.
Subset
(limits, output_file, data_reader=<cis.data_io.data_reader.DataReader object>, data_writer=<cis.data_io.data_writer.DataWriter object>)¶ Class for subsetting Ungridded or Gridded data either temporally, or spatially or both.
-
__init__
(limits, output_file, data_reader=<cis.data_io.data_reader.DataReader object>, data_writer=<cis.data_io.data_writer.DataWriter object>)¶ Constructor
Parameters: - limits (dict) – A dictionary of dimension_name:SubsetLimits key value pairs.
- output_file – The filename to output the result to
- data_reader – Optional
DataReader
configuration object - data_writer – Optional
DataWriter
configuration object
-
__weakref__
¶ list of weak references to the object (if defined)
-
subset
(variables, filenames, product=None)¶ Subset the given variables based on the initialised limits
Parameters: - variables (string or list) – One or more variables to read from the files
- filenames (string or list) – One or more filenames of the files to read
- product (str) – Name of data product to use (optional)
-
17.2.4. Stats¶
The main collocation class can be imported using from cis.stats import StatsAnalyzer
, it’s methods are outlined below:
-
class
cis.stats.
StatsAnalyzer
(data1, data2) Analyse datasets to produce statistics.
-
__init__
(data1, data2) Create a statistics analyser for two data sets
Parameters: - data1 (CommonData) – First data object
- data2 (CommonData) – Second data object
-
analyze
() Perform a statistical analysis on two data sets.
Returns: List of StatisticsResult instances.
-
points_count
() Count all points which will be used for statistical comparison operations (i.e. are non-missing in both datasets).
Returns: List of StatisticsResults
-
means
() Means of two datasets
Returns: List of StatisticsResults
-
stddevs
() Corrected sample standard deviation of datasets
Returns: List of StatisticsResults
-
abs_mean
() Mean of absolute difference d2-d1
Returns: List of StatisticsResults
-
abs_stddev
() Standard deviation of absolute difference d2-d1
Returns: List of StatisticsResults
-
rel_mean
() Mean of relative difference (d2-d1)/d1
Returns: List of StatisticsResults
-
rel_stddev
() Mean of relative difference (d2-d1)/d1
Returns: List of StatisticsResults
-
spearmans_rank
() Perform a spearman’s rank on the data
Returns: List of StatisticsResults
-
linear_regression
() Perform a linear regression on the data
Returns: List of StatisticsResults
-
__weakref__
list of weak references to the object (if defined)
-
17.2.5. Full Python reference documentation¶
The rest of the documentation below documents internal CIS functions and modules which are not intended to be used as an API at all. They are documented here as a reference for developers and other interested parties.