16. How can I read my own data?¶
One of the key strengths of CIS is the ability for users to create their own plugins to read data which CIS doesn’t currently support. These plugins can then be shared with the community to allow other users access to that data. Although the plugins are written in Python this tutorial assumes no experience in Python. Some programming experience is however assumed.
Any technical details that may be useful to experienced Python programmers will be highlighted in this style - they aren’t necessary for completing the tutorial.
Here we describe the process of creating and sharing a plugin. A CIS plugin is simply a python (.py) file with a set of methods (or functions) to describe how the plugin should behave.
The methods for each plugin are described within a Class, this gives the plugin a name and allows CIS to ensure that all of the necessary methods have been implemented.
There are a few methods that the plugin must contain, and some which are optional. A skeleton plugin would look like this:
class MyProd(AProduct): def get_file_signature(self): # Code goes here def create_coords(self, filenames): ... def create_data_object(self, filenames, variable): ...
Note that in python whitespace matters! When filling in the above methods the code for the method should be indented from the signature by four spaces like this:
Class MyProd(AProduct): def get_file_signature(self): # Code goes here foo = bar
Note also that the name of the plugin (MyProd) in this case should be changed to describe the data which it will read. (Don’t change the AProduct part though – this is important for telling CIS that this is a plugin for reading data.)
The plugin class subclasses
AProduct which is the abstract class which
defines the methods that the plugin needs to override. It also includes
a few helper functions for error catching.
When CIS looks for data plugins it searches for all classes which sub-class
AProduct. There are also plugins available for collocation with their own abstract base classes,
so that users can store multiple plugin types in the same plugin directory.
In order to turn the above skeleton into a working plugin we need to fill in each of the methods with the some code, which turns our data into something CIS will understand. Often it is easiest to start from an existing plugin that reads closely matching data. For example creating a plugin to read some other CCI data would probably be easiest to start from the Cloud or Aerosol CCI plugins. We have created three different tutorials to walk you through the creation of some of the existing plugins to try and illustrate the process. The Easy tutorial walks through the creation of a basic plugin, the Medium tutorial builds on that by creating a plugin which has a bit more detail, and finally the Advanced plugin talks through some of the main considerations when creating a large and complicated plugin.
Plugins aren’t the only way you can contribute though. CIS is an open source project hosted on GitHub, so please feel free to submit pull-requests for new features or bug-fixes – just check with the community first so that we’re not duplicating our effort.
16.1.1. Using and testing your plugin¶
It is important that CIS knows where to look to find your new plugin, and this is easily done by setting the environment variable CIS_PLUGIN_HOME to point to the directory within which your plugin is stored.
Once you have done this CIS will automatically use your plugin for reading any files which match the file signature you used.
If you have any issues with this (because for example the file signature clashes with a built-in plugin) you can tell CIS to use your plugin when reading data by simply specifying it after the variable and filename in most CIS commands, e.g.:
cis subset a_variable:filename.nc:product=MyProd ...
16.3. Data plugin reference¶
This section provides a reference describing the expected behaviour of each of the functions a plugin can implement. The following methods are mandatory:
This method should return a list of regular expressions, which CIS uses to decide which data product to use for a given file. If more than one regular expression is provided in the list then the file can match any of the expressions. The first product with a signature that matches the filename will be used. The order in which the products are searched is determined by the priority property, highest value first; internal products generally have a priority of 10.
For example, this would match all files with a name containing the string ‘CODE’ and with the ‘nc’ extension.:
If the signature has matched the framework will call
AProduct.get_file_type_error(), this gives the product a chance to open the file and check the contents.
Returns: A list of regex to match the product’s file naming convention. Return type: list
Reads the coordinates from one or more files. Note that this method may have to make certain assumptions about the file in order to return a single coordinate set. The user should be warned through the logger if this is the case.
Parameters: filenames (list) – List of filenames to read coordinates from Returns:
Create and return an
CommonDataobject for a given variable from one or more files.
- filenames (list) – List of filenames of files to read
- variable (str) – Variable to read from the files
CommonDataobject representing the specified variable
- FileIOError – Unable to read a file
- InvalidVariableError – Variable not present in file
While these may be implemented optionally:
Get a list of available variable names from the filenames list passed in. This general implementation can be overridden in specific products to include/exclude variables which may or may not be relevant. The data_type parameter can be used to specify extra information.
- filenames (list) – List of string filenames of files to be read from
- data_type (str) – ‘SD’ or ‘VD’ to specify only return SD or VD variables from HDF files. This may take on other values in specific product implementations.
A set of variable names as strings
Check a single file to see if it is of the correct type, and if not return a list of errors. If the return is None then there are no errors and this is the correct data product to use for this file.
This method gives a mechanism for a data product to identify itself as the correct product when a specific enough file signature cannot be provided. For example GASSP is a type of NetCDF file and so filenames end with .nc but so do other NetCDF files, so the data product opens the file and looks for the GASSP version attribute, and if it doesn’t find it returns an error.
Parameters: filename (str) – The filename for the file Returns: List of errors, or None Return type: list or None
Returns a file format hierarchy separated by slashes, of the form
HDF4/CloudSat. This is mainly used within the ceda_di indexing tool. If not set it will default to the products name.
A filename of an example file can be provided to enable the determination of, for example, a dataset version number.
Parameters: filename (str) – Filename of file to be inspected Returns: File format, of the form
[parent/]format/specific instance/version, or the class name
Return type: str Raises: FileFormatError if there is an error