Introduction

About POPPy

The Plugin-Oriented Pipeline for Python (POPPy) framework offers functionalities to develop, install and run in a standard way workflows. It is more particularly designed to work with data processing pipelines producing files.

../../_images/pipeline.png

Fig. 1 Schema of a pipeline built with POPPy

POPPy supports basical features that are usually needed when building and executing pipelines, such as:

  • Modular oriented architecture
  • Jobs execution activity and status logging
  • Centralized command line interface to execute batch jobs
  • Input/output data handling traceability
  • Standardized database communication

Getting start with POPPy

This section details how to install POPPy and use it to develop a pipeline.

System requirements

POPPy has been tested to work on Linux Debian operating system.

Prerequisites

Make sure that the following software set is installed on your system before deploying and using POPPy:

  • Python 3.6 or higher
  • Git

Additionaly a relational database managment system (RDMS) will be required to run a POPPy pipeline.

Setting up development environment

POPPy must be first installed to develop a pipeline.

It is strongly recommended to use POPPy into a Python’s virtual environment (virtualenv) in order to avoid dependency conflicts. Since the version 3.5, the virtualenv mechanism is natively included in Python.

To create a virtualenv, open a terminal and enter:

$ python3 -m venv /path/to/myprojectvenv

Where /path/to/myprojectvenv is the path to the virtualenv’s directory.

Then, to load enter the command:

$ source /path/to/myprojectvenv/bin/activate

For more details about the Python’s virtual environments, please visit https://docs.python.org/3/tutorial/venv.html.

To install POPPy in the virtualenv, execute the three following commands successively:

$ pip install git+https://gitlab.obspm.fr/POPPY/POPPyCore.git@develop#egg=poppy.core
$ pip install git+https://gitlab.obspm.fr/POPPY/POP.git@develop#egg=poppy.pop
$ pip install git+https://gitlab.obspm.fr/POPPY/PIPER.git@develop#egg=poppy.piper

The first command retrieves from the remote Git server and sets up the POPPy core library. The second and third commands retrieve and set up the POP and PIPER mandatory plugins.

Create a pipeline

You can generate a pipeline and all the boilerplate code needed to have a basic pipeline that uses the framework.

$ poppy create pipeline poppy_tuto

You will get a directory called mypipeline/ in the current directory containing multiple files :

poppy_tuto
├── config.json
├── descriptor.json
├── lib
├── manage.py
├── requirements.txt
└── settings.py
  • config.json : Contains the output path of the pipeline, database credentials and address. This is the only file that should not be tracked by your vcs.
  • descriptor.json : Provides metadata associated to the pipeline, the project and databases.
  • settings.py : Contains the list of active plugins, a variable to the root directory and the identifier of the main database.
  • requirements.txt : Contains the list of python libraries dependancies
  • lib/ : Contains eventual external libraries (in the case of the RPW pipeline, this directory contains nasa’s CDF library and the Instrument Database)
  • manage.py : The entry point of the pipeline.

Create a plugin

You can then create a plugin skeleton the same way we created the pipeline :

$ poppy create plugin guide.myplugin

Your plugin name must be of the form namespace.pluginname. It is once again a way to split the code in a meaningful way. To help you sort your code, create a directory called plugins/ in root directory of your pipeline. However your plugins can be wherever you want.

You will see it has once again created a bunch of files prefilled with some usual code.

In order to use the namespace feature, the python code of your plugin must be located in the directory plugin/namespace/plugin/ (see PEP 420 for more information on namespaces)

In the plugin root directory there is :

  • setup.py : it is a common python file, it allows you to install your python module using pip
  • system_reqs.ini : you can put in this file eventual external libraries needed by the plugin

In the myplugin/guide/myplugin/ you will find multiple python files. It is not mandatory for the POPPy framework to split your code into multiple files but it is simply a good practice, so POPPy assumes you would like to split your code and generates multiple files.

  • descriptor.json : as for the pipeline, each plugin needs a descriptor file. In the case of plugins there are information about the plugin, the tasks it will perform and their targets (input and output files).
  • commands.py : in this file you should register the commands you want to call from the Command Line Interface (CLI).
  • tasks.py : a file containing the tasks of your plugin. Usually those tasks are simply decorated python functions.
  • tests.py : this prefilled file should encourage you to write unit/functional/end-to-end/whatever tests for your pipeline. The test procedure is integrated to the POPPy framework and wrapper classes and functions exists to help you.
  • models : you will put in this directory all the database models corresponding to your plugin