Ruffus Manual

The chapters of this manual go through each of the features of Ruffus in turn.
Some of these (especially those labelled esoteric or deprecated) may not be of interest to all users of Ruffus.

If you are looking for a quick introduction to Ruffus, you may want to look at the Simple Tutorial first, some of which content is shared with, or elaborated on, by this manual.

Introduction

The Ruffus module is a lightweight way to run computational pipelines.

Computational pipelines often become quite simple if we breakdown the process into simple stages.

Note

Ruffus refers to each stage of your pipeline as a task.

Let us start with the usual “Hello World”.
We have the following two python functions which we would like to turn into an automatic pipeline:
../../_images/simple_tutorial_hello_world.png

The simplest Ruffus pipeline would look like this:

../../_images/simple_tutorial_intro_follows.png

The functions which do the actual work of each stage of the pipeline remain unchanged. The role of Ruffus is to make sure these functions are called in the right order, with the right parameters, running in parallel using multiprocessing if desired.

There are three simple parts to building a ruffus pipeline

  1. importing ruffus
  2. “Decorating” functions which are part of the pipeline
  3. Running the pipeline!

Importing ruffus

The most convenient way to use ruffus is to import the various names directly:

from ruffus import *

This will allow ruffus terms to be used directly in your code. This is also the style we have adopted for this manual.

Category Terms
Pipeline functions
pipeline_printout
pipeline_printout_graph
pipeline_run
register_cleanup
Decorators
@follows
@files
@split
@transform
@merge
@collate
@posttask
@jobs_limit
@parallel
@check_if_uptodate
@files_re
Loggers
stderr_logger
black_hole_logger
Parameter disambiguating Indicators
suffix
regex
inputs
touch_file
combine
mkdir
output_from
If any of these clash with names in your code, you can use qualified names instead:
import ruffus

ruffus.pipeline_printout("...")

“Decorating” functions

You need to tag or decorator existing code to tell Ruffus that they are part of the pipeline.

Note

decorators are ways to tag or mark out functions.

They start with an @ prefix and take a number of parameters in parenthesis.

../../_images/simple_tutorial_decorator_syntax.png

The ruffus decorator @follows makes sure that second_task follows first_task.

Multiple decorators can be used for each task function to add functionality to Ruffus pipeline functions.
However, the decorated python functions can still be called normally, outside of Ruffus.
Ruffus decorators can be added to (stacked on top of) any function in any order.

Running the pipeline

We run the pipeline by specifying the last stage (task function) of your pipeline. Ruffus will know what other functions this depends on, following the appropriate chain of dependencies automatically, making sure that the entire pipeline is up-to-date.

In our example above, because second_task depends on first_task, both functions are executed in order.

>>> pipeline_run([second_task], verbose = 1)

Ruffus by default prints out the verbose progress through your pipeline, interleaved with our Hello and World.

../../_images/simple_tutorial_hello_world_output.png