If you are looking for a quick introduction to Ruffus, you may want to look at the Simple Tutorial first, some of which content is shared with, or elaborated on, by this manual.
The Ruffus module is a lightweight way to run computational pipelines.
Computational pipelines often become quite simple if we breakdown the process into simple stages.
Ruffus refers to each stage of your pipeline as a task.Let us start with the usual “Hello World”.We have the following two python functions which we would like to turn into an automatic pipeline:
The simplest Ruffus pipeline would look like this:
The functions which do the actual work of each stage of the pipeline remain unchanged. The role of Ruffus is to make sure these functions are called in the right order, with the right parameters, running in parallel using multiprocessing if desired.
There are three simple parts to building a ruffus pipeline
- importing ruffus
- “Decorating” functions which are part of the pipeline
- Running the pipeline!
The most convenient way to use ruffus is to import the various names directly:from ruffus import *
This will allow ruffus terms to be used directly in your code. This is also the style we have adopted for this manual.
Category Terms Pipeline functions pipeline_printout pipeline_printout_graph pipeline_run register_cleanup Decorators @follows @files @split @transform @merge @collate @posttask @jobs_limit @parallel @check_if_uptodate @files_re Loggers stderr_logger black_hole_logger Parameter disambiguating Indicators suffix regex inputs touch_file combine mkdir output_from
import ruffus ruffus.pipeline_printout("...")
You need to tag or decorator existing code to tell Ruffus that they are part of the pipeline.
decorators are ways to tag or mark out functions.
They start with an @ prefix and take a number of parameters in parenthesis.
The ruffus decorator @follows makes sure that second_task follows first_task.Multiple decorators can be used for each task function to add functionality to Ruffus pipeline functions.However, the decorated python functions can still be called normally, outside of Ruffus.Ruffus decorators can be added to (stacked on top of) any function in any order.
We run the pipeline by specifying the last stage (task function) of your pipeline. Ruffus will know what other functions this depends on, following the appropriate chain of dependencies automatically, making sure that the entire pipeline is up-to-date.
In our example above, because second_task depends on first_task, both functions are executed in order.>>> pipeline_run([second_task], verbose = 1)
Ruffus by default prints out the verbose progress through your pipeline, interleaved with our Hello and World.