files¶

@files (input1, output1, [extra_parameters1, ...])¶

@files for single jobs¶

Purpose:

Provides parameters to run a task.

The first two parameters in each set represent the input and output which are used to see if the job is out of date and needs to be (re-)run.

By default, out of date checking uses input/output file timestamps. (On some file systems, timestamps have a resolution in seconds.) See @check_if_uptodate() for alternatives.

Example:
from ruffus import *
@files('a.1', 'a.2', 'A file')
def transform_files(infile, outfile, text):
    pass
pipeline_run([transform_files])
If a.2 is missing or was created before a.1, then the following will be called:
transform_files('a.1', 'a.2', 'A file')
Parameters:

input

Input file names

output

Output file names

extra_parameters

optional extra_parameters are passed verbatim to each job.

Checking if jobs are up to date:

Strings in input and output (including in nested sequences) are interpreted as file names and used to check if jobs are up-to-date.

See above for more details

@files ( (( input, output, [extra_parameters,...] ), (...), ...) )¶

@files in parallel¶

Purpose:

Passes each set of parameters to separate jobs which can run in parallel

The first two parameters in each set represent the input and output which are used to see if the job is out of date and needs to be (re-)run.

By default, out of date checking uses input/output file timestamps. (On some file systems, timestamps have a resolution in seconds.) See @check_if_uptodate() for alternatives.
Example:
from ruffus import *
parameters = [
                    [ 'a.1', 'a.2', 'A file'], # 1st job
                    [ 'b.1', 'b.2', 'B file'], # 2nd job
              ]

@files(parameters)
def parallel_io_task(infile, outfile, text):
    pass
pipeline_run([parallel_io_task])
is the equivalent of calling:
parallel_io_task('a.1', 'a.2', 'A file')
parallel_io_task('b.1', 'b.2', 'B file')
Parameters:

input

Input file names

output

Output file names

extra_parameters

optional extra_parameters are passed verbatim to each job.

Checking if jobs are up to date:

Strings in input and output (including in nested sequences) are interpreted as file names and used to check if jobs are up-to-date.

In the absence of input files (e.g. input is None), the job will run if any output file is missing.

In the absence of output files (e.g. output is None), the job will always run.

If any of the output files is missing, the job will run.

If any of the input files is missing when the job is run, a MissingInputFileError exception will be raised.