# ruffus.Task¶

## Decorators¶

Basic Task decorators are:

Task decorators include:

More advanced users may require:

## Pipeline functions¶

### pipeline_run¶

ruffus.task.pipeline_run(target_tasks, forcedtorun_tasks=[], multiprocess=1, logger=stderr_logger, gnu_make_maximal_rebuild_mode=True)[source]

Run pipelines.

Parameters: target_tasks – targets task functions which will be run if they are out-of-date forcedtorun_tasks – task functions which will be run whether or not they are out-of-date multiprocess – The number of concurrent jobs running on different processes. multithread – The number of concurrent jobs running as different threads. If > 1, ruffus will use multithreading instead of multiprocessing (and ignore the multiprocess parameter). Using multi threading is particularly useful to manage high performance clusters which otherwise are prone to “processor storms” when large number of cores finish jobs at the same time. (Thanks Andreas Heger) logger (logging objects) – Where progress will be logged. Defaults to stderr output. verbose – level 0 : nothing level 1 : Out-of-date Task names level 2 : All Tasks (including any task function docstrings) level 3 : Out-of-date Jobs in Out-of-date Tasks, no explanation level 4 : Out-of-date Jobs in Out-of-date Tasks, with explanations and warnings level 5 : All Jobs in Out-of-date Tasks, (include only list of up-to-date tasks) level 6 : All jobs in All Tasks whether out of date or not level 7 : Show file modification times for All jobs in All Tasks level 10: logs messages useful only for debugging ruffus pipeline code touch_files_only – Create or update input/output files only to simulate running the pipeline. Do not run jobs. If set to CHECKSUM_REGENERATE, will regenerate the checksum history file to reflect the existing i/o files on disk. exceptions_terminate_immediately – Exceptions cause immediate termination rather than waiting for N jobs to finish where N = multiprocess log_exceptions – Print exceptions to logger as soon as they occur. checksum_level – Several options for checking up-to-dateness are available: Default is level 1. level 0 : Use only file timestamps level 1 : above, plus timestamp of successful job completion level 2 : above, plus a checksum of the pipeline function body level 3 : above, plus a checksum of the pipeline function default arguments and the additional arguments passed in by task decorators one_second_per_job – To work around poor file timepstamp resolution for some file systems. Defaults to True if checksum_level is 0 forcing Tasks to take a minimum of 1 second to complete. runtime_data – Experimental feature: pass data to tasks at run time gnu_make_maximal_rebuild_mode – Defaults to re-running all out-of-date tasks. Runs minimal set to build targets if set to True. Use with caution. history_file – Database file storing checksums and file timestamps for input/output files. verbose_abbreviated_path – whether input and output paths are abbreviated. level 0: The full (expanded, abspath) input or output path level > 1: The number of subdirectories to include. Abbreviated paths are prefixed with [,,,]/ level < 0: Input / Output parameters are truncated to MMM letters where verbose_abbreviated_path ==-MMM. Subdirectories are first removed to see if this allows the paths to fit in the specified limit. Otherwise abbreviated paths are prefixed by 

### pipeline_printout¶

ruffus.task.pipeline_printout(output_stream=None, target_tasks=[], forcedtorun_tasks=[], verbose=None, indent=4, gnu_make_maximal_rebuild_mode=True, wrap_width=100, runtime_data=None, checksum_level=None, history_file=None, verbose_abbreviated_path=None, pipeline=None)[source]

Printouts the parts of the pipeline which will be run

Because the parameters of some jobs depend on the results of previous tasks, this function produces only the current snap-shot of task jobs. In particular, tasks which generate variable number of inputs into following tasks will not produce the full range of jobs.

::
verbose = 0 : Nothing verbose = 1 : Out-of-date Task names verbose = 2 : All Tasks (including any task function docstrings) verbose = 3 : Out-of-date Jobs in Out-of-date Tasks, no explanation verbose = 4 : Out-of-date Jobs in Out-of-date Tasks, with explanations and warnings verbose = 5 : All Jobs in Out-of-date Tasks, (include only list of up-to-date tasks) verbose = 6 : All jobs in All Tasks whether out of date or not
Parameters: output_stream (file-like object with write() function) – where to print to target_tasks – targets task functions which will be run if they are out-of-date forcedtorun_tasks – task functions which will be run whether or not they are out-of-date verbose – level 0 : nothing level 1 : Out-of-date Task names level 2 : All Tasks (including any task function docstrings) level 3 : Out-of-date Jobs in Out-of-date Tasks, no explanation level 4 : Out-of-date Jobs in Out-of-date Tasks, with explanations and warnings level 5 : All Jobs in Out-of-date Tasks, (include only list of up-to-date tasks) level 6 : All jobs in All Tasks whether out of date or not level 7 : Show file modification times for All jobs in All Tasks level 10: logs messages useful only for debugging ruffus pipeline code indent – How much indentation for pretty format. gnu_make_maximal_rebuild_mode – Defaults to re-running all out-of-date tasks. Runs minimal set to build targets if set to True. Use with caution. wrap_width – The maximum length of each line runtime_data – Experimental feature: pass data to tasks at run time checksum_level – Several options for checking up-to-dateness are available: Default is level 1. level 0 : Use only file timestamps level 1 : above, plus timestamp of successful job completion level 2 : above, plus a checksum of the pipeline function body level 3 : above, plus a checksum of the pipeline function default arguments and the additional arguments passed in by task decorators history_file – Database file storing checksums and file timestamps for input/output files. verbose_abbreviated_path – whether input and output paths are abbreviated. level 0: The full (expanded, abspath) input or output path level > 1: The number of subdirectories to include. Abbreviated paths are prefixed with [,,,]/ level < 0: Input / Output parameters are truncated to MMM letters where verbose_abbreviated_path ==-MMM. Subdirectories are first removed to see if this allows the paths to fit in the specified limit. Otherwise abbreviated paths are prefixed by 

### pipeline_printout_graph¶

ruffus.task.pipeline_printout_graph(stream, output_format=None, target_tasks=[], forcedtorun_tasks=[], draw_vertically=True, ignore_upstream_of_target=False, skip_uptodate_tasks=False, gnu_make_maximal_rebuild_mode=True, test_all_task_for_update=True, no_key_legend=False, minimal_key_legend=True, user_colour_scheme=None, pipeline_name='Pipeline:', size=(11, 8), dpi=120, runtime_data=None, checksum_level=None, history_file=None, pipeline=None)[source]

print out pipeline dependencies in various formats

Parameters: stream (file-like object with write() function) – where to print to output_format – [“dot”, “jpg”, “svg”, “ps”, “png”]. All but the first depends on the dot program. target_tasks – targets task functions which will be run if they are out-of-date. forcedtorun_tasks – task functions which will be run whether or not they are out-of-date. draw_vertically – Top to bottom instead of left to right. ignore_upstream_of_target – Don’t draw upstream tasks of targets. skip_uptodate_tasks – Don’t draw up-to-date tasks if possible. gnu_make_maximal_rebuild_mode – Defaults to re-running all out-of-date tasks. Runs minimal set to build targets if set to True. Use with caution. test_all_task_for_update – Ask all task functions if they are up-to-date. no_key_legend – Don’t draw key/legend for graph. minimal_key_legend – Only legend entries for used task types user_colour_scheme – Dictionary specifying flowchart colour scheme pipeline_name – Pipeline Title size – tuple of x and y dimensions dpi – print resolution runtime_data – Experimental feature: pass data to tasks at run time history_file – Database file storing checksums and file timestamps for input/output files. checksum_level – Several options for checking up-to-dateness are available: Default is level 1. level 0 : Use only file timestamps level 1 : above, plus timestamp of successful job completion level 2 : above, plus a checksum of the pipeline function body level 3 : above, plus a checksum of the pipeline function default arguments and the additional arguments passed in by task decorators

## Logging¶

class ruffus.task.t_black_hole_logger[source]

Does nothing!

class ruffus.task.t_stderr_logger[source]

Everything to stderr

## Implementation:¶

### Parameter factories:¶

ruffus.task.merge_param_factory(input_files_task_globs, output_param, *extra_params)[source]

Factory for task_merge

ruffus.task.collate_param_factory(input_files_task_globs, file_names_transform, extra_input_files_task_globs, replace_inputs, output_pattern, *extra_specs)[source]

Factory for task_collate

Looks exactly like @transform except that all [input] which lead to the same [output / extra] are combined together

ruffus.task.transform_param_factory(input_files_task_globs, file_names_transform, extra_input_files_task_globs, replace_inputs, output_pattern, *extra_specs)[source]

Factory for task_transform

ruffus.task.files_param_factory(input_files_task_globs, do_not_expand_single_job_tasks, output_extras)[source]
Factory for functions which
yield tuples of inputs, outputs / extras

..Note:

1. Each job requires input/output file names
2. Input/output file names can be a string, an arbitrarily nested sequence
3. Non-string types are ignored
3. Either Input or output file name must contain at least one string

ruffus.task.args_param_factory(orig_args)[source]
Factory for functions which
yield tuples of inputs, outputs / extras

..Note:

1. Each job requires input/output file names
2. Input/output file names can be a string, an arbitrarily nested sequence
3. Non-string types are ignored
3. Either Input or output file name must contain at least one string

ruffus.task.split_param_factory(input_files_task_globs, output_files_task_globs, *extra_params)[source]

Factory for task_split

### Wrappers around jobs:¶

ruffus.task.job_wrapper_generic(params, user_defined_work_func, register_cleanup, touch_files_only)[source]

run func

ruffus.task.job_wrapper_io_files(params, user_defined_work_func, register_cleanup, touch_files_only, output_files_only=False)[source]

run func on any i/o if not up to date

ruffus.task.job_wrapper_mkdir(params, user_defined_work_func, register_cleanup, touch_files_only)[source]

Make missing directories including any intermediate directories on the specified path(s)

### Checking if job is update:¶

ruffus.task.needs_update_check_modify_time(*params, **kwargs)[source]

Given input and output files, see if all exist and whether output files are later than input files Each can be

1. string: assumed to be a filename “file1”
2. any other type
3. arbitrary nested sequence of (1) and (2)
ruffus.task.needs_update_check_directory_missing(*params, **kwargs)[source]
Called per directory:
Does it exist? Is it an ordinary file not a directory? (throw exception