See also


@combinations_with_replacement( input, filter, tuple_size, output, [extras,...] )


Generates the combinations_with_replacement, between all the elements of a set of input (e.g. A B C D), i.e. r-length tuples of input elements with no repeated elements (A A) and where order of the tuples is irrelevant (either A B or B A, not both).

The effect is analogous to the python itertools function of the same name:

>>> from itertools import combinations_with_replacement
>>> # combinations_with_replacement('ABCD', 2)
>>> #   --> AA AB AC AD BB BC BD CC CD DD
>>> [ "".join(a) for a in combinations_with_replacement('ABCD', 2)]
['AA', 'AB', 'AC', 'AD', 'BB', 'BC', 'BD', 'CC', 'CD', 'DD']

Only out of date tasks (comparing input and output files) will be run

output file names and strings in the extra parameters are generated by string replacement via the formatter() filter from the input. This can be, for example, a list of file names or the output of up stream tasks. . The replacement strings require an extra level of nesting to refer to parsed components.

  1. The first level refers to which set in each tuple of input.
  2. The second level refers to which input file in any particular set of input.

This will be clear in the following example:


If input is four pairs of file names

 input_files = [ [ 'A.1_start', 'A.2_start'],
                 [ 'B.1_start', 'B.2_start'],
                 [ 'C.1_start', 'C.2_start'],
                 [ 'D.1_start', 'D.2_start'] ]

The first job of:

@combinations_with_replacement(input_files, formatter(), 3, ...)

Will be

# Two file pairs at a time
['A.1_start', 'A.2_start'],      # 0
# versus itself
['A.1_start', 'A.2_start'],      # 1
First level of nesting:
['A.1_start', 'A.2_start']  # [0]
['A.1_start', 'A.2_start']  # [1]
Second level of nesting:
'A.2_start'                 # [0][1]
'A.2_start'                 # [1][1]
Parse filename without suffix
'A'                         # {basename[0][1]}
'A'                         # {basename[1][1]}

Python code:

from ruffus import *
from ruffus.combinatorics import *

#   initial file pairs
@originate([ ['A.1_start', 'A.2_start'],
             ['B.1_start', 'B.2_start'],
             ['C.1_start', 'C.2_start'],
             ['D.1_start', 'D.2_start']])
def create_initial_files_ABCD(output_files):
    for output_file in output_files:
        with open(output_file, "w") as oo: pass

#   @combinations_with_replacement
@combinations_with_replacement(create_initial_files_ABCD,   # Input
              formatter(),                                  # match input files

              # tuple of 2 at a time

              # Output Replacement string

              # Extra parameter: path for 1st set of files, 1st file name

              # Extra parameter. Basename for:
              ["{basename[0][0]}",  # 1st set of files, 1st file name
               "{basename[1][0]}",  # 2rd
def combinations_with_replacement_task(input_file, output_parameter,
                                       shared_path, basenames):
    print " - ".join(basenames)

#       Run

This results in:

>>> pipeline_run(verbose=0)
A - A
A - B
A - C
A - D
B - B
B - C
B - D
C - C
C - D
D - D


  • input = tasks_or_file_names

    can be a:

    1. Task / list of tasks.

      File names are taken from the output of the specified task(s)

    2. (Nested) list of file name strings.
      File names containing *[]? will be expanded as a glob.

      E.g.:"a.*" => "a.1", "a.2"

  • tuple_size = N

    Select N elements at a time.

  • output = output

    Specifies the resulting output file name(s) after string substitution

  • extras = extras

    Any extra parameters are passed verbatim to the task function

    If you are using named parameters, these can be passed as a list, i.e. extras= [...]

    Any extra parameters are consumed by the task function and not forwarded further down the pipeline.