drmaa functions

drmaa_wrapper is not exported automatically by ruffus and must be specified explicitly:

# imported ruffus.drmaa_wrapper explicitly
from ruffus.drmaa_wrapper import run_job, error_drmaa_job

run_job

run_job (cmd_str, job_name = None, job_other_options = None, job_script_directory = None, job_environment = None, working_directory = None, logger = None, drmaa_session = None, retain_job_scripts = False, run_locally = False, output_files = None, touch_only = False)

Purpose:

ruffus.drmaa_wrapper.run_job dispatches a command with arguments to a cluster or Grid Engine node and waits for the command to complete.

It is the semantic equivalent of calling os.system or subprocess.check_output.

Example:

from ruffus.drmaa_wrapper import run_job, error_drmaa_job
import drmaa
my_drmaa_session = drmaa.Session()
my_drmaa_session.initialize()

run_job("ls",
        job_name = "test",
        job_other_options="-P mott-flint.prja -q short.qa",
        job_script_directory = "test_dir",
        job_environment={ 'BASH_ENV' : '~/.bashrc' },
        retain_job_scripts = True, drmaa_session=my_drmaa_session)
run_job("ls",
        job_name = "test",
        job_other_options="-P mott-flint.prja -q short.qa",
        job_script_directory = "test_dir",
        job_environment={ 'BASH_ENV' : '~/.bashrc' },
        retain_job_scripts = True,
        drmaa_session=my_drmaa_session,
        working_directory = "/gpfs1/well/mott-flint/lg/src/oss/ruffus/doc")

#
#   catch exceptions
#
try:
    stdout_res, stderr_res  = run_job(cmd,
                                      job_name          = job_name,
                                      logger            = logger,
                                      drmaa_session     = drmaa_session,
                                      run_locally       = options.local_run,
                                      job_other_options = get_queue_name())

# relay all the stdout, stderr, drmaa output to diagnose failures
except error_drmaa_job as err:
    raise Exception("\n".join(map(str,
                        ["Failed to run:",
                         cmd,
                         err,
                         stdout_res,
                         stderr_res])))

my_drmaa_session.exit()

Parameters:

  • cmd_str

    The command which will be run remotely including all parameters

  • job_name

    A descriptive name for the command. This will be displayed by SGE qstat, for example. Defaults to “ruffus_job”

  • job_other_options

    Other drmaa parameters can be passed verbatim as a string.

    Examples for SGE include project name (-P project_name), parallel environment (-pe parallel_environ), account (-A account_string), resource (-l resource=expression), queue name (-q a_queue_name), queue priority (-p 15).

    These are parameters which you normally need to include when submitting jobs interactively, for example via SGE qsub or SLURM (srun)

  • job_script_directory

    The directory where drmaa temporary script files will be found. Defaults to the current working directory.

  • job_environment

    A dictionary of key / values with environment variables. E.g. "{'BASH_ENV': '~/.bashrc'}"

  • working_directory

    • Sets the working directory.
    • Should be a fully qualified path.
    • Defaults to the current working directory.
  • retain_job_scripts

    Do not delete temporary script files containg drmaa commands. Useful for debugging, running on the command line directly, and can provide a useful record of the commands.

  • logger

    For logging messages indicating the progress of the pipeline in terms of tasks and jobs. Takes objects with the standard python logging module interface.

  • drmaa_session

    A shared drmaa session created and managed separately.

    In the main part of your Ruffus pipeline script somewhere there should be code looking like this:

    #
    #   start shared drmaa session for all jobs / tasks in pipeline
    #
    import drmaa
    drmaa_session = drmaa.Session()
    drmaa_session.initialize()
    
    
    #
    #   pipeline functions
    #
    
    if __name__ == '__main__':
        cmdline.run (options, multithread = options.jobs)
        drmaa_session.exit()
    
  • run_locally

    Runs commands locally using the standard python subprocess module rather than dispatching remotely. This allows scripts to be debugged easily

  • touch_only

    Create or update Output files only to simulate the running of the pipeline. Does not dispatch commands remotely or locally. This is most useful to force a pipeline to acknowledge that a particular part is now up-to-date.

    See also: pipeline_run(touch_files_only=True)

  • output_files

    Output files which will be created or updated if touch_only =True