Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

When you submit a job through Tapis, you are not directly submitting a Slurm script. Tapis uses a two-script model that separates scheduler control from application logic. This page covers both scripts, MPI configuration, and deployment practices. For an overview of how wrapper scripts fit into the broader Tapis app model, see Tapis and Custom Apps.

The Two-Script Model

When running a ZIP-based Tapis application on an HPC system, two scripts work together at runtime.

tapisjob.sh (the Tapis-generated launcher)

tapisjob.sh is created automatically for every job submission. It plays the same role as a Slurm batch script you would submit with sbatch. It contains the scheduler directives for your job’s resource requirements (node count, cores, queue, walltime) based on the app definition and job request.

Before launching your application, tapisjob.sh sources tapisjob.env, which contains job metadata and resolved parameters (job UUIDs, allocated resources, input/output paths). It then calls your application entrypoint.

You never create or edit this file.

Example tapisjob.sh:

#!/bin/bash

# This script was auto-generated by the Tapis Jobs Service for the purpose
# of running a Tapis application.  The order of execution is as follows:
#
#   1. The batch scheduler options are passed to the scheduler, including any
#      user-specified, scheduler-managed environment variables.
#   2. The application container is run with container options, environment
#      variables and application parameters as supplied in the Tapis job,
#      application and system definitions.

# Slurm directives.
#SBATCH --account DS-HPC1
#SBATCH --job-name tapisjob.sh
#SBATCH --nodes 2
#SBATCH --ntasks 96
#SBATCH --output /scratch/XXXXX/username/tapis/966244f7-de44-4404-ac54-9f1da33cda3e-007/tapisjob.out
#SBATCH --partition skx
#SBATCH --time 120

module load opensees/3.6.0

# Issue launch command for application executable.
# Format: nohup ./tapisjob_app.sh > tapisjob.out 2>&1 &

# Export Tapis and user defined environment variables.
. ./tapisjob.env

# Launch app executable.
./tapisjob_app.sh OpenSeesMP simpleMP_WebSubmit.tcl > /scratch/XXXXX/username/tapis/966244f7-de44-4404-ac54-9f1da33cda3e-007/tapisjob.out 2>&1

Here is a second, more generic illustration of what Tapis generates:

#!/bin/bash
#SBATCH -J tapis-job
#SBATCH -N 2
#SBATCH --ntasks-per-node=48
#SBATCH -p normal
#SBATCH -t 02:00:00
#SBATCH -o tapisjob.out
#SBATCH -e tapisjob.err

# Move to the job execution directory
cd "$SLURM_SUBMIT_DIR"

# Load Tapis-provided environment variables
if [ -f tapisjob.env ]; then
  source tapisjob.env
fi

echo "Tapis Job UUID: $_tapisJobUUID"
echo "Allocated nodes: $_tapisNodes"

# Ensure the app script is executable
chmod +x ./tapisjob_app.sh

# Launch the application
./tapisjob_app.sh

# Capture exit code for Tapis bookkeeping
echo $? > tapisjob.exitcode

The Slurm directives come from your app definition and job request. tapisjob.env injects Tapis metadata and resolved paths. The script prepares the environment, calls your application, and reports status back to Tapis.

tapisjob_app.sh (your application workflow)

tapisjob_app.sh is the script you write and control. It contains the actual commands you want to run: loading modules, activating environments, launching MPI jobs, running your analysis. Tapis does not modify it.

During execution, tapisjob.sh calls it directly (e.g., ./tapisjob_app.sh), redirecting stdout and stderr to Tapis-managed log files.

If your ZIP archive does not include tapisjob_app.sh, Tapis looks for a tapisjob.manifest file that specifies an alternate executable using tapisjob_executable=<path>. If neither is present, the job fails. If both are present, tapisjob_app.sh takes precedence.

Example tapisjob_app.sh:

set -x

BINARYNAME=$1
INPUTSCRIPT=$2
echo "INPUTSCRIPT is $INPUTSCRIPT"

TCLSCRIPT="${INPUTSCRIPT##*/}"
echo "TCLSCRIPT is $TCLSCRIPT"

cd "${inputDirectory}"

echo "Running $BINARYNAME"

ibrun $BINARYNAME $TCLSCRIPT
if [ ! $? ]; then
      echo "OpenSees exited with an error status. $?" >&2
      exit
fi

cd ..

Side-by-side summary

ScriptWho owns itPurposeYou edit it?
tapisjob.shTapisScheduler glue, environment injection, monitoringNever
tapisjob_app.shYouScientific / computational workflowAlways

Reserved Filenames

All filenames beginning with tapisjob are reserved by Tapis. Do not create files with this prefix. As an app developer, you supply only:

Everything else (tapisjob.sh, tapisjob.env, output logs, status files) is managed by Tapis.

Where Execution Happens

Neither tapisjob.sh nor tapisjob_app.sh runs on a login node. Both execute inside a Slurm job allocation on compute nodes.

The execution sequence:

  1. Tapis sends the job request to the HPC scheduler (e.g., Slurm on Stampede3).

  2. Slurm allocates the requested compute nodes.

  3. The ZIP runtime is unpacked into the job execution directory.

  4. Slurm launches tapisjob.sh on the first allocated compute node.

  5. tapisjob.sh invokes tapisjob_app.sh.

  6. tapisjob_app.sh launches the application binaries.

The compute-node environment is intentionally minimal. On systems using a tacc-no-modules profile, no modules are preloaded and no Python environment is configured. All environment setup must happen inside tapisjob_app.sh. Relying on login-node defaults will cause failures.

A typical tapisjob_app.sh explicitly loads everything it needs:

module load python/3.12.11
module load opensees
module load hdf5/1.14.4

Sample tapisjob_app.sh Scripts

Example A. Serial or threaded job

#!/bin/bash
set -e

echo "Running on host: $(hostname)"
echo "Working directory: $(pwd)"

# Explicit environment setup (compute node starts clean)
module purge
module load python/3.12.11
module load hdf5/1.14.4

# Optional: create a virtual environment
python -m venv venv
source venv/bin/activate

pip install numpy pandas

# Run your analysis
python run_analysis.py input.json

Example B. MPI-based OpenSees job

#!/bin/bash
set -e

echo "MPI job starting"
echo "Nodes: $_tapisNodes"
echo "Cores per node: $_tapisCoresPerNode"

module purge
module load opensees
module load openmpi

# Use Slurm-provided MPI launcher
mpirun -np $SLURM_NTASKS OpenSeesMP model.tcl

MPI is launched inside tapisjob_app.sh. Whether the app is declared isMpi: true or not, the MPI command lives here. tapisjob.sh does not care whether this is MPI, Python, or anything else.

Example C. Hybrid workflow (pre-processing + MPI + post-processing)

#!/bin/bash
set -e

module purge
module load python/3.12.11
module load opensees
module load hdf5

echo "Pre-processing inputs"
python generate_model.py

echo "Running OpenSees in parallel"
mpirun -np $SLURM_NTASKS OpenSeesMP model.tcl

echo "Post-processing results"
python extract_results.py results/

This is where the two-script model is most useful. Tapis handles scheduling and lifecycle. You orchestrate entire pipelines in one place. The script stays portable and readable.

MPI in Tapis Apps

Tapis v3 supports multi-node MPI workloads in two ways. The difference is not performance or capability, but who launches MPI and how your wrapper script is executed.

Input Staging

Tapis v3 stages inputs only once to the execution system: input data (inputDirectory), ZIP runtime contents, tapisjob.sh, tapisjob.env, and tapisjob_app.sh. These are unpacked into one shared working directory (ExecSystemExecDir) on a parallel filesystem (e.g., SCRATCH or WORK).

On systems like Stampede3, the filesystem is visible from all compute nodes. Every MPI rank can read/write the same files. No per-node file copying occurs. MPI jobs do not require node-local staging.

The isMpi Flag

In a Tapis app definition, the isMpi flag controls one thing: does Tapis wrap your job in an MPI launcher?

It does not control node allocation, MPI capability, performance, or whether your application can use MPI. Those are handled entirely by Slurm.

Non-MPI Launch Mode (isMpi: false)

This is the default and most flexible mode.

Tapis allocates multiple nodes (if requested) and runs tapisjob.sh and tapisjob_app.sh only on the first node. This is standard Slurm behavior for a batch script without srun/mpirun.

Inside tapisjob_app.sh, you explicitly launch MPI where needed:

ibrun python3 my_mpi_script.py
ibrun OpenSeesMP model.tcl

Slurm expands the MPI job across all allocated nodes, assigns ranks, and manages communication.

LayerBehavior
TapisSingle-node launcher
SlurmFull MPI orchestration
FilesystemShared across all nodes

This works because inputs were staged once to shared storage and all ranks see the same directory.

This mode is ideal for real scientific workflows, especially when you have mixed serial and parallel logic, environment setup, selective MPI regions, Python with mpi4py, or OpenSeesMP.

"isMpi": false,
"mpiCmd": null
# Serial sanity checks
python -V

# MPI only where needed
ibrun python -m mpi4py my_analysis.py

There is no “double MPI” risk and no implicit wrapping. Full Slurm context is available (SLURM_NODELIST, SLURM_NTASKS, _tapisNodes, _tapisCoresPerNode).

This approach is especially important for installing or building mpi4py, writing files once, and generating shared metadata.

Scheduler-Launched MPI (isMpi: true)

In this mode, Tapis injects the MPI launcher for you. tapisjob_app.sh runs on all MPI ranks. You must not call ibrun or mpirun yourself. mpiCmd must be defined.

"isMpi": true,
"mpiCmd": "ibrun"
# Already running on all ranks
python -m mpi4py your_program.py

You must guard file creation, logging, and output on rank 0:

from mpi4py import MPI
if MPI.COMM_WORLD.rank == 0:
    write_summary()

Use this mode when the entire workflow is MPI, there is minimal serial logic, and you are already MPI-safe everywhere.

Avoiding Double MPI

Either Tapis launches MPI, or you do. Never both.

ModeisMpimpiCmdCall ibrun yourself?
Scheduler-launchedtrue“ibrun”No
Script-launchedfalsenullYes

If isMpi=false, mpiCmd must be null, not an empty string.

Practical Guidance by Application Type

ApplicationRecommended Approach
OpenSeesMPisMpi: false with ibrun OpenSeesMP model.tcl
OpenSeesPy + mpi4pyScript-launched MPI, guard rank-0 I/O, explicit ibrun
Pure Python (non-MPI)isMpi: false, request 1 node
End-to-end MPI codesisMpi: true with no internal launcher calls

Performance Note

There is no performance penalty for script-launched MPI. Shared scratch/work filesystems are designed for this pattern. Avoiding per-node duplication often reduces overhead. MPI scaling is identical. The difference is control, not speed.

HPC Launchers

A Tapis job on an HPC system is a standard Slurm batch job. Once the job starts, everything inside the allocation behaves exactly as if you had submitted it manually with sbatch. You can use HPC launcher tools inside a Tapis job.

PyLauncher

PyLauncher is a Python-based parametric job launcher developed at TACC. It runs many small, independent tasks within a single Slurm allocation by distributing them across all available cores and nodes. This makes it ideal for parameter sweeps, ensemble analyses, Monte Carlo simulations, and high-throughput workloads.

Because Tapis has already reserved the compute nodes, PyLauncher simply inherits the Slurm allocation. It detects available resources by reading standard Slurm environment variables and uses them automatically.

The DesignSafe Agnostic App supports PyLauncher jobs.

How PyLauncher fits into a Tapis job

  1. Tapis submits the job to Slurm. Nodes and cores are allocated according to your app definition.

  2. The job starts on the primary node. Your tapisjob_app.sh is executed.

  3. Required modules are loaded (Python, PyLauncher). Input files are already staged by Tapis.

  4. PyLauncher is invoked. It detects the Slurm allocation automatically and distributes tasks across all nodes and cores.

From PyLauncher’s perspective, there is no difference between a manually submitted Slurm job and a Tapis-launched job.

Using PyLauncher in a Tapis app

Many users rely on PyLauncher’s ClassicLauncher, which reads a list of shell commands and executes them concurrently (one per core), recycling cores as tasks complete. This lets you fully use your allocation, avoid submitting thousands of tiny Slurm jobs, and keep scheduling overhead low.

Writing a launcher-aware Tapis app

Using PyLauncher inside Tapis jobs is often the point where users move from consuming existing apps to building their own. The mental model is simple: Tapis allocates resources, your app decides how to use them.

In the app definition (app.json / profile.json), you define node count, cores per node, required modules, and user-facing parameters (number of tasks, parameter ranges, input files).

In the wrapper script (tapisjob_app.sh), you load modules, generate task lists dynamically from parameters or input files, invoke PyLauncher, and optionally collect outputs.

Launchers are most effective when each task is small relative to the allocation, tasks are independent (no inter-task communication), you want to amortize Slurm queue wait time across many runs, and you need predictable scaling. For tightly coupled MPI simulations, use Slurm’s native MPI launch instead.

Example wrapper for a launcher-based app:

#!/bin/bash
set -e

module purge
module load python3
module load pylauncher

echo "Generating task list from parameters"
python3 generate_tasks.py --num-samples ${numSamples} > tasklist.txt

echo "Launching tasks with PyLauncher"
python3 -c "
import pylauncher
pylauncher.ClassicLauncher('tasklist.txt', cores=${coresPerNode})
"

echo "Collecting results"
python3 aggregate_results.py

Deployment Best Practices

Directory and versioning

Keep one directory per app version:

apps/<app_id>/<version>/

For example, apps/opensees-mp/1.0.0/. Treat each version directory as immutable after registration. If you need a change, cut a new version (e.g., 1.0.1) and re-register.

Use semantic versioning (MAJOR.MINOR.PATCH). Do not register a version literally named latest. If latest moves, old notebooks would silently run different code on re-submit.

You can keep a latest/ convenience folder as a copy of the newest version’s files for browsing, but always register and submit jobs with a pinned semantic version.

Promotion workflow:

  1. Upload and register apps/<app_id>/<new_version>/...

  2. Validate it (smoke tests)

  3. Copy files into apps/<app_id>/latest/ to match

Environments (dev, test, prod)

Two patterns work well.

Separate IDs per stage (most isolated):

opensees-mp-dev   / 0.9.x     # fast iteration
opensees-mp-test  / 1.0.0-rc1 # release candidate
opensees-mp       / 1.0.0     # production

Single ID with staged versions:

opensees-mp / 1.0.0-rc1  # test
opensees-mp / 1.0.0      # prod

Keep the input/parameter schema stable from test to prod to avoid breaking users mid-upgrade.

Files to include per version

Pin container/module versions per app version and echo them at runtime.

The separation across these files is intentional. app.json defines the “what” (app identity, schema, systems, resources). profile.json defines the “prepare” (modules, environment for the execution system). tapisjob_app.sh defines the “run” (container/module launch, scheduler integration, error handling). This lets you tweak the runtime environment without rewriting the app contract.

Registration and immutability

Upload, register, freeze. Do not edit files in place after registration. If you need a fix, bump the version (e.g., 1.0.1) and re-register. Record the registered app UUID and, if applicable, the git commit or tag that produced the artifacts.

Permissions

Containers and modules

Consistent inputs and defaults

Validation checklist

Provenance and logging

At job start, print app id and version, container image/tag (or module list with versions), Slurm environment (job id, node list), hostname, date/time, and working directory.

On exit, write an explicit exit code and summarize key outputs or where they were archived.

FAQ on latest vs. versions

Q. Can we register appVersion: "latest"? A. Avoid it. Reproducibility breaks when latest changes.

Q. What is the point of apps/<app_id>/latest/ then? A. Human-friendly browsing and ad-hoc tests. Jobs still use pinned versions.

Q. How do we keep latest/ fresh? A. After promoting a new version, copy its files into latest/.