How It Works - DesignSafe Workflows

DesignSafe brings together the computational power of the Texas Advanced Computing Center (TACC) with cloud-based interfaces, allowing researchers to move between interactive exploration and large-scale, production-level computation. Whether testing a small Python script in a Jupyter notebook or deploying thousands of simulations across TACC HPC systems, the platform scales from one core to tens of thousands.

DesignSafe supports the full life cycle of computational research: developing models and scripts, running and monitoring simulations, managing input and output data, and sharing or reproducing results. All of this happens through a browser. The platform handles the details of moving files to the right place, submitting jobs to the right machine, and collecting results when they finish.

This page explains how DesignSafe works and how the pieces fit together. The rest of the guide covers specific tasks: compute environments (JupyterHub, VMs, HPC systems, queues, and allocations), storage and file management (storage areas, paths, file staging), submitting jobs, debugging failures, and parameter sweeps.

The DesignSafe Portal¶

Everything starts at the DesignSafe web portal. The portal is the single entry point for launching compute environments, submitting jobs, and managing data. From the workspace, researchers can open a JupyterHub notebook, start an interactive application like MATLAB or QGIS, submit a batch simulation to an HPC cluster, or browse and publish datasets in the Data Depot.

Researchers can also bypass the portal and submit jobs programmatically from a Jupyter notebook using dapi. This is the preferred approach for automated pipelines, parameter sweeps, and reproducible workflows.

Three compute environments¶

DesignSafe provides three places where computation can happen. Each serves a different purpose.

DesignSafe compute environments overview

JupyterHub is where most day-to-day work happens. Each session gets a dedicated container (up to 8 CPU cores, 20 GB RAM) that starts immediately with no queue wait. Researchers write code, test models, visualize results, and submit HPC jobs from here. For heavier interactive work, Jupyter HPC Native sessions run directly on Stampede3 or Vista GPU nodes with full node resources, though these go through the SLURM queue.

Virtual machines run applications that need an interactive session without a queue wait. OpenSees Interactive, MATLAB, ADCIRC Interactive, STKO, and QGIS all run on shared VMs at TACC. STKO and QGIS provide a full graphical desktop through NICE DCV, which streams a remote desktop to the browser. VMs share hardware across users, so they work best for lightweight tasks and quick tests.

HPC systems handle production-scale computation. Stampede3, Frontera, and Lonestar6 are clusters of interconnected machines (nodes), each with dozens of CPU cores and hundreds of gigabytes of memory. Jobs can span multiple nodes using MPI (Message Passing Interface), a standard that lets parallel processes on different nodes exchange data during execution. SLURM manages the queue. Long-running simulations, multi-core parallel analyses, and parametric sweeps with hundreds of runs all belong on HPC. Even when launched through the portal’s graphical forms, HPC jobs are batch jobs that run unattended.

Which environment for what¶

What you need to do	Environment	Example
Write and test code, visualize results	JupyterHub	Developing a post-processing script, plotting response spectra
Interactive GUI session	VM (DCV desktop)	Building a mesh in STKO, exploring spatial data in QGIS
Quick serial test	VM	Testing an OpenSees Tcl model, short MATLAB analysis
Large or long-running simulation	HPC batch	Nonlinear time-history analysis, ADCIRC storm-surge forecast
Hundreds of independent runs	HPC with PyLauncher	Fragility study across 500 ground-motion records
Parallel simulation across many cores	HPC with MPI	Multi-node OpenFOAM CFD, ADCIRC with millions of elements
GPU-accelerated work	Jupyter HPC Native (Vista) or GPU queue	ML training, GPU-accelerated simulation

Most researchers follow a natural progression: develop and test interactively in JupyterHub, then submit production runs as batch jobs to HPC. Compute Environments covers each environment in detail, including node types, queues, and allocations.

Data and compute live together¶

Research data and compute hardware are co-located at TACC. A ground-motion database in CommunityData can be referenced directly from a simulation job without downloading it to a laptop and re-uploading it to the cluster. This is one of DesignSafe’s most important advantages.

DesignSafe provides several storage areas with different tradeoffs between persistence and performance.

Storage area	Purpose	Backed up
MyData	Private files (scripts, inputs, outputs)	Yes
MyProjects	Shared project files visible to collaborators	Yes
Work	Active workspace on the HPC system	No
Scratch	Temporary high-speed storage on HPC	No (purged)
CommunityData	Public datasets shared across DesignSafe	Yes
NHERI-Published	Archived NHERI datasets with DOIs	Yes
NEES	Legacy NEES datasets	Yes

MyData and MyProjects live on Corral, TACC’s backed-up storage. Work and Scratch are fast but not backed up. Always copy important results to MyData or MyProjects when a job finishes.

When a job is submitted, Tapis automatically stages input files to the execution system before the job starts, and archives output back to DesignSafe storage after completion. There is no manual file transfer step. Storage and File Management covers paths across environments, dapi file operations, and transfer strategies in detail.

Designing your workflow¶

A computational workflow is the sequence of steps that takes a research question from raw inputs to published results. It includes preparing input files, running simulations, analyzing the output, and often repeating the process with different parameters. On a laptop, these steps might all happen in a single folder. On DesignSafe, they may span multiple environments, but the platform is designed so the pieces connect.

A typical workflow has four stages.

Stage	What happens	Where on DesignSafe
Input generation	Prepare models, parameters, ground motions, or meshes	JupyterHub (Python scripts, notebooks) or STKO (GUI mesh builder)
Execution	Run the simulation, ensemble, or training loop	HPC batch job (single run or PyLauncher sweep), or VM for quick tests
Post-processing	Extract results, compute statistics, generate figures	JupyterHub (pandas, matplotlib, custom scripts)
Iteration	Repeat with new parameters, Monte Carlo samples, or refined models	dapi from JupyterHub (programmatic resubmission)

When these stages are separate, each can be reused across projects or swapped to a different environment. A mesh generator can change without touching the solver. The execution stage can move from JupyterHub to HPC without rewriting the post-processing code. The Data Depot ties the stages together by providing shared storage that is accessible from every environment.

Design workflows around the research question, not around a specific tool. A workflow that runs one model today should be able to scale to hundreds of runs tomorrow. Different workloads scale differently, and the right strategy depends on the problem. Most HPC workloads on DesignSafe fall into one of two patterns:

Embarrassingly parallel (parametric sweeps). Many independent runs that do not communicate with each other. Each run gets its own inputs, produces its own outputs, and can succeed or fail without affecting the others. A fragility study running the same OpenSeesPy model across 500 ground-motion records is a classic example. Use PyLauncher to dispatch all tasks inside a single SLURM allocation. See Parameter Sweeps.

Tightly coupled parallel (MPI). One large model split across many cores that must communicate during execution. Each core (rank) works on a different part of the problem and exchanges data with its neighbors through MPI. A multi-node OpenFOAM simulation, ADCIRC storm surge model, or OpenSees MP domain-decomposed analysis all fall here. See Running HPC Jobs for MPI details.

These are not the only patterns — some workloads are memory-bound, GPU-accelerated, or combine both approaches — but the distinction between “many independent runs” and “one big coupled run” drives most decisions about nodes, cores, and which application to use.

Workload pattern	How it scales	Example
Many independent runs	Add more tasks to a single allocation (PyLauncher)	Fragility study with 500 ground-motion records
One large model	Divide the domain across cores with MPI	3D nonlinear structural analysis, ADCIRC storm surge
Memory-bound analysis	Use nodes with more RAM or fewer cores per node	Large stiffness matrix assembly
GPU-accelerated work	Use GPU queues on Stampede3 or Lonestar6	ML training, dense linear algebra

Where to go next¶

I want to...	Read
Understand JupyterHub, HPC systems, queues, and allocations	Compute Environments
Understand storage areas, paths, and file staging	Storage and File Management
Submit a job through the portal	Portal Submission
Submit a job with dapi (serial or parallel)	Running HPC Jobs
Figure out why my job failed	Debugging Failed Jobs
Run hundreds of independent simulations	Parameter Sweeps
See what applications are available	DesignSafe Applications
Understand Tapis internals or build a custom app	Advanced Topics