Research data and compute hardware are co-located at TACC. A ground-motion database in CommunityData can be referenced directly from a simulation job without downloading it to a laptop and re-uploading it. This co-location is one of DesignSafe’s most important advantages, but it means understanding where files live and how they move between environments.
Storage areas¶
| Storage Area | Backed Up | Accessible From | Best For |
|---|---|---|---|
| MyData | Yes | Data Depot, JupyterHub, VMs, Tapis | Personal files: scripts, inputs, outputs |
| MyProjects | Yes | Data Depot, JupyterHub, VMs, Tapis | Team collaboration, curation, publication |
| CommunityData | Yes | Data Depot, JupyterHub, VMs, Tapis | Public shared datasets (read-only) |
| NHERI-Published | Yes | Data Depot, JupyterHub, VMs, Tapis | Archived NHERI datasets with DOIs (read-only) |
| NEES | Yes | Data Depot, JupyterHub, VMs, Tapis | Legacy NEES datasets (read-only) |
| Work | No | Compute nodes, JupyterHub, Data Depot | Active HPC job I/O, staging large inputs |
| Scratch | No (purged) | Compute nodes only | Temporary high-speed storage during jobs |
MyData, MyProjects, CommunityData, NHERI-Published, and NEES all live on Corral, TACC’s networked storage with automatic backups. This is the long-term home for research data. Performance is moderate because access goes over the network.
Work and Scratch live on Lustre, a parallel filesystem that stripes files across many disks simultaneously. This makes large reads and writes significantly faster than Corral. Work and Scratch are not backed up. Use them for staging large inputs and holding outputs temporarily. Always copy important results back to MyData or MyProjects. The performance difference is especially noticeable for jobs that read or write many files, or that perform frequent I/O during execution.
Node-local storage (/tmp) on each compute node is the fastest option but files disappear when the job ends. Use it for scratch I/O during computation. See Running HPC Jobs for details on /tmp sizes and usage patterns.
Prepare in Corral (MyData/MyProjects)
→ Stage to Work for large datasets
→ Run jobs (use /tmp for scratch I/O)
→ Archive results back to CorralPaths across environments¶
The same storage area appears at different paths depending on the environment.
JupyterHub paths¶
| Data Depot Section | JupyterHub Directory | Path |
|---|---|---|
| My Data | MyData | /home/jupyter/MyData/ |
| My Projects | MyProjects | /home/jupyter/MyProjects/PRJ-XXXX/ |
| Community Data | CommunityData | /home/jupyter/CommunityData/ |
| Published | NHERI-Published | /home/jupyter/NHERI-Published/PRJ-XXXX/ |
| Published (NEES) | NEES | /home/jupyter/NEES/ |
| Work | Work | /home/jupyter/Work/stampede3/ (HPC Native sessions only) |
HPC system paths¶
Each TACC system has its own $HOME and $SCRATCH filesystems. Only $WORK (the Stockyard global shared filesystem) is accessible across systems. The $WORK path includes the system name as a subdirectory.
Always use the environment variables ($HOME, $WORK, $SCRATCH) rather than hardcoded paths, since the underlying mount points can change. The examples below show typical paths, but echo $WORK will always give the correct current path.
| System | Storage Area | Typical Path | Environment Variable |
|---|---|---|---|
| Stampede3 | Home | /home1/<groupid>/<username>/ | $HOME |
| Stampede3 | Work | /work/<groupid>/<username>/stampede3/ | $WORK |
| Stampede3 | Scratch | /scratch/<groupid>/<username>/ | $SCRATCH |
| Frontera | Home | /home1/<groupid>/<username>/ | $HOME |
| Frontera | Work | /work/<groupid>/<username>/frontera/ | $WORK |
| Frontera | Scratch | use $SCRATCH (mount point varies) | $SCRATCH |
| Lonestar6 | Home | /home1/<groupid>/<username>/ | $HOME |
| Lonestar6 | Work | /work/<groupid>/<username>/ls6/ | $WORK |
| Lonestar6 | Scratch | /scratch/<groupid>/<username>/ | $SCRATCH |
Tapis job directory¶
When Tapis runs a job, all input files are staged into a single working directory on the compute system, available as $TAPIS_JOB_WORKDIR. Every compute node in a multi-node job can see the same staged files through the shared parallel filesystem — inputs are not copied separately to each node.
dapi path translation¶
dapi handles path translation automatically. Use DesignSafe paths (as seen in the Data Depot) and let dapi convert them to Tapis URIs:
from dapi import DSClient
ds = DSClient()
# Convert a DesignSafe path to a Tapis URI for job submission
input_uri = ds.files.to_uri("/MyData/opensees/site-response/")
# Convert back
path = ds.files.to_path(input_uri)Common path mappings (dapi translates these automatically):
| DesignSafe Path | Tapis URI |
|---|---|
/MyData/folder/ | tapis://designsafe.storage.default/username/folder/ |
/projects/PRJ-XXXX/folder/ | tapis://project-<uuid>/folder/ |
/CommunityData/folder/ | tapis://designsafe.storage.community/folder/ |
For projects, dapi searches Tapis to resolve the PRJ number to the project’s UUID-based system ID (e.g., project-766bbc0e-a536-...).
NHERI-Published and NEES are read-only and not typically used as job inputs. Their Tapis system IDs are designsafe.storage.published and nees.public.
File operations with dapi¶
ds.files.list("/MyData/results/")
ds.files.upload("/MyData/inputs/", "local_file.csv")
ds.files.download("/MyData/results/output.csv", "local_output.csv")File staging and transfer¶
When a job is submitted, Tapis automatically stages input files to the execution system before the job starts and archives output back to DesignSafe storage after completion. There is no manual file transfer step.
Bundle small files. A directory with 1,000 small CSV files transfers much slower than a single tar.gz archive. Bundle inputs before staging.
Keep shared data in Work. If multiple jobs reuse the same input data (e.g., 500 ground-motion records for a fragility study), keep it in Work to avoid re-staging for every submission.
Avoid running against Corral. Large datasets benefit from the higher I/O bandwidth of Work and Scratch. Running jobs directly against MyData (Corral) is slower and not recommended for production simulations.
For transferring data to and from DesignSafe using Globus, Cyberduck, or command-line tools (scp/rsync), see the DesignSafe Data Transfer Guide.