Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Research data and compute hardware are co-located at TACC. A ground-motion database in CommunityData can be referenced directly from a simulation job without downloading it to a laptop and re-uploading it. This co-location is one of DesignSafe’s most important advantages, but it means understanding where files live and how they move between environments.

Storage areas

Storage AreaBacked UpAccessible FromBest For
MyDataYesData Depot, JupyterHub, VMs, TapisPersonal files: scripts, inputs, outputs
MyProjectsYesData Depot, JupyterHub, VMs, TapisTeam collaboration, curation, publication
CommunityDataYesData Depot, JupyterHub, VMs, TapisPublic shared datasets (read-only)
NHERI-PublishedYesData Depot, JupyterHub, VMs, TapisArchived NHERI datasets with DOIs (read-only)
NEESYesData Depot, JupyterHub, VMs, TapisLegacy NEES datasets (read-only)
WorkNoCompute nodes, JupyterHub, Data DepotActive HPC job I/O, staging large inputs
ScratchNo (purged)Compute nodes onlyTemporary high-speed storage during jobs

MyData, MyProjects, CommunityData, NHERI-Published, and NEES all live on Corral, TACC’s networked storage with automatic backups. This is the long-term home for research data. Performance is moderate because access goes over the network.

Work and Scratch live on Lustre, a parallel filesystem that stripes files across many disks simultaneously. This makes large reads and writes significantly faster than Corral. Work and Scratch are not backed up. Use them for staging large inputs and holding outputs temporarily. Always copy important results back to MyData or MyProjects. The performance difference is especially noticeable for jobs that read or write many files, or that perform frequent I/O during execution.

Node-local storage (/tmp) on each compute node is the fastest option but files disappear when the job ends. Use it for scratch I/O during computation. See Running HPC Jobs for details on /tmp sizes and usage patterns.

Prepare in Corral (MyData/MyProjects)
    → Stage to Work for large datasets
    → Run jobs (use /tmp for scratch I/O)
    → Archive results back to Corral

Paths across environments

The same storage area appears at different paths depending on the environment.

JupyterHub paths

Data Depot SectionJupyterHub DirectoryPath
My DataMyData/home/jupyter/MyData/
My ProjectsMyProjects/home/jupyter/MyProjects/PRJ-XXXX/
Community DataCommunityData/home/jupyter/CommunityData/
PublishedNHERI-Published/home/jupyter/NHERI-Published/PRJ-XXXX/
Published (NEES)NEES/home/jupyter/NEES/
WorkWork/home/jupyter/Work/stampede3/ (HPC Native sessions only)

HPC system paths

Each TACC system has its own $HOME and $SCRATCH filesystems. Only $WORK (the Stockyard global shared filesystem) is accessible across systems. The $WORK path includes the system name as a subdirectory.

Always use the environment variables ($HOME, $WORK, $SCRATCH) rather than hardcoded paths, since the underlying mount points can change. The examples below show typical paths, but echo $WORK will always give the correct current path.

SystemStorage AreaTypical PathEnvironment Variable
Stampede3Home/home1/<groupid>/<username>/$HOME
Stampede3Work/work/<groupid>/<username>/stampede3/$WORK
Stampede3Scratch/scratch/<groupid>/<username>/$SCRATCH
FronteraHome/home1/<groupid>/<username>/$HOME
FronteraWork/work/<groupid>/<username>/frontera/$WORK
FronteraScratchuse $SCRATCH (mount point varies)$SCRATCH
Lonestar6Home/home1/<groupid>/<username>/$HOME
Lonestar6Work/work/<groupid>/<username>/ls6/$WORK
Lonestar6Scratch/scratch/<groupid>/<username>/$SCRATCH

Tapis job directory

When Tapis runs a job, all input files are staged into a single working directory on the compute system, available as $TAPIS_JOB_WORKDIR. Every compute node in a multi-node job can see the same staged files through the shared parallel filesystem — inputs are not copied separately to each node.

dapi path translation

dapi handles path translation automatically. Use DesignSafe paths (as seen in the Data Depot) and let dapi convert them to Tapis URIs:

from dapi import DSClient
ds = DSClient()

# Convert a DesignSafe path to a Tapis URI for job submission
input_uri = ds.files.to_uri("/MyData/opensees/site-response/")

# Convert back
path = ds.files.to_path(input_uri)

Common path mappings (dapi translates these automatically):

DesignSafe PathTapis URI
/MyData/folder/tapis://designsafe.storage.default/username/folder/
/projects/PRJ-XXXX/folder/tapis://project-<uuid>/folder/
/CommunityData/folder/tapis://designsafe.storage.community/folder/

For projects, dapi searches Tapis to resolve the PRJ number to the project’s UUID-based system ID (e.g., project-766bbc0e-a536-...).

NHERI-Published and NEES are read-only and not typically used as job inputs. Their Tapis system IDs are designsafe.storage.published and nees.public.

File operations with dapi

ds.files.list("/MyData/results/")
ds.files.upload("/MyData/inputs/", "local_file.csv")
ds.files.download("/MyData/results/output.csv", "local_output.csv")

File staging and transfer

When a job is submitted, Tapis automatically stages input files to the execution system before the job starts and archives output back to DesignSafe storage after completion. There is no manual file transfer step.

Bundle small files. A directory with 1,000 small CSV files transfers much slower than a single tar.gz archive. Bundle inputs before staging.

Keep shared data in Work. If multiple jobs reuse the same input data (e.g., 500 ground-motion records for a fragility study), keep it in Work to avoid re-staging for every submission.

Avoid running against Corral. Large datasets benefit from the higher I/O bandwidth of Work and Scratch. Running jobs directly against MyData (Corral) is slower and not recommended for production simulations.

For transferring data to and from DesignSafe using Globus, Cyberduck, or command-line tools (scp/rsync), see the DesignSafe Data Transfer Guide.