Loading Data from S3 to FSx and Using S3 as Backup
This section describes how to load data from Amazon S3 into an FSx for Lustre file system using Data Repository Associations (DRA), and presents a recommended strategy for using S3 as durable storage to back up data when using FSx scratch file systems.
Background and Design Philosophy
FSx for Lustre scratch file systems provide very high I/O performance at low cost, but they are not durable storage:
Scratch FSx does not replicate data
Scratch FSx does not support backups
Data loss is possible in the event of service or hardware failure
Therefore:
FSx scratch must never be treated as the only copy of important data
Amazon S3 should be used as the authoritative source and/or backup
The recommended design is:
Input data: stored durably in S3, imported into FSx for fast access
Output data: written to FSx during computation and exported back to S3
Data Repository Associations (DRA)
A Data Repository Association (DRA) links a directory inside FSx to an S3 bucket or prefix.
For example:
FSx path
/ExtData↔s3://my-bucket/ExtData/FSx path
/output↔s3://my-bucket/output/
After mounting FSx locally at /fsx_input, these appear as:
/fsx_input/ExtData
/fsx_input/output
A file system may have multiple DRAs, provided their FSx paths do not overlap.
How Data Is Loaded from S3 to FSx
It is important to understand that creating a DRA does not immediately copy all file contents from S3.
By default:
FSx does not proactively import metadata or file contents
Metadata and data are imported lazily when files are accessed
This behavior saves time and storage for large datasets
There are three ways to populate data from S3 to FSx.
Method 1: On-Demand Loading (Default)
If auto-import policies are enabled on the DRA:
Files and directories appear immediately under FSx
File contents are fetched automatically when first accessed
Example:
ls /fsx-input/ExtData
cat /fsx-input/ExtData/example.nc
This method is sufficient for most workflows and requires no manual action.
Method 2: Batch Metadata Import (Recommended After Creation)
To ensure the full directory structure appears immediately, run a metadata import task.
Auto-import policies are enabled by default, you can check by:
Checking Auto-Import Policies via AWS Console
Go to AWS Console → FSx
Select your FSx for Lustre file system
Open the Data repository tab
In the Data repository associations table, check the Import policy column for each DRA
Possible values include:
NEW, CHANGED, DELETEDAuto-import is enabled. FSx will automatically detect new, modified, and deleted objects in the associated S3 path.
Noneor an empty value Auto-import is not enabled. FSx will not automatically notice changes in S3.
Checking Auto-Import Policies via AWS CLI
You can also verify auto-import policies using the AWS CLI:
aws fsx describe-data-repository-associations
For a more readable summary:
aws fsx describe-data-repository-associations \ --query "Associations[*].{ \ FSxPath:FileSystemPath, \ ImportPolicy:S3.AutoImportPolicy.Events, \ ExportPolicy:S3.AutoExportPolicy.Events \ }"
Example output:
[ { "FSxPath": "/ExtData", "ImportPolicy": ["NEW", "CHANGED", "DELETED"], "ExportPolicy": null }, { "FSxPath": "/blended-tropomi", "ImportPolicy": ["NEW", "CHANGED", "DELETED"], "ExportPolicy": ["NEW", "CHANGED", "DELETED"] } ]
Interpretation:
ImportPolicycontaining one or more events (NEW,CHANGED,DELETED) indicates that auto-import is enabledImportPolicy = nullindicates that auto-import is disabledExportPolicyshows whether automatic export (FSx → S3) is enabled
What Auto-Import Policies Do (and Do Not Do)
Auto-import policies provide the following behavior:
Changes in S3 are automatically detected
Directory structure and metadata in FSx are kept in sync
File contents are fetched on demand when accessed
Auto-import policies do not guarantee that:
All file contents have already been copied to FSx
FSx remains usable if the S3 bucket is deleted
To fully copy all data from S3 to FSx, an explicit
IMPORT_FROM_REPOSITORYtask must be executed.Example:
aws fsx create-data-repository-task \ --type IMPORT_METADATA_FROM_REPOSITORY \ --file-system-id fs-xxxxxxxx \ --paths /ExtData
Repeat for other DRA paths as needed.
This imports only metadata, not file contents.
Method 3: Full Data Import (Required Before Deleting S3)
If you intend to delete the associated S3 bucket or make FSx fully self-contained, you must explicitly import all data.
Example:
aws fsx create-data-repository-task \
--type IMPORT_FROM_REPOSITORY \
--file-system-id fs-xxxxxxxx \
--paths /ExtData
This command copies all file contents from S3 to FSx.
Warning
Do not delete the S3 bucket until all
IMPORT_FROM_REPOSITORY tasks complete successfully.
DRA availability alone does not guarantee data has been copied.
Monitor progress with:
aws fsx describe-data-repository-tasks
Proceed only when Lifecycle = COMPLETED.
FSx Storage Capacity Considerations
When performing a full import:
FSx storage capacity must be greater than or equal to the total logical size of data in S3 (Add some headroom is recommended)
Do not rely on compression to reduce required capacity
Imports will fail if FSx runs out of space
Recommended Backup Strategy Using S3
The recommended and safe pattern for FSx scratch usage is:
Input Data (Read-Only)
Authoritative copy: S3
Working copy: FSx scratch
DRA policy:
Import: NEW, CHANGED, DELETED
Export: disabled
If FSx fails:
Recreate FSx
Re-import from S3
No data loss
Output Data (Write-Back)
Working location: FSx scratch
Backup location: S3 (separate bucket or prefix)
DRA policy: - Export enabled (auto-export or manual)
Output data is continuously or periodically written to S3
If FSx fails: - Completed output already exists in S3 - Only in-progress work may be lost
Note
S3 provides high durability at low cost and should always be used to store important or irreplaceable data when using FSx scratch.
When Is It Safe to Delete the S3 Bucket?
It is safe to delete the S3 bucket only if all of the following are true:
Full
IMPORT_FROM_REPOSITORYtasks have completedFSx storage usage is stable (verified with
lfs df)Files remain readable after cache drops
You understand that FSx scratch provides no recovery or backups
In most cases, deleting the S3 bucket is not recommended. Keeping S3 as the durable source of truth is safer and usually cheaper.
Summary
FSx scratch provides fast but non-durable storage
DRA does not automatically copy all file contents
Use explicit import tasks to fully populate FSx if needed
Always use S3 as the durable source and/or backup
Never treat FSx scratch as the only copy of important data