Data transfer between FSx and S3 bucket
This section describes two approaches for transferring data between an Amazon FSx for Lustre file system and an Amazon S3 bucket.
Launch an EC2 instance for data transfer
In this approach, data transfer is performed through a dedicated EC2 instance. This instance is used only for data movement, not for computation.
Launch EC2 instance
Launch a single EC2 instance (not a ParallelCluster)
The instance must be in the same VPC as the FSx file system
Mount the FSx file system on the instance
Data transfer commands
Transfer data from S3 to FSx:
aws s3 sync s3://<bucket>/input /fsx/inputTransfer data from FSx to S3:
aws s3 sync /fsx/output s3://<bucket>/output
Terminate instance
After data transfer is complete, terminate the EC2 instance to avoid unnecessary charges.
Instance type recommendation: A compute-optimized instance type is recommended for data transfer tasks (e.g.
c6i.large).
Python tools for data transfer: Python package such as
boto3for data downloading from S3 bucket.Refer to the detailed tutorial for downloading GEOS-Chem input data on AWS.
Example scripts for downloading data on AWS.
Data repository association (DRA)
FSx for Lustre Data Repository Associations (DRA) provide significantly
higher performance than aws s3 sync for transferring data between FSx
and S3.
With DRA, data transfer is handled natively by the AWS service rather than through an EC2 instance.
Requirements
The FSx file system and the S3 bucket reside in the same AWS account (account ID, not IAM user)
The FSx file system and the S3 bucket are in the same region
The S3 bucket to be linked allows FSx access specified in the permissions.
If these requirements are not met, DRA cannot be used or data cannot be loaded.
Create DRA association
Through console
Specify through Data repository import/export (DRA) tab when creating a FSx file system
Assume an FSx for Lustre file system is mounted on an EC2 instance at:
/fsx_input
Three data repository associations (DRAs) are created with the following settings:
DRA 1
File system path:
/ExtDataS3 path:
s3://dzhang-imi-gchp-test/ExtData
DRA 2
File system path:
/blended-tropomiS3 path:
s3://dzhang-imi-gchp-test/blended-tropomi
DRA 3
File system path:
/blended-boundary-conditionsS3 path:
s3://dzhang-imi-gchp-test/blended-boundary-conditions
Note
Select all import policies and deselect all export policies so that S3 → FSx synchronization is enabled while FSx → S3 synchronization is disabled.
On AWS console, we cannot create multiple DRAs at once. We can modify DRA settings after FSx is created by: - Go to AWS Console → FSx - Select your FSx for Lustre file system - Open the Data repository tab - Click Create data repository association
In this case, the linked S3 data will appear locally on the EC2 instance as:
/fsx_input/ExtData/
/fsx_input/blended-tropomi/
The local mount point (/fsx_input) corresponds to the root of the FSx file system.
Each data repository association creates a directory directly under this root,
with contents mirrored from the associated S3 prefix.
Through CLI
Data repository associations (DRA) can be added during creation or afterwards
using aws fsx create-data-repository-association.
You must provide an IAM Role ARN that FSx can assume to access S3 (trusted by fsx.amazonaws.com and allowed S3 actions).
# Create two Data Repository Associations (DRAs).
# Note: DRA file system paths cannot overlap (e.g., /ExtData and /ExtData/subdir).
# DRAs are supported on FSx for Lustre 2.12/2.15 file systems (excluding scratch_1).
# DRA 1: Import-only (recommended for static input data like ExtData)
aws fsx create-data-repository-association \
--file-system-id "$FSX_ID" \
--file-system-path "/ExtData" \
--data-repository-path "s3://dzhang-imi-gchp-test/ExtData" \
--batch-import-meta-data-on-create \
--s3 '{
"AutoImportPolicy": {"Events": ["NEW","CHANGED","DELETED"]}
}' \
--tags Key=Name,Value=dra-extdata \
--client-request-token dra-extdata-001
# DRA 2: Import-only (or add AutoExportPolicy if you truly want FSx -> S3 sync)
aws fsx create-data-repository-association \
--file-system-id "$FSX_ID" \
--file-system-path "/blended-tropomi" \
--data-repository-path "s3://dzhang-imi-gchp-test/blended-tropomi" \
--batch-import-meta-data-on-create \
--s3 '{
"AutoImportPolicy": {"Events": ["NEW","CHANGED","DELETED"]}
}' \
--tags Key=Name,Value=dra-blended-tropomi \
--client-request-token dra-tropomi-001
# DRA 3: Import-only (or add AutoExportPolicy if you truly want FSx -> S3 sync)
aws fsx create-data-repository-association \
--file-system-id "$FSX_ID" \
--file-system-path "/blended-boundary-conditions" \
--data-repository-path "s3://dzhang-imi-gchp-test/blended-boundary-conditions" \
--batch-import-meta-data-on-create \
--s3 '{
"AutoImportPolicy": {"Events": ["NEW","CHANGED","DELETED"]}
}' \
--tags Key=Name,Value=dra-blended-bc \
--client-request-token dra-bc-001
Verify DRA exists
aws fsx describe-data-repository-associations \
--filters Name=file-system-id,Values="$FSX_ID" \
--query "Associations[*].{Path:FileSystemPath,S3:DataRepositoryPath,State:Lifecycle}"
Important
FSx for Lustre accesses S3 through an IAM service role trusted by
fsx.amazonaws.com. When enabling DRA, ensure that the associated
IAM role has permission to s3:GetObject, s3:PutObject, and
s3:ListBucket on the linked S3 bucket or prefix.
Import and Export Semantics of DRA
A Data Repository Association (DRA) does not provide real-time or bidirectional synchronization between FSx and S3. Instead, it implements directional, policy-driven, and largely lazy data movement.
Understanding these semantics is critical when using FSx scratch file systems.
FSx → S3 (Export Semantics)
One-time export required for pre-existing data
Files that already exist in FSx before the DRA is created are not exported automatically. A one-time export task is required to establish a baseline copy in S3.
Auto-export (after DRA creation)
When auto-export is enabled:
Newly created or modified files in FSx are automatically exported to S3
The full file contents (not only metadata) are written to S3
Export occurs asynchronously but usually within minutes
Deletion behavior
Deleting a file in FSx does not delete the corresponding object in S3
S3 is treated as durable, append-oriented storage
S3 → FSx (Import Semantics)
Fully lazy import model
With auto-import enabled:
Files stored in S3 (created before or after the DRA) are not proactively copied to FSx
Both metadata and file contents are imported only when the file is accessed (e.g.,
ls,stat, file open, or model read)
Access-triggered behavior
On first access to a given file from FSx:
File metadata is imported into the FSx namespace
File data blocks are downloaded on demand
Subsequent accesses to the same file reuse the cached data and do not require re-downloading, unless the cache is evicted or the file changes
Manual import tasks (optional)
Manual import tasks may be used to pre-populate directory structure and metadata, but file contents are still fetched lazily unless a full import is explicitly requested.
Behavior summary
Operation |
Result |
Notes |
|---|---|---|
FSx → S3 (existing files) |
Not exported automatically |
One-time export required |
FSx → S3 (new or modified files) |
Automatically exported |
Full data copied asynchronously |
FSx file deletion |
No effect on S3 |
No automatic deletion |
S3 → FSx (any file) |
Imported on access |
Metadata and data are lazy |
Auto-import policy |
Enables access-triggered import |
No proactive copying |
Note
DRA is particularly useful for large datasets
EC2-based
aws s3 syncremains a flexible fallback option