AWS setup at Harvard
Knowledge base
Overview: How AWS works
High-level workflow
What this means in practice
How this documentation is organized
IAM identity
IAM user
IAM role
Assumed IAM role via Single Sign-On (SSO)
Miscellaneous IAM roles
IAM roles in ParallelCluster
Head node IAM role
Compute node IAM role
Data access considerations
AWS APIs
Checking AWS API permissions
Simulating API permissions
Practical considerations
AWS services and resources
Examples of AWS resources
Example AWS services and resources
Miscellaneous clarifications
CloudFormation
ParallelCluster
VPC
Glossary
Operational tutorials
Quickstart overview
High-level workflow
Assumptions
AWS utilities
Official installation instructions
Install using conda or mamba
Confirm installation
AWS account setup
SSO login
Create and store key pair
Store the key pair
AWS configure (IAM user)
Create IAM user
Add proper permissions
Create access key
Configure AWS with IAM user credential
(Advanced / emergency) Temporary admin IAM user
Create an admin IAM user
Create access key for admin user
Using admin to detach or delete IAM permission policies
Deactivate access key for admin IAM user
List AWS configured credentials
Build a custom Amazon Machine Image
Build AMI from AWS Console
Base AMI
Install required libraries
Create AMI
Build AMI with ParallelCluster based on an existing AMI
Prerequisites
Build process
Create an Amazon S3 Bucket
Prerequisites
Method 1: Create an S3 Bucket via AWS Management Console
Step 1: Open the S3 Console
Step 2: Configure Bucket Settings
Step 3: Create the Bucket
Method 2: Create an S3 Bucket via AWS CLI
Step 1: Configure AWS CLI
Step 2: Create the Bucket
Add FSx access to allow for DRA
Verification
Notes and Best Practices
Next Steps
Upload Data to an Amazon S3 Bucket
Prerequisites
Upload Input Data Using Python Scripts (Recommended)
Downloading scripts tutorials
Upload Output Data from FSx for Lustre (Best for Large Outputs)
Other Official Methods
Upload Data Using AWS CLI
Upload a Single File
Upload a Directory Recursively
Synchronize a Directory (Recommended)
Upload Data Using AWS Management Console
Common Permission Requirements
Verification
Checking S3 Bucket Size (for FSx Planning)
Use the AWS CLI (recommended):
Use the AWS console:
Notes and Best Practices
Next Steps
Create an FSx file system
Create FSx through AWS Console
Specify file system details
Delete FSx
Create FSx through AWS CLI
Delete the file system
Mount FSx to an EC2 instance
Prerequisites
Mounting FSx to an EC2 instance
Change ownership for write permissions (FSx output data)
Data transfer between FSx and S3 bucket
Launch an EC2 instance for data transfer
Data repository association (DRA)
Requirements
Create DRA association
Through console
Through CLI
Import and Export Semantics of DRA
FSx → S3 (Export Semantics)
S3 → FSx (Import Semantics)
Behavior summary
Loading Data from S3 to FSx and Using S3 as Backup
Background and Design Philosophy
Data Repository Associations (DRA)
How Data Is Loaded from S3 to FSx
Method 1: On-Demand Loading (Default)
Method 2: Batch Metadata Import (Recommended After Creation)
Method 3: Full Data Import (Required Before Deleting S3)
FSx Storage Capacity Considerations
Recommended Backup Strategy Using S3
Input Data (Read-Only)
Output Data (Write-Back)
When Is It Safe to Delete the S3 Bucket?
Summary
Create a ParallelCluster via AWS CLI
Create the cluster
Checklist for
pcluster-create.yml
Example
pcluster-create.yml
Monitor the creation process
Delete ParallelCluster
Delete by cluster name
Find the cluster name (if forgotten)
Recover the configuration YAML used to create a cluster
Teardown and cleanup
EC2 instances
ParallelCluster
Resources
Templates
Template: AMI build configuration (ami-build.yaml)
Example template
Optional: Custom build steps
Template: Cluster configuration (pcluster-create.yml)
Example template (Slurm + FSx for Lustre)
Common placeholders
Quick validation commands
Cost considerations: S3 vs FSx for Lustre vs compute
High-level pricing overview
S3: lowest-cost persistent storage
FSx for Lustre: high-performance, higher-cost storage
Compute cost usually dominates overall spending
Recommended cost-efficient pattern
Reference links
AWS setup at Harvard
Index
Index