AWS setup at Harvard

Knowledge base

  • Overview: How AWS works
    • High-level workflow
    • What this means in practice
    • How this documentation is organized
  • IAM identity
    • IAM user
    • IAM role
      • Assumed IAM role via Single Sign-On (SSO)
      • Miscellaneous IAM roles
    • IAM roles in ParallelCluster
      • Head node IAM role
      • Compute node IAM role
        • Data access considerations
  • AWS APIs
    • Checking AWS API permissions
    • Simulating API permissions
    • Practical considerations
  • AWS services and resources
    • Examples of AWS resources
    • Example AWS services and resources
    • Miscellaneous clarifications
      • CloudFormation
      • ParallelCluster
      • VPC
  • Glossary

Operational tutorials

  • Quickstart overview
    • High-level workflow
    • Assumptions
  • AWS utilities
    • Official installation instructions
    • Install using conda or mamba
    • Confirm installation
  • AWS account setup
    • SSO login
    • Create and store key pair
      • Store the key pair
    • AWS configure (IAM user)
      • Create IAM user
      • Add proper permissions
      • Create access key
      • Configure AWS with IAM user credential
      • (Advanced / emergency) Temporary admin IAM user
      • Create an admin IAM user
      • Create access key for admin user
      • Using admin to detach or delete IAM permission policies
      • Deactivate access key for admin IAM user
      • List AWS configured credentials
  • Build a custom Amazon Machine Image
    • Build AMI from AWS Console
      • Base AMI
      • Install required libraries
      • Create AMI
    • Build AMI with ParallelCluster based on an existing AMI
      • Prerequisites
      • Build process
  • Create an Amazon S3 Bucket
    • Prerequisites
    • Method 1: Create an S3 Bucket via AWS Management Console
      • Step 1: Open the S3 Console
      • Step 2: Configure Bucket Settings
      • Step 3: Create the Bucket
    • Method 2: Create an S3 Bucket via AWS CLI
      • Step 1: Configure AWS CLI
      • Step 2: Create the Bucket
    • Add FSx access to allow for DRA
    • Verification
    • Notes and Best Practices
    • Next Steps
  • Upload Data to an Amazon S3 Bucket
    • Prerequisites
    • Upload Input Data Using Python Scripts (Recommended)
      • Downloading scripts tutorials
    • Upload Output Data from FSx for Lustre (Best for Large Outputs)
    • Other Official Methods
      • Upload Data Using AWS CLI
        • Upload a Single File
        • Upload a Directory Recursively
        • Synchronize a Directory (Recommended)
      • Upload Data Using AWS Management Console
    • Common Permission Requirements
    • Verification
    • Checking S3 Bucket Size (for FSx Planning)
      • Use the AWS CLI (recommended):
      • Use the AWS console:
    • Notes and Best Practices
    • Next Steps
  • Create an FSx file system
    • Create FSx through AWS Console
      • Specify file system details
      • Delete FSx
    • Create FSx through AWS CLI
      • Delete the file system
    • Mount FSx to an EC2 instance
      • Prerequisites
      • Mounting FSx to an EC2 instance
      • Change ownership for write permissions (FSx output data)
  • Data transfer between FSx and S3 bucket
    • Launch an EC2 instance for data transfer
    • Data repository association (DRA)
      • Requirements
      • Create DRA association
        • Through console
        • Through CLI
      • Import and Export Semantics of DRA
        • FSx → S3 (Export Semantics)
        • S3 → FSx (Import Semantics)
        • Behavior summary
  • Loading Data from S3 to FSx and Using S3 as Backup
    • Background and Design Philosophy
    • Data Repository Associations (DRA)
    • How Data Is Loaded from S3 to FSx
      • Method 1: On-Demand Loading (Default)
      • Method 2: Batch Metadata Import (Recommended After Creation)
      • Method 3: Full Data Import (Required Before Deleting S3)
    • FSx Storage Capacity Considerations
    • Recommended Backup Strategy Using S3
      • Input Data (Read-Only)
      • Output Data (Write-Back)
    • When Is It Safe to Delete the S3 Bucket?
    • Summary
  • Create a ParallelCluster via AWS CLI
    • Create the cluster
      • Checklist for pcluster-create.yml
      • Example pcluster-create.yml
    • Monitor the creation process
    • Delete ParallelCluster
      • Delete by cluster name
      • Find the cluster name (if forgotten)
      • Recover the configuration YAML used to create a cluster
  • Teardown and cleanup
    • EC2 instances
    • ParallelCluster

Resources

  • Templates
    • Template: AMI build configuration (ami-build.yaml)
      • Example template
      • Optional: Custom build steps
    • Template: Cluster configuration (pcluster-create.yml)
      • Example template (Slurm + FSx for Lustre)
      • Common placeholders
      • Quick validation commands
  • Cost considerations: S3 vs FSx for Lustre vs compute
    • High-level pricing overview
    • S3: lowest-cost persistent storage
    • FSx for Lustre: high-performance, higher-cost storage
    • Compute cost usually dominates overall spending
    • Recommended cost-efficient pattern
  • Reference links
AWS setup at Harvard
  • Reference links
  • View page source

Reference links

  • AWS documentation

  • AWS ParallelCluster documentation

  • FSx for Lustre documentation

  • Pricing quote

  • EC2 instance information

Previous

© Copyright 2026, Dandan Zhang.

Built with Sphinx using a theme provided by Read the Docs.