.. _pricing: Cost considerations: S3 vs FSx for Lustre vs compute ------------------------------------------------------ When designing AWS workflows for data-intensive modeling (e.g., GEOS-Chem / GCHP), it is important to distinguish **persistent storage**, **high-performance scratch storage**, and **compute cost**, as they differ substantially in pricing and intended use. High-level pricing overview ^^^^^^^^^^^^^^^^^^^^^^^^^^^ The table below summarizes typical AWS costs relevant to data-intensive HPC workflows. Prices are approximate and may vary by region, availability zone, and market conditions. .. list-table:: :widths: 22 18 20 20 :header-rows: 1 * - Resource - Typical cost - Billing unit - Notes / intended use * - Amazon S3 Standard - ~$0.023 - per GB-month - Lowest-cost persistent storage. * - FSx for Lustre (Scratch 2) - ~$0.140 - per GB-month - High-performance parallel filesystem for runtime I/O. Significantly more expensive than S3; intended for short-lived use. * - EC2 Spot instances - ~$1 - per node-hour - Compute cost usually dominates total spend. Price varies strongly by instance type and availability zone. S3: lowest-cost persistent storage ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Amazon S3 is the most cost-effective option for long-term storage of model input data, restart files, and archived outputs. - **S3 Standard (active storage)** costs **$0.023 per GB-month** (`AWS S3 pricing `_). - There is **no minimum storage duration** for S3 Standard. - S3 is designed for **durability and scalability**, not low-latency parallel I/O, which makes it ideal as a *persistent data lake* rather than a runtime filesystem. - Additional cost components to be aware of: - **Requests** (PUT, GET, LIST), which are typically negligible compared to storage for scientific workflows. - **Data transfer**: - Data transfer into S3 is free from local files or same-regions transfer. - Data transfer from S3 to EC2 or FSx within the same region is free (common case for HPC workflows). - Data transfer **out of AWS** (to the public internet) is charged. FSx for Lustre: high-performance, higher-cost storage ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Amazon FSx for Lustre provides a **parallel POSIX filesystem** optimized for high-throughput and low-latency I/O during model execution. - **FSx for Lustre Scratch 2** costs approximately **$0.140 per GB-month** for provisioned storage (`AWS FSx for Lustre pricing `_). - FSx is intended for: - Runtime model input/output - Checkpointing - High-frequency parallel reads and writes - FSx storage is **significantly more expensive than S3** and should therefore be treated as **temporary or performance-critical storage**, not long-term storage. - FSx file systems can be linked to S3 using **Data Repository Associations (DRA)**, enabling data to be staged from S3 into FSx and written back when jobs complete. Compute cost usually dominates overall spending ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ For most HPC workloads, **compute cost dominates total cost**, while storage (often S3) is comparatively cheap. - Spot EC2 instances commonly cost **~$1 per node-hour**, depending on: - Instance type - Availability zone - Spot market conditions - Pricing references: - `AWS EC2 Spot pricing `_ - `Vantage instance price summary `_ As a result: - **Reducing wall-clock runtime** (e.g., faster I/O using FSx) can save more money than minimizing storage costs. - Paying for short-lived FSx storage is often justified if it substantially reduces expensive compute time. Recommended cost-efficient pattern ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ A common and cost-effective pattern is: 1. Store all persistent data (input datasets, restarts, archived outputs) in **S3**. 2. Stage required data from S3 to **FSx for Lustre** before or during job startup. 3. Run compute-intensive jobs using FSx for high-performance I/O. 4. Write final outputs back to S3. 5. Delete or reuse FSx file systems as needed to minimize storage duration. This approach leverages the **low cost of S3**, the **performance of FSx**, and acknowledges that **compute time is the primary cost driver** for large-scale modeling workloads.