AI & LLM Training Cost Calculator
Enter a model size, a dataset size, and the GPU you want. Get the total compute, the wall-clock time, and the GPU training cost on Spheron, with live pricing benchmarked against AWS and GCP. Works for pretraining a model from scratch or fine-tuning an LLM with LoRA.
Inputs
Modeling assumptions
Details →Cost on Spheron
This run takes over a year. Consider more GPUs or a smaller model.
Cost breakdown
| Spheron on-demand | $743,440 |
| Spheron reserved (35% off) | $483,236 |
| AWS on-demand | $1.44M |
| Google Cloud on-demand | $1.11M |
| Azure on-demand | $2.59M |
You save $701,005 versus AWS (49%).
Estimates use 8× H100 at $2.01/hr on-demand. AWS and GCP figures are public list prices and exclude reserved discounts, egress, and storage. Math: how this is calculated.
Training cost examples: Llama 3 70B, 405B, and a 7B LoRA
Click any example to load it into the calculator. Numbers are based on published training runs and live Spheron pricing.
How much does it cost to train an LLM?
Training an LLM costs anywhere from under $100 to tens of millions of dollars. Three inputs set the number: the size of the model, the number of tokens you train on, and the per-hour rate of the GPU you run on. The calculator above turns those into a dollar figure for any model. To put the range in context, here is what four common jobs cost on live Spheron H100 pricing.
The jump from fine-tuning to pretraining is why almost no one trains from scratch. Fine-tuning an open model on on-demand H100 access gets you a custom model for a rounding error against a from-scratch run. For a full worked example, see how a small team trained a 70B model for $11,200 on spot GPUs, then price your own job in the calculator above.
How LLM training cost is calculated
Training a model takes a known amount of compute. The standard rule of thumb is 6 × parameters × tokens FLOPs for full training. LoRA fine-tunes use about two-thirds of that: the base weights are frozen, so you skip computing their gradients, but the full forward pass and the activation-gradient backward pass still run.
We take that FLOPs total and divide by your GPU's sustained throughput at the chosen precision. H100 delivers roughly 700 TFLOPs at FP16 and 1400 TFLOPs at FP8 in real training jobs. B200 roughly doubles those numbers.
Then we apply model FLOPs utilization (MFU), the fraction of peak throughput you actually get. Well-tuned dense transformer training on H100 lands around 0.45. The calculator defaults to that and lets you tune it in the advanced panel.
Cost is then wall-clock hours × GPU count × per-GPU rate. Spheron rates come live from the marketplace API. AWS, GCP, and Azure rates are public list prices for the same GPU and exclude reserved discounts, egress, and storage.
Not sure which GPU to pick?
Browse the full GPU catalog with live per-hour pricing across H100, H200, B200, B300, A100, L40S, RTX PRO 6000, and more. Per-minute billing, no commitment.