Question 1

How is the AI training cost calculated?

Accepted Answer

Compute is estimated using the standard rule of 6 × parameters × tokens FLOPs for full training, or about two-thirds of that for LoRA fine-tunes (the frozen base weights skip their gradient step, while the forward and activation-gradient passes still run in full). We divide by your GPU's sustained TFLOPs at the chosen precision, scale by the GPU count, and apply a model FLOPs utilization factor (MFU) to reflect real-world throughput. Cost is then wall-clock hours × GPU count × per-GPU rate.

Question 2

How accurate is this GPU training cost estimate?

Accepted Answer

Within about 20% for most jobs. The math matches published training runs (Llama 3 70B at roughly 6.4M H100-hours, Llama 3 405B at roughly 30M H100-hours) when you use realistic MFU values of 0.4 to 0.55. Real costs vary with checkpoint frequency, data loading, eval passes, and the failure rate of your training stack.

Question 3

Why is the AWS and GCP price rough?

Accepted Answer

We use each cloud's public on-demand list price for the GPU you picked. That ignores reserved discounts, savings plans, EDP credits, egress, and storage. It also assumes you can get capacity at list price, which is often not true for H100 and B200. The number is directional, not a quote.

Question 4

What is the difference between on-demand, spot, and reserved on Spheron?

Accepted Answer

On-demand bills per minute with no commitment and a 99.99% SLA. Spot is the same hardware at a discount and can be reclaimed when demand rises, which makes it a fit for fault-tolerant training with checkpointing. Reserved gives you a discount in exchange for a longer commitment and dedicated capacity for multi-week or multi-month runs.

Question 5

Does the calculator include networking, storage, or data transfer costs?

Accepted Answer

No. The calculator covers GPU compute only. Storage, ingress, and egress are not modeled here. Spheron does not charge egress fees, which is the largest hidden line item on AWS, GCP, and Azure for training workloads.

Question 6

What is MFU and how do I pick a value?

Accepted Answer

Model FLOPs utilization is the fraction of a GPU's peak FLOPs you actually use. 0.45 is a reasonable default for dense transformer training on H100 with FlashAttention and FP8. Well-tuned Megatron-LM jobs hit 0.5 to 0.55. MoE models and unoptimized stacks land closer to 0.25 to 0.35.

Question 7

Can I use this for LoRA and QLoRA fine-tuning?

Accepted Answer

Yes. Pick LoRA Fine-Tune as the training approach. The calculator scales compute to about two-thirds of a full fine-tune: LoRA freezes the base weights and skips their gradient, but it still runs the full forward and activation-gradient passes, so per-token throughput is close to full fine-tuning rather than 10x faster. LoRA's real wins are memory and the tiny number of trained parameters, not raw FLOPs. QLoRA is similar on the FLOPs side; the memory savings from 4-bit quantization do not change wall-clock time materially.

Question 8

How much does it cost to train a 70B model?

Accepted Answer

It depends on whether you pretrain or fine-tune. A full pretrain of a 70B model on 15 trillion tokens runs a few million H100-hours, which lands in the seven-figure range. A supervised fine-tune of the same model is far cheaper: one team completed a full 70B fine-tune for $11,200 using spot H100s. Set 70B as the model size above, then switch between Pretrain and Full Fine-Tune to see both numbers on live pricing.

Question 9

Is it cheaper to fine-tune or train from scratch?

Accepted Answer

Fine-tuning is cheaper by orders of magnitude, but mostly because it runs on far fewer tokens, not because each token is cheaper. Pretraining spends compute on every token in a multi-trillion-token corpus, while a fine-tune runs on a small fraction of that. Per token, a LoRA fine-tune still costs about two-thirds of a full fine-tune, since the forward pass and activation gradients run in full. Unless you need a new base model, start from an open-weight checkpoint and fine-tune.

Plan	Est. cost
Spheron on-demand	$743,769
Spheron reserved (35% off)	$483,450
AWS on-demand	$1.44M
Google Cloud on-demand	$1.11M
Azure on-demand	$2.59M

AI & LLM Training Cost Calculator

Training cost examples: Llama 3 70B, 405B, and a 7B LoRA

How much does it cost to train an LLM?

How LLM training cost is calculated

Not Sure Which GPU?

Training cost calculator FAQ