NVIDIA GB200 NVL72
Grace Blackwell Rack-Scale GPU
72x B200 · 36x Grace CPU · 13.5 TB HBM3e · 130 TB/s NVLink · 1.4 EFLOPS FP4
The NVIDIA GB200 NVL72 is a rack-scale system pairing 72 Blackwell B200 GPUs with 36 Grace ARM CPUs over NVLink 5 at 130 TB/s of aggregate bandwidth. Each B200 ships with 192 GB of HBM3e and 8 TB/s of memory bandwidth, giving each NVL72 rack 13.5 TB of unified HBM and 1.4 EFLOPS of FP4 dense compute. It is purpose-built for trillion-parameter LLM training and 100B+ inference where rack-scale unified memory and NVLink fabric are the determining factors. Spheron reserves GB200 NVL72 capacity across data center partner regions in any count, from a single 8-GPU node to multi-rack deployments.
Tell us how many GPUs you need and what you're running. Our team confirms availability within one business day and gets you provisioned.
Submit Your Request
Tell us how many GPUs you need. Our team confirms availability within one business day and gets you provisioned.
Reserve NVIDIA GB200 Blackwell GPUs on Spheron in any count, from a single 8-GPU node to multi-rack deployments. Each GPU is a Blackwell B200 with 192 GB HBM3e. Inside an NVL72 rack, 72 GPUs and 36 Grace CPUs share a single NVLink 5 domain at 130 TB/s, exposing 13.5 TB of unified memory for large-scale LLM training and inference. Submit the form with your GPU count and our team confirms availability within one business day. For workloads that don't need GB200 specifically, B200 per-GPU rentals and H200 are on per-minute billing today.
Where GB200 sits in the stack
GB200 is the broadly-deployed Blackwell rack-scale system. GB300 is the Blackwell Ultra upgrade with 50% more memory. Rubin R100 is the next generation, available H2 2026.
GB200 NVL72 specifications
Per-rack specifications. Smaller reservations inherit the same architecture at the slice they're allocated. All systems run in NVL72 reference configuration with liquid cooling, 2:1 InfiniBand fat-tree fabric across racks, and persistent NVMe per chassis.
GB200 NVL72 vs GB300 NVL72 vs HGX H100
| Spec (per rack) | GB200 NVL72Reserve | GB300 NVL72 | HGX H100 (8x) |
|---|---|---|---|
| Architecture | Blackwell | Blackwell Ultra | Hopper |
| GPUs / rack | 72 × B200 | 72 × B300 | 8 × H100 |
| Total HBM | 13.5 TB HBM3e | 20.7 TB HBM3e | 640 GB HBM3 |
| FP4 sparse | 1.44 EFLOPS | 2.16 EFLOPS | N/A |
| FP4 dense per GPU | 10 PFLOPS | 15 PFLOPS | N/A |
| FP8 throughput | 720 PFLOPS | 720 PFLOPS | 16 PFLOPS |
| NVLink fabric | 130 TB/s | 130 TB/s | 7.2 TB/s (8 GPUs) |
| CPU | 36 × Grace | 36 × Grace | x86 host |
| Networking | ConnectX-7 · 400 Gb/s | ConnectX-8 · 800 Gb/s | ConnectX-7 · 400 Gb/s |
| Spheron access | By reservation | By reservation | Available |
Per-rack specifications for NVL72 systems. HGX H100 figures are for a standard 8-GPU node for relative scale, since H100 does not come in NVL72 configuration. GB200 specs match the NVIDIA GB200 NVL72 reference design.
Workloads built for GB200
Large-Scale LLM Pre-Training
Each B200 GPU has 192 GB HBM3e. A single 8-GPU node holds 1.5 TB; a full NVL72 rack holds 13.5 TB inside one NVLink domain. 1.44 EFLOPS FP4 per rack is a 4x lift over equivalent H100 capacity for 200B to 1T parameter pre-training. Reserve the count that fits your run.
High-Throughput FP4 Inference
B200 doubles FP4 throughput over H100 and delivers 8 TB/s HBM bandwidth per GPU. 130 TB/s of NVLink fabric inside an NVL72 rack supports tensor and pipeline parallelism for large models without leaving the rack. Smaller reservations fit 70B to 200B parameter inference on 8 to 32 GPUs.
Reasoning and Agentic Workloads
Grace CPUs handle orchestration, retrieval, and tool calls in the same coherent address space as the GPUs over 900 GB/s NVLink-C2C, removing PCIe round-trips that slow down agent loops on x86 nodes. Reserve from a few GPUs for prototyping up to full racks for production serving.
Fine-Tuning and Post-Training
Reserve the GPU count that matches your run. DPO and GRPO on 70B models fit on a single 8-GPU node. Large-scale RL with rollouts, value heads, and reference models benefits from full-rack co-location inside one NVLink domain. Persistent NVMe per chassis handles checkpoint streaming at any scale.
When to pick GB200
Pick GB200 if
You want Blackwell rack-scale at the best price-to-performance. Reserve any quantity, from a single 8-GPU node to multi-rack deployments. Models that need rack-scale memory and fit in 13.5 TB of HBM3e (most workloads under 1T parameters) get the same NVLink 5 fabric and Grace CPU coupling as GB300 at lower cost.
Pick GB300 instead if
Your workload needs 288 GB per GPU or is bottlenecked on attention compute. GB300 has 50% more HBM3e per GPU and 2x attention throughput over GB200, with ConnectX-8 (800 Gb/s) replacing ConnectX-7 (400 Gb/s). For 1T+ parameter training and reasoning workloads, GB300 is the upgrade.
Pick B200 (per GPU) instead if
You want per-minute billing with no commitment. B200 SXM5 is available on Spheron on a per-GPU or 8-GPU node basis with on-demand and spot pricing. For most 70B to 200B parameter inference and fine-tuning jobs that don't need a sales conversation, B200 is the simpler call.
Pick R100 if you can wait
R100 Rubin is available H2 2026 with 22 TB/s per-GPU bandwidth (2.75x B200) and 50 PFLOPS FP4 (5x B200 dense). For new training runs with flexible timelines, R100 is the higher-ceiling option. If you need to start now, GB200 is live today.
Other GPUs on Spheron
For workloads that don't need GB200 specifically, Spheron also offers B200, B300, and H200 on per-minute billing with no commitments.
Related resources
NVIDIA GB200 NVL72 Architecture Guide
How the NVL72 rack pairs 72 B200 GPUs with 36 Grace CPUs, 13.5 TB of unified HBM3e, NVLink 5 fabric, and what workloads benefit from each scale.
NVIDIA B200 GPU: Specs, Pricing, and Workloads
Single-GPU and 8-GPU B200 SXM5 access on Spheron. 192 GB HBM3e, FP4 Transformer Engine, and per-minute billing with no commitments.
NVIDIA GB300 NVL72 Architecture Guide
The Blackwell Ultra upgrade. 50% more HBM3e per GPU, 2x attention compute, ConnectX-8 networking. Where GB300 makes sense over GB200.