Spheron GPU Catalog
Blackwell · Grace + NVLink 5
PipelineReservations Open

NVIDIA GB200 NVL72
Grace Blackwell Rack-Scale GPU

72x B200 · 36x Grace CPU · 13.5 TB HBM3e · 130 TB/s NVLink · 1.4 EFLOPS FP4

The NVIDIA GB200 NVL72 is a rack-scale system pairing 72 Blackwell B200 GPUs with 36 Grace ARM CPUs over NVLink 5 at 130 TB/s of aggregate bandwidth. Each B200 ships with 192 GB of HBM3e and 8 TB/s of memory bandwidth, giving each NVL72 rack 13.5 TB of unified HBM and 1.4 EFLOPS of FP4 dense compute. It is purpose-built for trillion-parameter LLM training and 100B+ inference where rack-scale unified memory and NVLink fabric are the determining factors. Spheron reserves GB200 NVL72 capacity across data center partner regions in any count, from a single 8-GPU node to multi-rack deployments.

Tell us how many GPUs you need and what you're running. Our team confirms availability within one business day and gets you provisioned.

Per GPU192 GB HBM3e
Per Rack1.44 EFLOPS FP4
NVLink130 TB/s
Starts at8 GPUs
ReserveGB200 NVL72

Submit Your Request

Tell us how many GPUs you need. Our team confirms availability within one business day and gets you provisioned.

At a glance

Reserve NVIDIA GB200 Blackwell GPUs on Spheron in any count, from a single 8-GPU node to multi-rack deployments. Each GPU is a Blackwell B200 with 192 GB HBM3e. Inside an NVL72 rack, 72 GPUs and 36 Grace CPUs share a single NVLink 5 domain at 130 TB/s, exposing 13.5 TB of unified memory for large-scale LLM training and inference. Submit the form with your GPU count and our team confirms availability within one business day. For workloads that don't need GB200 specifically, B200 per-GPU rentals and H200 are on per-minute billing today.

Where GB200 sits in the stack

GB200 is the broadly-deployed Blackwell rack-scale system. GB300 is the Blackwell Ultra upgrade with 50% more memory. Rubin R100 is the next generation, available H2 2026.

GB200 NVL72 specifications

GPUs per rack
72 × B200 Blackwell
Grace CPUs
36 × 72-core Neoverse V2
Total HBM3e
13.5 TB unified
System Memory
17 TB LPDDR5X
FP4 Compute
1.44 EFLOPS
FP8 Compute
720 PFLOPS
NVLink 5 Fabric
130 TB/s aggregate
GPU-to-CPU
900 GB/s NVLink-C2C
Networking
ConnectX-7 · 400 Gb/s
Rack power
~120 kW liquid-cooled

Per-rack specifications. Smaller reservations inherit the same architecture at the slice they're allocated. All systems run in NVL72 reference configuration with liquid cooling, 2:1 InfiniBand fat-tree fabric across racks, and persistent NVMe per chassis.

GB200 NVL72 vs GB300 NVL72 vs HGX H100

Spec (per rack)GB200 NVL72GB300 NVL72HGX H100 (8x)
ArchitectureBlackwellBlackwell UltraHopper
GPUs / rack72 × B20072 × B3008 × H100
Total HBM13.5 TB HBM3e20.7 TB HBM3e640 GB HBM3
FP4 sparse1.44 EFLOPS2.16 EFLOPSN/A
FP4 dense per GPU10 PFLOPS15 PFLOPSN/A
FP8 throughput720 PFLOPS720 PFLOPS16 PFLOPS
NVLink fabric130 TB/s130 TB/s7.2 TB/s (8 GPUs)
CPU36 × Grace36 × Gracex86 host
NetworkingConnectX-7 · 400 Gb/sConnectX-8 · 800 Gb/sConnectX-7 · 400 Gb/s
Spheron accessBy reservationBy reservationAvailable

Per-rack specifications for NVL72 systems. HGX H100 figures are for a standard 8-GPU node for relative scale, since H100 does not come in NVL72 configuration. GB200 specs match the NVIDIA GB200 NVL72 reference design.

Workloads built for GB200

Use case / 01
🔬

Large-Scale LLM Pre-Training

Each B200 GPU has 192 GB HBM3e. A single 8-GPU node holds 1.5 TB; a full NVL72 rack holds 13.5 TB inside one NVLink domain. 1.44 EFLOPS FP4 per rack is a 4x lift over equivalent H100 capacity for 200B to 1T parameter pre-training. Reserve the count that fits your run.

200B to 1T dense and MoE pre-trainingMulti-modal foundation modelsRLHF and post-training at scaleLong-sequence transformer training
Use case / 02

High-Throughput FP4 Inference

B200 doubles FP4 throughput over H100 and delivers 8 TB/s HBM bandwidth per GPU. 130 TB/s of NVLink fabric inside an NVL72 rack supports tensor and pipeline parallelism for large models without leaving the rack. Smaller reservations fit 70B to 200B parameter inference on 8 to 32 GPUs.

MoE serving at high concurrencyLong context windows (256K to 1M tokens)Disaggregated prefill and decodeReal-time agentic inference for 200B+ models
Use case / 03
🧠

Reasoning and Agentic Workloads

Grace CPUs handle orchestration, retrieval, and tool calls in the same coherent address space as the GPUs over 900 GB/s NVLink-C2C, removing PCIe round-trips that slow down agent loops on x86 nodes. Reserve from a few GPUs for prototyping up to full racks for production serving.

Test-time compute and chain-of-thought servingTool-using agents with shared memoryCode generation with 200K+ contextMulti-step planning and self-reflection
Use case / 04
🚀

Fine-Tuning and Post-Training

Reserve the GPU count that matches your run. DPO and GRPO on 70B models fit on a single 8-GPU node. Large-scale RL with rollouts, value heads, and reference models benefits from full-rack co-location inside one NVLink domain. Persistent NVMe per chassis handles checkpoint streaming at any scale.

DPO, GRPO, and PPO fine-tuningReward modeling for production LLMsDistillation pipelines into smaller modelsCurriculum learning with millions of rollouts

When to pick GB200

Scenario 01

Pick GB200 if

You want Blackwell rack-scale at the best price-to-performance. Reserve any quantity, from a single 8-GPU node to multi-rack deployments. Models that need rack-scale memory and fit in 13.5 TB of HBM3e (most workloads under 1T parameters) get the same NVLink 5 fabric and Grace CPU coupling as GB300 at lower cost.

Recommended fit
Scenario 02

Pick GB300 instead if

Your workload needs 288 GB per GPU or is bottlenecked on attention compute. GB300 has 50% more HBM3e per GPU and 2x attention throughput over GB200, with ConnectX-8 (800 Gb/s) replacing ConnectX-7 (400 Gb/s). For 1T+ parameter training and reasoning workloads, GB300 is the upgrade.

Recommended fit
Scenario 03

Pick B200 (per GPU) instead if

You want per-minute billing with no commitment. B200 SXM5 is available on Spheron on a per-GPU or 8-GPU node basis with on-demand and spot pricing. For most 70B to 200B parameter inference and fine-tuning jobs that don't need a sales conversation, B200 is the simpler call.

Recommended fit
Scenario 04

Pick R100 if you can wait

R100 Rubin is available H2 2026 with 22 TB/s per-GPU bandwidth (2.75x B200) and 50 PFLOPS FP4 (5x B200 dense). For new training runs with flexible timelines, R100 is the higher-ceiling option. If you need to start now, GB200 is live today.

Recommended fit

Other GPUs on Spheron

For workloads that don't need GB200 specifically, Spheron also offers B200, B300, and H200 on per-minute billing with no commitments.

Related resources

FAQ / 07

GB200 FAQ

Also consider