Pricing depends on quantity, commitment length, region, and networking requirements. Smaller reservations run closer to per-hour rates; multi-rack reservations come with committed pricing and priority allocation. Submit the form with your GPU count and our team shares a quote within one business day.

Spheron GPU Catalog

Blackwell · Grace + NVLink 5

PipelineReservations Open

Any Quantity

NVIDIA GB200 NVL72
Grace Blackwell Rack-Scale

Q: How do I reserve GB200 NVL72 capacity on Spheron?

Submit the form on this page with your expected GPU count, timeline, and intended workload. Our team confirms availability, region, and pricing within one business day. Reservations are allocated per request across Tier 3 and Tier 4 liquid-cooled data center partner regions.

Q: What's the minimum and maximum quantity I can reserve?

Reserve any count. The smallest configuration is a single 8-GPU node. From there you can scale to 36 GPUs (half rack), 72 GPUs (full NVL72 rack), or multi-rack deployments in the hundreds. Smaller allocations still sit inside an NVL72 NVLink domain, so you keep the same interconnect at any scale.

Q: How fast can I get provisioned after submitting the form?

Our team confirms availability within one business day. From there, smaller reservations (single 8-GPU nodes, half racks) typically come online within days. Multi-rack deployments take longer depending on region and networking requirements. We share the timeline along with the quote.

72x B200 · 13.5 TB HBM3e · 1.4 EFLOPS FP4

NVIDIA's Blackwell rack-scale system. 72 B200 GPUs and 36 Grace CPUs share one NVLink 5 domain with 13.5 TB of HBM3e and 1.4 EFLOPS of FP4. Built for trillion-parameter training and 100B+ inference where rack-scale memory and NVLink fabric decide the run. Reserve any count on Spheron, from a single 8-GPU node to multi-rack.

Tell us how many GPUs you need. We confirm availability within one business day.

Per GPU192 GB HBM3e

Per Rack1.44 EFLOPS FP4

NVLink130 TB/s

Starts at8 GPUs

Other GPU options

View all GPU pricing →

ReserveGB200 NVL72

Submit Your Request

Tell us how many GPUs you need. Our team confirms availability within one business day and gets you provisioned.

First Name *

Last Name *

Company *

Job Title

Work Email *

Phone *

GPU Count *

Heard via *

Intended Workload *

I agree to Spheron's privacy policy and to be contacted about GB200 availability and Spheron GPU offerings.I'd like to receive product updates, announcements, and marketing communications from Spheron. You can unsubscribe at any time.

At a glance

Reserve NVIDIA GB200 Blackwell GPUs on Spheron in any count, from a single 8-GPU node to multi-rack deployments. Each GPU is a Blackwell B200 with 192 GB HBM3e. Inside an NVL72 rack, 72 GPUs and 36 Grace CPUs share a single NVLink 5 domain at 130 TB/s, exposing 13.5 TB of unified memory for large-scale LLM training and inference. Submit the form with your GPU count and our team confirms availability within one business day. For workloads that don't need GB200 specifically, B200 per-GPU rentals and H200 are on per-minute billing today.

Where GB200 sits in the stack

GB200 is the broadly-deployed Blackwell rack-scale system. GB300 is the Blackwell Ultra upgrade with 50% more memory. Rubin R100 is the next generation, available H2 2026.

BlackwellReserve

GB200

192 GB HBM3e

This page

GB200 NVL72 specifications

GPUs per rack

72 × B200 Blackwell

Grace CPUs

36 × 72-core Neoverse V2

Total HBM3e

13.5 TB unified

System Memory

17 TB LPDDR5X

FP4 Compute

1.44 EFLOPS

FP8 Compute

720 PFLOPS

NVLink 5 Fabric

130 TB/s aggregate

GPU-to-CPU

900 GB/s NVLink-C2C

Networking

ConnectX-7 · 400 Gb/s

Rack power

~120 kW liquid-cooled

Per-rack specifications. Smaller reservations inherit the same architecture at the slice they're allocated. All systems run in NVL72 reference configuration with liquid cooling, 2:1 InfiniBand fat-tree fabric across racks, and persistent NVMe per chassis.

GB200 NVL72 vs GB300 NVL72 vs HGX H100

Spec (per rack)	GB200 NVL72Reserve	GB300 NVL72	HGX H100 (8x)
Architecture	Blackwell	Blackwell Ultra	Hopper
GPUs / rack	72 × B200	72 × B300	8 × H100
Total HBM	13.5 TB HBM3e	20.7 TB HBM3e	640 GB HBM3
FP4 sparse	1.44 EFLOPS	2.16 EFLOPS	N/A
FP4 dense per GPU	10 PFLOPS	15 PFLOPS	N/A
FP8 throughput	720 PFLOPS	720 PFLOPS	16 PFLOPS
NVLink fabric	130 TB/s	130 TB/s	7.2 TB/s (8 GPUs)
CPU	36 × Grace	36 × Grace	x86 host
Networking	ConnectX-7 · 400 Gb/s	ConnectX-8 · 800 Gb/s	ConnectX-7 · 400 Gb/s
Spheron access	By reservation	By reservation	Available

Per-rack specifications for NVL72 systems. HGX H100 figures are for a standard 8-GPU node for relative scale, since H100 does not come in NVL72 configuration. GB200 specs match the NVIDIA GB200 NVL72 reference design.

Workloads built for GB200

Use case / 01

GB200

🔬

Large-Scale LLM Pre-Training

Each B200 GPU has 192 GB HBM3e. A single 8-GPU node holds 1.5 TB; a full NVL72 rack holds 13.5 TB inside one NVLink domain. 1.44 EFLOPS FP4 per rack is a 4x lift over equivalent H100 capacity for 200B to 1T parameter pre-training. Reserve the count that fits your run.

200B to 1T dense and MoE pre-trainingMulti-modal foundation modelsRLHF and post-training at scaleLong-sequence transformer training

Use case / 02

GB200

⚡

High-Throughput FP4 Inference

B200 doubles FP4 throughput over H100 and delivers 8 TB/s HBM bandwidth per GPU. 130 TB/s of NVLink fabric inside an NVL72 rack supports tensor and pipeline parallelism for large models without leaving the rack. Smaller reservations fit 70B to 200B parameter inference on 8 to 32 GPUs.

MoE serving at high concurrencyLong context windows (256K to 1M tokens)Disaggregated prefill and decodeReal-time agentic inference for 200B+ models

Use case / 03

GB200

🧠

Reasoning and Agentic Workloads

Grace CPUs handle orchestration, retrieval, and tool calls in the same coherent address space as the GPUs over 900 GB/s NVLink-C2C, removing PCIe round-trips that slow down agent loops on x86 nodes. Reserve from a few GPUs for prototyping up to full racks for production serving.

Test-time compute and chain-of-thought servingTool-using agents with shared memoryCode generation with 200K+ contextMulti-step planning and self-reflection

Use case / 04

GB200

🚀

Fine-Tuning and Post-Training

Reserve the GPU count that matches your run. DPO and GRPO on 70B models fit on a single 8-GPU node. Large-scale RL with rollouts, value heads, and reference models benefits from full-rack co-location inside one NVLink domain. Persistent NVMe per chassis handles checkpoint streaming at any scale.

DPO, GRPO, and PPO fine-tuningReward modeling for production LLMsDistillation pipelines into smaller modelsCurriculum learning with millions of rollouts

When to pick GB200

Scenario 01

Pick GB200 if

You want Blackwell rack-scale at the best price-to-performance. Reserve any quantity, from a single 8-GPU node to multi-rack deployments. Models that need rack-scale memory and fit in 13.5 TB of HBM3e (most workloads under 1T parameters) get the same NVLink 5 fabric and Grace CPU coupling as GB300 at lower cost.

Recommended fit

Scenario 02

Pick GB300 instead if

Your workload needs 288 GB per GPU or is bottlenecked on attention compute. GB300 has 50% more HBM3e per GPU and 2x attention throughput over GB200, with ConnectX-8 (800 Gb/s) replacing ConnectX-7 (400 Gb/s). For 1T+ parameter training and reasoning workloads, GB300 is the upgrade.

Recommended fit

Scenario 03

Pick B200 (per GPU) instead if

You want per-minute billing with no commitment. B200 SXM5 is available on Spheron on a per-GPU or 8-GPU node basis with on-demand and spot pricing. For most 70B to 200B parameter inference and fine-tuning jobs that don't need a sales conversation, B200 is the simpler call.

Recommended fit

Scenario 04

Pick R100 if you can wait

R100 Rubin is available H2 2026 with 22 TB/s per-GPU bandwidth (2.75x B200) and 50 PFLOPS FP4 (5x B200 dense). For new training runs with flexible timelines, R100 is the higher-ceiling option. If you need to start now, GB200 is live today.

Recommended fit

Other GPUs on Spheron

For workloads that don't need GB200 specifically, Spheron also offers B200, B300, and H200 on per-minute billing with no commitments.

B200192 GB HBM3e

Same Blackwell silicon as GB200, billed by the minute.

From$5.34/hrlive

Rent B200 →

B300288 GB HBM3e

Blackwell Ultra. 50% more memory per GPU than B200.

From$3.35/hrlive

Rent B300 →

H200141 GB HBM3e

Hopper. Best value for inference and long-context serving.

From$1.76/hrlive

Rent H200 →

Related resources

01Read

NVIDIA GB200 NVL72 Architecture Guide

How the NVL72 rack pairs 72 B200 GPUs with 36 Grace CPUs, 13.5 TB of unified HBM3e, NVLink 5 fabric, and what workloads benefit from each scale.

02Read

NVIDIA B200 GPU: Specs, Pricing, and Workloads

Single-GPU and 8-GPU B200 SXM5 access on Spheron. 192 GB HBM3e, FP4 Transformer Engine, and per-minute billing with no commitments.

03Read

NVIDIA GB300 NVL72 Architecture Guide

The Blackwell Ultra upgrade. 50% more HBM3e per GPU, 2x attention compute, ConnectX-8 networking. Where GB300 makes sense over GB200.

FAQ / 07

GB200 FAQ

How do I reserve GB200 NVL72 capacity on Spheron?

What's the minimum and maximum quantity I can reserve?

What's the difference between GB200 and B200?

B200 is the individual GPU: 192 GB HBM3e, 8 TB/s bandwidth, 10 PFLOPS dense FP4. It runs in standard 8-GPU SXM5 nodes connected to an x86 host. GB200 is the same B200 silicon deployed in NVL72 racks that pair 72 GPUs with 36 Grace CPUs over NVLink 5, exposing 13.5 TB of unified memory inside one NVLink domain. B200 is rented per GPU on per-minute billing. GB200 is reserved by request in any quantity and our team provisions.

How is GB200 priced?

How does GB200 compare to GB300?

GB300 swaps B200 GPUs (192 GB HBM3e) for B300 GPUs (288 GB HBM3e), raising total rack memory from 13.5 TB to 20.7 TB. Per-GPU dense FP4 increases from 10 to 15 PFLOPS and B300 adds 2x attention compute. Networking upgrades from ConnectX-7 (400 Gb/s) to ConnectX-8 (800 Gb/s) per GPU. NVLink 5 fabric, Grace CPU layout, and software stack remain consistent, so workloads port directly between the two systems.

What workloads is GB200 built for?

Models in the 200B to 1T parameter range that benefit from rack-scale memory and NVLink interconnect. 1.44 EFLOPS FP4 per rack supports trillion-parameter dense and MoE training, frontier inference at high concurrency, long-context serving, and large-scale RL post-training. Smaller GB200 reservations also work for fine-tuning and inference on the latest Blackwell architecture at lower cost than GB300.

How fast can I get provisioned after submitting the form?

Also consider

GB300

288GB

Blackwell Ultra · 288 GB HBM3e · 8 TB/s

B200

192GB

Single-GPU Blackwell. Per-minute billing, no commit.

R100

288GB HBM4

Rubin pre-order. Next-gen architecture, H2 2026.

ExploreBrowse all GPUs

View catalog

CompareView pricing comparison

See all rates

NVIDIA GB200 NVL72Grace Blackwell Rack-Scale

Submit Your Request

Where GB200 sits in the stack

GB200 NVL72 specifications

GB200 NVL72 vs GB300 NVL72 vs HGX H100

Workloads built for GB200

Large-Scale LLM Pre-Training

High-Throughput FP4 Inference

Reasoning and Agentic Workloads

Fine-Tuning and Post-Training

When to pick GB200

Pick GB200 if

Pick GB300 instead if

Pick B200 (per GPU) instead if

Pick R100 if you can wait

Other GPUs on Spheron

Related resources

NVIDIA GB200 NVL72 Architecture Guide

NVIDIA B200 GPU: Specs, Pricing, and Workloads

NVIDIA GB300 NVL72 Architecture Guide

GB200 FAQ

Also consider

GB300

B200

R100

NVIDIA GB200 NVL72
Grace Blackwell Rack-Scale