Spheron GPU Catalog
Blackwell Ultra · Grace + NVLink 5
Capacity1024+ Available

NVIDIA GB300 NVL72
Grace Blackwell Ultra Rack-Scale GPU

72x B300 · 36x Grace CPU · 20.7 TB HBM3e · 130 TB/s NVLink · 1.4 EFLOPS FP4 · 1024+ GPUs available

The NVIDIA GB300 NVL72 is the Blackwell Ultra rack-scale system pairing 72 B300 GPUs with 36 Grace ARM CPUs over NVLink 5 at 130 TB/s of aggregate bandwidth. Each B300 ships with 288 GB of HBM3e (50% more than B200) and 8 TB/s of memory bandwidth, giving each NVL72 rack 20.7 TB of unified HBM and 1.4 EFLOPS of FP4 dense compute. It is the highest-VRAM rack-scale GPU system NVIDIA produces, purpose-built for frontier-scale pre-training, 200B+ inference serving, and trillion-parameter MoE workloads. Spheron rents GB300 NVL72 across data center partner regions in any count, from a single 8-GPU node to multi-rack deployments.

Tell us how many GPUs you need and what you're running. Our team confirms availability within one business day and gets you provisioned.

Capacity1024+ GPUs
Per GPU288 GB HBM3e
Per Rack2.16 EFLOPS FP4
Starts at8 GPUs
AvailableGB300 NVL72

Submit Your Request

Tell us how many GPUs you need. Our team confirms availability within one business day and gets you provisioned.

At a glance

Rent NVIDIA GB300 GPUs on Spheron. 1024+ GPUs are live across our data center partner regions and ready to provision. Rent any quantity, from a single 8-GPU node to multi-rack deployments. Each GPU is a Blackwell Ultra B300 with 288 GB HBM3e. Inside an NVL72 rack, 72 GPUs and 36 Grace CPUs share a single NVLink 5 domain at 130 TB/s, exposing 20.7 TB of unified memory for trillion-parameter training and frontier inference. Submit the form with your GPU count and our team confirms availability within one business day. For workloads that don't need GB300 specifically, B300 per-GPU rentals and B200 are on per-minute billing today.

Where GB300 sits in the stack

GB300 is NVIDIA's latest Blackwell Ultra GPU. Spheron has 1024+ available to rent at any quantity. Rubin R100 is the next generation, available H2 2026.

GB300 NVL72 specifications

GPUs per rack
72 × B300 SXM6
Grace CPUs
36 × 72-core Neoverse V2
Total HBM3e
20.7 TB unified
System Memory
17 TB LPDDR5X
FP4 Compute
2.16 EFLOPS sparse
FP8 Compute
720 PFLOPS
NVLink 5 Fabric
130 TB/s aggregate
GPU-to-CPU
900 GB/s NVLink-C2C
Networking
ConnectX-8 · 800 Gb/s
Rack power
~132 kW liquid-cooled

Per-rack specifications. Smaller rentals inherit the same architecture at the slice they're allocated. All systems run in NVL72 reference configuration with liquid cooling, 2:1 InfiniBand fat-tree fabric across racks, and persistent NVMe per chassis.

GB300 NVL72 vs GB200 NVL72 vs HGX H100

Spec (per rack)GB300 NVL72GB200 NVL72HGX H100 (8x)
ArchitectureBlackwell UltraBlackwellHopper
GPUs / rack72 × B30072 × B2008 × H100
Total HBM20.7 TB HBM3e13.5 TB HBM3e640 GB HBM3
FP4 sparse2.16 EFLOPS1.44 EFLOPSN/A
FP8 throughput720 PFLOPS720 PFLOPS16 PFLOPS
NVLink fabric130 TB/s130 TB/s7.2 TB/s (8 GPUs)
CPU36 × Grace36 × Gracex86 host
NetworkingConnectX-8 · 800 Gb/sConnectX-7 · 400 Gb/sConnectX-7 · 400 Gb/s
Spheron availabilityLive nowAvailableAvailable

Per-rack specifications for NVL72 systems. HGX H100 figures are for a standard 8-GPU node for relative scale, since H100 does not come in NVL72 configuration. GB300 specs match the NVIDIA GB300 NVL72 reference design.

Workloads built for GB300

Use case / 01
🔬

Trillion-Parameter Pre-Training

Each B300 GPU has 288 GB HBM3e. A single 8-GPU node holds 2.3 TB; a full NVL72 rack holds 20.7 TB inside one NVLink domain. 2.16 EFLOPS FP4 sparse per rack cuts wall-clock time on 1T+ parameter training by 4 to 6x versus equivalent H100 capacity. Rent the count that fits your run.

1T+ dense and MoE pre-trainingMulti-modal foundation modelsRLHF and post-training at scaleLong-sequence transformer training
Use case / 02

Frontier-Scale FP4 Inference

B300 delivers 1.5x more dense FP4 than B200 (15 vs 10 PFLOPS) and adds 2x attention compute on top. 130 TB/s of NVLink fabric inside an NVL72 rack supports tensor and pipeline parallelism for the largest models without leaving the rack. Smaller rentals fit 70B to 400B parameter inference on 8 to 32 GPUs.

Frontier MoE serving at high concurrency1M+ token context windowsDisaggregated prefill and decodeReal-time agentic inference for 400B+ models
Use case / 03
🧠

Reasoning and Agentic Workloads

Long reasoning chains generate large KV-cache footprints. Grace CPUs handle orchestration, retrieval, and tool calls in the same coherent address space as the GPUs over 900 GB/s NVLink-C2C, removing PCIe round-trips that slow down agent loops on x86 nodes. Rent from a few GPUs for prototyping up to full racks for production serving.

Test-time compute and chain-of-thought servingTool-using agents with shared memoryCode generation with 200K+ contextMulti-step planning and self-reflection
Use case / 04
🚀

Fine-Tuning and Post-Training

Rent the GPU count that matches your run. DPO and GRPO on 70B models fit on a single 8-GPU node. Large-scale RL with rollouts, value heads, and reference models benefits from full-rack co-location inside one NVLink domain. Persistent NVMe per chassis handles checkpoint streaming at any scale.

DPO, GRPO, and PPO fine-tuningReward modeling for frontier LLMsDistillation pipelines into smaller modelsCurriculum learning with millions of rollouts

When to pick GB300

Scenario 01

Pick GB300 if

You want Blackwell Ultra. We have 1024+ GPUs available to rent at any quantity, from a single 8-GPU node to multi-rack deployments. Models too large for one node sit inside an NVL72 rack with 20.7 TB of unified HBM3e at 130 TB/s. Rent the count you actually need.

Recommended fit
Scenario 02

Pick B300 (per GPU) instead if

You want per-minute billing with no commitment. B300 SXM6 is available on Spheron on a per-GPU or 8-GPU node basis with on-demand and spot pricing. For most 70B to 200B parameter inference and fine-tuning jobs that don't need a sales conversation, B300 is the simpler call.

Recommended fit
Scenario 03

Pick GB200 instead if

Budget matters and you don't need 288 GB per GPU. GB200 has 192 GB per GPU versus 288 GB on GB300, with identical NVLink fabric and Grace CPU layout. For sub-1T parameter workloads, GB200 is often the better price-to-performance pick.

Recommended fit
Scenario 04

Pick R100 if you can wait

R100 Rubin is available H2 2026 with 22 TB/s per-GPU bandwidth (2.75x B300) and 50 PFLOPS FP4 (3.33x B300). For new training runs with flexible timelines, R100 is the higher-ceiling option. If you need to start now, GB300 is live today.

Recommended fit

Other GPUs on Spheron

For workloads that don't need GB300 specifically, Spheron also offers B300, B200, and H200 on per-minute billing with no commitments.

Related resources

FAQ / 07

GB300 FAQ

Also consider