Pricing depends on quantity, commitment length, region, and networking requirements. Smaller rentals run closer to per-hour rates; multi-rack reservations come with committed pricing and priority allocation. Submit the form with your GPU count and our team shares a quote within one business day.

Spheron GPU Catalog

Blackwell Ultra · Grace + NVLink 5

Capacity1024+ Available

Reserve Now

NVIDIA GB300 NVL72
Grace Blackwell Ultra GPU

Q: How many GB300 GPUs can I rent on Spheron right now?

1024+ NVIDIA GB300 GPUs are live across our Tier 3 and Tier 4 liquid-cooled data center partner regions and ready to provision. We add more capacity through 2026. Submit the form with your GPU count and our team confirms availability, region, and pricing within one business day.

Q: What's the minimum and maximum quantity I can rent?

Rent any count. The smallest configuration is a single 8-GPU node. From there you can scale to 36 GPUs (half rack), 72 GPUs (full NVL72 rack), or multi-rack deployments in the hundreds. Smaller allocations still sit inside an NVL72 NVLink domain, so you keep the same interconnect at any scale.

Q: How fast can I get provisioned after submitting the form?

Our team confirms availability within one business day. From there, smaller rentals (single 8-GPU nodes, half racks) typically come online within days. Multi-rack deployments take longer depending on region and networking requirements. We share the timeline along with the quote.

72x B300 · 20.7 TB HBM3e · 1.4 EFLOPS FP4

NVIDIA's Blackwell Ultra rack-scale system. 72 B300 GPUs and 36 Grace CPUs share one NVLink 5 domain with 20.7 TB of HBM3e, the most memory in any rack-scale GPU NVIDIA ships. Built for trillion-parameter training, 200B+ inference, and frontier MoE workloads. Rent any count on Spheron, from a single 8-GPU node to multi-rack.

Tell us how many GPUs you need. We confirm availability within one business day.

Capacity1024+ GPUs

Per GPU288 GB HBM3e

Per Rack2.16 EFLOPS FP4

Starts at8 GPUs

Other GPU options

View all GPU pricing →

AvailableGB300 NVL72

Submit Your Request

Tell us how many GPUs you need. Our team confirms availability within one business day and gets you provisioned.

First Name *

Last Name *

Company *

Job Title

Work Email *

Phone *

GPU Count *

Heard via *

Intended Workload *

I agree to Spheron's privacy policy and to be contacted about GB300 availability and Spheron GPU offerings.I'd like to receive product updates, announcements, and marketing communications from Spheron. You can unsubscribe at any time.

At a glance

Rent NVIDIA GB300 GPUs on Spheron. 1024+ GPUs are live across our data center partner regions and ready to provision. Rent any quantity, from a single 8-GPU node to multi-rack deployments. Each GPU is a Blackwell Ultra B300 with 288 GB HBM3e. Inside an NVL72 rack, 72 GPUs and 36 Grace CPUs share a single NVLink 5 domain at 130 TB/s, exposing 20.7 TB of unified memory for trillion-parameter training and frontier inference. Submit the form with your GPU count and our team confirms availability within one business day. For workloads that don't need GB300 specifically, B300 per-GPU rentals and B200 are on per-minute billing today.

Where GB300 sits in the stack

GB300 is NVIDIA's latest Blackwell Ultra GPU. Spheron has 1024+ available to rent at any quantity. Rubin R100 is the next generation, available H2 2026.

Blackwell Ultra

GB300

288 GB HBM3e

Live

GB300 NVL72 specifications

GPUs per rack

72 × B300 SXM6

Grace CPUs

36 × 72-core Neoverse V2

Total HBM3e

20.7 TB unified

System Memory

17 TB LPDDR5X

FP4 Compute

2.16 EFLOPS sparse

FP8 Compute

720 PFLOPS

NVLink 5 Fabric

130 TB/s aggregate

GPU-to-CPU

900 GB/s NVLink-C2C

Networking

ConnectX-8 · 800 Gb/s

Rack power

~132 kW liquid-cooled

Per-rack specifications. Smaller rentals inherit the same architecture at the slice they're allocated. All systems run in NVL72 reference configuration with liquid cooling, 2:1 InfiniBand fat-tree fabric across racks, and persistent NVMe per chassis.

GB300 NVL72 vs GB200 NVL72 vs HGX H100

Spec (per rack)	GB300 NVL72New	GB200 NVL72	HGX H100 (8x)
Architecture	Blackwell Ultra	Blackwell	Hopper
GPUs / rack	72 × B300	72 × B200	8 × H100
Total HBM	20.7 TB HBM3e	13.5 TB HBM3e	640 GB HBM3
FP4 sparse	2.16 EFLOPS	1.44 EFLOPS	N/A
FP8 throughput	720 PFLOPS	720 PFLOPS	16 PFLOPS
NVLink fabric	130 TB/s	130 TB/s	7.2 TB/s (8 GPUs)
CPU	36 × Grace	36 × Grace	x86 host
Networking	ConnectX-8 · 800 Gb/s	ConnectX-7 · 400 Gb/s	ConnectX-7 · 400 Gb/s
Spheron availability	Live now	Available	Available

Per-rack specifications for NVL72 systems. HGX H100 figures are for a standard 8-GPU node for relative scale, since H100 does not come in NVL72 configuration. GB300 specs match the NVIDIA GB300 NVL72 reference design.

Workloads built for GB300

Use case / 01

GB300

🔬

Trillion-Parameter Pre-Training

Each B300 GPU has 288 GB HBM3e. A single 8-GPU node holds 2.3 TB; a full NVL72 rack holds 20.7 TB inside one NVLink domain. 2.16 EFLOPS FP4 sparse per rack cuts wall-clock time on 1T+ parameter training by 4 to 6x versus equivalent H100 capacity. Rent the count that fits your run.

1T+ dense and MoE pre-trainingMulti-modal foundation modelsRLHF and post-training at scaleLong-sequence transformer training

Use case / 02

GB300

⚡

Frontier-Scale FP4 Inference

B300 delivers 1.5x more dense FP4 than B200 (15 vs 10 PFLOPS) and adds 2x attention compute on top. 130 TB/s of NVLink fabric inside an NVL72 rack supports tensor and pipeline parallelism for the largest models without leaving the rack. Smaller rentals fit 70B to 400B parameter inference on 8 to 32 GPUs.

Frontier MoE serving at high concurrency1M+ token context windowsDisaggregated prefill and decodeReal-time agentic inference for 400B+ models

Use case / 03

GB300

🧠

Reasoning and Agentic Workloads

Long reasoning chains generate large KV-cache footprints. Grace CPUs handle orchestration, retrieval, and tool calls in the same coherent address space as the GPUs over 900 GB/s NVLink-C2C, removing PCIe round-trips that slow down agent loops on x86 nodes. Rent from a few GPUs for prototyping up to full racks for production serving.

Test-time compute and chain-of-thought servingTool-using agents with shared memoryCode generation with 200K+ contextMulti-step planning and self-reflection

Use case / 04

GB300

🚀

Fine-Tuning and Post-Training

Rent the GPU count that matches your run. DPO and GRPO on 70B models fit on a single 8-GPU node. Large-scale RL with rollouts, value heads, and reference models benefits from full-rack co-location inside one NVLink domain. Persistent NVMe per chassis handles checkpoint streaming at any scale.

DPO, GRPO, and PPO fine-tuningReward modeling for frontier LLMsDistillation pipelines into smaller modelsCurriculum learning with millions of rollouts

When to pick GB300

Scenario 01

Pick GB300 if

You want Blackwell Ultra. We have 1024+ GPUs available to rent at any quantity, from a single 8-GPU node to multi-rack deployments. Models too large for one node sit inside an NVL72 rack with 20.7 TB of unified HBM3e at 130 TB/s. Rent the count you actually need.

Recommended fit

Scenario 02

Pick B300 (per GPU) instead if

You want per-minute billing with no commitment. B300 SXM6 is available on Spheron on a per-GPU or 8-GPU node basis with on-demand and spot pricing. For most 70B to 200B parameter inference and fine-tuning jobs that don't need a sales conversation, B300 is the simpler call.

Recommended fit

Scenario 03

Pick GB200 instead if

Budget matters and you don't need 288 GB per GPU. GB200 has 192 GB per GPU versus 288 GB on GB300, with identical NVLink fabric and Grace CPU layout. For sub-1T parameter workloads, GB200 is often the better price-to-performance pick.

Recommended fit

Scenario 04

Pick R100 if you can wait

R100 Rubin is available H2 2026 with 22 TB/s per-GPU bandwidth (2.75x B300) and 50 PFLOPS FP4 (3.33x B300). For new training runs with flexible timelines, R100 is the higher-ceiling option. If you need to start now, GB300 is live today.

Recommended fit

Other GPUs on Spheron

For workloads that don't need GB300 specifically, Spheron also offers B300, B200, and H200 on per-minute billing with no commitments.

B300288 GB HBM3e

Same Blackwell Ultra silicon as GB300, billed by the minute.

From$3.35/hrlive

Rent B300 →

B200192 GB HBM3e

Blackwell. Most workloads under 200B parameters.

From$5.34/hrlive

Rent B200 →

H200141 GB HBM3e

Hopper. Best value for inference and long-context serving.

From$1.76/hrlive

Rent H200 →

Related resources

01Read

NVIDIA B300 Blackwell Ultra Guide

B300 Blackwell Ultra chip specs, DGX B300 vs HGX B300 vs GB300 NVL72 form factors, FP4 throughput, and how the rack-scale system pairs 72 B300 GPUs with 36 Grace CPUs.

02Read

NVIDIA B300 GPU: Specs, Pricing, and Workloads

Single-GPU and 8-GPU B300 SXM6 access on Spheron. 288 GB HBM3e, FP4 Transformer Engine, and per-minute billing with no commitments.

03Read

Vera Rubin NVL72: The Next-Gen System

How NVL72 evolves from Blackwell to Rubin. HBM4, NVLink 6, and the upgrade path for teams running GB300 today.

FAQ / 07

GB300 FAQ

How many GB300 GPUs can I rent on Spheron right now?

What's the minimum and maximum quantity I can rent?

What's the difference between GB300 and B300?

B300 is the individual GPU: 288 GB HBM3e, 8 TB/s bandwidth, 15 PFLOPS FP4. It runs in standard 8-GPU SXM6 nodes connected to an x86 host. GB300 is the same B300 silicon deployed in NVL72 racks that pair 72 GPUs with 36 Grace CPUs over NVLink 5, exposing 20.7 TB of unified memory inside one NVLink domain. B300 is rented per GPU on per-minute billing. GB300 is rented by request in any quantity and our team provisions.

How is GB300 priced?

How does GB300 compare to GB200?

GB300 swaps B200 GPUs (192 GB HBM3e) for B300 GPUs (288 GB HBM3e), raising total rack memory from 13.5 TB to 20.7 TB. FP4 compute increases from 1.44 to 2.16 EFLOPS per rack and B300 adds 2x attention compute over B200. Power moves up to roughly 132 kW from GB200's 120 kW. NVLink 5 fabric, Grace CPU layout, and software stack remain consistent, while networking upgrades from ConnectX-7 (400 Gb/s) to ConnectX-8 (800 Gb/s) per GPU.

What workloads is GB300 built for?

Models that exceed 2.3 TB of HBM memory (the ceiling of a single 8x B300 node) are the clearest fit at full-rack scale: 1T+ parameter dense and MoE training, frontier inference at high concurrency, and long-context serving with large KV cache. Smaller GB300 allocations are also a strong pick when you specifically need Blackwell Ultra compute and bandwidth for inference serving, fine-tuning, or experimentation on the latest architecture.

How fast can I get provisioned after submitting the form?