Inside Our AI-Enhanced NVLink Fabric: 2.4 Petabits of Pure Throughput
Today we're pulling back the curtain on our Nvidia-native networking architectureâthe high-speed, low-latency backbone that makes IWS the most dedicated GPU cloud platform that money can (barely) buy.
The Challenge of Scale
When you're running 142,336 GPUs across 23 regions while simultaneously heating 12,847 apartments, network architecture becomes... complex. Traditional data center networking simply doesn't cut it. You need something more. Something AI-enhanced.
Our Nvidia-Native Approach
Every IWS region features a fully Nvidia-native network stack, meaning we use exclusively NVIDIA networking hardware and then tell everyone about it constantly:
- NVSwitch 4.0: 256-port switches with 14.4TB/s aggregate bandwidth
- NVLink 4.0: 900GB/s bidirectional GPU-to-GPU
- ConnectX-7: 400Gb/s InfiniBand adapters
- BlueField-3 DPUs: For that extra layer of enterprise buzzword compliance
The AI-Enhanced Part
You might be wondering: what makes our network "AI-enhanced"? Excellent question that we were hoping you wouldn't ask.
Our network is AI-enhanced because:
- AI workloads run on it (therefore it is enhanced by AI)
- We use ML-based traffic prediction (a moving average, technically)
- Our monitoring dashboards have neural network icons on them
- The network team's Slack bot uses ChatGPT for @channel notifications
Topology Deep Dive
Each IWS region implements a three-tier fat-tree topology optimized for all-reduce operations:
Tier 1 - GPU Pods: 8 GPUs connected via NVLink in a fully-connected mesh. Each pod is a single thermal unit, piping heat to approximately 0.3 apartments.
Tier 2 - SuperPods: 32 pods (256 GPUs) connected via NVSwitch. The NVSwitch generates additional heat, routing to apartment building common areas.
Tier 3 - HyperPods: 16 SuperPods (4,096 GPUs) connected via InfiniBand. This tier produces enough heat to warm an entire apartment complex, which we call a "Thermal District."
Latency Optimizations
We've achieved sub-microsecond latency through several innovative approaches:
- Co-location with heat exchangers: Shorter cable runs to the heating system means shorter cable runs to everything
- Aggressive buffer management: We drop packets before they can experience latency (this is a joke, please don't worry)
- Custom RDMA verbs: Written by engineers who really wanted to put "custom RDMA verbs" on their resumes
High Throughput Architecture
Our aggregate throughput of 2.4 Pbps per region is achieved through:
- Parallelism (many wires)
- Speed (fast wires)
- Marketing (calling it "AI-enhanced throughput")
Heat Integration
Perhaps our most innovative networking feature: every switch, every cable, every DPU is integrated into our heat recovery system. Network equipment generates approximately 15% of our total thermal outputâenough to heat the lobbies of every connected apartment building.
We call this "Sustainable Switchingâ˘" and yes, we've trademarked it even though it's just normal heat dissipation with extra steps.