Our final estimate of the achievable inter data center bandwidth by 2030 is 4 to 20 Pbps, which would allow for training runs of 3e29 to 2e31 FLOP. In light of this, bandwidth is unlikely to be a major constraint for a distributed training run compared to achieving the necessary power supply in the first place.
Expanding bandwidth capacity for distributed training networks presents a relatively straightforward engineering challenge, achievable through the deployment of additional fiber pairs between data centers. In the context of AI training runs potentially costing hundreds of billions of dollars, the financial investment required for such bandwidth expansion appears comparatively modest.44
We conclude that training runs in 2030 supported by a local power supply could likely involve 1 to 5 GW and reach 1e28 to 3e29 FLOP by 2030. Meanwhile, geographically distributed training runs could amass a supply of 2 to 45 GW and achieve 4 to 20 Pbps connections between data center pairs, allowing for training runs of 2e28 to 2e30 FLOP.45 All in all, it seems likely that training runs between 2e28 to 2e30 FLOP will be possible by 2030.46 The assumptions behind these estimates can be found in Figure 3 below.
Leave a reply