Google Cloud recently announced a major upgrade to its AI infrastructure, introducing new hardware and software solutions designed to meet the growing demands of artificial intelligence workloads. The centerpiece of these changes is the release of Trillium, Google’s sixth-generation Tensor Processing Unit (TPU).
Compared to its predecessor, the TPU v5e, Trillium delivers over four times the training performance and up to three times the inference throughput. This improvement is accompanied by a 67% increase in energy efficiency.
The new TPU boasts impressive specifications, including double the High Bandwidth Memory (HBM) capacity and Interchip Interconnect (ICI) bandwidth, making Trillium particularly well-suited for handling large language models like Gemma 2 and Llama, as well as compute-intensive inference tasks such as those required by diffusion models like Stable Diffusion XL.