Google Cloud today unveiled the fifth iteration of its tensor processing units (TPUs) for AI training and inference at Cloud Next, its annual user conference. Google released the fourth iteration of their specialized CPUs in 2021, but developers couldn’t access them until 2022.
The manufacturer claims that efficiency was a primary consideration while designing this particular chip. This version promises to give a 2x improvement in training performance per dollar and a 2.x5 improvement in inferencing performance per dollar compared to the previous generation.
Mark Lohmeyer, VP and GM for computing and ML infrastructure at Google Cloud, stated at a press conference held in advance of today’s announcement that “this is the most affordable and accessible cloud TPU to date.”
Lohmeyer also emphasized that the business had made sure that customers could scale their TPU clusters beyond what was previously feasible.
“We’re enabling our customers to easily scale their AI models beyond the physical boundaries of a single TPU pod or a single TPU cluster,” he said. The ability to scale a single huge AI job across numerous physical TPU clusters, scaling to literally tens of thousands of chips, is now possible, and it does so extremely affordably. As a result, we’re actually offering our customers a lot of choice, flexibility, and optionality across cloud GPUs and cloud TPUs to suit the needs of the diverse collection of AI workloads that we see coming.
Google also said today that as part of its A3 series of virtual machines, starting in the coming month, it will make Nvidia’s H100 GPUs generally accessible to developers in addition to the next generation of TPUs. This topic is covered in more detail here.