Google Cloud unveils AI-optimised infrastructure enhancements

Google Cloud has announced significant advancements in its AI-optimised infrastructure, including fifth-generation TPUs and A3 VMs based on NVIDIA H100 GPUs.

Traditional approaches to designing and constructing computing systems are proving inadequate for the surging demands of workloads like generative AI and large language models (LLMs). Over the last five years, the parameters in LLMs have surged tenfold annually, prompting the need for both cost-effective and scalable AI-optimised infrastructure.

From conceiving the transformative Transformer architecture that underpins generative AI, to AI-optimised infrastructure tailored for global-scale performance, Google Cloud has stood at the forefront of AI innovation.

Cloud TPU v5e headlines Google Cloud’s latest offerings. Distinguished by its cost-efficiency, versatility, and scalability, the TPU aims to revolutionise medium- and large-scale training and inference. This iteration outpaces its predecessor, Cloud TPU v4, delivering up to 2.5x higher inference performance and up to 2x higher training performance per dollar for LLMs and generative AI models.

Wonkyum Lee, Head of Machine Learning at Gridspace, said:

“Our speed benchmarks are demonstrating a 5X increase in the speed of AI models when training and running on Google Cloud TPU v5e.

We are also seeing a tremendous improvement in the scale of our inference metrics, we can now process 1000 seconds in one real-time second for in-house speech-to-text and emotion prediction models—a 6x improvement.”

Striking a balance between performance, flexibility, and efficiency, Cloud TPU v5e pods support up to 256 interconnected chips, boasting an aggregate bandwidth surpassing 400 Tb/s and 100 petaOps of INT8 performance. Furthermore, its adaptability shines – with eight distinct virtual machine configurations – accommodating an array of LLM and generative AI model sizes.

The ease of operation also receives a boost, with Cloud TPUs now available on Google Kubernetes Engine (GKE). This development streamlines AI workload orchestration and management. For those inclined towards managed services, Vertex AI offers training with diverse frameworks and libraries via Cloud TPU VMs.

Google Cloud fortifies its support for leading AI frameworks including JAX, PyTorch, and TensorFlow.

PyTorch/XLA 2.1 release is on the horizon, featuring Cloud TPU v5e support and model/data parallelism for large-scale model training. Moreover, Multislice technology enters preview—enabling seamless scaling of AI models, transcending the confines of physical TPU pods.

Meanwhile, the new A3 VMs are powered by NVIDIA’s H100 Tensor Core GPUs and focus on demanding generative AI workloads and LLMs,

A3 VMs deliver exceptional training capabilities and networking bandwidth. Their implementation in combination with Google Cloud’s infrastructure heralds a breakthrough, achieving 3x faster training and 10x greater networking bandwidth compared to previous iterations.

David Holz, Founder and CEO at Midjourney, commented:

“Midjourney is a leading generative AI service enabling customers to create incredible images with just a few keystrokes. To bring this creative superpower to users we leverage Google Cloud’s latest GPU cloud accelerators, the G2 and A3. 

With A3, images created in Turbo mode are now rendered 2x faster than they were on A100s, providing a new creative experience for those who want extremely quick image generation.”

The unveiling of these advancements aims to solidify Google Cloud’s leadership in AI infrastructure, empowering innovators and enterprises to forge the most advanced AI models.

(Image Credit: Google Cloud)

See also: EDB reveals three new ways to run Postgres on Google Kubernetes Engine

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with Cyber Security & Cloud Expo and Digital Transformation Week.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

  • Ryan Daws

    Ryan is a senior editor at TechForge Media with over a decade of experience covering the latest technology and interviewing leading industry figures. He can often be sighted at tech conferences with a strong coffee in one hand and a laptop in the other. If it’s geeky, he’s probably into it. Find him on Twitter (@Gadget_Ry) or Mastodon (@gadgetry@techhub.social)

    View all posts

Tags: a3 vm, artificial intelligence, cloud, cloud computing, gke, google cloud, inference, jax, Kubernetes, kubernetes engine, llm, tensor core, tensorflow, tpu v5, tpu v5e

Source