Google Cloud adds vector support to all its database offerings

Google Cloud on Thursday said it is adding vector support and integrating LangChain with all of its database offerings in an effort to outdo rival cloud service providers, such as Amazon Web Services (AWS), Microsoft, and Oracle.

Cloud service providers have been locked in a race to add generative AI and AI-related capabilities to their database offerings to have the first mover advantage in order to garner a bigger pie of the growing AI and generative AI market.

The new updates to database offerings include the addition of vector support for relational, key value, document, and in-memory databases such as CloudSQL, Spanner, Firestore, Bigtable, and Memorystore for Redis.

Nearest neighbor search is a key differentiator

The vector capabilities added to the databases feature search capabilities including the approximate nearest neighbor search (ANN) and exact nearest neighbor search (KNN).

While ANN is used to optimize search, in other words, reduce latency, for large datasets, KNN is used to return more specific or precise search results on smaller datasets, said David Menninger, executive director at ISG’s Ventana Research.

“Support for ANN and KNN reflects that there isn’t a one-size-fits-all approach to vector search and that different use cases require different indexing algorithms to provide the required level of accuracy and performance,” Menninger explained, adding that this highlights that it is incumbent for developers to understand the nature of their data and application, and experiment with various databases to identify the capabilities that best fit the requirements of an individual project.

The other advantage from Google’s standpoint, according to Forrester’s principal analyst Noel Yuhanna, is that most database vendors don’t offer both ANN and KNN.

“Some vendors support KNN, while others support the ANN approach. ANN is more popular since it is scalable and performs well for large datasets and high-dimensional vectors,” Yuhanna said.

All the vector capabilities added to the database offerings are currently in preview. In July last year, Google launched support for the popular pgvector extension in AlloyDB and Cloud SQL to support building generative AI applications.

The addition of vector capabilities across multiple database offerings since July last year at regular intervals, seemingly, makes Google Cloud “more aggressive” than rival hyperscalers, according to Menninger.

However, he did point out that almost all database vendors are adding support for vector and vector search capabilities.

Microsoft, AWS, and Oracle, according to Yuhanna, have some level of vector support capabilities in the works in their respective database offerings.

The announcements by Google Cloud might just give it an edge over its rivals as it seems to be a bit further ahead in the journey than others in terms of making these capabilities generally available to enterprises, Yuhanna said.

Both analysts also pointed out that adding support for vector capabilities will soon become table stakes for data platform vendors to support the development of generative AI applications by complementing large language models (LLMs) with approved enterprise data to improve accuracy and trust.

ISG, according to Menninger, believes that almost all enterprises developing applications based on generative AI will explore the use of vector search and retrieval-augmented generation to complement foundation models with proprietary data and content by the end of 2026.

Rivalry between vector databases and traditional databases

The addition of vector capabilities by hyperscalers and other database vendors to their offerings has resulted in a growing rivalry between vector databases and traditional databases, according to analysts.

While traditional databases have been adding vector capabilities to make their case to enterprises, vector databases have been capabilities to make their products more easily consumable by non-experts, they added.

However, ISG’s Menninger believes that more than 50% of enterprises will use traditional database offerings with vector support by 2026, given their reliance on these traditional databases.

Specialized vector databases will still continue to exist, though only for more complex and sophisticated use cases, Menninger said. Pinecone, Chroma, Weaviate, Milvus, and Qdrant are examples of specialized databases.

Explaining further, Menninger said that whether vector search is best performed using a specialist vector database or a general-purpose database will depend on a variety of factors, including the relative reliance of an enterprise on an existing database, developer skills, the size of the dataset, and specific application requirements.

Integration of LangChain with all Google database offerings

Google Cloud is adding LangChain integrations for all of its databases. “We will support three LangChain Integrations that include vector stores, document loaders, and chat messages memory,” said Andi Gutmans, vice president of engineering for Google Cloud’s databases division.

LangChain is a framework for developing applications powered by LLMs and the integration into databases will allow developers built-in Retrieval Augmented Generation (RAG) workflows across their preferred data source, Gutmans added.

While the LangChain vector stores integration is available for AlloyDB, Cloud SQL for PostgreSQL, Cloud SQL for MySQL, Memorystore for Redis, and Spanner, the document loaders and chat messages memory integration is available for all databases, including Firestore, Bigtable, and SQL Server.

Analysts see the addition of LangChain integrations as an “assertive” move from Google.

“LangChain is currently the most popular framework for connecting LLMs to private sources of enterprise data, providing vendor-neutral integration with enterprise databases, as well as commercial machine learning development and deployment environments, such as SageMaker Studio and Vertex AI Studio,” Menninger explained.

AlloyDB AI made generally available

Google has made its AlloyDB AI offering generally available. It can be used via AlloyDB and AlloyDB Omni.

AlloyDB AI, which was moved into preview last year in August, is a suite of integrated capabilities that allow developers to build generative AI-based applications using real-time data.

It builds on the basic vector support available with standard PostgreSQL and can introduce a simple PostgreSQL function to generate embeddings on data.

AlloyDB AI is an integral part of AlloyDB and AlloyDB Omni, and is available at no additional charge, the company said.

Source

Join the Preview – AWS Glue Data Quality

AWS Week in Review – March 13, 2023

The Total Economic Impact™ of the Microsoft commercial marketplace