Google Cloud’s Vertex AI gets new grounding options

Google Cloud is introducing a new set of grounding options that will further enable enterprises to reduce hallucinations across their generative AI-based applications and agents.

The large language models (LLMs) that underpin these generative AI-based applications and agents may start producing faulty output or responses as they grow in complexity. These faulty outputs are termed as hallucinations as the output is not grounded in the input data.

Retrieval augmented generation (RAG) is one of several techniques used to address hallucinations: others are fine-tuning and prompt engineering. RAG grounds the LLM by feeding the model facts from an external knowledge source or repository to improve the response to a particular query.

The new set of grounding options introduced inside Google Cloud’s AI and machine learning service, Vertex AI, includes dynamic retrieval, a “high-fidelity” mode, and grounding with third-party datasets, all of which can be seen as expansions of Vertex AI features unveiled at its annual Cloud Next conference in April.

Dynamic retrieval to balance between cost and accuracy

The new dynamic retrieval capability, which will be soon offered as part of Vertex AI’s feature to ground LLMs in Google Search, looks to strike a balance between cost efficiency and response quality, according to Google.

As grounding LLMs in Google Search racks up additional processing costs for enterprises, dynamic retrieval allows Gemini to dynamically choose whether to ground end-user queries in Google Search or use the intrinsic knowledge of the models, Burak Gokturk, general manager of cloud AI at Google Cloud, wrote in a blog post.

The choice is left to Gemini as all queries might not need grounding, Gokturk explained, adding that Gemini’s training knowledge is very capable.

Gemini, in turn, takes the decision to ground a query in Google Search by segregating any prompt or query into three categories based on how the responses could change over time—never changing, slowly changing, and fast changing.

This means that if Gemini was asked a query about a latest movie, then it would look to ground the response in Google Search but it wouldn’t ground a response to a query, such as “What is the capital of France?” as it is less likely to change and Gemini would already know the answer to it.

High-fidelity mode aimed at healthcare and financial services sectors

Google Cloud also wants to aid enterprises in grounding LLMs in their private enterprise data and to do so it showcased a collection of APIs under the name APIs for RAG as part of Vertex AI in April.

APIs for RAG, which has been made generally available, includes APIs for document parsing, embedding generation, semantic ranking, and grounded answer generation, and a fact checking service called check-grounding.

High fidelity experiment

As part of an extension to the grounded answer generation API, which uses Vertex AI Search data stores, custom data sources, and Google Search, to ground a response to a user prompt, Google is introducing an experimental grounding option, named grounding with high-fidelity mode.

The new grounding option, according to the company, is aimed at further grounding a response to a query by forcing the LLM to retrieve answers by not only understanding the context in the query but also sourcing the response from a custom provided data source.

This grounding option uses a Gemini 1.5 Flash model that has been fine-tuned to focus on a prompt’s context, Gokturk explained, adding that the option provides sources attached to the sentences in the response along with grounding scores.

Grounding with high-fidelity mode currently supports key use cases such as summarization across multiple documents or data extraction against a corpus of financial data.

This grounding option, according to Gokturk, is being aimed at enterprises in the healthcare and financial services sectors as these enterprises cannot afford hallucinations and sources provided in query responses aid in building trust in the end-user-facing generative AI-based application.

Other major cloud service providers, such as AWS and Microsoft Azure, currently don’t have an exact feature that matches high-fidelity mode but each of them have a system in place to evaluate the reliability of RAG applications, including the mapping of response generation metrics.

While Microsoft uses the Groundedness Detection API to check whether the text responses of large language models (LLMs) are grounded in the source materials provided by users, AWS’ Amazon Bedrock service uses several metrics to do the same task.

As part of Bedrock’s RAG evaluation and observability features, AWS uses metrics such as faithfulness, answer relevance, and answer semantic similarity to benchmark a query response.

The faithfulness metric measures whether the answer generated by the RAG system is faithful to the information contained in the retrieved passages, AWS said, adding that the aim is to avoid hallucinations and ensure the output is justified by the context provided as input to the RAG system.

Enabling third-party data for RAG via Vertex AI

In line with its announced plans at Cloud Next in April, the company said it is planning to introduce a new service within Vertex AI from the next quarter to allow enterprises to ground their models and AI agents with specialized third-party data.

Google said that it was already working with data providers such as Moody’s, MSCI, Thomson Reuters, and Zoominfo to bring their data to this service.

Source

How to explain data meshes, fabrics, and clouds

Sovereign Cloud Day - 5th July Brussels

Are we too focused on vendors?