Amazon Bedrock: A solid generative AI foundation

Amazon Web Services’ fully managed service for building, deploying, and scaling generative AI applications, Amazon Bedrock offers a catalog of foundation models, implements retrieval-augmented generation (RAG) and vector embeddings, hosts knowledge bases, implements fine-tuning of foundation models, and allows continued pre-training of selected foundation models.

Amazon Bedrock complements the almost 30 other Amazon machine learning services available, including Amazon Q, the AWS generative AI assistant.

There are currently six major features in Amazon Bedrock:

  • Experiment with different models: Use the API or GUI in the console to test various prompts and configurations with different foundation models.
  • Integrate external data sources: Improve response generation by incorporating external data sources into knowledge bases, which can be queried to augment the responses from foundation models.
  • Develop customer support applications: Build applications that use foundation models, API calls, and knowledge bases to reason and execute tasks for customers.
  • Customize models: Tailor a foundation model for particular tasks or domains by providing training data for fine-tuning or additional pretraining.
  • Boost application efficiency: Optimize the performance of foundation model-based applications by purchasing provisioned throughput.
  • Choose the most suitable model: Compare the outputs of various models using standard or custom prompt data sets to choose the model that best aligns with the requirements of your application.

One major competitor to Amazon Bedrock is Azure AI Studio, which, while still in preview and somewhat under construction, checks most of the boxes for a generative AI application builder. Azure AI Studio is a nice system for picking generative AI models, grounding them with RAG using vector embeddings, vector search, and data, and fine-tuning them, all to create what Microsoft calls copilots, or AI agents.

Another major competitor is Google Vertex AI’s Generative AI Studio, which allows you to tune foundation models with your own data, using tuning options such as adapter tuning and reinforcement learning from human feedback (RLHF), or style and subject tuning for image generation. Generative AI Studio complements the Vertex AI model garden and foundation models as APIs.

Other possible competitors include LangChain (and LangSmith), Poe, and the ChatGPT GPT Builder. LangChain does require you to do some programming.

Amazon Bedrock model setup

There are two setup tasks for Bedrock: model setup and API setup. You need to request access to models before you can use them. If you want to use the AWS command line interface or any of the AWS SDKs, you also need to install and configure the CLI or SDK.

I didn’t bother with API setup, as I’m concentrating on using the console for the purposes of this review. Completing the model access request form was easier than it looked, and I was granted access to models faster than I expected.

amazon bedrock 02 IDG

You can’t use a model in Amazon Bedrock until you’ve requested and received permission to use it. Most vendors grant access immediately. Anthropic takes a few minutes, and requires you to fill out a short questionnaire about your planned usage. This screenshot was taken just before my Claude access requests were granted.

Amazon Bedrock model inference parameters

Amazon Bedrock uses slightly different parameters to control the response of models than, say, OpenAI. Bedrock controls randomness and diversity using the temperature of the probability distribution, the top K, and the top P. It controls the length of the output with the response length, penalties, and stop sequences.

Temperature modulates the probability for the next token. A lower temperature leads to more deterministic responses, and a higher temperature leads to more random responses. In other words, choose a lower temperature to increase the likelihood of higher-probability tokens and decrease the likelihood of lower-probability tokens; choose a higher temperature to increase the likelihood of lower-probability tokens and decrease the likelihood of higher-probability tokens. For example, a high temperature would allow the completion of “I hear the hoof beats of” to include unlikely beasts like unicorns, while a low temperature would weight the output to likely ungulates like horses.

Top K is the number of most-likely candidates that the model considers for the next token. Lower values limit the options to more likely outputs, like horses. Higher values allow the model to choose less likely outputs, like unicorns.

Top P is the percentage of most-likely candidates that the model considers for the next token. As with top K, lower values limit the options to more likely outputs, and higher values allow the model to choose less likely outputs.

Response length controls the number of tokens in the generated response. Penalties can apply to length, repeated tokens, frequency of tokens, and type of tokens in a response. Stop sequences are sequences of characters that stop the model from generating further tokens.

Amazon Bedrock prompts, examples, and playgrounds

Amazon Bedrock currently displays 33 examples of generative AI model usage, and offers three playgrounds. Playgrounds provide a console environment to experiment with running inference on different models and with different configurations. You can start with one of the playgrounds (chat, text, or image), select a model, construct a prompt, and set the metaparameters. Or you can start with an example and open it in the appropriate playground with the model and metaparameters pre-selected and the prompt pre-populated. Note that you need to have been granted access to a model before you can use it in a playground.

Amazon Bedrock examples demonstrate prompts and parameters for various supported models and tasks. Tasks include summarization, question answering, problem solving, code generation, text generation, and image generation. Each example shows a model, prompt, parameters, and response, and presents a button you can press to open the example in a playground. The results you get in the playground may or may not match what is shown in the example, especially if the parameters allow for lower-probability tokens.

Our first example shows arithmetic word problem solving using a chain-of-thought prompt and the Llama 2 Chat 70B v1 model. There are several points of interest in this example. First, it works with a relatively small open-source chat model. (As an aside, there’s a related example that uses a 7B (billion) parameter model instead of the 70B parameter model used here; it also works.) Second, the chain-of-thought action is triggered by a simple addition to the prompt, “Let’s think step by step.” Note that if you remove that line, the model often goes off the rails and generates a wrong answer.

amazon bedrock 03 IDG

The chain-of-thought problem-solving example uses a Llama 2 chat model and presents a typical 2nd or 3rd grade arithmetic word problem. Note the [INST]You are a…[/INST] block at the beginning of the prompt. This seems to be specific to Llama. You’ll see other models respond to different formats for defining instructions or system prompts.

amazon bedrock 04 IDG

The chain-of-thought problem-solving example running in the Amazon Bedrock Chat playground. This particular set of prompts and hyperparameters usually gives correct answers, although not in the exact same format every time. If you remove the “Let’s think step by step” part of the prompt it usually gives wrong answers. The temperature setting of 0.5 asks for moderate randomness in the probability mass function, and the top P setting of 0.9 allows the model to consider less likely outputs.

Our second example shows contract entity extraction using Cohere’s Command text generation model. Text LLMs (large language models) often allow for many different text processing functions.

amazon bedrock 05 IDG

Amazon Bedrock contract entity extraction example using Cohere’s Command text generation model. Note that the instruction here is on the first line followed by a colon, and then the contract body follows.

amazon bedrock 06 IDG

Contract entity extraction example running in the Amazon Bedrock text playground. Note that there was an opportunity for additional interaction in the playground, which didn’t show up in the example. While the temperature of this run was 0.9, Cohere’s Command model takes temperature values up to 5. The top p value is set to 1 (and displayed at 0.99) and the top k parameter is not set. These allow for high randomness in the generated text.

Our final example shows image inpainting, an application of image generation that uses a reference image, a mask, and prompts to produce a new image. Up until now, I’ve only done AI image inpainting in Adobe Photoshop, which has had the capability for awhile.

amazon bedrock 07 IDG

Amazon Bedrock’s image inpainting example uses the Titan Image Generator G1 model. Note the reference image and mask image in the image configuration.

amazon bedrock 08 IDG

In order to actually select the flowers for inpainting, I had to move the mask from the default selection of the backpack to the area containing the white flowers in the reference image. When I didn’t do that, orange flowers were generated in front of the backpack.

amazon bedrock 09 IDG

Successful inpainting in Amazon Bedrock. Note that I could have used the mask prompt to refine the mask for complex mask selections in noncontiguous areas, for example selecting the flowers and the books. You can use the Info links to see explanations of individual hyperparameters.

Amazon Bedrock orchestration

Amazon Bedrock orchestration currently includes importing data sources into knowledge bases that you can then use for setting up RAG, and creating agents that can execute actions. These are two of the most important techniques available for building generative AI applications, falling between simple prompt engineering and expensive and time-consuming continued pre-training or fine-tuning.

Using knowledge bases takes multiple steps. Start by importing your data sources into an Amazon S3 bucket. When you do that, specify the chunking you’d like for your data. The default is approximately 300 tokens per chunk, but you can set your own size. Then set up your vector store and embeddings model in the database you prefer, or allow AWS to use its default of Amazon OpenSearch Serverless. Then create your knowledge base from the Bedrock console, ingest your data sources, and test your knowledge base. Finally, you can connect your knowledge base to a model for RAG, or take the next step and connect it to an agent. There’s a good one-hour video about this by Mani Khanuja, recorded at AWS re:Invent 2023.

Agents orchestrate interactions between foundation models, data sources, software applications, and prompts, and call APIs to take actions. In addition to the components of RAG, agents can follow instructions, use an OpenAPI schema to define the APIs that the agent can invoke, and/or invoke a Lambda function.

amazon bedrock 10 IDG

Amazon Bedrock knowledge base creation and testing starts with this screen. There are several more steps.

Amazon Bedrock model assessment and deployment

The Assessment and Deployment panel in Amazon Bedrock contains functionality for model evaluation and provisioned throughput.

Model evaluation supports automatic evaluation of a single model, manual evaluation of up to two models using your own work team, and manual evaluation of as many models as you wish using an AWS-managed work team. Automatic evaluation uses recommended metrics, which vary depending on the type of task being evaluated, and can either use your own prompt data or built-in curated prompt data sets.

Provisioned throughput allows you to purchase dedicated capacity to deploy your models. Pricing varies depending on the model that you use and the level of commitment you choose.

amazon bedrock 11 IDG

Automatic model evaluation selection in Amazon Bedrock. Bedrock can also set up human model evaluations. The metrics and data sets used vary with the task type being evaluated.

amazon bedrock 12 IDG

Amazon Bedrock’s provisioning throughput isn’t cheap, and it isn’t available for every model. Here we see an estimated monthly cost of provisioning five model units of the Llama 2 Chat 13B model for one month. It’s $77.3K. Upping the term to six months drops the monthly cost to $47.7K. You can’t edit the provisioned model units or term once you’ve purchased the throughput.

Model customization methods

It’s worth discussing ways of customizing models in general at this point. Below we’ll talk specifically about the customization methods implemented in Amazon Bedrock.

Prompt engineering, as shown above, is one of the simplest ways to customize a generative AI model. Typically, models accept two prompts, a user prompt and a system or instruction prompt, and generate an output. You normally change the user prompt all the time, and use the system prompt to define the general characteristics you want the model to take on. Prompt engineering is often sufficient to define the way you want a model to respond for a well-defined task, such as generating text in specific styles by presenting sample text or question-and-answer pairs. You can easily imagine creating a prompt for “Talk Like a Pirate Day.” Ahoy, matey.

Source