What is RAG?

Retrieval-Augmented Generation (RAG) combines the large language model such as Anthropic Claude, OpenAI GPT, Google Gemini, etc with a vector search layer to generate a response based on the information retrieved from a knowledge base.

Why multi-tenant RAG?

The solution we worked with has a SaaS product that is used by enterprise clients and is architected to provide each client their dedicated set of infrastructure resources and the chatbot was to be integrated with the same product hence it must comply with the same rules.

As per the AWS article, there are multiple ways to architect a multi-tenant RAG solution and we decided to go with pool pattern which consists of a single knowledge base, vector store and data bucket and ensures data segregation using metadata files stored in the data bucket.

The reason behind opting for the pool pattern over others is that majority of the data provided to the knowledge base is generated internally by the product team. This allows us to avoid data duplication, keep the architecture simple and avoid admin overheads while still complying with the existing policies.

Additionally, we applied isolation at the compute layer by using the lambda tenant isolation feature.

Fig 1. Architecture Diagram

Enough talking. I believe the best way to learn is by doing it, so let’s get our hands dirty. We will start with creating Bedrock Knowledge Base.

Bedrock Knowledge Base

Navigate to the Amazon Bedrock service and use the left panel to switch to Knowledge Bases to get started. Let’s create a knowledge base with unstructured database.

Let AWS create a new service role for the knowledge base and choose S3 as the data source.

Fig 2.1 Bedrock Knowledge Base Setup

In the next step, select the S3 bucket that contains documents you want Bedrock to parse, generate embeddings, store it in a vector store and generate responses. Additionally, we need to select the appropriate option for parsing the source data to extract and structure the information. Let’s select Amazon Bedrock Data Automation as parser but you can even choose a different foundation model if that fits your requirements.

Fig 2.2 Bedrock Knowledge Base Setup

Next step is about selecting the LLM model that you want to use for creating embeddings and the vector store that you want to use for storing those embeddings. I prefer Amazon Titan Text Embeddings model and S3 as the vector store for keeping the cost at minimum. Before opting for vector make sure to understand limitations associated with each of them to help you pick the right one.

Fig 2.3 Bedrock Knowledge Base Setup

On the review screen, confirm all the inputs and proceed with creating the knowledge base. It will take a few minutes to setup the knowledge base and other related resources.

Fig 2.4 Bedrock Knowledge Base Setup

Before we can test the knowledge base, we need to upload some files in the source bucket. I downloaded a few PDF files about Stranger Things series from Wikipedia. Along with these files you will notice metadata files too. These metadata files will help us ensure tenant-level data separation while generating response using Bedrock.

Fig 3. Bedrock KB Data Source

Stranger_Things_season_1.pdf.metadata.json

{
    "metadataAttributes" : {
      "tenantId" : "demo"
    }
}

List_of_Stranger_Things_episodes.pdf.metadata.json

{
    "metadataAttributes" : {
      "tenantId" : "test"
    }
}

Note: The metadata file name must be same as the source file and must end with the extension .metadata.json.

Pro Tip: You can include additional attributes such as user/group id to further restrict access to certain documents by a user/group.

Next step is to sync the data source. During this process, files are fetched from S3 bucket, parsed, and split into chunks to create embeddings and store them in the vector store. This step can take a few minutes depending on the amount of data you are syncing.

Fig 4. Bedrock KB Data Sync

Let’s do a quick check to confirm if the knowledge base is working before we proceed to launch next services.

Fig 5. Bedrock KB Test

Awesome! It seems to be working. Let’s move ahead to see how we integrated the knowledge base as a multi-tenant RAG chatbot within the SaaS product.

IAM Roles

Let’s start with creating two IAM roles: one for our Authorizer Lambda function (optional) and another one for the Bedrock Agent Lambda function.

We used Lambda Authorizer to validate the request and extract tenant id from the token. This approach over relying on host header or request parameter ensures tenant id is not spoofed. If you do not require a Lambda Authorizer, you can skip creating role and policy for it.

IAM role for AuthZ lambda function (optional):

Step 1: Create Role

aws iam create-role --role-name rag-apigw-authz-lambda \
  --assume-role-policy-document '{
    "Version": "2012-10-17",
    "Statement": [
      {
        "Effect": "Allow",
        "Principal": {
          "Service": "lambda.amazonaws.com"
        },
        "Action": "sts:AssumeRole"
      }
    ]
  }'

Step 2: Attach Policy

aws iam attach-role-policy --role-name rag-apigw-authz-lambda --policy-arn arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole

IAM role for Bedrock Agent lambda function:

Step 1: Create Role

aws iam create-role --role-name rag-bedrock-lambda \
  --assume-role-policy-document '{
    "Version": "2012-10-17",
    "Statement": [
      {
        "Effect": "Allow",
        "Principal": {
          "Service": "lambda.amazonaws.com"
        },
        "Action": "sts:AssumeRole"
      }
    ]
  }'

Step 2: Attach Policies

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "bedrock:Retrieve",
                "bedrock:RetrieveAndGenerate"
            ],
            "Effect": "Allow",
            "Resource": [
                "arn:aws:bedrock:AWS_REGION:ACCOUNT_ID:knowledge-base/BEDROCK_KB_ID"
            ]
        },
        {
            "Action": [
                "bedrock:InvokeModel",
                "bedrock:GetInferenceProfile"
            ],
            "Effect": "Allow",
            "Resource": [
                "arn:aws:bedrock:AWS_REGION:ACCOUNT_ID:inference-profile/us.anthropic.claude-sonnet-4-20250514-v1:0",
                "arn:aws:bedrock:*::foundation-model/anthropic.claude-sonnet-4-20250514-v1:0"
            ]
        }
    ]
}

Note: Make sure to replace the placeholders BEDROCK_KB_ID and AWS_REGION in the above IAM policy with their respective values.

To use an LLM model other than Claude Sonnet 4 or in a region other than US, change the foundation model id and inference profile id. You can find them in AWS doc.

aws iam put-role-policy --role-name rag-bedrock-lambda --policy-name rag-bedrock --policy-document file://policy-lambda.json

aws iam attach-role-policy --role-name rag-bedrock-lambda --policy-arn arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole

Now that we have the IAM roles ready, let’s deploy both the lambda functions.

Lambda Functions

AuthZ Lambda Function (Optional)

For simplicity, the below snippet extracts tenant/client id from the Authorization header passed as plain text. In actual, this must be extracted from a verified JWT to enforce tenant authenticity and prevent spoofing.

Tenant ID is added to the context block so that it is made available to the API Gateway Integration request step for further use.

bedrock_apigw_authz.py:

import os
import boto3

sts = boto3.client("sts")


def handler(event: dict, context: dict) -> dict:
    """
    Entry point for the lambda function
    """

    aws_region = os.environ.get("AWS_REGION")
    account_id = sts.get_caller_identity().get("Account")

    # extract tenant/client id from authz header
    tenant_id = event["headers"].get("Authorization").split(" ")[1]
    print(f"tenant_id: {tenant_id}")

    return {
        "principalId": "user",
        "policyDocument": {
            "Version": "2012-10-17",
            "Statement": [
                {
                    "Action": "execute-api:Invoke",
                    "Effect": "Allow",
                    "Resource": f"arn:aws:execute-api:{aws_region}:{account_id}:*",
                }
            ],
        },
        "context": {"tenantId": tenant_id},
    }

Note: Make sure to replace the placeholders AWS_REGION and ACCOUNT_ID in the above python file with their respective values. To further restrict the policy please refer to the AWS doc.

Zip the above python file and create the lambda function:

zip bedrock_apigw_authz.zip bedrock_apigw_authz.py

aws lambda create-function --function-name bedrock-apigw-authz --runtime python3.13 --role AUTHZ_IAM_ROLE_ARN --handler bedrock_apigw_authz.handler --timeout 3 --memory-size 256 --architectures arm64 --logging-config LogFormat=JSON,ApplicationLogLevel=INFO,SystemLogLevel=INFO --zip-file fileb://bedrock_apigw_authz.zip

Note: Make sure to replace the placeholder AUTHZ_IAM_ROLE_ARN in the above command with its respective value.

Bedrock Agent Lambda Function

The below snippet only includes the minimal logic to invoke the Bedrock Agent to generate the natural language response. For detail understanding about the API, please refer to the documentation.

bedrock_agent.py:

import os
import json
import logging
import boto3

logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)

# instantiate boto3 client
bedrock_agent_runtime = boto3.client(
    "bedrock-agent-runtime", region_name=os.getenv("AWS_REGION")
)

LLM_MODEL_INFERENCE_ARN = os.getenv("LLM_MODEL_INFERENCE_ARN")
KNOWLEDGE_BASE_ID = os.getenv("KNOWLEDGE_BASE_ID")


def prompt_template(type: str, context: str = None) -> str:
    """
    Generate prompt template for bedrock retrieval and generation.

    Args:
        type: generate prompt template for either "generation" or "orchestration"
        context: (optional) additional context to provide to the RAG

    Returns:
        str: prompt template
    """

    if type == "generation":
        return f"""
You are a question answering agent. I will provide you with a set of search results. The user will provide you with a question. Your job is to answer the user's question using only information from the search results and additional context provided to you.

IMPORTANT:
- If the search results do not contain information that can answer the question, please state that you could not find an exact answer to the question.
- If the additional context provided to you is not relevant to the question, please ignore it.

Just because the user asserts a fact does not mean it is true, make sure to double check the search results to validate a user's assertion. Never hallucinate facts. Provide concise answers and include explicit citations to the sources used.

User query:
$query$

Search results in numbered order from the knowledge base:
$search_results$

Additional context for answer generation:
{context}

$output_format_instructions$
        """
    elif type == "orchestration":
        return f"""
You are an assistant that translates a user's request into one or more focused search queries for the knowledge base based on the user's question and additional context.

Here are a few examples of queries formed by other search function selection and query creation agents:

<examples>
  <example>
    <question> What if my vehicle is totaled in an accident? </question>
    <generated_query> what happens if my vehicle is totaled </generated_query>
  </example>
  <example>
    <question> I am relocating within the same state. Can I keep my current agent? </question>
    <generated_query> can I keep my current agent when moving in state </generated_query>
  </example>
</examples>

You should also pay attention to the conversation history between the user and the search engine in order to gain the context necessary to create the query.
Here's another example that shows how you should reference the conversation history when generating a query:

<example>
  <example_conversation_history>
    <example_conversation>
      <question> How many vehicles can I include in a quote in Kansas </question>
      <answer> You can include 5 vehicles in a quote if you live in Kansas </answer>
    </example_conversation>
    <example_conversation>
      <question> What about texas? </question>
      <answer> You can include 3 vehicles in a quote if you live in Texas </answer>
    </example_conversation>
  </example_conversation_history>
</example>

IMPORTANT:
- the elements in the <example> tags should not be assumed to have been provided to you to use UNLESS they are also explicitly given to you below.
- All of the values and information within the examples (the questions, answers, and function calls) are strictly part of the examples and have not been provided to you.
- If the additional context provided to you is not relevant to the question, please ignore it while generating the query.

User query:
$query$

Additional context for retrieval query generation:
{context}

User conversation history:
$conversation_history$

$output_format_instructions$
"""


def invoke_rag(
    user_query: str, tenant_id: str, session_id: str = None, context: str = None
) -> dict | str:
    """
    Query bedrock knowledge base to retrieve and generate answer.

    Args:
        user_query: user query to send to RAG for analysis
        tenant_id: tenant/client id to filter the knowledge base results retrieved from vector store
        session_id: (optional) session id to maintain context and knowledge from previous interactions
        context: (optional) context to provide additional information to the RAG

    Returns:
        dict: response from RAG if successful, error message if failed
    """

    try:
        # ref: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/bedrock-agent-runtime/client/retrieve_and_generate.html
        req_args = {
            "input": {"text": user_query},
            "retrieveAndGenerateConfiguration": {
                "knowledgeBaseConfiguration": {
                    "knowledgeBaseId": KNOWLEDGE_BASE_ID,
                    "modelArn": LLM_MODEL_INFERENCE_ARN,
                    "retrievalConfiguration": {
                        "vectorSearchConfiguration": {
                            "numberOfResults": 5,
                            "overrideSearchType": "SEMANTIC",
                            # allows retrieval of data that belong to the tenant
                            "filter": {
                                "equals": {"key": "tenantId", "value": tenant_id}
                            },
                        }
                    },
                    "generationConfiguration": {
                        "promptTemplate": {
                            "textPromptTemplate": prompt_template("generation", context)
                        },
                    },
                    "orchestrationConfiguration": {
                        "promptTemplate": {
                            "textPromptTemplate": prompt_template(
                                "orchestration", context
                            )
                        },
                        "queryTransformationConfiguration": {
                            "type": "QUERY_DECOMPOSITION"
                        },
                    },
                },
                "type": "KNOWLEDGE_BASE",
            },
        }

        # assign session id to maintain context and knowledge from previous interactions
        if session_id is not None:
            req_args["sessionId"] = session_id

        # query rag to generate response
        response = bedrock_agent_runtime.retrieve_and_generate(**req_args)

        return {"statusCode": 200, "message": response}
    except Exception:
        logger.exception("failed to generate response")
        return {
            "statusCode": 500,
            "message": {"error": "failed to generate response"},
        }


def handler(event: dict, context: dict) -> dict:
    """
    Entry point for the lambda function
    """
    req_body = json.loads(event["body"])

    # prepare request arguments
    req_args = {"user_query": req_body["user_query"], "tenant_id": context.tenant_id}

    # load session id to maintain context and knowledge from previous interactions
    if "session_id" in req_body:
        req_args["session_id"] = req_body["session_id"]

    # load context to provide additional information to the RAG
    if "context" in req_body:
        req_args["context"] = req_body["context"]

    response = invoke_rag(**req_args)

    return {
        "statusCode": response["statusCode"],
        "body": json.dumps(response["message"]),
        "headers": {
            "Content-Type": "application/json",
        },
    }

Note: Both the prompts used in the above code are sourced from AWS.

The above snippet starts with extracting values from request body such as user_query, tenant_id and optionally session_id and context which are used as inputs along with a few other values to generate natural language response.

The important point to focus is the use of filter parameter within vectorSearchConfiguration block while preparing request body for bedrock_agent_runtime.retrieve_and_generate method. This filter parameter ensures that when Bedrock fetches data from the vector store it only refers to the data owned by the respective tenant.

"retrievalConfiguration": {
    "vectorSearchConfiguration": {
        ...
        "filter": {
            "equals": {
                "key": "tenantId", 
                "value": tenant_id
            }
        },
    }
}

In case you are separating client data at bucket or folder level, you can filter using S3 prefixes which is a built-in metadata. This avoids maintaining custom metadata files.

"retrievalConfiguration": {
    "vectorSearchConfiguration": {
        ...
        "filter": {
            "startsWith": {
                "key": "x-amz-bedrock-kb-source-uri", 
                "value": f"s3://$bucket_name/"
            }
        },
    }
}

Note: startsWith and stringContains is not supported for S3 vector store. Please refer to the AWS doc to learn more about using the filters.

Time to zip the above python file and create the lambda function with tenant-isolation mode enabled to ensure client data segregation at compute layer.

zip bedrock_agent.zip bedrock_agent.py

aws lambda create-function --function-name bedrock-agent --runtime python3.13 --role BEDROCK_AGENT_LAMBDA_ROLE_ARN --handler bedrock_agent.handler --timeout 15 --memory-size 256 --architectures arm64 --logging-config LogFormat=JSON,ApplicationLogLevel=INFO,SystemLogLevel=INFO --tenancy-config TenantIsolationMode=PER_TENANT --zip-file fileb://bedrock_agent.zip --environment 'Variables={LLM_MODEL_INFERENCE_ARN=xxxx,KNOWLEDGE_BASE_ID=xxxx}'

Note: Make sure to replace the placeholder BEDROCK_AGENT_LAMBDA_ROLE_ARN in the above command with its respective value and update values for both the environment variables LLM_MODEL_INFERENCE_ARN and KNOWLEDGE_BASE_ID.

It’s time to deploy the final piece to complete the puzzle.

REST API Gateway

At the time of writing this article, HTTP API Gateway cannot be used with a multi-tenant enabled lambda function because it does not support overriding the X-Amz-Tenant-Id header, hence we create REST API Gateway. However, if you are deploying a lambda function without tenant-isolation mode, you can surely use HTTP API Gateway.

openapi-schema.json

{
  "openapi" : "3.0.1",
  "info" : {
    "title" : "bedrock-rag-chatbot",
    "description" : "Bedrock RAG chatbot REST API Gateway",
    "version" : "1.0"
  },
  "paths" : {
    "/genai/query" : {
      "post" : {
        "security" : [ {
          "apigw-lambda-authz" : [ ]
        } ],
        "x-amazon-apigateway-integration" : {
          "type" : "aws_proxy",
          "httpMethod" : "POST",
          "uri" : "arn:aws:apigateway:us-east-1:lambda:path/2015-03-31/functions/BEDROCK_LAMBDA_FUNCTION_ARN/invocations",
          "requestParameters" : {
            "integration.request.header.X-Amz-Tenant-Id" : "context.authorizer.tenantId"
          },
          "responses" : {
            "default" : {
              "statusCode" : "200"
            }
          }
        }
      }
    }
  },
  "components" : {
    "securitySchemes" : {
      "apigw-lambda-authz" : {
        "type" : "apiKey",
        "name" : "Unused",
        "in" : "header",
        "x-amazon-apigateway-authtype" : "custom",
        "x-amazon-apigateway-authorizer" : {
          "identitySource" : "method.request.header.Authorization,context.httpMethod,context.path",
          "authorizerUri" : "arn:aws:apigateway:us-east-1:lambda:path/2015-03-31/functions/AUTHZ_LAMBDA_FUNCTION_ARN/invocations",
          "authorizerResultTtlInSeconds" : 300,
          "type" : "request"
        }
      }
    }
  }
}

Note: Do not forget to update the placeholder BEDROCK_LAMBDA_FUNCTION_ARN and AUTHZ_LAMBDA_FUNCTION_ARN in the above OpenAPI file with their actual values.

In case you are not using Lambda Authorizer to extract tenant id and add it to the context, follow the steps provided in the AWS blog to extract it from http request.

Step 1: Create/Import REST API

aws apigateway import-rest-api --body 'fileb://openapi-schema.json'

Step 2: Deploy API

aws apigateway create-deployment --rest-api-id REST_API_ID --stage-name demo

Next, we need to update resource policy for both the lambda functions to allow API gateway to invoke them.

Fetch Lambda Authorizer ID

aws apigateway get-authorizers --rest-api-id REST_API_ID

Permission for Authoriser Lambda

aws lambda add-permission --function-name bedrock-apigw-authz --action lambda:InvokeFunction --statement-id apigw-invoke --principal apigateway.amazonaws.com --source-arn arn:aws:execute-api:AWS_REGION:ACCOUNT_ID:REST_API_ID/authorizers/REST_API_AUTHORIZER_ID

Permission for Agent Lambda

aws lambda add-permission --function-name bedrock-agent --action lambda:InvokeFunction --statement-id apigw-invoke --principal apigateway.amazonaws.com --source-arn arn:aws:execute-api:AWS_REGION:ACCOUNT_ID:REST_API_ID/*/POST/*

Fun time

It’s finally time to validate if the entire setup is working as expected. I’m using curl tool for making the API calls but you can use tools such as Postman Client as well.

We will test for the following cases:

Client test has access to List_of_Stranger_Things_episodes.pdf, so must be able to list episodes for all of the seasons.
Client demo can only access Stranger_Things_season_1.pdf, so can give information about season 1 but nothing about any other season.
Invalid client dummy should not be able to retrieve any sort of information.

Fig 6.1 Client test

Fig 6.2.1 Client demo - Unable to fetch information about season 2 episodes

Fig 6.2.2 Client demo - Showing information about season 1 episodes

Fig 6.3 Invalid Client dummy

Mission accomplished! We have successfully created a secure multi-tenant RAG chatbot.

Caveats

This article does not cover about applying guardrails to the knowledge base which is recommended to safeguard the content generated by LLM models against sensitive data, profanity and hallucination. Additionally, AWS WAF should be applied to the REST API Gateway to protect applications against Layer 7 attacks.

Last but not the least, rigorously test the responses generated by various models available within Bedrock and tweak the values passed to the Bedrock API to better meet the requirements for your use-case.

Building a Secure, Serverless Multi-Tenant RAG Chatbot with Amazon Bedrock and Lambda