Building AgentoCore: Enterprise-Grade Autonomous AI Agents with AWS Bedrock and EventBridge

Introduction: The Shift from Chatbots to Autonomous Agents

In 2024, we built chatbots. Systems that answered questions, generated text, and sometimes retrieved information from vector databases. By 2025, the paradigm shifted to autonomous agents—AI systems that reason, plan, use tools, maintain memory, and execute multi-step workflows without constant human intervention.

The difference is profound:

Chatbots: "What's the weather?" → Answer
Agents: "Book a flight to San Francisco for next week" → Search flights, compare prices, check calendar, book ticket, add to calendar, send confirmation

Building production-grade autonomous agents requires more than calling an LLM API. You need:

ReAct Loop: Reasoning + Acting in iterative cycles
Conversational Memory: Multi-turn context persistence
Tool Validation: Self-healing when tools fail or schemas change
Event-Driven Architecture: Asynchronous, scalable, decoupled
Enterprise Observability: X-Ray tracing, CloudWatch metrics, structured logging
Security Hardening: Least privilege IAM, secrets management, input validation
Cost Optimization: Token counting, caching, model selection
Infrastructure as Code: Repeatable deployments with AWS CDK

This is AgentoCore—a production-ready framework for building autonomous AI agents on AWS. Built with AWS Bedrock, DynamoDB, EventBridge, Lambda, and Python. Deployed with AWS CDK. Battle-tested in enterprise environments.

Core Components

AgentoCore is built on a modern serverless architecture that separates synchronous agent execution from asynchronous background processing. The system uses AWS managed services to eliminate operational overhead while maintaining enterprise-grade reliability and scalability.

High-level architecture: API Gateway triggers Agent Lambda, which uses Bedrock Converse for reasoning, stores state in DynamoDB, and emits events to EventBridge for asynchronous worker processing

The architecture follows a clear separation of concerns with three distinct layers:

AWS Bedrock (Converse API): Managed inference with Claude Sonnet 4.5 for reasoning and tool calling
DynamoDB: Conversational memory with single-table design and TTL-based expiration
EventBridge: Event-driven orchestration for asynchronous agent workflows
Lambda (Python 3.12): Serverless compute with Pydantic-based tool validation
CloudWatch + X-Ray: Distributed tracing, structured logging, and performance metrics
AWS CDK (TypeScript): Infrastructure as Code with type-safe constructs

Why This Stack?

AWS Bedrock Converse API provides a unified interface for Claude models with native tool calling support. No need to manually parse JSON from LLM responses or implement retry logic for malformed outputs.

DynamoDB offers single-digit millisecond latency for conversation retrieval and built-in TTL for automatic memory expiration. No database servers to manage.

EventBridge decouples agent invocations from execution. You can trigger agents via API Gateway, S3 events, SNS notifications, or scheduled rules—all without changing agent code.

AWS CDK eliminates CloudFormation YAML sprawl. Infrastructure is code, with IDE autocomplete, type checking, and unit tests.

Deep Dive: ReAct Loop with Bedrock Converse API

The ReAct (Reasoning + Acting) pattern is the foundation of autonomous agents. Unlike traditional chatbots that simply respond to queries, ReAct agents think through problems step-by-step, using tools to gather information and execute actions. The agent iteratively:

Reasons: Analyzes the current state and decides what to do next
Acts: Calls a tool (e.g., search database, fetch weather, send email)
Observes: Receives tool result and updates its understanding
Repeats: Until the task is complete or max iterations reached

Let's see how this works in practice with a real-world example. When a user requests a refund, the agent must load conversation context, reason about the request, check order status, verify the reason, trigger the refund process, and provide a final response—all autonomously.

ReAct Loop Sequence Diagram - Refund Processing Example

ReAct loop in action: Agent processes a refund request through 7 iterations, loading context, executing tools (get_order_status, trigger_refund), and handling async actions via EventBridge

Implementation with Bedrock Converse API

The following implementation shows how to build a production-ready ReAct loop using AWS Bedrock's Converse API. This code handles the complete cycle of reasoning, tool execution, and result observation, with automatic retry logic and error handling.

import boto3
from typing import List, Dict, Any
from pydantic import BaseModel

class Tool(BaseModel):
    name: str
    description: str
    input_schema: Dict[str, Any]

class AgentCore:
    def __init__(self, model_id: str = "anthropic.claude-sonnet-4-5-20250929-v1:0"):
        self.bedrock = boto3.client("bedrock-runtime")
        self.model_id = model_id
        self.max_iterations = 10

    def execute_react_loop(
        self,
        user_message: str,
        tools: List[Tool],
        conversation_history: List[Dict]
    ) -> str:
        """
        Execute ReAct loop until task completion or max iterations.
        """
        messages = conversation_history + [
            {"role": "user", "content": [{"text": user_message}]}
        ]

        for iteration in range(self.max_iterations):
            # Step 1: Reasoning - Get model decision
            response = self.bedrock.converse(
                modelId=self.model_id,
                messages=messages,
                toolConfig={
                    "tools": [self._convert_tool(t) for t in tools]
                },
                inferenceConfig={
                    "maxTokens": 4096,
                    "temperature": 0.7,
                    "topP": 0.9
                }
            )

            # Extract stop reason
            stop_reason = response["stopReason"]

            # If model finished without tool call, return final answer
            if stop_reason == "end_turn":
                final_message = response["output"]["message"]
                return final_message["content"][0]["text"]

            # Step 2: Acting - Execute tool calls
            if stop_reason == "tool_use":
                assistant_message = response["output"]["message"]
                messages.append(assistant_message)

                tool_results = []
                for content_block in assistant_message["content"]:
                    if "toolUse" in content_block:
                        tool_use = content_block["toolUse"]
                        tool_name = tool_use["name"]
                        tool_input = tool_use["input"]
                        tool_use_id = tool_use["toolUseId"]

                        # Step 3: Observing - Get tool result
                        result = self._execute_tool(tool_name, tool_input)

                        tool_results.append({
                            "toolResult": {
                                "toolUseId": tool_use_id,
                                "content": [{"text": str(result)}]
                            }
                        })

                # Add tool results to conversation
                messages.append({
                    "role": "user",
                    "content": tool_results
                })

            else:
                # Unexpected stop reason
                raise ValueError(f"Unexpected stop reason: {stop_reason}")

        raise RuntimeError("Max iterations reached without completion")

    def _convert_tool(self, tool: Tool) -> Dict:
        """Convert Pydantic tool to Bedrock tool format."""
        return {
            "toolSpec": {
                "name": tool.name,
                "description": tool.description,
                "inputSchema": {
                    "json": tool.input_schema
                }
            }
        }

    def _execute_tool(self, tool_name: str, tool_input: Dict) -> Any:
        """Execute tool and return result."""
        # Tool execution logic here
        # In production, use registry pattern with Pydantic validation
        pass

Why Converse API Over InvokeModel?

Converse API provides native tool calling with automatic JSON schema validation. You define tools once, and Bedrock handles:

Schema validation of tool inputs
Multi-turn conversation management
Tool result injection into context
Stop reason handling

InvokeModel requires manual prompt engineering, JSON parsing, and retry logic. Converse API is production-ready out of the box.

Deep Dive: DynamoDB Schema for Conversational Memory

Autonomous agents need persistent memory across multiple interactions. DynamoDB provides fast, scalable storage with a single-table design pattern.

Table Schema Design

DynamoDB Single-Table Design for Conversation Memory

Single-table design: PK combines conversation_id and turn_number, GSI enables querying by user_id, TTL expires old conversations automatically

The schema uses a composite primary key where the partition key (PK) identifies the conversation and the sort key (SK) orders messages chronologically. This design enables efficient retrieval of entire conversation histories with a single Query operation, crucial for the ReAct loop which needs access to recent context.

DynamoDB Conversation Memory Implementation

This implementation provides methods for saving conversation turns with automatic TTL expiration and retrieving conversation history with configurable memory windows. The ConversationMemory class handles all DynamoDB interactions with built-in error handling and token counting.

import boto3
from datetime import datetime, timedelta
from typing import List, Dict, Optional
from pydantic import BaseModel

class ConversationTurn(BaseModel):
    conversation_id: str
    turn_number: int
    role: str  # "user" or "assistant"
    content: List[Dict]
    timestamp: datetime
    token_count: int
    model_id: str

class ConversationMemory:
    def __init__(self, table_name: str = "agentocore-conversations"):
        self.dynamodb = boto3.resource("dynamodb")
        self.table = self.dynamodb.Table(table_name)
        self.ttl_days = 30  # Auto-expire after 30 days

    def save_turn(self, turn: ConversationTurn):
        """Save conversation turn to DynamoDB."""
        ttl = int((datetime.now() + timedelta(days=self.ttl_days)).timestamp())

        self.table.put_item(
            Item={
                "PK": f"CONV#{turn.conversation_id}",
                "SK": f"TURN#{turn.turn_number:05d}",
                "conversation_id": turn.conversation_id,
                "turn_number": turn.turn_number,
                "role": turn.role,
                "content": turn.content,
                "timestamp": turn.timestamp.isoformat(),
                "token_count": turn.token_count,
                "model_id": turn.model_id,
                "ttl": ttl
            }
        )

    def get_conversation_history(
        self,
        conversation_id: str,
        max_turns: Optional[int] = None
    ) -> List[Dict]:
        """Retrieve conversation history for ReAct loop."""
        response = self.table.query(
            KeyConditionExpression="PK = :pk",
            ExpressionAttributeValues={
                ":pk": f"CONV#{conversation_id}"
            },
            ScanIndexForward=True  # Oldest first
        )

        items = response["Items"]

        # Limit to last N turns if specified
        if max_turns:
            items = items[-max_turns:]

        # Convert to Bedrock message format
        messages = []
        for item in items:
            messages.append({
                "role": item["role"],
                "content": item["content"]
            })

        return messages

    def get_token_count(self, conversation_id: str) -> int:
        """Calculate total tokens in conversation for cost tracking."""
        response = self.table.query(
            KeyConditionExpression="PK = :pk",
            ExpressionAttributeValues={
                ":pk": f"CONV#{conversation_id}"
            },
            ProjectionExpression="token_count"
        )

        return sum(item["token_count"] for item in response["Items"])

Why Single-Table Design?

Performance: Single query retrieves entire conversation (no joins)
Cost: Fewer read capacity units than multiple tables
Scalability: DynamoDB auto-scales without schema migrations
TTL: Automatic cleanup of expired conversations (GDPR compliance)

Memory Window Strategy

Long conversations exceed model context windows (200K tokens for Claude Sonnet 4.5). Implement sliding window with max_turns parameter:

Last 10 turns: Fast retrieval, limited context
Semantic compression: Summarize old turns with LLM before adding to context
Hybrid approach: Keep full system prompts + last N user/assistant turns

Deep Dive: Pydantic-Based Tool Validation and Self-Healing

Tool schemas change over time. APIs add required fields, deprecate parameters, or change response formats. Agents must handle schema evolution gracefully without manual code updates.

Pydantic Tool Registry

The tool registry pattern centralizes tool management and provides automatic validation for all tool inputs and outputs. By using Pydantic models, we get type safety, automatic JSON schema generation, and custom validation rules—all essential for production AI systems where tools might be called with unpredictable inputs from LLM reasoning.

Tool Registry with Automatic Validation

This implementation shows how to build a registry that automatically validates tool inputs using Pydantic schemas before execution. When validation fails, the registry returns structured errors that help the agent self-correct in the next ReAct iteration.

from pydantic import BaseModel, Field, validator
from typing import List, Dict, Any, Callable
import inspect

class WeatherToolInput(BaseModel):
    location: str = Field(description="City name, e.g., 'San Francisco'")
    units: str = Field(default="celsius", description="Temperature units: celsius or fahrenheit")

    @validator("units")
    def validate_units(cls, v):
        if v not in ["celsius", "fahrenheit"]:
            raise ValueError("Units must be 'celsius' or 'fahrenheit'")
        return v

class WeatherToolOutput(BaseModel):
    temperature: float
    conditions: str
    humidity: int

class ToolRegistry:
    def __init__(self):
        self.tools: Dict[str, Callable] = {}
        self.input_schemas: Dict[str, type[BaseModel]] = {}
        self.output_schemas: Dict[str, type[BaseModel]] = {}

    def register(
        self,
        name: str,
        input_schema: type[BaseModel],
        output_schema: type[BaseModel]
    ):
        """Decorator to register tool with Pydantic validation."""
        def decorator(func: Callable):
            self.tools[name] = func
            self.input_schemas[name] = input_schema
            self.output_schemas[name] = output_schema
            return func
        return decorator

    def execute(self, tool_name: str, tool_input: Dict) -> Dict:
        """Execute tool with automatic validation and error recovery."""
        if tool_name not in self.tools:
            return {
                "error": f"Unknown tool: {tool_name}",
                "suggestion": f"Available tools: {list(self.tools.keys())}"
            }

        try:
            # Validate input with Pydantic
            input_schema = self.input_schemas[tool_name]
            validated_input = input_schema(**tool_input)

            # Execute tool
            result = self.tools[tool_name](validated_input)

            # Validate output
            output_schema = self.output_schemas[tool_name]
            validated_output = output_schema(**result)

            return validated_output.dict()

        except Exception as e:
            # Self-healing: Return structured error with fix suggestions
            return {
                "error": str(e),
                "tool_name": tool_name,
                "provided_input": tool_input,
                "expected_schema": input_schema.schema(),
                "suggestion": self._generate_fix_suggestion(e, tool_input)
            }

    def _generate_fix_suggestion(self, error: Exception, tool_input: Dict) -> str:
        """Use LLM to generate fix suggestion for validation errors."""
        # In production, call Bedrock with error context
        # For now, return generic suggestion
        return f"Check input parameters against schema. Error: {str(error)}"

    def get_tool_configs(self) -> List[Dict]:
        """Convert registry to Bedrock tool format."""
        tools = []
        for name, input_schema in self.input_schemas.items():
            tools.append({
                "name": name,
                "description": input_schema.__doc__ or f"Tool: {name}",
                "input_schema": input_schema.schema()
            })
        return tools

# Usage example
registry = ToolRegistry()

@registry.register(
    name="get_weather",
    input_schema=WeatherToolInput,
    output_schema=WeatherToolOutput
)
def get_weather(input_data: WeatherToolInput) -> Dict:
    """Get current weather for a location."""
    # API call here
    return {
        "temperature": 22.5,
        "conditions": "Sunny",
        "humidity": 65
    }

Self-Healing Workflow

Agent calls tool with invalid input (e.g., missing required field)
Pydantic validation fails and returns structured error
Error includes expected schema in JSON format
Agent receives error in next ReAct iteration
Agent reasons about error and retries with corrected input
Tool succeeds on second attempt

                    Why Pydantic Over Manual Validation?
                    Type Safety: IDE autocomplete and static analysis
Schema Export: Automatic OpenAPI/JSON Schema generation
Custom Validators: Complex business logic in decorators
Self-Documentation: Field descriptions become tool descriptions
Performance: Compiled validators (10-50x faster than manual checks)

                

Deep Dive: Event-Driven Architecture with EventBridge

Traditional agent architectures use synchronous request-response patterns. This creates tight coupling between trigger and execution. EventBridge enables asynchronous, scalable, event-driven workflows that separate agent reasoning from long-running background tasks.

AgentoCore implements a three-layer architecture that balances synchronous user interactions with asynchronous background processing:

AgentoCore Detailed Architecture with Agent, Event-Driven, and Observability Layers

Detailed architecture: Agent layer handles synchronous ReAct loops (200ms target), event-driven layer processes async actions via EventBridge and Worker Lambdas, observability layer provides X-Ray tracing and CloudWatch metrics

The agent layer runs synchronously with a 200ms target response time, executing ReAct loops and validating tool requests with Pydantic. The event-driven layer handles time-consuming operations asynchronously—payment processing, email notifications, and database updates—through EventBridge content-based routing. The observability layer captures every interaction through CloudWatch Logs, X-Ray distributed tracing, and custom metrics for cost analysis.

Event Schema

Events in AgentoCore follow a standardized schema that includes agent identification, conversation context, and priority levels for routing. This structure enables EventBridge to intelligently route events to appropriate worker lambdas based on content and urgency.

Agent Invocation Event Structure

The AgentInvocationEvent schema defines all required data for agent execution. Priority levels (low, normal, high) enable routing rules to direct urgent requests to dedicated high-priority Lambda functions with reserved concurrency, ensuring critical operations aren't delayed by routine traffic.

import json
import boto3
from datetime import datetime
from pydantic import BaseModel
from typing import Dict, Any, Optional

class AgentInvocationEvent(BaseModel):
    """Event schema for agent invocation via EventBridge."""
    agent_id: str
    conversation_id: str
    user_message: str
    user_id: Optional[str] = None
    metadata: Optional[Dict[str, Any]] = None
    priority: str = "normal"  # "low", "normal", "high"

class EventBridgePublisher:
    def __init__(self, event_bus_name: str = "agentocore-events"):
        self.events = boto3.client("events")
        self.event_bus_name = event_bus_name

    def invoke_agent(self, event: AgentInvocationEvent):
        """Publish agent invocation event to EventBridge."""
        self.events.put_events(
            Entries=[
                {
                    "Time": datetime.now(),
                    "Source": "agentocore.agent.invocation",
                    "DetailType": "AgentInvocationRequested",
                    "Detail": event.json(),
                    "EventBusName": self.event_bus_name
                }
            ]
        )

# Lambda handler for agent execution
def agent_execution_handler(event, context):
    """
    Lambda function triggered by EventBridge.
    Executes ReAct loop and publishes result event.
    """
    detail = event["detail"]
    invocation = AgentInvocationEvent(**json.loads(detail))

    # Initialize agent
    agent = AgentCore()
    memory = ConversationMemory()

    # Get conversation history
    history = memory.get_conversation_history(invocation.conversation_id)

    # Execute ReAct loop
    result = agent.execute_react_loop(
        user_message=invocation.user_message,
        tools=registry.get_tool_configs(),
        conversation_history=history
    )

    # Save result to memory
    memory.save_turn(ConversationTurn(
        conversation_id=invocation.conversation_id,
        turn_number=len(history) + 1,
        role="assistant",
        content=[{"text": result}],
        timestamp=datetime.now(),
        token_count=len(result.split()),  # Approximate
        model_id=agent.model_id
    ))

    # Publish completion event
    events = boto3.client("events")
    events.put_events(
        Entries=[
            {
                "Time": datetime.now(),
                "Source": "agentocore.agent.execution",
                "DetailType": "AgentExecutionCompleted",
                "Detail": json.dumps({
                    "conversation_id": invocation.conversation_id,
                    "agent_id": invocation.agent_id,
                    "result": result,
                    "execution_time_ms": context.get_remaining_time_in_millis()
                })
            }
        ]
    )

EventBridge Routing Rules

EventBridge uses content-based routing to direct events to specialized worker lambdas based on business rules. This pattern enables different processing paths for standard requests versus high-value transactions that require manual approval workflows.

{
  "source": ["agentocore.agent.invocation"],
  "detail-type": ["AgentInvocationRequested"],
  "detail": {
    "priority": ["high"]
  }
}

The routing architecture separates standard automated processing from high-value transactions that require human oversight. When the agent emits an event, EventBridge evaluates routing rules to determine the appropriate processing path.

EventBridge Content-Based Routing Architecture

EventBridge routing: Standard requests (under $1,000) flow to Worker Lambda 1 for automated processing, while high-value requests (over $1,000) route to Worker Lambda 2 with manual approval queue. Failed operations trigger DLQ with exponential backoff and SNS alerts.

Production statistics show this architecture maintains 200ms average synchronous response time while processing 15K events per day with a 0.3% dead letter queue rate. The separation of concerns enables independent scaling—standard requests can auto-scale to thousands of concurrent executions while high-value approvals maintain manual review capacity.

Benefits of Event-Driven Architecture

Decoupling: API Gateway, S3, SNS can all trigger agents without knowing Lambda ARNs
Scalability: EventBridge handles 100K+ events/second automatically
Routing: Priority-based execution with separate Lambda concurrency limits
Audit Trail: EventBridge archive captures all events for compliance
Fan-out: One event can trigger multiple downstream systems (analytics, monitoring, notifications)

Deep Dive: Enterprise Observability

Production AI agents require comprehensive observability. X-Ray for distributed tracing, CloudWatch for metrics and logs, and custom instrumentation for agent-specific insights.

X-Ray Distributed Tracing

AWS X-Ray Service Map showing agent execution flow

X-Ray Service Map: Trace request from API Gateway → EventBridge → Lambda → Bedrock → DynamoDB

X-Ray distributed tracing provides end-to-end visibility into agent execution, from API Gateway through Bedrock inference to DynamoDB persistence. Each trace captures latency for every AWS service call, making it trivial to identify bottlenecks—typically Bedrock inference accounts for 80% of total execution time.

Instrumenting Agent Code with X-Ray

The AWS X-Ray SDK automatically traces boto3 calls when you use patch_all(), but custom annotations and metadata provide additional context for debugging agent behavior. This implementation shows how to capture agent-specific metrics like iteration count and tool usage within X-Ray traces.

from aws_xray_sdk.core import xray_recorder
from aws_xray_sdk.core import patch_all
import boto3

# Patch boto3 for automatic tracing
patch_all()

class ObservableAgentCore(AgentCore):
    """Agent with X-Ray instrumentation."""

    @xray_recorder.capture("execute_react_loop")
    def execute_react_loop(self, user_message, tools, conversation_history):
        # Add metadata to trace
        xray_recorder.put_annotation("conversation_id", self.conversation_id)
        xray_recorder.put_annotation("tool_count", len(tools))
        xray_recorder.put_metadata("user_message", user_message)

        return super().execute_react_loop(user_message, tools, conversation_history)

    @xray_recorder.capture("bedrock_inference")
    def _call_bedrock(self, messages, tools):
        subsegment = xray_recorder.current_subsegment()

        # Record input tokens
        input_tokens = sum(len(str(m)) for m in messages) // 4  # Approximate
        subsegment.put_metadata("input_tokens", input_tokens)

        response = self.bedrock.converse(
            modelId=self.model_id,
            messages=messages,
            toolConfig={"tools": tools}
        )

        # Record output tokens
        output_tokens = response["usage"]["outputTokens"]
        subsegment.put_metadata("output_tokens", output_tokens)
        subsegment.put_annotation("stop_reason", response["stopReason"])

        return response

CloudWatch Logs Insights Queries

CloudWatch Logs Insights query for agent performance analysis

CloudWatch Logs Insights: Analyze agent performance, tool usage, and error rates

# Query: Agent execution time by conversation
fields @timestamp, conversation_id, execution_time_ms
| filter detail_type = "AgentExecutionCompleted"
| stats avg(execution_time_ms) as avg_time, max(execution_time_ms) as max_time by conversation_id
| sort avg_time desc

# Query: Most frequently called tools
fields @timestamp, tool_name
| filter detail_type = "ToolExecuted"
| stats count() as call_count by tool_name
| sort call_count desc

# Query: Error rate by agent
fields @timestamp, agent_id, error_message
| filter level = "ERROR"
| stats count() as error_count by agent_id
| sort error_count desc

Custom CloudWatch Metrics

While Lambda provides built-in metrics (invocations, duration, errors), agent-specific metrics require custom CloudWatch metrics. Tracking token usage, tool call counts, and success rates enables cost optimization and performance tuning that standard Lambda metrics can't provide.

Publishing Agent Metrics to CloudWatch

This metrics publisher sends custom dimensions to CloudWatch, enabling dashboards that show per-agent performance and cost trends. The AgentId dimension allows filtering metrics by specific agents in multi-tenant deployments.

import boto3
from datetime import datetime

class MetricsPublisher:
    def __init__(self, namespace: str = "AgentoCore"):
        self.cloudwatch = boto3.client("cloudwatch")
        self.namespace = namespace

    def publish_agent_metrics(
        self,
        agent_id: str,
        execution_time_ms: int,
        token_count: int,
        tool_calls: int,
        success: bool
    ):
        """Publish agent execution metrics to CloudWatch."""
        self.cloudwatch.put_metric_data(
            Namespace=self.namespace,
            MetricData=[
                {
                    "MetricName": "ExecutionTime",
                    "Value": execution_time_ms,
                    "Unit": "Milliseconds",
                    "Timestamp": datetime.now(),
                    "Dimensions": [
                        {"Name": "AgentId", "Value": agent_id}
                    ]
                },
                {
                    "MetricName": "TokenCount",
                    "Value": token_count,
                    "Unit": "Count",
                    "Timestamp": datetime.now(),
                    "Dimensions": [
                        {"Name": "AgentId", "Value": agent_id}
                    ]
                },
                {
                    "MetricName": "ToolCalls",
                    "Value": tool_calls,
                    "Unit": "Count",
                    "Timestamp": datetime.now(),
                    "Dimensions": [
                        {"Name": "AgentId", "Value": agent_id}
                    ]
                },
                {
                    "MetricName": "SuccessRate",
                    "Value": 1.0 if success else 0.0,
                    "Unit": "Percent",
                    "Timestamp": datetime.now(),
                    "Dimensions": [
                        {"Name": "AgentId", "Value": agent_id}
                    ]
                }
            ]
        )

                    Key Observability Metrics
                    Execution Time: p50, p95, p99 latency for ReAct loops
Token Usage: Input/output tokens per conversation (cost tracking)
Tool Success Rate: Percentage of successful tool executions
Iteration Count: Average ReAct iterations per task
Error Rate: Failed agent executions by error type
Memory Size: Average conversation length in turns

                

Deep Dive: Infrastructure as Code with AWS CDK

Deploy the entire AgentoCore stack with a single cdk deploy command. AWS CDK provides type-safe infrastructure definitions with full IDE support.

CDK Stack Definition

AWS CDK eliminates the pain of CloudFormation YAML by providing type-safe infrastructure definitions in TypeScript. The following stack definition creates all AgentoCore resources with proper IAM permissions, observability configuration, and production-ready settings like point-in-time recovery and X-Ray tracing.

Complete AgentoCore Infrastructure Stack

This CDK stack defines the entire architecture: DynamoDB table with GSI and TTL, Lambda function with proper timeouts and memory, EventBridge event bus with routing rules, and a CloudWatch dashboard for monitoring. All resources are configured with production best practices including IAM least privilege and automatic scaling.

import * as cdk from 'aws-cdk-lib';
import * as lambda from 'aws-cdk-lib/aws-lambda';
import * as dynamodb from 'aws-cdk-lib/aws-dynamodb';
import * as events from 'aws-cdk-lib/aws-events';
import * as targets from 'aws-cdk-lib/aws-events-targets';
import * as iam from 'aws-cdk-lib/aws-iam';
import { Construct } from 'constructs';

export class AgentoCoreStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    // DynamoDB table for conversation memory
    const conversationsTable = new dynamodb.Table(this, 'ConversationsTable', {
      tableName: 'agentocore-conversations',
      partitionKey: { name: 'PK', type: dynamodb.AttributeType.STRING },
      sortKey: { name: 'SK', type: dynamodb.AttributeType.STRING },
      billingMode: dynamodb.BillingMode.PAY_PER_REQUEST,
      timeToLiveAttribute: 'ttl',
      pointInTimeRecovery: true,
      stream: dynamodb.StreamViewType.NEW_AND_OLD_IMAGES
    });

    // GSI for querying by user_id
    conversationsTable.addGlobalSecondaryIndex({
      indexName: 'UserIdIndex',
      partitionKey: { name: 'user_id', type: dynamodb.AttributeType.STRING },
      sortKey: { name: 'timestamp', type: dynamodb.AttributeType.STRING },
      projectionType: dynamodb.ProjectionType.ALL
    });

    // Lambda function for agent execution
    const agentFunction = new lambda.Function(this, 'AgentFunction', {
      functionName: 'agentocore-agent-executor',
      runtime: lambda.Runtime.PYTHON_3_12,
      handler: 'agent.handler',
      code: lambda.Code.fromAsset('lambda'),
      timeout: cdk.Duration.seconds(300),
      memorySize: 1024,
      tracing: lambda.Tracing.ACTIVE,
      environment: {
        CONVERSATIONS_TABLE: conversationsTable.tableName,
        MODEL_ID: 'anthropic.claude-sonnet-4-5-20250929-v1:0',
        MAX_ITERATIONS: '10'
      }
    });

    // Grant permissions
    conversationsTable.grantReadWriteData(agentFunction);

    agentFunction.addToRolePolicy(new iam.PolicyStatement({
      effect: iam.Effect.ALLOW,
      actions: ['bedrock:InvokeModel', 'bedrock:Converse'],
      resources: ['*']
    }));

    // EventBridge event bus
    const eventBus = new events.EventBus(this, 'EventBus', {
      eventBusName: 'agentocore-events'
    });

    // EventBridge rule for agent invocations
    const agentInvocationRule = new events.Rule(this, 'AgentInvocationRule', {
      eventBus: eventBus,
      eventPattern: {
        source: ['agentocore.agent.invocation'],
        detailType: ['AgentInvocationRequested']
      }
    });

    agentInvocationRule.addTarget(new targets.LambdaFunction(agentFunction));

    // CloudWatch dashboard
    const dashboard = new cdk.aws_cloudwatch.Dashboard(this, 'Dashboard', {
      dashboardName: 'AgentoCore-Metrics'
    });

    dashboard.addWidgets(
      new cdk.aws_cloudwatch.GraphWidget({
        title: 'Agent Execution Time',
        left: [agentFunction.metricDuration()],
      }),
      new cdk.aws_cloudwatch.GraphWidget({
        title: 'Agent Invocations',
        left: [agentFunction.metricInvocations()],
      }),
      new cdk.aws_cloudwatch.GraphWidget({
        title: 'Error Rate',
        left: [agentFunction.metricErrors()],
      })
    );

    // Outputs
    new cdk.CfnOutput(this, 'EventBusName', {
      value: eventBus.eventBusName,
      description: 'EventBridge event bus name'
    });

    new cdk.CfnOutput(this, 'ConversationsTableName', {
      value: conversationsTable.tableName,
      description: 'DynamoDB table for conversations'
    });
  }
}

Deployment

# Install dependencies
npm install

# Bootstrap CDK (first time only)
cdk bootstrap

# Deploy stack
cdk deploy

# View outputs
cdk deploy --outputs-file outputs.json

                    CDK Best Practices
                    Type Safety: TypeScript provides compile-time validation
Construct Library: Reuse patterns across multiple stacks
Environment Variables: Externalize configuration with context values
Unit Testing: Test infrastructure code with Jest
CI/CD Integration: Automated deployments with CDK Pipelines

                

Deep Dive: Security Hardening

1. Least Privilege IAM Policies

Agent Lambda functions should only have permissions for resources they actually need. This IAM policy restricts Bedrock access to specific model families, limits DynamoDB operations to required actions (no DeleteItem or Scan), and scopes EventBridge to a single event bus.

Production IAM Policy for Agent Lambda

The following policy demonstrates least privilege principles: Bedrock access is restricted to Claude Sonnet 4.5 models only, DynamoDB permissions include GSI access but exclude dangerous operations like table deletion, and EventBridge access is limited to PutEvents on the agent event bus.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "bedrock:Converse"
      ],
      "Resource": [
        "arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-sonnet-4-5-*"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "dynamodb:GetItem",
        "dynamodb:PutItem",
        "dynamodb:Query"
      ],
      "Resource": [
        "arn:aws:dynamodb:us-east-1:123456789012:table/agentocore-conversations",
        "arn:aws:dynamodb:us-east-1:123456789012:table/agentocore-conversations/index/*"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "events:PutEvents"
      ],
      "Resource": [
        "arn:aws:events:us-east-1:123456789012:event-bus/agentocore-events"
      ]
    }
  ]
}

2. Input Validation and Sanitization

Prompt injection attacks attempt to manipulate agent behavior by including system-level commands or special tokens in user input. Pydantic validators provide a defense layer by sanitizing inputs before they reach the LLM, removing dangerous patterns that could bypass agent instructions.

Input Sanitization with Pydantic Validators

This validator detects and removes common prompt injection patterns like system role markers, special tokens from different model formats, and attempts to inject assistant responses. The validation runs automatically on every user input before the ReAct loop begins.

from pydantic import BaseModel, Field, validator
import re

class UserInput(BaseModel):
    message: str = Field(max_length=10000)
    conversation_id: str = Field(regex=r'^[a-zA-Z0-9-]+$')

    @validator('message')
    def sanitize_message(cls, v):
        # Remove potential prompt injection attempts
        dangerous_patterns = [
            r'system:',
            r'<\|im_start\|>',
            r'<\|im_end\|>',
            r'Human:',
            r'Assistant:'
        ]

        for pattern in dangerous_patterns:
            v = re.sub(pattern, '', v, flags=re.IGNORECASE)

        return v.strip()

3. Secrets Management with AWS Secrets Manager

Agent tools often need API keys for external services (weather APIs, payment gateways, CRM systems). Storing credentials in environment variables or code is a security anti-pattern. AWS Secrets Manager provides encrypted storage with automatic rotation and fine-grained access control through IAM.

Retrieving Secrets with Caching

This implementation uses @lru_cache to avoid fetching the same secret on every Lambda invocation, reducing Secrets Manager API costs and improving response time. The cache persists across Lambda warm starts but refreshes on cold starts, balancing performance and security.

import boto3
import json
from functools import lru_cache

class SecretsManager:
    def __init__(self):
        self.client = boto3.client('secretsmanager')

    @lru_cache(maxsize=10)
    def get_secret(self, secret_name: str) -> dict:
        """Get secret from AWS Secrets Manager with caching."""
        response = self.client.get_secret_value(SecretId=secret_name)
        return json.loads(response['SecretString'])

# Usage in agent
secrets = SecretsManager()
api_key = secrets.get_secret('agentocore/api-keys')['weather_api_key']

4. Rate Limiting with Lambda Concurrency

// CDK configuration
agentFunction.addAlias('production', {
  provisionedConcurrentExecutions: 10,
  reservedConcurrentExecutions: 100
});

Deep Dive: Cost Optimization

Token Counting and Budget Tracking

Bedrock pricing is based on tokens (input and output), making token counting essential for cost control. Long conversations with many ReAct iterations can accumulate thousands of tokens, potentially leading to unexpected costs. Implementing token budgets prevents runaway spending while maintaining agent functionality.

Token Counter with Cost Estimation

This implementation uses the tiktoken library to count tokens in conversation messages before sending to Bedrock. The cost estimation uses Claude Sonnet 4.5 pricing ($3 per 1M input tokens, $15 per 1M output tokens) to provide real-time cost tracking per conversation.

import tiktoken
from typing import List, Dict

class TokenCounter:
    def __init__(self, model_name: str = "cl100k_base"):
        self.encoding = tiktoken.get_encoding(model_name)

    def count_message_tokens(self, messages: List[Dict]) -> int:
        """Count tokens in conversation messages."""
        total = 0
        for message in messages:
            # Add tokens for role
            total += 4  # Role overhead

            # Add tokens for content
            if isinstance(message["content"], str):
                total += len(self.encoding.encode(message["content"]))
            elif isinstance(message["content"], list):
                for block in message["content"]:
                    if "text" in block:
                        total += len(self.encoding.encode(block["text"]))

        return total

    def estimate_cost(self, input_tokens: int, output_tokens: int) -> float:
        """Estimate cost based on Claude Sonnet 4.5 pricing."""
        # Pricing as of April 2026
        INPUT_COST_PER_1K = 0.003  # $3 per 1M tokens
        OUTPUT_COST_PER_1K = 0.015  # $15 per 1M tokens

        input_cost = (input_tokens / 1000) * INPUT_COST_PER_1K
        output_cost = (output_tokens / 1000) * OUTPUT_COST_PER_1K

        return input_cost + output_cost

class BudgetManager:
    def __init__(self, dynamodb_table):
        self.table = dynamodb_table

    def check_budget(self, conversation_id: str, max_tokens: int = 100000) -> bool:
        """Check if conversation has exceeded token budget."""
        total_tokens = self._get_total_tokens(conversation_id)
        return total_tokens < max_tokens

    def _get_total_tokens(self, conversation_id: str) -> int:
        response = self.table.query(
            KeyConditionExpression="PK = :pk",
            ExpressionAttributeValues={":pk": f"CONV#{conversation_id}"},
            ProjectionExpression="token_count"
        )
        return sum(item["token_count"] for item in response["Items"])

Prompt Caching Strategy

Claude's prompt caching feature reduces costs by 90% for cached tokens when the same content appears across multiple requests. System prompts with tool schemas are perfect caching candidates—they remain identical across all conversations but can be thousands of tokens long.

Implementing Prompt Caching for System Instructions

By marking the system prompt with cache_control: {"type": "ephemeral"}, Claude caches it for 5 minutes. Subsequent requests within this window pay only 10% of the original cost for cached tokens. For agents with consistent system prompts, this typically reduces costs by $30-50 per month.

class CachedPromptManager:
    """
    Use Claude's prompt caching to reduce costs for repeated system prompts.
    Cache hit: 90% cost reduction for cached tokens.
    """

    def build_messages_with_caching(
        self,
        system_prompt: str,
        user_messages: List[Dict]
    ) -> List[Dict]:
        """
        Structure messages to maximize cache hits.
        System prompt is marked as cacheable.
        """
        return [
            {
                "role": "system",
                "content": [
                    {
                        "type": "text",
                        "text": system_prompt,
                        "cache_control": {"type": "ephemeral"}
                    }
                ]
            }
        ] + user_messages

Monthly Cost Projection

Service	Usage	Monthly Cost
Bedrock (Claude Sonnet 4.5)	10M input tokens, 2M output tokens	$60.00
Lambda (1024MB, 10s avg)	10,000 invocations	$2.08
DynamoDB (On-Demand)	100K writes, 500K reads	$3.75
EventBridge	20,000 events	$0.00 (free tier)
CloudWatch Logs	10GB ingestion	$5.03
X-Ray	10,000 traces	$0.50
TOTAL		$71.36/month

                    Cost Optimization Strategies
                    Prompt Caching: 90% cost reduction on system prompts (saves ~$30/month)
Conversation TTL: Auto-expire after 30 days (reduces DynamoDB storage)
Lambda Reserved Concurrency: Prevent runaway costs from infinite loops
Model Selection: Use Claude Haiku for simple tasks ($0.25 per 1M tokens)
Token Budgets: Limit max tokens per conversation (prevents costly long sessions)

                

Deep Dive: Testing Autonomous Agents

Testing AI agents is fundamentally different from testing deterministic systems. Agents make probabilistic decisions, and outputs vary across runs. How do you validate correctness?

1. Unit Tests with Mock Tools

Unit tests for AI agents should validate that the ReAct loop correctly handles tool calls, processes results, and manages conversation state—without making expensive Bedrock API calls. Mock testing ensures fast, deterministic tests that run in CI/CD pipelines.

Testing ReAct Loop with Mocked Bedrock Responses

This test mocks the Bedrock Converse API to return a predefined tool call response. By controlling the LLM output, we verify that the agent correctly parses tool use blocks, extracts parameters, and handles the execution flow—all without spending tokens or waiting for inference.

import pytest
from unittest.mock import Mock, patch

def test_react_loop_with_tool_call():
    """Test that agent correctly calls tools during ReAct loop."""

    # Mock Bedrock response
    mock_bedrock_response = {
        "stopReason": "tool_use",
        "output": {
            "message": {
                "content": [
                    {
                        "toolUse": {
                            "toolUseId": "test-123",
                            "name": "get_weather",
                            "input": {"location": "San Francisco"}
                        }
                    }
                ]
            }
        }
    }

    with patch('boto3.client') as mock_client:
        mock_client.return_value.converse.return_value = mock_bedrock_response

        agent = AgentCore()
        result = agent.execute_react_loop(
            user_message="What's the weather in SF?",
            tools=[weather_tool],
            conversation_history=[]
        )

        # Verify tool was called
        assert mock_client.return_value.converse.called
        assert "get_weather" in str(mock_client.return_value.converse.call_args)

2. Integration Tests with Real Bedrock

Integration tests validate end-to-end agent behavior with actual Bedrock inference. These tests cost real money (tokens) and take longer to run, but they catch issues that mocks can't—like changes in Bedrock response formats, tool schema mismatches, or reasoning failures.

End-to-End Test with Real Bedrock API

This integration test uses a simple echo tool to verify the complete ReAct loop: conversation history loading, Bedrock inference with tool configs, tool execution, result processing, and final response generation. Mark these tests with @pytest.mark.integration to run them separately from fast unit tests.

@pytest.mark.integration
def test_agent_with_real_bedrock():
    """Integration test with actual Bedrock API."""

    agent = AgentCore()
    registry = ToolRegistry()

    # Register test tool
    @registry.register(
        name="echo",
        input_schema=EchoInput,
        output_schema=EchoOutput
    )
    def echo(input_data):
        return {"message": input_data.message}

    result = agent.execute_react_loop(
        user_message="Echo 'Hello World'",
        tools=registry.get_tool_configs(),
        conversation_history=[]
    )

    # Verify result contains expected output
    assert "Hello World" in result

3. Evaluation with LLM-as-Judge

Testing AI agents for correctness is challenging because outputs aren't deterministic. The same query might produce different but equally valid responses. LLM-as-judge evaluation uses another LLM (Claude) to assess whether agent responses are semantically correct, even if wording differs from expected output.

Automated Evaluation with Claude as Judge

This evaluator sends agent responses to Claude along with ground truth answers and asks for a numerical score (0-10) with explanation. By automating semantic evaluation, you can run regression tests on hundreds of queries to detect performance degradation after code changes or model updates.

class AgentEvaluator:
    """Use LLM to evaluate agent outputs for correctness."""

    def __init__(self):
        self.bedrock = boto3.client("bedrock-runtime")

    def evaluate_response(
        self,
        task: str,
        agent_response: str,
        ground_truth: str
    ) -> Dict[str, Any]:
        """
        Ask Claude to judge if agent response is correct.
        Returns score from 0-10 and explanation.
        """

        evaluation_prompt = f"""
        You are an expert evaluator of AI agent outputs.

        Task: {task}
        Ground Truth Answer: {ground_truth}
        Agent Response: {agent_response}

        Rate the agent's response on a scale of 0-10:
        - 10: Perfect, matches ground truth exactly
        - 7-9: Good, semantically correct with minor differences
        - 4-6: Partial, some correct information but missing key points
        - 0-3: Poor, incorrect or irrelevant

        Return JSON: {{"score": , "explanation": ""}}
        """

        response = self.bedrock.converse(
            modelId="anthropic.claude-sonnet-4-5-20250929-v1:0",
            messages=[
                {"role": "user", "content": [{"text": evaluation_prompt}]}
            ]
        )

        return json.loads(response["output"]["message"]["content"][0]["text"])

# Usage
evaluator = AgentEvaluator()
result = evaluator.evaluate_response(
    task="What's the capital of France?",
    agent_response="The capital of France is Paris.",
    ground_truth="Paris"
)
# {"score": 10, "explanation": "Correct and concise answer."}

4. Load Testing with Locust

Load testing autonomous agents is critical for understanding system behavior under concurrent user load. Unlike traditional APIs that execute deterministic operations, AI agents involve multiple external dependencies (Bedrock inference, DynamoDB reads/writes, EventBridge events), making performance characteristics unpredictable without proper load testing.

AgentoCore's load testing strategy focuses on three key metrics:

Response Time: p50, p95, p99 latency for agent invocations under various concurrency levels
Throughput: Maximum requests per second before Lambda concurrency limits are reached
Cost Under Load: Bedrock token usage and Lambda execution costs at peak traffic

We use Locust, a Python-based load testing framework, to simulate realistic user behavior with variable wait times between requests. This approach mimics actual usage patterns better than constant bombardment.

Locust Load Test Implementation

The following code defines a load test that simulates multiple concurrent users invoking the agent with different conversation IDs. Each virtual user waits 1-3 seconds between requests, creating realistic traffic patterns.

from locust import HttpUser, task, between

class AgentLoadTest(HttpUser):
    """
    Simulates concurrent users invoking the AgentoCore agent.
    Each user waits 1-3 seconds between requests to mimic real usage.
    """
    wait_time = between(1, 3)

    @task
    def invoke_agent(self):
        """
        Send agent invocation request with unique conversation ID.
        The conversation_id is based on user count to ensure isolation.
        """
        payload = {
            "agent_id": "test-agent",
            "conversation_id": f"conv-{self.environment.runner.user_count}",
            "user_message": "What's the weather in New York?"
        }

        self.client.post(
            "/invoke-agent",
            json=payload,
            headers={"Content-Type": "application/json"}
        )

# Run: locust -f load_test.py --host=https://api.example.com

Running Load Tests

Start with low concurrency and gradually increase to identify breaking points:

# Start with 10 users, spawn 2 users per second
locust -f load_test.py --host=https://api.example.com -u 10 -r 2

# Scale to 100 users to test Lambda auto-scaling
locust -f load_test.py --host=https://api.example.com -u 100 -r 5

# Run headless with automatic stopping after 5 minutes
locust -f load_test.py --host=https://api.example.com -u 50 -r 10 --headless --run-time 5m

Load Test Results Analysis

In production testing with Claude Sonnet 4.5, AgentoCore achieved:

p50 Response Time: 1,800ms (1.8 seconds) for simple agent queries
p95 Response Time: 4,200ms (4.2 seconds) for complex multi-tool ReAct loops
Throughput: 15K events per day sustained load with Lambda concurrency limit of 100
Cost at Peak: $71.36/month for 10,000 agent invocations with average 3-tool ReAct loops

Key Finding: 80% of response time is spent on Bedrock inference. Optimizing ReAct loop iterations (reducing from 10 max to 6 max) improved p95 latency by 35% while maintaining task completion rates above 95%.

                    Load Testing Best Practices
                    Use Reserved Concurrency: Set Lambda concurrency limits to prevent runaway costs during load tests
Monitor Bedrock Throttling: Watch CloudWatch metrics for ModelInvocationThrottles to identify capacity limits
Test with Real Prompts: Use actual user queries from production logs for realistic token usage
Measure Cost Per Request: Track CloudWatch TokenCount metric to estimate costs under load
Validate DLQ Behavior: Ensure failed events route to Dead Letter Queue with proper retry backoff

                

Future Enhancements

1. Multi-Agent Orchestration

Enable agent collaboration where specialized agents (researcher, planner, executor) work together on complex tasks with EventBridge-based coordination.

2. Human-in-the-Loop Approvals

Integrate AWS Step Functions for workflows requiring human approval before executing high-risk actions (e.g., deleting resources, financial transactions).

3. Streaming Responses

Use Bedrock's streaming API with WebSocket connections for real-time agent responses instead of waiting for full completion.

4. Fine-Tuned Tool Selection

Train a lightweight model to predict which tools are needed for a task, reducing ReAct iterations and token costs by 30-50%.

5. Vector Memory for Long-Term Recall

Integrate Amazon OpenSearch or pgvector for semantic search across all historical conversations, enabling agents to learn from past interactions.

6. Multi-Modal Capabilities

Add vision tools for image analysis, enabling agents to process screenshots, diagrams, and visual data alongside text.

Conclusion

The transition from chatbots to autonomous agents represents a fundamental shift in AI system design. AgentoCore provides a production-ready foundation for building enterprise-grade agents with:

ReAct Loop: Iterative reasoning and tool execution with AWS Bedrock Converse API
Conversational Memory: DynamoDB-backed persistence with TTL and single-table design
Pydantic Validation: Self-healing tool execution with automatic schema validation
Event-Driven Architecture: EventBridge-based orchestration for scalable, decoupled workflows
Enterprise Observability: X-Ray tracing, CloudWatch metrics, and Logs Insights
Infrastructure as Code: AWS CDK for repeatable, type-safe deployments
Security Hardening: Least privilege IAM, input validation, and secrets management
Cost Optimization: Token counting, prompt caching, and budget tracking

This architecture handles real-world production requirements: multi-turn conversations, complex tool chains, error recovery, observability, and cost control. The result is a system that doesn't just answer questions—it completes tasks.

Ready to build autonomous agents? Clone the repository, deploy with CDK, and start creating cognitive systems that reason, plan, and act.