Building Bulletproof Multi-Agent Systems Using Pydantic AI v2 and PyTorch 2.12

Building applications with Large Language Models often feels like trying to build a skyscraper on top of a fault line. The foundational models are incredibly powerful, but their outputs are inherently probabilistic. If you have spent any amount of time working with generative AI in a production environment, you know the pain of staring at a traceback because an LLM inexplicably decided to return a markdown block instead of a raw JSON string, or worse, hallucinated a data structure entirely.

For the past year, developers have relied heavily on Pydantic to tame this chaos. By enforcing rigid schemas and leveraging its high-performance Rust core, Pydantic became the unsung hero of the AI boom. However, as applications evolved from simple single-prompt text generators into complex multi-agent swarms, the ecosystem fragmented. Developers were forced to glue together heavy orchestration frameworks with custom validation logic, resulting in bloated dependencies and fragile codebases.

With the release of Pydantic AI v2, that era is coming to an end. This massive release introduces native, highly ergonomic agent orchestration directly into the Pydantic ecosystem. Furthermore, it brings unprecedented seamless integration with PyTorch 2.12, elevating multi-dimensional tensors to first-class citizens within your validation schemas. Today, we are going to dive deep into these new capabilities and build a robust, self-healing multi-agent pipeline from scratch.

Unpacking the Core Innovations of Pydantic AI v2

Before we write any code, it is crucial to understand why this update changes the architectural landscape of AI applications. Pydantic AI v2 is not just a wrapper around existing tools. It is a fundamental rethinking of how structured data, deep learning models, and LLM agents interact.

The framework introduces native dependency injection for agents to manage database connections and API clients without relying on messy global states.
The underlying Rust core has been optimized to handle complex validation retry loops, allowing agents to self-correct their outputs in milliseconds.
Streaming responses are fully supported out of the box with partial schema validation, meaning your application can start rendering UI components before the LLM has even finished generating the JSON payload.
Zero-overhead integration with PyTorch 2.12 allows developers to validate tensor shapes, data types, and device placements automatically before tensors hit the GPU.

Tip Upgrading from earlier alpha versions of Pydantic AI requires updating your base Pydantic dependency to at least version 2.7. Make sure to check your environment constraints before running the installation commands.

First Class Tensor Validation with PyTorch 2.12

One of the most notoriously painful aspects of integrating local deep learning models with LLM orchestration workflows is dealing with tensor shape mismatches. In a multi-agent system, an LLM might generate a series of parameters that are then used to slice, reshape, or query an embedding model. If the LLM makes a mistake, passing a malformed tensor to a compiled PyTorch model will result in a catastrophic runtime crash.

Pydantic AI v2 solves this by introducing specialized annotations for PyTorch tensors. Let us look at how you can guarantee the shape and type of a tensor using standard Pydantic syntax.

code

from typing import Annotated
import torch
from pydantic import BaseModel
from pydantic_ai.ext.pytorch import TensorShape, TensorDType, TensorDevice

class EmbeddingResponse(BaseModel):
    document_id: str
    text_content: str
    # Validate that the tensor is exactly 2D with 768 features on the CPU
    embeddings: Annotated[
        torch.Tensor, 
        TensorShape(-1, 768), 
        TensorDType(torch.float32),
        TensorDevice('cpu')
    ]

# Simulated output from an embedding model
raw_data = {
    "document_id": "doc_9942",
    "text_content": "Financial results for Q3 showed a 12% increase in revenue.",
    "embeddings": torch.randn(5, 768, dtype=torch.float32)
}

# Validation happens instantly via the Rust backend
validated_response = EmbeddingResponse(**raw_data)
print(f"Successfully validated tensor of shape {validated_response.embeddings.shape}")

This integration is particularly powerful because it is fully compatible with the torch.compile() optimizations introduced in PyTorch 2.0 and refined in 2.12. Because Pydantic handles the validation step entirely in Rust and bypasses the Python Global Interpreter Lock (GIL) where possible, validating large batches of tensors adds virtually zero latency to your inference pipeline. If an agent attempts to pass a 512-dimensional vector to a model expecting 768 dimensions, Pydantic throws a precise validation error before the GPU even sees the data.

Building a Multi-Agent System Architecture

To truly understand the power of Pydantic AI v2, we need to build something substantial. We are going to construct a Financial Intelligence Swarm. This system will consist of two distinct agents working in tandem.

The first agent is the Data Retrieval Specialist. Its job is to ingest a user query, determine which financial metrics are required, and fetch raw structured data. The second agent is the Deep Learning Evaluator. It takes the structured data, runs a local PyTorch sentiment analysis model over the text components, and synthesizes a final, rigidly typed executive summary.

Defining the Domain Models

Every reliable AI application starts with deterministic data structures. We will define our database connection dependencies and the exact schemas we expect our agents to produce.

code

from pydantic import BaseModel, Field
from typing import List

class SystemDependencies(BaseModel):
    database_url: str
    model_device: str
    api_key: str

class FinancialMetric(BaseModel):
    metric_name: str = Field(description="The official name of the financial metric")
    value: float = Field(description="The numerical value of the metric")
    trend: str = Field(description="Up, Down, or Flat")

class ExecutiveSummary(BaseModel):
    company_name: str
    overall_sentiment_score: float = Field(ge=-1.0, le=1.0)
    key_metrics: List[FinancialMetric]
    strategic_advice: str

By heavily annotating our models with descriptions and constraints (such as the sentiment score being bounded between -1.0 and 1.0), we provide Pydantic AI with the exact metadata it needs to prompt the LLM effectively. Pydantic AI automatically translates these models into highly optimized JSON schemas injected directly into the LLM context window.

Constructing the Retrieval Agent

Now we utilize the new Agent class. Pydantic AI v2 allows us to bind tools directly to agents using decorators while maintaining strict type safety for dependencies via the RunContext.

code

from pydantic_ai import Agent, RunContext

# Initialize the agent with a specific model and typed dependencies
retrieval_agent = Agent(
    'openai:gpt-4o',
    deps_type=SystemDependencies,
    system_prompt=(
        "You are an expert financial data extractor. "
        "Use your tools to query the database and gather relevant metrics."
    )
)

@retrieval_agent.tool
def query_financial_db(ctx: RunContext[SystemDependencies], company: str) -> str:
    # In a real application, we would use ctx.deps.database_url to query a DB
    print(f"[Tool Execution] Querying database on {ctx.deps.database_url} for {company}")
    return f"Raw data for {company}: Q3 Revenue $50M, Q3 Profit $5M, YoY Growth 15%."

Note The RunContext provides a thread-safe way to pass configuration and connections down to your tools. This is a massive improvement over older frameworks that required you to initialize tools with hardcoded state, making testing and parallel execution incredibly difficult.

Integrating PyTorch into the Evaluator Agent

Our second agent will consume the raw data, execute a local PyTorch model to calculate a sentiment score, and return our final ExecutiveSummary model.

code

import torch
import torch.nn as nn

# A mock PyTorch model for demonstration purposes
class SentimentModel(nn.Module):
    def forward(self, input_tensor: torch.Tensor) -> torch.Tensor:
        # Simulates returning a sentiment score between -1 and 1
        return torch.clamp(torch.sum(input_tensor) * 0.01, -1.0, 1.0)

local_sentiment_model = SentimentModel()
local_sentiment_model.eval()

# Define the Evaluator Agent
evaluator_agent = Agent(
    'openai:gpt-4o',
    deps_type=SystemDependencies,
    result_type=ExecutiveSummary,
    system_prompt=(
        "You are a principal financial analyst. Synthesize the provided raw data. "
        "Use the sentiment_analysis tool to evaluate the tone of the raw data. "
        "Return a fully structured ExecutiveSummary."
    )
)

@evaluator_agent.tool
def analyze_sentiment(ctx: RunContext[SystemDependencies], text: str) -> float:
    # Convert text length to a dummy tensor for the sake of the PyTorch example
    tensor_input = torch.tensor([float(len(text))], dtype=torch.float32)
    
    # Move to the device specified in our typed dependencies
    device = torch.device(ctx.deps.model_device)
    tensor_input = tensor_input.to(device)
    local_sentiment_model.to(device)
    
    with torch.no_grad():
        score = local_sentiment_model(tensor_input).item()
        
    print(f"[Tool Execution] PyTorch model calculated sentiment: {score}")
    return score

Notice how seamlessly we integrate PyTorch operations inside the agent's tool. Because our dependency injection ensures the correct model device is passed down, we can safely run this agent across different hardware environments (CUDA, MPS, or CPU) without altering the core logic.

Orchestrating the Swarm

With both agents defined, we simply need a lightweight orchestration function to tie them together. Because Pydantic AI agents return robust RunResult objects, we can easily chain their inputs and outputs.

code

async def run_financial_swarm(user_query: str, deps: SystemDependencies) -> ExecutiveSummary:
    print("--- Starting Retrieval Phase ---")
    retrieval_result = await retrieval_agent.run(user_query, deps=deps)
    raw_financial_data = retrieval_result.data
    
    print("\n--- Starting Evaluation Phase ---")
    # Pass the output of the first agent as the prompt for the second agent
    evaluator_prompt = f"Analyze this data based on the user query '{user_query}': {raw_financial_data}"
    evaluation_result = await evaluator_agent.run(evaluator_prompt, deps=deps)
    
    return evaluation_result.data

# Executing the swarm
import asyncio

deps = SystemDependencies(
    database_url="postgres://user:pass@localhost:5432/finance",
    model_device="cpu",
    api_key="sk-dummy-key"
)

if __name__ == "__main__":
    query = "How is Acme Corp performing this quarter?"
    final_report = asyncio.run(run_financial_swarm(query, deps))
    
    print("\n--- Final Validated Output ---")
    print(final_report.model_dump_json(indent=2))

Self Healing and the Retry Mechanism

One of the most remarkable features running under the hood in the code above is Pydantic AI's automatic self-healing capability. When evaluator_agent.run() is called, the framework instructs the LLM to output JSON matching the ExecutiveSummary schema. But what happens if the LLM hallucinates and provides a sentiment score of 1.5, violating our le=1.0 constraint?

In legacy frameworks, this would throw a validation exception, crashing your application and requiring you to write custom retry blocks. Pydantic AI handles this natively. When the Rust validation engine detects the bounds violation, it intercepts the error, prevents the crash, and automatically sends a new prompt back to the LLM. This prompt essentially says, "Your previous output failed validation because the sentiment score was 1.5, but it must be less than or equal to 1.0. Please correct this."

This retry loop happens transparently up to a configurable maximum number of retries. By providing exact, machine-readable validation errors back to the LLM, the model can logically correct its own mistakes, drastically increasing the reliability of your multi-agent system in production.

Scaling AI from Prototypes to Production

The transition from a neat Jupyter Notebook prototype to a resilient, production-ready AI service is fraught with architectural challenges. By unifying the orchestration of LLMs with the rigorous data validation of Pydantic and the computational power of PyTorch 2.12, developers finally have a cohesive stack.

We are moving past the era of treating language models as mysterious black boxes that return unpredictable strings. By defining strict interfaces, utilizing typed dependency injection, and leveraging native framework integrations, we can treat LLM agents as reliable software components within a deterministic engineering pipeline.

As you build out your next generation of AI applications, consider leaning heavily into these structured paradigms. The time saved on debugging malformed outputs and chasing down runtime tensor errors will allow your team to focus on what actually matters—shipping intelligent, highly capable features to your users.