Level 4: Advanced~35 min

Agent Pipelines

Build systems that reason, act, and learn. From simple tool use to complex multi-agent orchestration.

What is an Agent?

An agent is an LLM that can take actions in the world. Instead of just generating text, it can call functions, search the web, run code, or interact with APIs.

The core components of an agent:

LLM (Brain)

The reasoning engine. Decides what to do next based on the current state and goal.

Tools (Actions)

Functions the agent can call: search, calculate, code execution, API calls, database queries.

Memory (Context)

Conversation history, retrieved documents, and scratchpad for intermediate results.

Planning (Strategy)

How the agent breaks down complex tasks into steps and decides which tools to use.

// Agent = LLM + Tools + Memory + Planning

User: "What's the weather in Tokyo and should I bring an umbrella?"

Agent thinks: "I need to get weather data, then analyze it"

Agent calls: get_weather("Tokyo")

Agent observes: {temp: 18, conditions: "rain", precipitation: 80%}

Agent responds: "It's 18C with 80% chance of rain. Yes, bring an umbrella."

The ReAct Pattern

ReAct (Reason + Act) is the foundational agent pattern. The agent alternates between reasoning about what to do and taking action.

Thought

The LLM reasons about the current state. "I need to find X to answer the question."

Action

The LLM decides which tool to call and with what arguments. "search_web(query='X')"

Observation

The tool returns results. These become part of the context for the next thought.

Repeat or Finish

Continue the loop until the task is complete, then return the final answer.

Tool Calling with Function Definitions

Modern LLMs support structured tool calling. You define functions with JSON schemas, and the model outputs structured calls you can execute.

Simple ReAct Agent

# Simple ReAct agent
from openai import OpenAI

client = OpenAI()
tools = [
    {"type": "function", "function": {
        "name": "search_web",
        "description": "Search the web for information",
        "parameters": {"type": "object", "properties": {
            "query": {"type": "string"}
        }}
    }},
    {"type": "function", "function": {
        "name": "calculate",
        "description": "Perform mathematical calculations",
        "parameters": {"type": "object", "properties": {
            "expression": {"type": "string"}
        }}
    }}
]

def run_agent(task: str):
    messages = [{"role": "user", "content": task}]
    while True:
        response = client.chat.completions.create(
            model="gpt-4",
            messages=messages,
            tools=tools
        )
        if response.choices[0].finish_reason == "stop":
            return response.choices[0].message.content
        # Execute tool and continue...

Complete Agent Loop

import json

def execute_tool(name: str, args: dict) -> str:
    """Execute a tool and return the result"""
    if name == "search_web":
        # In production, call actual search API
        return f"Search results for '{args['query']}': ..."
    elif name == "calculate":
        try:
            result = eval(args['expression'])  # Use safer eval in production
            return str(result)
        except:
            return "Error in calculation"
    return "Unknown tool"

def run_agent(task: str, max_iterations: int = 10):
    messages = [{"role": "user", "content": task}]

    for _ in range(max_iterations):
        response = client.chat.completions.create(
            model="gpt-4",
            messages=messages,
            tools=tools
        )

        message = response.choices[0].message

        # Check if agent is done
        if response.choices[0].finish_reason == "stop":
            return message.content

        # Execute tool calls
        if message.tool_calls:
            messages.append(message)
            for tool_call in message.tool_calls:
                result = execute_tool(
                    tool_call.function.name,
                    json.loads(tool_call.function.arguments)
                )
                messages.append({
                    "role": "tool",
                    "tool_call_id": tool_call.id,
                    "content": result
                })

    return "Max iterations reached"

Multi-Agent Orchestration

Complex tasks benefit from multiple specialized agents working together. Each agent has different tools and expertise.

Hierarchical

A manager agent delegates to worker agents. Manager synthesizes results.

Manager -> [Researcher, Writer, Reviewer]

Peer-to-Peer

Agents communicate directly, passing context and results to each other.

Agent A <-> Agent B <-> Agent C

Multi-Agent Pattern

class Agent:
    def __init__(self, name: str, system_prompt: str, tools: list):
        self.name = name
        self.system_prompt = system_prompt
        self.tools = tools

    def run(self, task: str) -> str:
        # Agent execution logic
        pass

# Specialized agents
researcher = Agent(
    name="Researcher",
    system_prompt="You research topics thoroughly. Use search tools.",
    tools=[search_web, search_papers]
)

writer = Agent(
    name="Writer",
    system_prompt="You write clear, engaging content.",
    tools=[write_document, edit_text]
)

# Orchestrator pattern
def orchestrate(task: str):
    # Research phase
    research = researcher.run(f"Research: {task}")

    # Writing phase with research context
    draft = writer.run(f"Write about {task}. Context: {research}")

    return draft

Evaluating Agents

Agent evaluation is harder than LLM evaluation. You need to measure both task completion and efficiency.

Key Metrics

-Task success rate: Did it complete the task?
-Steps to completion: How many actions needed?
-Tool efficiency: Were the right tools used?
-Cost: Total tokens and API calls.

Standard Benchmarks

-SWE-bench: Real GitHub issues
-METR: Multi-step reasoning tasks
-WebArena: Browser automation
-GAIA: General AI assistants

SWE-bench

Real software engineering tasks from GitHub

METR Benchmarks

Evaluating AI capabilities and safety

Common Pitfalls

Infinite Loops

Agent keeps calling tools without making progress. Solution: Set max iterations, detect repeated actions, add escape conditions.

Tool Misuse

Agent calls wrong tools or with wrong arguments. Solution: Better tool descriptions, examples in prompts, argument validation.

Context Overflow

Long agent runs exceed context limits. Solution: Summarize history, use RAG for memory, prune irrelevant observations.

Runaway Costs

Complex tasks with many iterations get expensive. Solution: Budget limits, cheaper models for simple steps, caching.

Key Takeaways

1
Agent = LLM + Tools + Memory + Planning - The LLM reasons, tools act, memory persists, planning coordinates.
2
ReAct is the foundation - Thought, Action, Observation loop. Most agent frameworks build on this.
3
Function calling is structured - JSON schemas for tools, structured outputs for reliable execution.
4
Measure task completion AND efficiency - Use benchmarks like SWE-bench and METR for rigorous evaluation.

Next: Video Understanding Previous: Multi-modal RAG