Build for Agents

Drawing on his work at Stanford, OpenAI, and Tesla, Andrej sees a shift underway. Software is changing, again. We've entered the era of 'Software 3.0', where natural language becomes the new programming interface and models do the rest.

He explores what this shift means for developers, users, and the design of software itself - that we're not just using new tools, but building a new kind of computer.

Key Takeaways

Three eras of software: Karpathy frames modern development as 'Software 1.0' (hand-written code), 'Software 2.0' (neural-network weights trained on data), and now 'Software 3.0' (LLMs as programmable virtual machines where English prompts become code).
Flipped technology diffusion: Unlike past breakthroughs (e.g., electricity, early computing) that first served enterprises, LLMs launched straight into consumers' hands—instantly available to everyone via the cloud—reshaping how and where innovation happens.
He also likens LLMs to stochastic simulators of human psychology, with encyclopaedic memory and superhuman abilities yet riddled with 'jagged' intelligence (hallucinations, factual errors) and limited working-memory, requiring careful prompts and guardrails.
Partial-autonomy apps & autonomy slider: The future lies in domain-specific LLM apps (e.g., Cursor, Perplexity) that orchestrate multiple models, provide specialised GUIs for fast verification, and let users dial in how much autonomy the AI wields.
Importance to 'build for agents', and how this will help build better, more compatible systems in the light of autonomous systems powered by generative AI.

Apr 21, 2025

A practical guide to agents (PDF)

https://cdn.openai.com/[...]/a-practical-guide-to-building-agents.pdf

A great high-level PDF document guide from OpenAI, outlining how to build agents. Knowing how agents work is key to the understanding of how to build systems that get used by them.

This guide is designed for product and engineering teams exploring how to build their first agents, distilling insights from numerous customer deployments into practical and actionable best practices. It includes frameworks for identifying promising use cases, clear patterns for designing agent logic and orchestration, and best practices to ensure your agents run safely, predictably, and effectively.

Key Takeaways

What makes an 'agent': An agent is an LLM-powered system that autonomously manages and executes multi-step workflows on a user's behalf by dynamically choosing and invoking external tools, all within clearly defined guardrails.
When to build one: Agents excel in scenarios where traditional deterministic automation struggles. E.g. complex decision-making with nuanced judgement, unstructured data interpretation, or maintenance-heavy rulesets.
Core design pillars: Every agent hinges on three foundations: (1) selecting and benchmarking the right model(s) for your tasks, (2) defining modular, well-documented 'tools' for data retrieval and actions, and (3) crafting clear, unambiguous instructions (including edge-case handling).
Orchestration patterns: Start simple with a single-agent loop that calls tools until completion; scale into multi-agent architectures (manager or decentralised handoffs) only when workflow complexity or tool-overlap demands it—always guarded by safety checks.

Feb 1, 2025

Agents Companion Whitepaper (PDF)

https://www.kaggle.com/whitepaper-agent-companion

A whitepaper from Google, designed for developers and serves as a '102' guide to more advanced topics. It offers in-depth explorations of agent evaluation methodologies and practical applications of Google agent products for enhancing agent capabilities in solving complex, real-world problems.

Key Takeaways

What an agent is: A Generative AI agent is an autonomous application that observes its environment and acts via available tools to achieve specified goals without ongoing human intervention.
The orchestration layer: At the core of every agent is a cognitive architecture (e.g., ReAct, Chain-of-Thought, Tree-of-Thoughts) that structures its reasoning, planning, and decision-making processes.
Key tool types: Agents rely on (1) Extensions to bridge APIs, (2) Functions for developer-defined actions, and (3) Data Stores (often vector databases) to access and query external information in real time.
Future direction—agent chaining: Combining specialised 'expert' agents into chains or mixtures allows tackling increasingly complex workflows by delegating subtasks to the most suitable agent.

Nov 24, 2024

Introducing the Model Context Protocol

https://www.anthropic.com/news/model-context-protocol

Anthropic introduces The Model Context Protocol - an open standard that enables developers to build secure, two-way connections between their data sources and AI-powered tools. The architecture is straightforward: developers can either expose their data through MCP servers or build AI applications (MCP clients) that connect to these servers.

Key Takeaways

Universal protocol for AI–data integration: The Model Context Protocol (MCP) is an open standard that lets developers securely connect AI assistants to any data source—content repositories, business tools, or development environments—so models can access up-to-date information when generating responses.
Solves fragmentation and scalability: By replacing bespoke, one-off integrations with a single, consistent protocol, MCP eliminates data silos and dramatically reduces the effort required to onboard new data sources, enabling AI systems to maintain context seamlessly across tools.
Three core components: MCP comprises the official specification and SDKs, built-in server support in the Claude Desktop apps, and an open-source repository of reference MCP servers—complete with pre-built connectors for platforms like Google Drive, Slack, GitHub, Git, Postgres, and Puppeteer.
Growing ecosystem and adoption: Early adopters such as Block and Apollo have already integrated MCP, and developer platforms including Zed, Replit, Codeium, and Sourcegraph are adding support—building toward a collaborative, open-source community for context-aware AI agents.

Sep 3, 2024

The /llms.txt file

https://llmstxt.org/

A proposal by Jeremy Howard to standardise on using an /llms.txt file to provide information to help LLMs use a website at inference time.

Key Takeaways

Standardising LLM-friendly site metadata: Introduces a /llms.txt file at a site's root (see ours) to give language models concise, expert-level context and links, analogous to robots.txt or sitemap.xml but designed for inference time access.
Solves context-window limitations: Addresses the difficulty of converting full HTML (with navigation, ads, JS) into text by providing pre-curated markdown pointers and summaries that fit within LLM context windows.
Simple, structured markdown spec: Requires an H1 title, a blockquoted summary, free-form detail sections, and optional H2-delimited file lists linking to .md versions of key pages—allowing both human and programmatic parsing.
Ecosystem and tooling: Already supported by a Python/CLI module, JavaScript implementation, VitePress and Docusaurus plugins, plus directories (e.g., llmstxt.site) and community resources (GitHub, Discord) to drive adoption.

Jun 12, 2017

Attention Is All You Need

https://arxiv.org/abs/1706.03762

'Attention Is All You Need' is a 2017 landmark research paper in machine learning authored by eight scientists working at Google. The paper introduced a new deep learning architecture known as the transformer, based on the attention mechanism proposed in 2014 by Bahdanau et al.

Key Takeaways

Self-Attention as Core: Replaces recurrent or convolutional sequence models with a purely attention-based mechanism, enabling each token to directly attend to all others in the same sequence.
Multi-Head Attention: Runs several attention 'heads' in parallel—each learning different representation subspaces—then concatenates their outputs, allowing the model to capture diverse contextual relationships.
Positional Encoding: Since attention has no built-in order awareness, sinusoidal positional encodings are added to token embeddings to inject sequence order information without recurrence.
Highly Parallel & Efficient: The fully attention-based encoder–decoder can be trained much faster than RNN-based models by leveraging parallel computation over sequence length, achieving state-of-the-art translation quality with lower training cost.

Jul 30, 2013

Bret Victor's 'DBX' Talk

https://worrydream.com/dbx/

A talk from Bret Victor where he explores a lot of insightful meta-ideas about programming. One interesting point he raises is We're not going to have APIs in the future. What we are going to have are programmes that know how to figure out how to talk to each other, and that's going to require programming in goals..

This resonates a lot with the idea of building for agents in light of the Large Language Models in the modern day. What's truly interesting is that this talk is from year 2013 - way before the recent advancements in AI and invention of LLMs!

Key Takeaways

From code to direct manipulation: Victor argues that programming should move beyond writing static text to enable real-time, interactive manipulation of data—taking inspiration from systems like Sketchpad to let developers 'grab' and tweak live objects directly.
Procedures vs. goals & constraints: Instead of spelling out step-by-step procedures, we should specify high-level goals and constraints, allowing underlying systems (à la Planner or Prolog) to figure out the 'how', thus freeing us from low-level implementation details.
Text dumps → spatial representations: Pure text (and code) often hides structure and meaning; Victor showcases the power of spatial, visual representations (e.g., Engelbart's NLS, Smalltalk) to make complex systems more intuitive and explorable.
Embrace concurrency over sequentiality: The talk calls on us to break free from the von Neumann bottleneck—adopting concurrent, parallel models (actor systems, systolic arrays, etc.) so our tools and metaphors match the reality of modern hardware.

Andrej Karpathy: Software Is Changing (Again)

Key Takeaways

A practical guide to agents (PDF)

Key Takeaways

Agents Companion Whitepaper (PDF)

Key Takeaways

Introducing the Model Context Protocol

Key Takeaways

The /llms.txt file

Key Takeaways

Attention Is All You Need

Key Takeaways

Bret Victor's 'DBX' Talk

Key Takeaways