Build for Agents

Newsletter curating the most important milestones and methodologies for building the next generation of agentic and AI-native software systems.

Keep up to date.

Only key milestones. Reputable sources. No junk.

Reach out to me at www.narvidas.com

Build for Agents — Now Managed by an Agent

This newsletter is now managed by an autonomous agent.

Build for Agents started as a human-curated publication tracking milestones in agent-native software. The thesis was simple: as AI agents become more capable, the software we build needs to accommodate them — not just human users. Signal over noise. Monthly cadence at most.

The problem was execution. Scanning sources, evaluating relevance, drafting entries, committing to Git, keeping the site updated — these tasks accumulated. The newsletter lost momentum. Not from lack of interesting developments, but from the friction of manual curation.

So I built the system I'd been writing about.

The Stack

The editorial workflow now runs on OpenClaw, an open-source framework for autonomous agents that has seen remarkable adoption since its origins as Clawdbot — the community growth speaks for itself. The agent runs on a VPS connected via Tailscale, with Claude Opus 4.5 as the reasoning backend.

The workflow is straightforward:

  1. Discovery — The agent monitors sources, bookmarks, and RSS feeds for potential milestones.
  2. Triage — Candidates land in a Kanban-style board. The agent evaluates relevance against the newsletter's focus: genuine milestones in agent-native software, not hype.
  3. Drafting — High-confidence candidates get drafted in the house style. The agent has learned the format from existing entries.
  4. Review — I review drafts, suggest adjustments, or approve publication.
  5. Publishing — Approved entries are committed to the repository and deployed automatically.

I stay in the loop — but at the right altitude. Supervision, editorial judgment, occasionally surfacing an article the agent missed. Not copying links into markdown files.

Why This Matters

This is dogfooding. The newsletter covers tools and standards for building software that agents can use. Now the newsletter itself is software that an agent uses.

The agent reads the codebase, understands the content format, makes Git commits, and maintains its own memory across sessions. It uses the same primitives I've been documenting: workspace files, structured tools, persistent state.

If something breaks, I'll write about that too.

Key Takeaways

  • Build for Agents is now curated by an autonomous agent using OpenClaw and Claude Opus 4.5.
  • My role shifts from manual curation to supervision and editorial judgment.
  • The stack: VPS, Tailscale, OpenClaw, Git-based publishing, Kanban-style candidate triage.
  • This is deliberate dogfooding — using the patterns I document to run the publication itself.
  • Open-source tooling means anyone can replicate this setup for their own workflows.

Agentic Commerce Protocol: Agents Can Now Buy Things

https://openai.com/index/buy-it-in-chatgpt/

OpenAI introduced Instant Checkout in ChatGPT, powered by the Agentic Commerce Protocol — an open standard for AI commerce co-developed with Stripe.

The user experience is straightforward. Ask a shopping question, ChatGPT shows relevant products. If the product supports Instant Checkout, you can buy without leaving the conversation. ChatGPT acts as your agent, passing information between you and the merchant while keeping each party in control.

For merchants, integration is minimal. Stripe merchants can enable it with a single configuration change. No backend modifications required. The merchant remains the merchant of record with full control over pricing, inventory, and fulfilment.

The open protocol defines how AI agents and businesses complete purchases: product discovery, order creation, payment authorisation via encrypted tokens, and fulfilment handoff. Each step requires explicit user confirmation.

This opens a new surface for agent-native commerce. Merchants who build for this protocol make their products purchasable by AI agents — essential infrastructure as agents handle more autonomous purchasing tasks.

Key Takeaways

  • Open protocol co-developed by OpenAI and Stripe for agent-mediated purchases.
  • Users can complete purchases inside ChatGPT with explicit confirmation at each step.
  • Minimal merchant integration — Stripe merchants enable with one config change.
  • Encrypted payment tokens authorised only for specific amounts and merchants.
  • Defines the standard for how any AI agent can transact with any supporting merchant.

Agentic AI Foundation: Open Standards Go Official

https://block.xyz/inside/block-anthropic-and-openai-launch-the-agentic-ai-foundation

Block, Anthropic, and OpenAI launched the Agentic AI Foundation under the Linux Foundation. The goal: ensure agentic AI develops as an open, collaborative ecosystem rather than a collection of proprietary silos.

Three founding projects anchor the initiative. The Model Context Protocol (MCP), originally from Anthropic, provides uniform agent-to-data integration. AGENTS.md, from OpenAI, standardises how coding agents navigate repositories. Goose, Block's open source agent framework, joins as community-governed infrastructure.

The membership list signals industry-wide commitment. AWS, Bloomberg, Cloudflare, Google, and Microsoft joined as platinum members. Gold members include Cisco, Datadog, Docker, IBM, JetBrains, Okta, Oracle, Salesforce, SAP, Shopify, and Snowflake.

Block's announcement put it simply: "The Internet, Linux, and the Web succeeded precisely because they were open." The AAIF bets that agentic infrastructure will follow the same path.

Key Takeaways

  • Block, Anthropic, and OpenAI co-founded under Linux Foundation governance.
  • MCP, AGENTS.md, and Goose become community-governed open standards.
  • Major cloud and enterprise vendors joined as platinum and gold members.
  • Neutral governance ensures no single company dominates agent infrastructure.
  • Core protocols will remain interoperable and open for any framework to adopt.

Agent Skills: Equipping Agents for the Real World

https://claude.com/blog/equipping-agents-for-the-real-world-with-agent-skills

Anthropic released Agent Skills as an open standard. Skills are organised folders of instructions, scripts, and resources that agents can discover and load dynamically. Think of it as writing an onboarding guide for a new hire — except the hire is an AI agent.

The format is deliberately simple. A SKILL.md file describes what the skill does and how to use it. Additional files provide reference docs, scripts, or assets the agent might need. The structure mirrors how humans package procedural knowledge.

Progressive disclosure keeps context efficient. At startup, agents see only skill metadata. When a skill seems relevant, the agent loads its full instructions. Linked files load only as needed. This design prevents context bloat while preserving access to deep knowledge.

Skills shift agent customisation from prompt engineering to knowledge packaging. Anyone can specialise general-purpose agents by capturing expertise in a portable, composable format. The standard is open for any agent framework to adopt.

Key Takeaways

  • Open standard for packaging agent capabilities as discoverable, loadable folders.
  • Progressive disclosure: metadata → instructions → supporting files, loaded on demand.
  • Skills can bundle scripts for deterministic operations alongside natural-language guidance.
  • Unbounded depth — filesystem access means arbitrarily deep knowledge structures.
  • Production-tested: powers Claude's document editing abilities.

Claude Opus 4.5: Extended Thinking at Scale

https://www.anthropic.com/news/claude-opus-4-5

Anthropic released Claude Opus 4.5. The architecture is incremental, but in practice the model has become the backbone of many novel agent applications.

The main addition is an 'effort' parameter. Developers can now control how much reasoning budget the model expends on a given task. This lets you balance cost against complexity without switching models.

Alignment improvements matter here too. The model shows better prompt injection resistance, reduced sycophancy, and fewer deceptive behaviours. These become critical as agents take on higher-stakes work.

Key Takeaways

  • New 'effort' parameter lets developers dial reasoning budget up or down per request.
  • Meaningfully better performance on complex, multi-step tasks requiring sustained focus.
  • Improved prompt injection resistance, important for agents in adversarial environments.
  • Reduced sycophancy and deception make outputs more reliable for autonomous decisions.
  • Better long-context coherence across extended interactions.

Gemini 3: Google's Unified Multimodal Model

https://deepmind.google/technologies/gemini/

Google DeepMind released Gemini 3, completing the late-2025 wave of major lab announcements. The model handles text, images, audio, and video natively in a single architecture.

Developer documentation has been less extensive than OpenAI's or Anthropic's releases. The strength is ecosystem integration. Teams already using Vertex AI, Cloud Functions, or BigQuery will find a smoother path to production.

For the agent-native space, Gemini 3's significance is market completeness. All three major labs now have models capable of powering sophisticated autonomous agents.

Key Takeaways

  • Google's flagship release, alongside GPT-5 and Claude updates in the same period.
  • Native multimodal architecture. Text, images, audio, and video without separate model calls.
  • Deep integration with Google Cloud infrastructure for production deployments.
  • Competitive benchmark performance with GPT-5 and Claude Opus 4.5.
  • Less developer-focused documentation. Strength lies in Google ecosystem integration.

Claude Agent SDK: Production-Grade Agent Infrastructure

https://www.anthropic.com/news/claude-sonnet-4-5

Anthropic released the Claude Agent SDK alongside Claude Sonnet 4.5. This is the infrastructure that powers Claude Code, now available for everyone.

The SDK formalises a loop: gather context, take action, verify work, repeat. It provides concrete building blocks. Agentic search lets Claude navigate file systems with bash rather than requiring pre-chunked embeddings. Subagents handle parallelisation and isolation. Automatic compaction manages context limits. MCP integration is built in.

This is the first time a major lab has open-sourced their internal agent infrastructure at this level. It works for non-coding agents too. Finance, customer support, research, anywhere you need reliable autonomous execution.

Key Takeaways

  • The exact infrastructure behind Claude Code, tested on complex multi-hour autonomous tasks.
  • Formalises the agent loop with concrete primitives: agentic search, subagents, compaction, MCP.
  • Agentic search uses bash (grep, tail) rather than pre-indexed embeddings.
  • Subagents provide isolated context windows for parallel execution.
  • Claude Sonnet 4.5 achieved 77.2% on SWE-bench Verified.

Cloudflare Code Mode: A Better Way to Use MCP

https://blog.cloudflare.com/code-mode/

Cloudflare discovered that LLMs are significantly better at writing code to call MCP tools than at calling MCP tools directly. This insight led to 'Code Mode' in the Cloudflare Agents SDK.

The reasoning is straightforward. LLMs have trained on millions of real-world TypeScript projects. Tool-calling, by contrast, relies on comparatively small synthetic datasets with contrived examples. When you convert MCP tools to typed TypeScript APIs, you let the model operate in its natural habitat.

Cloudflare's analogy: "Making an LLM perform tasks with tool calling is like putting Shakespeare through a month-long class in Mandarin and then asking him to write a play in it."

The practical benefits compound. Generated code can loop, branch, and compose multiple calls without round-tripping through the model. Only final results need to feed back to the LLM. Complex tool sets become manageable as familiar typed APIs.

Key Takeaways

  • LLMs perform better generating TypeScript than using structured tool calls.
  • MCP tools converted to typed APIs let models use familiar patterns from training.
  • Generated code can chain calls locally — loops, branches, composition — without model round-trips.
  • MCP remains valuable for uniform discovery and auth; the execution layer shifts to code.
  • Available in the Cloudflare Agents SDK for production use.

GPT-5: Automatic Routing and Agentic Capabilities

https://openai.com/index/introducing-gpt-5/

OpenAI released GPT-5. The system automatically routes between a fast model and a deeper reasoning model based on task complexity. You no longer choose which model to use. The system decides.

The benchmark results stand out. GPT-5 scored 96.7% on τ2-bench telecom, a tool-calling benchmark where no prior model exceeded 49%. For developers building agents, this is a large improvement in reliable multi-step execution.

New API controls let you tune cost and quality tradeoffs. The reasoning_effort parameter (low/medium/high) controls how much compute goes into thinking. verbosity settings adjust output detail. Custom tools support context-free grammar.

Key Takeaways

  • First major model with automatic routing between fast and reasoning modes.
  • 96.7% on τ2-bench telecom. Previous best was 49%.
  • New API parameters (reasoning_effort, verbosity) for cost and quality control.
  • Around 80% fewer factual errors than o3 on LongFact/FactScore when reasoning.
  • 'Safe completions' approach: partial answers for edge cases rather than binary refuse/comply.

https://www.youtube.com/watch?v=LCEmiRjPEtQ

Andrej Karpathy: Software Is Changing (Again)

Drawing on his work at Stanford, OpenAI, and Tesla, Andrej sees a shift underway. Software is changing, again. We've entered the era of 'Software 3.0', where natural language becomes the new programming interface and models do the rest.

He explores what this shift means for developers, users, and the design of software itself - that we're not just using new tools, but building a new kind of computer.

Key Takeaways

  • Three eras of software: Karpathy frames modern development as 'Software 1.0' (hand-written code), 'Software 2.0' (neural-network weights trained on data), and now 'Software 3.0' (LLMs as programmable virtual machines where English prompts become code).
  • Flipped technology diffusion: Unlike past breakthroughs (e.g., electricity, early computing) that first served enterprises, LLMs launched straight into consumers' hands—instantly available to everyone via the cloud—reshaping how and where innovation happens.
  • He also likens LLMs to stochastic simulators of human psychology, with encyclopaedic memory and superhuman abilities yet riddled with 'jagged' intelligence (hallucinations, factual errors) and limited working-memory, requiring careful prompts and guardrails.
  • Partial-autonomy apps & autonomy slider: The future lies in domain-specific LLM apps (e.g., Cursor, Perplexity) that orchestrate multiple models, provide specialised GUIs for fast verification, and let users dial in how much autonomy the AI wields.
  • Importance to 'build for agents', and how this will help build better, more compatible systems in the light of autonomous systems powered by generative AI.

A practical guide to agents (PDF)

https://cdn.openai.com/[...]/a-practical-guide-to-building-agents.pdf

A great high-level PDF document guide from OpenAI, outlining how to build agents. Knowing how agents work is key to the understanding of how to build systems that get used by them.

This guide is designed for product and engineering teams exploring how to build their first agents, distilling insights from numerous customer deployments into practical and actionable best practices. It includes frameworks for identifying promising use cases, clear patterns for designing agent logic and orchestration, and best practices to ensure your agents run safely, predictably, and effectively.

Key Takeaways

  • What makes an 'agent': An agent is an LLM-powered system that autonomously manages and executes multi-step workflows on a user's behalf by dynamically choosing and invoking external tools, all within clearly defined guardrails.
  • When to build one: Agents excel in scenarios where traditional deterministic automation struggles. E.g. complex decision-making with nuanced judgement, unstructured data interpretation, or maintenance-heavy rulesets.
  • Core design pillars: Every agent hinges on three foundations: (1) selecting and benchmarking the right model(s) for your tasks, (2) defining modular, well-documented 'tools' for data retrieval and actions, and (3) crafting clear, unambiguous instructions (including edge-case handling).
  • Orchestration patterns: Start simple with a single-agent loop that calls tools until completion; scale into multi-agent architectures (manager or decentralised handoffs) only when workflow complexity or tool-overlap demands it—always guarded by safety checks.

Agents Companion Whitepaper (PDF)

https://www.kaggle.com/whitepaper-agent-companion

A whitepaper from Google, designed for developers and serves as a '102' guide to more advanced topics. It offers in-depth explorations of agent evaluation methodologies and practical applications of Google agent products for enhancing agent capabilities in solving complex, real-world problems.

Key Takeaways

  • What an agent is: A Generative AI agent is an autonomous application that observes its environment and acts via available tools to achieve specified goals without ongoing human intervention.
  • The orchestration layer: At the core of every agent is a cognitive architecture (e.g., ReAct, Chain-of-Thought, Tree-of-Thoughts) that structures its reasoning, planning, and decision-making processes.
  • Key tool types: Agents rely on (1) Extensions to bridge APIs, (2) Functions for developer-defined actions, and (3) Data Stores (often vector databases) to access and query external information in real time.
  • Future direction—agent chaining: Combining specialised 'expert' agents into chains or mixtures allows tackling increasingly complex workflows by delegating subtasks to the most suitable agent.

Introducing the Model Context Protocol

https://www.anthropic.com/news/model-context-protocol

Anthropic introduces The Model Context Protocol - an open standard that enables developers to build secure, two-way connections between their data sources and AI-powered tools. The architecture is straightforward: developers can either expose their data through MCP servers or build AI applications (MCP clients) that connect to these servers.

Key Takeaways

  • Universal protocol for AI–data integration: The Model Context Protocol (MCP) is an open standard that lets developers securely connect AI assistants to any data source—content repositories, business tools, or development environments—so models can access up-to-date information when generating responses.
  • Solves fragmentation and scalability: By replacing bespoke, one-off integrations with a single, consistent protocol, MCP eliminates data silos and dramatically reduces the effort required to onboard new data sources, enabling AI systems to maintain context seamlessly across tools.
  • Three core components: MCP comprises the official specification and SDKs, built-in server support in the Claude Desktop apps, and an open-source repository of reference MCP servers—complete with pre-built connectors for platforms like Google Drive, Slack, GitHub, Git, Postgres, and Puppeteer.
  • Growing ecosystem and adoption: Early adopters such as Block and Apollo have already integrated MCP, and developer platforms including Zed, Replit, Codeium, and Sourcegraph are adding support—building toward a collaborative, open-source community for context-aware AI agents.

The /llms.txt file

https://llmstxt.org/

A proposal by Jeremy Howard to standardise on using an /llms.txt file to provide information to help LLMs use a website at inference time.

Key Takeaways

  • Standardising LLM-friendly site metadata: Introduces a /llms.txt file at a site's root (see ours) to give language models concise, expert-level context and links, analogous to robots.txt or sitemap.xml but designed for inference time access.
  • Solves context-window limitations: Addresses the difficulty of converting full HTML (with navigation, ads, JS) into text by providing pre-curated markdown pointers and summaries that fit within LLM context windows.
  • Simple, structured markdown spec: Requires an H1 title, a blockquoted summary, free-form detail sections, and optional H2-delimited file lists linking to .md versions of key pages—allowing both human and programmatic parsing.
  • Ecosystem and tooling: Already supported by a Python/CLI module, JavaScript implementation, VitePress and Docusaurus plugins, plus directories (e.g., llmstxt.site) and community resources (GitHub, Discord) to drive adoption.

Attention Is All You Need

https://arxiv.org/abs/1706.03762

'Attention Is All You Need' is a 2017 landmark research paper in machine learning authored by eight scientists working at Google. The paper introduced a new deep learning architecture known as the transformer, based on the attention mechanism proposed in 2014 by Bahdanau et al.

Key Takeaways

  • Self-Attention as Core: Replaces recurrent or convolutional sequence models with a purely attention-based mechanism, enabling each token to directly attend to all others in the same sequence.
  • Multi-Head Attention: Runs several attention 'heads' in parallel—each learning different representation subspaces—then concatenates their outputs, allowing the model to capture diverse contextual relationships.
  • Positional Encoding: Since attention has no built-in order awareness, sinusoidal positional encodings are added to token embeddings to inject sequence order information without recurrence.
  • Highly Parallel & Efficient: The fully attention-based encoder–decoder can be trained much faster than RNN-based models by leveraging parallel computation over sequence length, achieving state-of-the-art translation quality with lower training cost.

Bret Victor's 'DBX' Talk

https://worrydream.com/dbx/

A talk from Bret Victor where he explores a lot of insightful meta-ideas about programming. One interesting point he raises is We're not going to have APIs in the future. What we are going to have are programmes that know how to figure out how to talk to each other, and that's going to require programming in goals..

This resonates a lot with the idea of building for agents in light of the Large Language Models in the modern day. What's truly interesting is that this talk is from year 2013 - way before the recent advancements in AI and invention of LLMs!

Key Takeaways

  • From code to direct manipulation: Victor argues that programming should move beyond writing static text to enable real-time, interactive manipulation of data—taking inspiration from systems like Sketchpad to let developers 'grab' and tweak live objects directly.
  • Procedures vs. goals & constraints: Instead of spelling out step-by-step procedures, we should specify high-level goals and constraints, allowing underlying systems (à la Planner or Prolog) to figure out the 'how', thus freeing us from low-level implementation details.
  • Text dumps → spatial representations: Pure text (and code) often hides structure and meaning; Victor showcases the power of spatial, visual representations (e.g., Engelbart's NLS, Smalltalk) to make complex systems more intuitive and explorable.
  • Embrace concurrency over sequentiality: The talk calls on us to break free from the von Neumann bottleneck—adopting concurrent, parallel models (actor systems, systolic arrays, etc.) so our tools and metaphors match the reality of modern hardware.