OpinionDecember 31, 2025

2025 Retrospective: AI vs Systems vs Humans

As 2025 draws to a close, I find myself reflecting on a year that has profoundly validated something I have been arguing to my AI colleagues for years (often unsuccessfully, I might add). The lesson is simple: building reliable AI systems is fundamentally a systems problem.

The Rise of Context Engineering

This year, tools like Claude Code, OpenAI Codex, and various agentic frameworks demonstrated something remarkable. The difference between a demo that wows and a system that works reliably in production is not just about having a better model. It is about context engineering, state management, failure handling, caching strategies, and resource orchestration.

We have watched the field scramble through a dizzying evolution: from prompt engineering to context engineering to agent design to multi-agent swarms and back again. The frameworks multiply — LangChain, LlamaIndex, AutoGen, CrewAI, Claude Agent SDK — each promising to be the abstraction we need. But beneath the churn lies a fundamental truth: these are distributed systems problems wearing new clothes.

When an agent needs to maintain state across sessions, that is database design. When multiple agents coordinate on a task, that is distributed consensus. When a system must gracefully handle a model hallucinating mid-task, that is fault tolerance. When you are optimizing token usage across parallel requests, that is resource scheduling.

Why Systems Knowledge Is Now Essential

There is a popular narrative that AI will replace software engineers. I will not presume to speak to software engineering writ large, but on the systems side, I would argue the opposite: systems knowledge has never been more essential.

Here is why. When AI generates code, someone needs to understand whether that code will scale. When an agent proposes an architecture, someone needs to recognize if it violates separation of concerns. When a multi-agent system starts producing inconsistent outputs, someone needs to debug the coordination failures. When the LLM generates plausible-looking but subtly broken implementations, someone needs to catch them.

The alternative is what many are calling "AI slop": systems that work in demos but fail in production, code that passes superficial review but harbors subtle bugs, architectures that seem reasonable but collapse under real-world conditions. The antidote is not to use less AI; it is to bring more systems thinking to how we use it.

Modularity. Layering. Separation of concerns. Fault tolerance. Observability. These are not relics of a pre-AI era. They are the foundations that determine whether a complex AI system will actually work.

The Failure Modes Are Still Wild

Some of the failure modes we are seeing are genuinely absurd. I have watched agentic systems confidently execute completely wrong plans. I have seen context windows overflow at the worst possible moments. I have observed multi-agent systems enter infinite loops of polite disagreement. You need only open Twitter or Reddit on any given day to find a fresh collection of war stories.

But this is exactly why the future is bright. Every one of these failures is a research opportunity. Every production incident is a lesson in what reliable AI systems actually require. We are at the beginning of understanding how to build robust systems with probabilistic components, and that is genuinely exciting for anyone who cares about systems research.

Intent as the Ultimate Abstraction

Perhaps the most profound shift I have observed is in what constitutes an abstraction layer. Traditionally, we built abstractions in code: APIs, frameworks, protocols. I spent my entire PhD discussing abstractions left, right, and center, and we systems researchers have developed a refined taste for elegance in design. Now, that entire sensibility is being upended. The abstraction layer is increasingly becoming natural language intent.

Someone can describe what they want a system to do, and an AI agent will discover protocols, implement authentication, parse configurations, and orchestrate components, all without the human understanding (or needing to understand) the underlying complexity.

This is simultaneously liberating and terrifying. Liberating because it democratizes what was previously accessible only to those with deep technical expertise. Terrifying because intent is ambiguous, and the gap between what you meant and what you said becomes the gap between a working system and a catastrophe.

When the abstraction layer was code, at least the misunderstandings were deterministic. When the abstraction layer is intent, misunderstandings become probabilistic, and debugging probabilistic failures requires a whole new set of skills.

A Revolution, Not an Evolution

Put bluntly, we are witnessing a revolution in software engineering. Not a gradual improvement, not a new tool in the toolbox, but a fundamental restructuring of how software gets built. The old ways are being blown apart faster than most people realize.

This has uncomfortable implications for research. Some problems that seemed important two years ago are now irrelevant. Entire research directions are being flattened by capabilities that did not exist when the projects started. I watch some of my AI colleagues pursue work that I can only describe as pragmatically sterile: technically interesting, methodologically sound, but aimed at problems that simply will not matter. The models are moving too fast. The benchmarks are obsolete before the papers are published. The careful ablation studies are measuring differences that vanish with the next model release. Sometimes I wonder whether doing research on LLMs themselves even makes sense anymore as an academic researcher. The AI labs are just so much faster, better resourced, and, most importantly, closer to the frontier. By the time a paper clears peer review, the landscape has shifted entirely.

On the systems side, the calculus has shifted too. Some research problems that once required years of careful engineering can now be prototyped in an afternoon. Others have become impossible to pursue without AI assistance — not because you cannot do the work, but because those who embrace these tools will simply outpace you. The researchers who thrive will be those who recognize which problems are being commoditized and which are becoming more important. These models can easily push one or two steps beyond their current frontier, and if your contributions are too incremental — if your novelty is too close to the distribution — I have bad news for you: you are about to be steamrolled.

Looking Forward

If 2025 taught us anything, it is that the systems challenges around AI are just beginning. We need better frameworks for managing agent state. We need more principled approaches to multi-agent coordination. We need observability tools that can trace reasoning, not just execution. We need failure recovery mechanisms for probabilistic components.

These are not AI problems that happen to involve systems. They are systems problems that happen to involve AI. And that distinction matters enormously for how we approach solving them.

To my fellow systems researchers: our field has never been more relevant. The skills we have developed over decades — reasoning about concurrency, designing for failure, building observable systems, managing state at scale — are exactly what the AI era demands.

To the software engineers wondering whether they will be replaced: invest deeply in systems design. Understand the fundamentals of architecture. Learn about databases, distributed systems, and the core problems that have shaped this field. These skills will determine whether you are directing AI toward good outcomes or cleaning up after AI-generated messes. When everyone can 10X their output with AI, you want to be the one who 100Xes, and that will not come from your expertise in softmax and policy gradient methods.

The models will keep getting better. The real question is whether we will build the systems to use them reliably and efficiently. That is the work ahead.

Here is to 2026, and to building systems that actually work.

Comments