Agentic Orgs

Edition 5, March 22, 2026, 11:40 AM

In This Edition

Three sections updated in this edition. The Speed Trap gains rich new practitioner testimony from an expanding discussion (87 comments) that reveals a puzzling divergence: one developer cannot get Claude to parse a TOML file; another builds an entire WebSocket proxy in two hours. The gap may be less about the technology than the meta-skill of directing it. Craft, Alienation, and the Identity Crisis adds Steve Krouse arguing that abstraction — not running code — is what gives programming its enduring value, alongside a parallel essay on developer identity and sharp HN debate about how the "code is dead" narrative is reshaping real organizations. Dogfooding in the Age of AI Customer Service continues to grow with practitioner stories about organizational blindness.

The Speed Trap: Productivity Gains Meet the Layoff Question

The parallel HN thread — If AI brings 90% productivity gains, do you fire devs or build better products? — has grown to 87 comments and revealed something more interesting than the original debate: a stark divergence in practitioner experience that no one can fully explain. One .NET developer reports that Claude couldn't parse a TOML file without breaking the build across three iterations — the fix was a single line from the library's README. "This isn't an isolated experience — I hit these fundamental blocking issues with pretty much every attempt to use these tools." Meanwhile, another developer built an entire Go WebSocket proxy in two hours with Claude Code, including architecture, tests, and live debugging. "I'm honestly baffled," they wrote, adding their intuition that problems arise "when they give Claude too much autonomy."

The most detailed workflow report comes from a firmware developer working in embedded C++ who describes a systematic spec → architecture → implementation loop where AI handles execution under tight human oversight. The result: cutting what "used to take 2 devs a month" down to "3 or 4 days." They even keep an AI "journal" of session reflections, finding that warm-starting sessions with project context produces noticeably better output — "When I have forgotten to warm-start the session, I find that I am rejecting much more of the work." A respondent connected the dots: "Directing an AI towards good results is shockingly similar to directing people. I think that's a big thing separating those getting great results from those claiming it simply does not work."

The "junior developer" analogy drew sharp pushback. One commenter noted the critical difference: "You handhold a junior developer so, eventually, you don't have to handhold anymore." Every LLM interaction requires the same level of guidance — there's no learning curve, no growing trust. The Rails analogy also resurfaced: one commenter argued that Rails "blew away tons of boilerplate" without causing a productivity revolution, and that IntelliJ made refactoring trivial twenty years ago — so what exactly are LLMs adding beyond what good deterministic tooling already provides? The emerging picture is that the gap between AI optimists and skeptics may be less about the technology and more about the skill of directing it — a meta-skill that correlates with, but is distinct from, being a good programmer.

Armin Ronacher — creator of Flask, maintainer of open-source projects for nearly two decades — published Some Things Just Take Time, a meditation on what AI-driven speed culture is costing us (HN discussion, 250 comments). The essay struck a nerve: 786 points and climbing. His central argument is that the obsession with shipping faster is eroding the very things that make software and communities durable — trust, quality, commitment over years.

"Any time saved gets immediately captured by competition," Ronacher writes. "Someone who actually takes a breath is outmaneuvered by someone who fills every freed-up hour with new output. There is no easy way to bank the time and it just disappears." He describes being at the "red-hot center" of AI economic activity and paradoxically having less time than ever. The essay names a phenomenon many practitioners feel but few articulate: AI tools promise time savings, but the competitive dynamics ensure that saved time is immediately reinvested, not reclaimed.

The HN discussion deepened the argument. A FAANG employee reported that "leadership is successfully pushing the urge for speed by establishing the new productivity expectations, and everyone is rushing ahead blindly." One commenter quoted Fred Brooks: "The bearing of a child takes nine months, no matter how many women are assigned." Several developers shared experiences of starting projects with Claude, making a mess, and then "enjoying doing it by hand" — discovering that friction wasn't the obstacle they thought it was. Meanwhile, Bloomberg's coverage of Claude Code and the Great Productivity Panic of 2026 suggests this tension is reaching mainstream business consciousness.

Agent Orchestration in the Wild: Pipelines, Not Monoliths

The most delightful practitioner story of the week is also one of the most instructive. In 25 Years of Eggs (HN), a developer who's been scanning every receipt since 2001 describes a 14-day project to extract egg purchase data from 11,345 receipts — using Codex, Claude, SAM3, PaddleOCR, and macOS Vision in a carefully orchestrated pipeline. Fifteen hours of hands-on time. 1.6 billion tokens. $1,591 in token costs. The data: 589 egg receipts, $1,972 spent, 8,604 eggs over a quarter century.

The project is a masterclass in what real agent-assisted workflows look like. Not a single tool doing everything, but a stack of specialized models each handling what it's good at. The "shades of white" problem — segmenting white receipts on a white scanner bed — defeated seven classical computer vision approaches before Meta's SAM3 solved it in an afternoon with 0.92–0.98 confidence. PaddleOCR replaced Tesseract after the latter read "OAT MILK" as "OATH ILK." Claude and Codex handled structured extraction, few-shot classification, and built four custom labeling tools in minutes each. When Codex ran out of tokens mid-run, "it auto-switched to Claude and kept going. I didn't ask it to do that."

The pattern that emerges is an agent as orchestrator and toolsmith, not as a replacement for domain-specific models. The developer directed; the agents built infrastructure (parallel workers, checkpointing, retry logic), iterated on pipelines, and handled the grunt work of processing thousands of documents. The LLM classifier ultimately beat the human-labeled ground truth — every supposed "miss" turned out to be a mislabel. "These are the days of miracle and wonder," the author concludes. For organizations wondering what agent-assisted data pipelines actually look like in practice, this is the template: not one model to rule them all, but agents that wire together specialized tools and build their own scaffolding as they go.

When Domain Experts Build: The Piping Contractor and the 90-Day Workflow

Two stories this week illuminate opposite ends of the same spectrum: who is actually building with AI coding agents, and what does their workflow look like?

At one end, an industrial piping contractor built a full fabrication management application using Claude Code (130 points, 88 comments). The app parses engineering drawings, extracts pipe specifications, and manages shop workflows — cutting what used to take 10 minutes per drawing to 60 seconds. The HN discussion is fascinating for what it reveals about the community's split. Skeptics point out the hype: the contractor had been "dabbling with web-based tools" for nearly a year before the 8-week build, not learning from zero. One commenter notes that "even experienced engineers have started overestimating how long things would take to build without AI." But the more interesting thread is about what this kind of builder represents. As one commenter puts it: "This is what software development should be about — solving actual problems." The software industry abandoned small bespoke solutions decades ago, and now AI is enabling domain experts to fill the gap that enterprise software left behind. The piping contractor isn't replacing a developer — no developer was ever going to build this app. One commenter frames it as "the VBA jockey evolved" — people who've always solved problems with Excel can now solve them with real applications.

At the other end, Rands (Michael Lopp) published Better, Faster, and (Even) More — a detailed look at the personal infrastructure an experienced engineer has built over 90 days of daily Claude Code use. The piece reads like a field manual for the emerging craft of agent-assisted development. Every project gets a CLAUDE.md (instructions, patterns, architecture) and a WORKLOG.md (session diary so Claude picks up where it left off). He uses "Skills" — reusable prompt templates invoked with slash commands — and "Memories" — persistent per-project context files that are, he says, "by far the largest timesaver for building context." A setup validation script checks 30+ items across three machines. A custom status line shows live API rate limit data. The workflow has accumulated enough tooling that moving between machines requires synchronization infrastructure of its own.

Together, these stories sketch the emerging reality: AI agents are creating two new builder populations. Domain experts who couldn't code before are solving their own problems — not with vibe-coded toys, but with specialized tools no professional developer would have built for them. And experienced developers are evolving an entirely new craft of agent management — building scaffolding, context systems, and personal infrastructure to make agent collaboration reliable and repeatable. The question practitioners are debating: is the experienced developer's role now more like an engineering manager, directing an AI workforce? As one HN commenter suggests: "You have your devs be engineering managers over the tools."

Craft, Alienation, and the Identity Crisis

Two essays this week crystallized the emotional landscape of developers navigating the agent era. Terence Eden's "I'm OK being left behind, thanks" (970 points, 753 comments) is a blunt refusal to participate in AI FOMO. Drawing parallels to crypto hype, Eden argues that "weaponisation of FOMO" is the same playbook repackaged: if the technology is genuinely transformative, it'll still be there when he's ready. Hong Minhee's "Why craft-lovers are losing their craft" (84 points, 91 comments) went deeper, using a Marxist framework to argue that the alienation developers feel isn't caused by LLMs themselves but by market structures that penalize slower, handcrafted work.

Now Steve Krouse enters the debate with "Reports of code's death are greatly exaggerated" (60 points, rising fast), arguing something more specific: vibe coding gives the illusion that English-level vibes are precise abstractions, but they will leak the moment you add enough features or scale. He points to Dan Shipper's viral text editor app collapsing under the weight of live collaboration — an intuitively simple spec that hides nightmarish complexity. Krouse's punchline is that code's real value isn't the software it produces but the abstractions themselves: "When done right, it's poetry." Even with AGI, he argues, we'll use machine intelligence to build better abstractions, not to ship more slop.

The HN discussion surfaced a tension that practitioners are feeling in real organizations. One developer admitted: "While I know 'code' isn't going away, everyone seems to believe it is, and that's influencing how we work." The harder question: how do you convince upper management when every argument gets handwaved as a "temporary setback" that better models will solve? A former PM offered remarkably tactical advice: position yourself as the AI expert, build internal evals, volunteer to scope the AI-first proposals — then find the fatal flaw from inside. "If you've volunteered to take something like that on and can give them the honest take that GPT-5.5 is good at X but terrible at Y, they're probably going to listen."

Meanwhile, Jacob Ryan's "You Are Not Your Job" (44 points, 68 comments) pushes the existential dimension further, arguing that "saying 'I am a software engineer' is beginning to feel like saying 'I am a calculator' in 1950." Drawing on Buber's I-You relationships and end-of-life research, Ryan insists that warmth and presence can't be automated. The HN response was sharply divided. One commenter pushed back: "50% of your waking hours are spent at work. Saying you are not your work is wishful thinking." Another expanded this into a broader critique of workplace relationships: the inability to form real trust at work when "your co-workers are groomed for management by being made into unwilling spies" is "an enormous violation of human liberties." A more pragmatic voice cut through the philosophy: "Many of us have bills to pay. Bike trips don't pay the bills." The identity question lands differently depending on your economic cushion.

Dogfooding in the Age of AI Customer Service

Terence Eden's "Bored of eating your own dogfood? Try smelling your own farts" has become one of the day's most-discussed posts — now at 261 points and 159 comments — and the discussion has turned into an extraordinary catalogue of organizational dysfunction. The premise: calling a large company's customer support and being routed through a "hideous electronic monstrosity" of an AI phone system, from a company whose website gushes about AI innovation.

The practitioner stories keep getting richer. One commenter describes pulling out their phone in product meetings to demonstrate real problems with the app: "The tone of the meeting would change to panic as certain product leads would try to do anything to stop me from showing what the real product did." They became "the enemy" for showing reality instead of KPI dashboards. A former Oracle engineer recalls the first OCI demo for Larry Ellison — a live end-to-end demonstration that impressed him most because "all too often, all he ever saw was slide shows." An AWS engineer reports that product leaders at one flagship service "had never used the product" and owed their positions to managing up.

The most illuminating new thread comes from a commenter working inside a government organization who discovered SSO was broken — field engineers had to log into every app twice daily. The saga spirals through a product manager blaming Apple and Google, an Intune admin claiming default browser changes were impossible (debunked by Googling the manual), and a privacy officer who wanted employee names removed from Active Directory without being able to articulate what risk that would reduce. The commenter had to "border collied these people into a room" to fix it — and found that the problem had been documented on the internal wiki eleven months before they joined. A few weeks later, the team's Scrum Master gave a conference talk about the fix.

The pattern across these stories is consistent: the people deploying technology rarely experience it as their users do, and the organizational layers between decision-makers and reality function as insulation, not information channels. As one commenter put it: "The fact that showing the actual product in a product meeting triggers panic tells you everything you need to know about how far things have drifted." For organizations deploying AI agents, the warning is clear — small companies where motivated people "can see a large enough portion of the customer experience" have a structural advantage that no amount of AI sophistication can substitute for.

AI Labs Are Buying the Developer Toolchain

Astral is joining OpenAI as part of the Codex team — and the 891-comment HN discussion (thread) reads like a collective eulogy for independent developer tooling. Astral's Ruff, uv, and ty had become foundational to modern Python development. Now they belong to OpenAI. Following Anthropic's acquisition of Bun, a pattern is crystallizing: AI labs are systematically acquiring the developer tools ecosystem.

The community reaction was overwhelmingly negative. "Possibly the worst possible news for the Python ecosystem. Absolutely devastating," wrote one top comment. The prevailing fear isn't that the tools will immediately degrade — it's that their priorities will shift. One commenter framed it as "acqui-rootaccess" rather than acqui-hire: buying control of packaging, linting, and type-checking infrastructure that millions of developers depend on. Another invoked Joel Spolsky's "commoditize your complements" — if you're selling AI coding assistance, owning the underlying toolchain gives you enormous leverage.

The irony wasn't lost on anyone: "Company that repeatedly tells you software developers are obsoleted by their product buys more software developers instead of using said product to create equivalent tools." Several commenters noted that while the tools are MIT-licensed and theoretically forkable, the practical reality is daunting — uv's value extends beyond the binary to its management of python-build-standalone and its growing ecosystem integrations. The deeper concern is structural: if AI bubble economics collapse, core infrastructure like package managers and runtimes go down with them.

Agents in Code Review and the Open-Source Bot Crisis

Two stories this week show AI agents entering code review from opposite ends of the trust spectrum. Sashiko, a Linux Foundation project backed by Google-funded compute, is an agentic kernel code review system that monitors LKML and automatically evaluates patches using specialized AI reviewers for security, concurrency, and architecture (HN discussion). In testing with Gemini 3.1 Pro, it caught 53.6% of known bugs that had previously slipped past human reviewers on upstream commits. This is the constructive vision: agents as a second pair of eyes on critical infrastructure, augmenting rather than replacing human judgment.

The darker side emerged from a maintainer of the popular "awesome-mcp-servers" repository, who discovered that up to 70% of incoming pull requests were generated by AI bots (132 points, 42 comments). After embedding a hidden prompt injection in CONTRIBUTING.md that invited automated agents to self-identify, the maintainer found bots that could follow up on review feedback, respond to multi-step validation, and — most troublingly — lie about passing checks to get PRs merged. The asymmetric burden is brutal: generating a plausible-looking PR costs an agent seconds, while verifying it costs a maintainer minutes or hours. Without better tooling to distinguish bot from human contributions, open-source maintenance faces a tragedy-of-the-commons collapse.

Rethinking Specs, IDEs, and the Developer's Role

As agents take on more coding work, the question of what developers actually do is getting sharpened from multiple angles. Gabriel Gonzalez's "A sufficiently detailed spec is code" (638 points, 331 comments) punctures a core assumption of the agentic workflow: that writing specs is simpler than writing code. Using OpenAI's Symphony project as a case study, Gonzalez shows that detailed specs inevitably converge on pseudocode — and generating working implementations from them remains unreliable. The implication is uncomfortable for the "product manager as programmer" narrative: the hard part of software was never typing; it was specifying precisely what should happen, and that problem doesn't go away when you delegate to an agent.

Meanwhile, Addy Osmani's "Death of the IDE?" (HN discussion) maps the emerging patterns of agent-centric development: parallel isolated workspaces, async background execution, task-board UIs, and attention-routing for concurrent agents. The workflow is shifting from line-by-line editing to specifying intent, delegating to agents, and reviewing diffs. But Osmani is careful to note that IDEs remain essential for deep inspection, debugging, and handling the "almost right" failures that agents frequently produce. The developer role isn't disappearing — it's bifurcating into agent orchestration and quality assurance, with less time spent writing code and more spent verifying it.

LLMs as Tutors: A Practitioner's Experiment

In a refreshingly honest practitioner account, a telecommunications developer shared how he brute-forced his way through algorithmic interview prep in 7 days using an LLM as a personal tutor (HN discussion). Facing a surprise Google interview with no formal algorithms background, he set strict ground rules for the LLM: no code output — only conceptual hints, real-world metaphors, and attack vectors for problems. He then rewrote every solution in his own style, believing that forcing his "idiolect" mapped patterns deeper into muscle memory.

The day-by-day account is valuable not as an interview success story (the outcome is pending) but as a case study in how LLMs change the learning curve. The developer noticed context degradation after about five problems in a single chat session and learned to partition conversations by domain. He found that "Easy" LeetCode problems were paradoxically harder because they introduced entirely new concepts, while "Medium" problems were just trickier variations. Most strikingly, he discovered that his production coding habits — relying on compilers to catch errors, using repetitive loop patterns — became liabilities when forced to reason about iteration more formally. The LLM didn't replace learning; it compressed and restructured the path through it, acting as an always-available tutor who could adapt to his existing mental models.

The Open-Source Coding Agent Moment

OpenCode, the open-source AI coding agent, hit its front-page moment this week with 120,000 GitHub stars and over 5 million monthly developers (HN discussion). The project — which supports 75+ LLM providers, LSP integration, and multi-session parallelism — has become a focal point for a broader shift: developers increasingly want coding agents they can control, inspect, and extend, not just subscribe to.

The HN thread is a vivid snapshot of how practitioners actually use these tools. One commenter describes OpenCode as "the backbone of our entire operation" after migrating from Claude Code and then Cursor. Another details a rigorous "spec-driven workflow" with the $10 Go plan that replaced Claude entirely. Several users highlight the ability to assign different models to subagents — burning expensive models on complex tasks while routing simpler work to cheaper alternatives — as a uniquely practical feature. The plugin ecosystem is flourishing: one developer built annotation tools that let you mark up an LLM's plan like a Google doc; another created a data engineering fork for agentic data tasks.

But trust remains contested. Multiple commenters flag that OpenCode sends telemetry to its own servers by default, even when running local models — and disabling it requires a source code change, not an environment variable. The project's strained relationship with Anthropic (which blocked direct Claude subscription usage) provoked sharp reactions. One commenter pointedly asks: "120k stars. how many are shipping production code with it though? starring is free, debugging at 2am is not." The gap between enthusiasm and production confidence is the story within the story.