Alex Massaad | Blog of various wonders

The team member helping you just clocked out

2026-04-20T14:00:00+00:00

An AI tool going down mid-workday is like a fire alarm going off all afternoon. You can’t focus, you can’t ignore it, and there’s no actual fire. Just a building full of people standing in the parking lot wondering if they should go get coffee.

March 2, 2026: Claude.ai down for about fourteen hours. March 11: an OAuth bug locks Claude Code users out while the API itself is happily running. April 15: Opus 4.6 throwing errors across the chatbot, the CLI, and the API. Three big ones in six weeks, plus the smaller ones I’ve stopped counting.

I’m not panicking. We don’t run any LLMs in production at Victoria Garland. But I am annoyed, because the “team member” who was helping me build this morning just clocked out without telling anyone, and they took my context with them.

What 99.1% actually buys you

Claude’s API has been running at about 99.1% uptime over the last 90 days. That sounds great. It’s the kind of number you’d put on a slide.

It is, until you do the math. 99.1% over a quarter is roughly sixteen hours of degraded service. Sixteen hours a quarter that your most-loaded teammate, the one with all your context, all your half-formed plans, all your in-flight refactors, is unreachable.

Now scale that. 95% of developers report using AI coding tools at least weekly. 75% lean on them for more than half their actual coding. The average experienced dev runs 2.3 AI tools concurrently. We have, collectively and very quickly, made an entire profession’s productivity dependent on a handful of providers running in a handful of data centers.

We’re starting to find out what that costs.

Build-time vs. production AI

Here’s the distinction I keep coming back to, because it changes the whole conversation.

If you put an LLM in your production path (your checkout flow, your support chat, your search ranking), an outage is catastrophic. Customers see it. Revenue stops. The pager goes off and someone has to explain it to a stakeholder who didn’t know there was an LLM in there to begin with.

If you put an LLM in your build process (generating code, drafting copy, researching, writing tests), an outage is annoying. The fire alarm. You can stand in the parking lot for a few hours, you’ll probably survive.

We’re squarely in the second camp. AI is part of how we build, but the things we ship to clients don’t call out to a model in real time. Build, then share. That’s a for now rule, not a forever rule. For now, the math works. Build-time outages are a productivity tax, not a customer-facing incident.

The catch is that “productivity tax” is doing a lot of heavy lifting in that sentence.

The thing that actually hurts is in-flight context

When Claude blinks out at hour three of a session, the loss isn’t the API call that failed. The loss is the conversation behind it. The plan you talked through. The five files you’d already pulled in. The mental model you co-built with the tool, half of it sitting in the chat history and half of it sitting in your head.

You can switch to another model. I do. The problem is that the other model wasn’t there. It doesn’t know what you decided fifteen turns ago. It doesn’t know which approach you ruled out. You either re-explain everything (slow, lossy, irritating), or you start over and pretend the last two hours didn’t happen.

The pain scales with how multi-step the workflow is. Single-shot autocomplete? Trivial to swap. Copilot, Cursor, whatever, doesn’t matter. A long agentic session orchestrating multiple steps with ambiguous boundaries (is this a chat? is it a model generating something in the cloud? am I waiting on the agent or on a render?) is a different problem. Mid-stream is the worst place to lose your tool.

This is where I want software to catch up. I’d love a local transcript layer where the conversation isn’t owned by any single provider, so when one model goes dark, a different model can pick up the thread and continue. Maybe replay the last few turns to re-prime, then keep going. I’m guessing this exists in some form already. I haven’t had time to build it. If it doesn’t exist yet, somebody please ship it.

We can’t go to space with Claude (yet)

Here’s the bigger structural point, because it’s not really about Claude.

As a tool, the current generation of AI is excellent. I’d be a worse engineer without it. I’m not turning it off. But as a redundant, reliable system, the kind you’d build a real piece of infrastructure on, we’re seeing the flaws in real time. Three significant outages in six weeks isn’t a fluke. It’s the natural consequence of one company hitting #1 on the App Store while simultaneously serving a meaningful chunk of the global developer workforce. Consumer demand crushes the same infra that paid devs depend on. Welcome to the trade-off.

We can’t go to space with Claude yet. We can’t run the air traffic control system on Claude. We probably shouldn’t run our checkout on Claude either, unless we’ve thought very hard about the fallback path.

The fix isn’t “stop using Claude.” The fix is treating any single-provider AI dependency the way you’d treat any single point of failure. Have a backup model ready. Keep your prompts portable. Don’t let the multi-step workflow get so tangled with one provider’s quirks that switching costs you a day. Pay attention to where the LLM lives in your system. Build-time you can ride out, customer-facing you cannot.

And when the next outage comes, and it’s coming, don’t take it personally. Your teammate just clocked out. They’ll be back. Go take the dog for a walk.

Shameless plug: At Victoria Garland we build serious Shopify infrastructure for merchants who want their store to keep running whether or not the AI hype cycle is having a good day.

Shopify Kills the $1M Exemption: The Platform Grew Up and We’re Surprised?

2026-04-06T14:00:00+00:00

If you build Shopify apps, you probably got The Email.

The one that said your annual $1M revenue exemption is now a lifetime $1M exemption. As in: once you’ve made a million total, you pay 15% on everything, forever.

The developer community lost its mind. I read the threads, the hot takes, the LinkedIn outrage. And I kept thinking the same thing: this is a small ask.

The old deal was absurd (in a good way)

Let’s be clear about what we had. Every year, Shopify app developers kept the first $1M in revenue completely free. No rev share. Zero. Then it reset in January and you got another million, free again.

If you were making $2M a year, you were paying 15% on only half your revenue. Every single year. That’s an incredibly generous deal. I don’t think people fully appreciated how unusual it was.

Apple takes 30%. Google takes 30%. Even their small business programs cap at $1M annually and still charge 15% on everything under it. Shopify was handing developers the first million and walking away. Every January, like clockwork.

According to Glen Coates, Shopify’s VP of Product, only “a few hundred” developers out of tens of thousands were actually benefiting from the annual reset. A few hundred. Out of 16,000+ apps. That’s who this change affects.

The developers making $50K a year from their app? Nothing changed. The first $1M is still free — they’ll never hit it. The developers making $200K? Same story. The lifetime cap is so far above them that this policy is invisible.

The people this hurts are the ones who were already doing very well.

I remember the old Shopify

Here’s where it gets personal for me.

I worked at Shopify. I remember when it felt like a startup that genuinely loved its developers. The API was open, the documentation was solid (mostly), the partner program was a real invitation. Build something useful, put it in the app store, and Shopify would stay out of your way.

That era shaped how I think about platforms. It’s why Victoria Garland builds on Shopify. It’s why I still tell clients that the Shopify ecosystem is one of the best places to build an ecommerce business. I believe that.

But I also remember that Shopify went public in 2015. And public companies have shareholders. And shareholders want growth. And growth, eventually, means extracting more value from the ecosystem you built.

This isn’t betrayal. It’s arithmetic.

Every platform follows the same arc. You attract developers with generous terms. You build an ecosystem. You go public. And then, slowly, the terms get less generous. Not because anyone in the building is evil, but because the incentives shifted the day the stock started trading.

I’ve watched this happen at Apple. I’ve watched it happen at Google. Shopify held out longer than most, honestly. The $1M annual exemption lasted years. That’s more patience than most public companies show their developer ecosystems.

The outrage is real but misplaced

I’m not dismissing the frustration. If you’re an app developer who was counting on that annual reset, and now you’re looking at $450K more in rev share over five years, that’s a real number. It changes your margins. It changes your roadmap. Maybe it changes whether you hire that next developer.

But the framing online — that Shopify is killing the developer ecosystem, that this is the end of independent app development on the platform — is wildly out of proportion.

The first million is still free. The rate is 15%, not 30%. Shopify paid out over $1B to app developers. The ecosystem is massive and growing.

What actually happened is that a very good deal became a pretty good deal. And a few hundred developers who had the best deal in the industry now have a deal that’s merely better than most.

That’s the part nobody wants to say out loud.

So is it still worth building on Shopify?

Yes. But with eyes open.

I’m saying this as someone who builds Shopify apps right now at Victoria Garland. We’re in it. JourneyGlow, PriceGlow, CrowdShop, StockGlow, SpeedGlow. We’re not on the sidelines commenting. We’re writing Liquid and deploying to Gadget and dealing with the same API versioning headaches as everyone else.

It’s still worth it because the merchant base is real. The problems are real. The money is real. And 15% after your first million is still a better deal than most platforms offer from dollar one.

What you can’t do is build on any platform like it owes you something. Shopify doesn’t owe developers a $1M annual gift. Apple doesn’t owe developers a 15% rate. No platform owes its ecosystem permanent generosity.

The deal is: you get access to merchants, distribution, and infrastructure. In exchange, the platform takes a cut. If the cut changes, you adapt. If the platform builds something that competes with your app, you find a new angle. That’s the game.

I remember the small business, entrepreneurship-loving Shopify. I do. And there’s a version of me that misses it.

But the Shopify that exists today, the one with shareholders and quarterly earnings and 16,000 apps, was always where this was heading. The generous era wasn’t the real Shopify. It was the startup phase of a company that was always going to grow up.

The rest of us just have to grow up with it.

Shameless plug: At Victoria Garland, we build Shopify apps and custom integrations for merchants who need things done right. We’ve been in this ecosystem long enough to know where the lines are, and how to build something worth keeping.

The 100-Hour Gap Between a Vibecoded Prototype and a Working Product

2026-03-23T13:00:00+00:00

I had a working scheduling app in three days.

Recurring events rendered on screen, synced to Google Calendar, the whole thing. I called it Claro. It looked like a product. It felt like a product. Then the app started putting fourteen events on a Tuesday morning, some of them in the past. It took months to fix.

The prototype was the easy part. It’s always the easy part.

The fun part

Claro was a personal project, a rebuild of a scheduling app I’d been thinking about for a while. I wanted to see how much AI could actually do, so I gave it a long leash. I didn’t prompt it on the data model. I didn’t spec out the recurrence algorithm. I just described what I wanted and watched what it planned.

And for the first few days, it felt like flying. I had screens, a calendar view, events rendering, Google Calendar sync. The momentum was intoxicating. You don’t notice you’re accumulating technical debt when the code is appearing that fast. Everything looks right because everything looks right: the UI is there, the data flows, the buttons work. You mistake the presence of a working demo for the presence of a working product.

The METR study I’ve talked about before found that AI made experienced developers 19% slower while they believed they were 20% faster. That 39-point gap between perception and reality? I was living inside it. The prototype was real. My understanding of how it worked was not.

Fourteen events on a Tuesday

The recurrence logic was the first thing to break. The algorithm had no concept of a calculation horizon. It would generate events into the future with no boundary, or sometimes not far enough. One morning I opened Google Calendar and there were fourteen instances of the same event stacked on a single day. Some of them were in the past, as if the app had decided retroactively that I’d been busy yesterday morning.

The thing is, I didn’t design the recurrence engine. The AI did. I approved it without fully understanding the horizon calculation, the series generation, the edge cases around time zones and daylight saving. It’s like inheriting a codebase from a developer who quit (except that developer was me ten minutes ago).

I’ve always considered writing software to be like telling a story. It has to make sense. There should be a narrative you can follow from data model to UI. When AI writes the first draft of the story, the code compiles and the tests pass, but the narrative is missing. I couldn’t explain why Claro calculated horizons the way it did. And if you can’t explain your own code, you can’t debug it and you definitely can’t hand it to someone else.

The AI fixed its own mess (eventually)

Here’s the part that complicates the narrative: the AI did eventually solve the recurrence problem. Not in a flash of brilliance, but in a slow, iterative feedback loop over months.

I set up Playwright tests that could see the actual output in Google Calendar. The app syncs events to your calendar, so when the recurrence logic was wrong, the evidence was right there: fourteen events on a Tuesday, events in the past, missing series. I’d describe the problem, point the AI at the Playwright results, and let it write tests and code itself out of the hole.

It worked. Months later, the recurrence engine was solid. But “months later it was resolved” is doing a lot of heavy lifting in that sentence. The prototype took days. The fix took months of coming back to it, giving it new test cases, letting it see its own failures. The 100 hours aren’t gone. They’re just spent differently. Instead of writing the fix yourself, you’re supervising an intern who’s very fast and very confident and occasionally puts fourteen events on a Tuesday.

The agency version of this

At Victoria Garland, we see the same gap from the other side of the table.

Coding is maybe 40% of a Shopify build. The rest is QA, client feedback rounds, content migration, launch checklists, training the client’s team on how to actually use the thing. AI speeds up that 40%, and then the client looks at the timeline and says why does this still take six weeks?

Because the six weeks was never about writing code. It was about everything around the code.

We produce code but allow about a month for the feature to bake in. This is when the edge cases hit. This is when someone says “I wish the button was over there, it would be ten times faster for my workflow.” You can’t compress that. It literally takes time for humans to use software, perceive it, and tell you everything you got wrong. Even if the code can be built in a flash, we can’t comprehend what we’ve built until we’ve lived with it for a while. The perception takes time.

A Hacker News thread with 332 comments put it plainly: “Testing workloads that take hours to run still take hours to run with either a human or LLM.” AI writes the code. It doesn’t watch the pipeline. It doesn’t sit with the client while they try to figure out why the button isn’t where they expected it.

The gap between prototype and product

The 100-hour gap is a description of what software actually is.

Software is code plus the slow, human process of discovering whether it actually works for the people using it. That process hasn’t gotten faster. The first day got faster. The next thirty didn’t.

The developers are the gatekeepers of this. We’re the ones who need to guide the client toward the correct process; slowly and intentionally building something that runs solidly for a long time. You can build a hasty tent in an afternoon. It’ll blow over with the next weekend’s wind gust. Or you can take the time to anchor it properly and end up with something that actually shelters people.

AI made the prototype trivial. The product is still the hard part. The 100 hours are still the 100 hours. They just look different now.

Shameless plug: At Victoria Garland, we build Shopify stores that survive past the prototype phase. We’ve done our time in the 100-hour gap so our clients don’t have to. If you’re building something on Shopify and want it done right, let’s talk.

The Worst API Integration I Ever Built: A Horror Story in Three Acts

2026-03-16T14:00:00+00:00

The reason I learned to code was a TV on a wall.

Shopify’s support team in Ottawa had these leaderboard dashboards: who closed the most tickets, who had the best Smiley scores, who was winning. I wanted to put my own data on those screens. I wanted to integrate with the internet, build things, display information my way. So I taught myself Ruby, started deploying little apps to Heroku, and eventually someone noticed.

The default Dashing dashboard. Shopify open-sourced this framework and every team ran their own version. It’s defunct now, but in 2013 these screens were everywhere.

My own dashboard. Hydro Ottawa usage graphs (apprently lagging badly!), cat gifs, weather and countdowns — if I could scrape it, it went on the screen.

“Hey, you know code now right? Can you (you can) fix our Zendesk integration?”

I should have said no.

Act 1: Optimism

It was 2013. Shopify had maybe 100,000 merchants, and support was a massive part of the business. Every ticket, every call, every chat all flowed through Zendesk. The sales team lived and died by attribution. If a ticket wasn’t synced properly, their numbers didn’t show up on the leaderboard. With no leaderboard credit there was no bonus. People cared about those dashboards the way traders care about Bloomberg terminals.

The Waterloo office mega dashboard. This thing was massive in the former distillery’s barrel-lined walls.

The Montreal office dashboards. This is the best representation of what those screens actually looked like day-to-day: mounted on brick walls, always on, always watching.

The integration was supposed to be straightforward. Sync data between Shopify’s internal systems, phone system and our Zendesk instance. Push events in, pull reports out. I’d been writing Ruby for a few months. I’d deployed a few apps. How hard could it be?

I want to tell you that I had a bad feeling from the start, but I didn’t. I was a junior developer who’d just discovered that code could make things happen in the real world. The developers at Shopify were rockstars. They could push one change and it would ripple across thousands of stores and be seen by millions. I wanted that. And now someone was handing me the same kind of blast-radius work. Me, the former magician and support guy who taught himself Ruby off a leaderboard TV.

That thrill lasted about 48 hours.

Act 2: The Documentation Lies

Here’s the thing about the Zendesk API in 2013: it wasn’t broken, exactly. It was just non-obvious in a hundred small ways that compounded into something painful.

The docs would say one thing. The API would do something slightly different. Rate limits would kick in at seemingly random thresholds, which matters a lot when you’re a hypergrowth company sending all sorts of data to a single Zendesk instance. And we were sending a lot. Probably too much, honestly.

I’d write a sync job, test it with a handful of records, ship it, and then watch it choke at scale. Retry logic wrapped in more retry logic. Error handling that was 80% of the codebase:

# I wish I was kidding
def sync_ticket(ticket)
  retries = 0
  begin
    zendesk_client.tickets.update(ticket)
  rescue ZendeskAPI::Error::RateLimited
    retries += 1
    sleep(retries * 30) # "exponential" backoff (it's hope)
    retry if retries < 5 # why 5? no reason. felt right.
  rescue ZendeskAPI::Error::RecordNotFound
    # it existed five seconds ago but sure, ok
    create_ticket_instead(ticket)
  rescue ZendeskAPI::Error::InvalidEndpoint
    # the docs said this endpoint works. the docs lied.
    try_the_other_endpoint(ticket)
  rescue => e
    # look, if we've gotten here, god help us
    log_and_hope_nobody_notices(e)
  end
end

The worst part wasn’t any single bug. It was the slow accumulation of workarounds. Each one made sense in isolation. Together, they formed something that looked less like an integration and more like a Jenga tower held together with hope.

And then someone from the sales floor would walk over: “Hey, my tickets aren’t showing up on the board. My numbers look wrong.”

Translation: your code is costing me money.

No pressure.

Act 3: The Jaded Pixels

Shopify’s original corporate name was Jaded Pixel Technologies Inc. Our emails were @jadedpixel.com. I didn’t think much of the name at the time.

But when you’re a junior developer staring at a failing sync job at 6 PM, and you walk over to a senior dev’s desk hoping for guidance, and they glance at your screen, say “yeah, that endpoint’s weird,” and turn back to their monitor: you start to understand the name.

They weren’t bad people. They’d just seen it all. Every API quirk, every rate limit dance, every integration that was supposed to take a week and took three months. They were jaded. And when you’re jaded, you don’t have the energy to walk a junior through the ninth weird thing they’ve hit today. You give them a cryptic one-liner and hope they figure it out.

I figured most of it out. Eventually. Through the time-honored method of reading Stack Overflow until 11 PM and deploying code that technically worked but that I wouldn’t want anyone to review.

The integration stabilized. The leaderboards updated. The sales team got their numbers. Nobody sent me flowers.

The Contrast That Stuck With Me

Years later, I heard the story about Twilio’s CEO walking into a pitch meeting with nothing but a laptop. He live-coded an integration right there and sent a text message from a few lines of code. That was the whole pitch deck. The API was the product demo because it was simple enough to explain to non-technical people in real time.

I think about that a lot. A great API is so simple it’s a pitch deck. A bad API is a horror story in three acts.

The Zendesk API wasn’t malicious. It wasn’t even the worst API I’ve encountered since. But it was my first, and I was alone with it. That’s the part that actually matters.

What Junior Devs Absorb

Bad tooling hurts everyone, but junior developers absorb the pain differently. When a senior dev hits a weird API, they know the API is weird. When a junior dev hits a weird API, they think they’re weird. They can’t tell if the docs are wrong or if they’re reading them wrong. They don’t know if the rate limit is unreasonable or if they’re doing something unreasonable.

This matters more now than it did in 2013. AI tools are turning non-technical people into junior developers overnight. Someone who’s never written code before can vibe their way into a working app with Claude or Copilot, and that’s genuinely exciting.

I’ve always thought individuals from a non-technical background (eg not-CompSci grads) make the best developers because they bring a fresh perspective and experience to the table. But the moment they hit a bad API, a confusing error, or docs that don’t match reality, they have zero frame of reference. At least I knew what an HTTP status code was. The new wave of AI-assisted builders might not even have that.

I spent months wondering if I was just bad at this. It turns out that the API was genuinely painful, but I had no frame of reference to know that! I just assumed everyone else would have figured it out faster.

If you’re senior and you see a junior wrestling with a bad integration: don’t just say “yeah, that endpoint’s weird.” Sit down for ten minutes. Confirm that the API is, in fact, the problem and not them. Pair programming is not only a force-multiplier but that validation is worth more than any code review.

And if you’re a junior reading this and currently fighting an API that makes you feel stupid: it might not be you. Some APIs are just bad. The fact that you’re still trying means you’re doing fine.

Shameless plug: At Victoria Garland, we build Shopify integrations that don’t make people cry. I’ve done my time in API purgatory so our clients don’t have to. If you’re building something on Shopify and want it done right, let’s talk.

CLI Tools Beat MCPs Every Time

2026-03-09T08:00:00+00:00

Here’s my hot take after a year of experimenting with AI agents: MCPs are mostly a solution to a problem that doesn’t exist.

Not all of them. But most of the ones being built right now, for tools that already have a solid CLI? They’re unnecessary. The AI doesn’t need them. It was already doing the thing.

I say this as someone who went on a genuine MCP binge. New integration drops, I installed it. Shiny new server for a tool I use daily, sign me up. I’ve tried probably a dozen at this point. The pattern became hard to ignore.

The GitHub MCP problem

The GitHub MCP was the one that finally made it click.

I set it up properly: authenticated, configured, the whole thing. It worked. I could ask my AI agent to check PR statuses, create issues, all of it through this nice structured integration. Great!

Then I noticed something. In other contexts, without the MCP at all, the agent was running gh pr list, gh issue create, gh repo clone the whole GitHub CLI suite, completely unprompted, completely correctly. No special configuration. Just… it knew.

The MCP was a middleman between the model and a tool it already spoke fluently. I was adding plumbing to a tap that wasn’t dry.

So I uninstalled it. Shrugged. Moved on.

Then reinstalled it two months later when I forgot this lesson. Then uninstalled it again. I’ve done this cycle three times now.

Why the CLI is already AI’s first language

Here’s what makes this obvious in retrospect: these models were trained on an enormous amount of terminal output. Man pages. README files. Stack Overflow answers packed with CLI commands and their outputs. GitHub issues full of shell sessions. Every --help flag, every brew install, every curl-pipe-bash that somebody pasted into a forum thread.

The command line isn’t just something AI agents can figure out. It’s closer to their native environment. They understand flags, exit codes, piped output, environment variables: not because someone wrote a custom integration, but because humans have been writing about this stuff for decades and it all ended up in the training data.

When you give an AI agent access to a terminal and tell it to interact with GitHub, it doesn’t need a structured API wrapper. It reaches for gh the same way a senior engineer would.

The same logic applies to most standard developer tooling. Good CLIs are well-documented, well-used, and therefore well-understood by the model. Shopify CLI, AWS CLI, kubectl, docker, git I haven’t needed an MCP for any of them. I just let the agent use the terminal.

Where MCPs actually make sense

I want to be fair here, because MCPs aren’t useless. They solve real problems in specific situations.

If you need complex OAuth flows and token management, a CLI isn’t going to cut it. If your tool streams data bidirectionally and needs the model to react to events in real time, you need something more than stdin/stdout. If your tool has no CLI at all (some proprietary internal system, some third-party service with only a web UI) then building an MCP is genuinely the right call.

The test I’ve landed on is simple: does a good CLI exist for this? If yes, try the CLI first. Don’t build the MCP until you’ve confirmed the AI can’t handle it through the terminal.

Most of the time, it can.

MCPs are the new microservices

Here’s what’s actually happening culturally: MCPs have become a buzzword inside a buzzword.

We’re deep in an AI hype cycle, and MCPs are a shiny sub-ecosystem within it. There’s a particular kind of developer energy (I recognize it because I’ve felt it) that gets excited about building integrations. About making things connect. About having a little server that talks to another little server in a structured, typed, elegant way.

That energy is great. It builds ecosystems. It’s also how we ended up with microservices architectures for apps that would have been fine as a monolith.

The MCP space right now has a lot of “I built this because it was interesting to build” energy, and I say that without judgment because I’ve absolutely done the same thing. But if you’re reaching for an MCP because MCPs are the thing you do when you want AI to interact with a tool (rather than because the CLI genuinely wasn’t good enough) you might want to pump the brakes.

Shipping an MCP for a tool that already has an excellent, widely-documented CLI isn’t an integration. It’s a wrapper around something that didn’t need wrapping, with a maintenance burden attached.

The practical test

Before you build (or install) an MCP for a developer tool, ask yourself two things:

Does a CLI exist for this?
Have I actually tried giving the AI access to the terminal and seeing what happens?

If you’ve done step 2 and the model struggled, great! Now you have a real reason to build the MCP. But in my experience, step 2 usually surprises you.

I’ve watched Claude Code run Shopify CLI commands I didn’t know existed. I’ve seen it compose multi-step gh workflows that I would have taken longer to type manually. The agents aren’t doing anything magical. The CLI is just a format they know well: better than most people expect.

Try the CLI first. You might find you don’t need the MCP, and that’s one less thing to maintain, configure, and update when the underlying tool changes its API.

I build integrations for a living at Victoria Garland, which maybe makes me the wrong person to tell you not to build integrations. But sometimes the CTO job is knowing which complexity to skip.

Vibe Coding Is Real, But It’s Not What Twitter Thinks

2026-03-03T15:00:00+00:00

AI is your copilot (aptly named, GitHub) and you should always be ready to take over command of the aircraft.

A recent study found that experienced developers were 19% slower when using AI coding tools. The kicker? Those same developers believed they were 20% faster. That gap: between how productive AI feels and how productive it actually is is the entire vibe coding conversation in one data point.

I use AI tools every day. Claude Code for maybe 20-30% of my work, VS Code and my own brain for the rest. And I’ve felt that gap firsthand. There are days where I feel like I’m flying and then I look at the clock and realize I spent 45 minutes going back and forth on something I could’ve written in 20.

The productivity gain is real. But it’s not where people think it is.

The Twitter version vs. the real version

The discourse around vibe coding has split into two camps, and both are wrong.

Camp one: people building todo apps in 5-minute demos and declaring that software engineering is over. Camp two: senior devs swearing off AI tools entirely because “real programmers type their own code.” Most working developers are somewhere in the middle: too busy actually shipping to argue about it online.

Here’s my version. I use Claude Code to scaffold new features, write tests, refactor messy Liquid templates. It handles a real chunk of the work. But I’m reading every line it gives me. Every single one. The “vibe” part is that I’m describing what I want in plain English instead of typing every character. The “not vibing” part is that I still have to know if what it gave me is good.

Those two things aren’t contradictory. They’re the whole point.

The typing was never the bottleneck

The biggest misconception about vibe coding is that it means you don’t need to understand code anymore. That you can just prompt your way to a production app and call it a day.

Building the thing was never the hard part. Maintaining it is. Debugging it at 2am is. Understanding why it breaks when a real user touches it in a way you didn’t anticipate. That’s the actual job. And none of that gets easier because an AI wrote the first draft.

If anything, it gets harder. When you write code yourself, you build a mental model of what it’s doing as you go. When AI writes it and you just approve it, that mental model has gaps. You don’t notice the gaps until something breaks and you’re staring at code you technically “wrote” but can’t quite explain.

The METR study I mentioned isn’t just a fun stat. It’s pointing at something real: AI tools make you feel faster because the typing part is fast. But the typing was never the bottleneck. The thinking was. And the thinking still takes the same amount of time, you just don’t notice you’re skipping it until later.

The open-source warning sign

If you want to see what happens when vibe coding goes wrong at scale, look at open source right now.

Daniel Stenberg shut down cURL’s bug bounty after AI-generated submissions hit 20%. Not because AI can’t find bugs ( it can) but because people were pointing AI at the issue tracker and hitting submit without reading what it produced. Mitchell Hashimoto banned AI code from Ghostty entirely. Steve Ruiz closed all external PRs to tldraw.

These aren’t Luddites. These are maintainers of critical, widely-used projects who got buried under confident-sounding contributions from people who never read the codebase. The PRs looked good. The grammar was perfect. The code compiled. And the maintainers still had to spend hours explaining why the approach was wrong, because the person who submitted it couldn’t defend it.

That’s vibe coding without the pilot. All copilot, no one at the controls.

Where the real gain is

So if the gain isn’t “write code faster,” where is it?

For me, it’s exploring ideas faster. I can sketch out three different approaches to a problem in the time it used to take to build one. I can ask “what if we structured it this way instead?” and get a working prototype in minutes, not hours. The speed isn’t in production, it’s in iteration.

It’s also in the boring stuff. Boilerplate, test scaffolding, file conversions, repetitive refactors. The stuff that was never intellectually challenging but ate hours anyway. AI is genuinely great at that, and offloading it frees up time for the thinking work that actually matters.

But here’s the thing nobody talks about: to use AI well for any of this, you need to already know what good looks like. You need to read the output and catch the subtle bug. You need to know when the approach is wrong even though the code runs. You need the fundamentals.

A tool, not a shortcut

I think about this a lot when it comes to junior devs learning through AI tools. And my take is pretty simple: it’s a tool. Use it to ask questions. Use it to learn the system. Ask it to include you in the investigation and learn along the way.

But don’t skip the part where you enrich your own brain. We have incredibly powerful tools right now, and that’s great. Maybe in ten years the relationship between humans and code looks completely different. But right now, understanding the code is still part of the job. There is no shortcut to reading, understanding, and being able to explain what your software does.

And honestly? That part is fun. The learning, the debugging, the moment where something finally clicks. That’s not the chore AI is saving you from. That’s the craft. Skipping it doesn’t make you faster. It makes you dependent.

The one rule

If I had to boil my entire AI workflow down to a single principle, it’s this: AI is your copilot, and you should always be ready to take over command of the aircraft.

Not because the autopilot is bad. Modern autopilot is incredible. But when something unexpected happens (and in software, something unexpected always happens) the person in the seat needs to understand the instruments, the terrain, and the plan well enough to take the controls and land the thing safely.

Vibe coding is real. I do it every day. But the version that works isn’t the one Twitter is selling. It’s not “prompt and ship.” It’s “prompt, read, think, adjust, and ship.” The AI does real work. But you’re still the pilot.

Don’t let anyone tell you otherwise. And definitely don’t let the autopilot tell you, either.

If you want to see vibe coding in production, the careful kind,s we build Shopify apps and integrations at Victoria Garland. The AI helps. The humans decide.

The Human in the Loop Isn’t Optional

2026-03-02T09:00:00+00:00

Intel had a great marketing trick: they made you care about a chip you’d never see. The sticker was on the outside of the laptop. The processor was sealed inside. You had no idea what it looked like or how it worked — but “Intel Inside” meant something. It meant the thing was engineered well. It meant someone was accountable for the core.

I’ve been thinking about a version of that for AI workflows. The label I want to see on my work is Alex Inside. Not because I wrote every line, but because there’s a human behind the judgment, the direction, and the calls that actually matter. The chip is impressive. But someone still has to be the computer.

Here’s the thing though: that label is a lot easier to slap on than to earn.

The part where I lost the thread

A while back, I handed off a significant chunk of a project to AI. Not trivial stuff, the kind of thing that shapes core behavior of what you’re building. It went smoothly. The output was good. I moved on.

A few weeks later, I needed to evolve it. Add a new constraint. Rethink the logic given some new information. And I realized, with a kind of quiet dread, that I couldn’t quite do it. Not because the AI had written bad code or bad reasoning (it hadn’t). But because I had never actually owned any of it. I had approved it, not understood it. I had agreed, not decided.

“How do we evolve the algorithm if we didn’t write it, or even carefully ask for it?”

That question sat with me. Because the drift isn’t dramatic. You don’t suddenly find yourself unable to function. It’s subtle. One project you handed off 20%. Next one, 40%. Then you’re reviewing AI output more than you’re doing actual thinking, and you’ve quietly outsourced the part of the job that keeps you sharp. The part that builds intuition. The part that lets you change direction when direction needs to change.

The algorithm doesn’t drift. You do.

AI for AI’s sake

Here’s a flavor of drift that’s less personal and more systemic: the reflexive addition of AI to things that didn’t need it.

Someone in the meeting asks, “Should we add a chat interface so users can query their data?” And because it’s 2026 and you’ve got an LLM API key sitting right there, the answer is almost always “yes, obviously.” But it’s worth actually asking: does this help?

As information gets easier to find — and it has never been easier — the real skill isn’t retrieval anymore. It’s distillation. Not asking Jeeves, but knowing what question to ask and what to do with the answer. A chat interface that lets users ask “what happened last Tuesday?” is impressive. But if what they actually need is a clean weekly summary that surfaces the three things they should act on, the chat interface is a party trick.

Someone still has to decide what matters. Someone still has to make the judgment call about what the user actually needs versus what they asked for. That’s not something you can put in a prompt.

The best AI integrations I’ve seen feel invisible. They’re doing real work somewhere under the hood, and the person using the product gets a cleaner experience. They feel like a product that actually understood what you needed. The worst ones feel like AI with a voice skin on top. Someone decided the experience should be AI because AI is the thing right now, not because it was the right answer.

Where are you in the loop?

There’s a version of “human in the loop” that’s mostly decorative. You’re in the loop in the sense that you clicked approve. You read the summary. You nodded. But you’re not actually doing anything the loop couldn’t do without you. You’re just there for liability reasons.

And then there’s a version that actually matters.

The difference is roughly this: asking and agreeing vs. directing, evaluating, and evolving.

Asking and agreeing is: “Write me a strategy for X.” Read it. Looks reasonable. Ship it.

Directing, evaluating, and evolving is: knowing enough about X to shape the input, reading the output with a critical eye, pushing back on things that don’t sit right, and building on the result in a way that adapts when the situation changes. It’s the difference between having a strong opinion you can defend and having a summary you half-remember.

This matters even more when other humans are involved. Clients, users, teammates: people don’t just want polished output. They want to feel heard. Trust doesn’t transfer through the AI’s eloquence. It transfers through the sense that a real person thought about their specific situation and gave a damn. A beautifully worded recommendation that clearly came from a template (even a good template) does not have the same weight as a recommendation from someone who was clearly paying attention.

You can automate a lot of the generation. You can’t automate the relationship.

Alex inside

So here’s where I’ve landed.

AI is embedded in nearly everything I do now. Research, drafts, code, proposals. It’s in the stack whether I name it or not. And I think that’s fine. More than fine, actually. The output is often better, faster, more thorough than what I’d have produced grinding through it alone.

But the thing I keep coming back to is: what’s the sticker on the outside?

“Intel Inside” worked because Intel stood for something. The chip did something specific and did it well, and the company was accountable for that. The sticker was a promise.

“Alex Inside” is a promise too. It means I directed this. I evaluated it. I could defend the reasoning. I can evolve it when the situation changes. I’m not the person who approved the output — I’m the person who owns it.

The AI is doing real work in there. But I’m the processor.

Using AI well is less about what you hand off and more about staying the kind of person who can hand things off and still know what’s happening. Because a loop without a human who’s actually in it isn’t a loop — it’s just a machine running until it breaks, with no one quite sure how to fix it.

Don’t lose the thread.

Zen and the Art of Infinite Support Loops: My 17-Day Battle to Cancel Zendesk

2025-12-16T14:00:00+00:00

They say irony is dead, but I’m pretty sure it’s just stuck in a ticket queue somewhere at Zendesk.

Recently, I decided to do some digital spring cleaning. I’ve had less time for side-projects lately, and you know the old saying: the nail that sticks out gets hammered down. For me, that nail was an old Zendesk developer account, a relic from a long-forgotten failed app that I didn’t need anymore. I only noticed it was still active because, ironically, it started spamming me with a ridiculous amount of emails.

“No problem,” I thought. “Zendesk is the industry leader in customer support software. Surely, using Zendesk to get support from Zendesk to cancel Zendesk will be a seamless experience.” I thought.

Holy heck, I was so wrong!

What followed was 17 days, multiple agents, an AI chatbot named “Zea,” and a Kafka-esque journey through the bowels of enterprise support infrastructure. I experienced it all, from A to Zea. Here is the chronicle of my descent into customer service purgatory.

Phase 1: The “Managed” Trap

It started innocently on November 29th. I had recieved a few dozen spammer/junk notifications from my old Zendesk. I logged in, looking for a simple “Cancel Account” button. Instead, I was informed my account was “managed” and I’d need to contact support to make any changes.

“Contact Zendesk Customer Support to make changes to your account.”

Enter Zea, Zendesk’s AI agent.

Zea was very eager to help me “WATCH NEXT LEVEL CX WEBINAR SERIES” or “LEARN ABOUT ZENDESK.” When I typed “cancel,” I was met with a menu of options: none of which were “cancel.” I had to navigate through “Manage subscription” → wait for another menu → type “cancel” again → get told it’s a managed account → agree to speak to a human → answer more qualifying questions about my request.

It’s 7:40 AM. I just want to click an off button. The AI wants to know if this is about billing or technical support. It’s about leaving, Zea.

Phase 2: The Warm Hand-off into the Cold Void

Finally, a human! On November 30th, Jeric from the “Zendesk Advocacy Team” entered the chat. Relief washed over me.

Then I read his message:

“I understand that you’re looking to cancel or change the message stating, ‘This is a managed account.’”

No, Jeric. I don’t want to change the message. I want to cancel the account. The message is fine. The account’s existence is the problem.

He then asked me to “share more about your goal for updating your account management” and “what specific changes are you hoping to make.”

I am hoping to make the account not exist anymore, Jeric.

Eventually, he understood and said he would “connect me over to our team, which handles cancellation requests.” Great! A specialist team! I assumed this would take, maybe, an hour.

Four days passed. Silence.

On December 4th, I nudged them. The ticket looked like it was auto-closing due to inactivity. I was inactive? I literally replied “Hi there, I’m looking to cancel” to make sure there was no ambiguity.

Sidenote: I want to be clear: I’m not dunking on individual support agents here. My first role at Shopify back in February 2013 was as a support agent. I’ve answered the angry phone calls, the frustrated emails, the impatient chats. I commiserate. I’ve made my share of mistakes under time pressure: like the time I addressed a miffed “Carla” as “Carlos.” I get it. What I’m critiquing is the system, not the people trapped inside it.

Phase 3: Who Are You Again?

This is where it got truly Kafkaesque.

Rachel from the “Renewals team” finally responded. After being in an email thread associated with my account for five days, a thread where I’m logged in, where my email is visible, where the entire context is about canceling my account, she asked:

“Could you kindly confirm the subdomain you are requesting to cancel?”

I stared at my screen. You are Zendesk. I am emailing you through Zendesk. About my Zendesk account. Which you can see. Because you are Zendesk.

By this point, I was responding from my iPhone in increasingly terse messages: “Perhaps something went wrong. I got this message that the ticket will be closed but I am waiting for a response. Trying to cancel this account. Thanks.”

The replies were coming from different time zones. I was waking up to messages sent at 2:05 PM CST. 1:25 AM CST. 11:40 PM CST. This ticket was circumnavigating the globe while going absolutely nowhere, running in place.

Phase 4: Spam Inception

By December 9th - ten days into this saga - I had reached my breaking point. The irony had become so thick you could cut it with a knife:

While I was waiting for Zendesk to cancel my account to stop the spam notifications, Zendesk’s system was sending me more spam notifications about the support ticket I opened to stop the spam.

I literally had to copy-paste their own system spam back to them as evidence. “Here is a sample of the spam I am still receiving from your systems,” I wrote, including a forwarded notification about a ticket from some gibberish email address that my still-active Zendesk account was dutifully alerting me about.

“I keep getting passed around, can someone complete this task please?”

Meanwhile, Rachel responded: “I am still waiting for confirmation regarding the account cancellation.”

Waiting for confirmation from whom? I had confirmed. Multiple times. Through multiple channels. Was there a secret council that needed to convene? A board of directors vote? Did the Finance team need to consult the ancient scrolls?

The Resolution

Finally, on December 16th at 8:13 AM (seventeen days after I first typed “cancel” into a chat window) the golden email arrived:

“We would like to inform you that the Finance team has completed the cancellation of your account with the subdomain: d3v-atam.”

The Finance team. It took escalation to Finance to flip a boolean from active: true to active: false.

Seventeen days. Four agents (Zea, Jeric, Rachel, and the mysterious Finance team). Countless emails. Multiple time zones. All to delete a developer sandbox account.

The Takeaway

There’s a particular flavor of dystopian comedy in watching a customer support software company struggle with customer support. It’s like watching a fire extinguisher factory burn down, or a locksmith locked out of their own shop.

If Zendesk (the company that literally sells the tools to handle these interactions efficiently) can’t manage a simple cancellation request without a 17-day odyssey through their own labyrinthine processes, what hope is there for the rest of us?

So the next time you feel bad about your own company’s support response times, take heart: Even the people who build the customer experience infrastructure can’t figure out how to deliver a good customer experience.

At least the webinar series is still available, I guess.

Shameless plug: At Victoria Garland, we design software that’s actually human-focused and easy to use, the kind where “cancel account” is a button, not a 17-day expedition. I’m the CTO, and if you’re building something and want to avoid putting your users through support purgatory, let’s chat.

Where Background Agents Are At Today (And How They Shaped Claro)

2025-09-03T14:00:00+00:00

When I first started experimenting with background agents in Cursor, I’ll admit: I wasn’t sure what to expect. The idea was ambitious: let an AI quietly work in the background, tackling tickets, cleaning up code, maybe even handling the chores that developers usually push off until “later.” For a long time, “later” meant never. Background agents promised a world where those small but necessary changes actually happened without me lifting a finger.

I’ve been leaning on Cursor pretty heavily while building Claro, my new task and calendar management app. Claro itself is an attempt to rethink scheduling: lightweight, flexible, and native-friendly. But in practice, most of my energy over the past 48 hours hasn’t gone into building shiny new features. It’s gone into seeing just how far background agents can stretch.

The Early Days: Rough Around the Edges

When background agents first launched a few weeks (or was it months?) back, I wanted them to do the kinds of things I usually procrastinate on: removing debug logging, enforcing consistency across files, or patching up stray TODOs. In theory, perfect use cases. In reality, the first wave of agents struggled. They could follow simple commands, but as soon as the request got layered or touched multiple parts of the repo, things went sideways.

I remember one of my earliest attempts: “remove all debug logging unless we’re in development mode.” The agent gamely opened a PR . . . But it was riddled with inconsistencies and missed half the cases. Worse, when I tried to merge other changes into main, the agent’s PR immediately was outdated. It didn’t know how to rebase against my recent work, so the result was a PR I couldn’t use. The net time saved? Negative.

Back then, the failure rate was high enough that I stopped bothering. I’d spend more time cleaning up their mess than just doing it myself.

The GPT-5 Shift

Fast forward to the last couple of days, and it feels like we’ve crossed a line. The background agents now run on GPT-5, and it shows. They can actually follow a detailed prompt without spiraling off into nonsense. Instead of coming back with a barely related PR, they now seem to “get it” in a way that’s closer to a junior developer who actually reads your instructions.

That same “remove all debug logging” request? The new agent handled it cleanly, scoped the changes, and explained what it had done. That’s a leap forward. The precision makes them feel less like a demo feature and more like a real tool I might trust.

But it’s not perfect. They’re still brittle around changes in the codebase. If I’ve merged a handful of PRs since they started their run, chances are I’ll end up tossing their work because it no longer applies cleanly. This feels like the biggest limiter right now: they operate in a snapshot of time, but my repo is alive. Unless they can adapt and rebase, a lot of the value gets stranded.

The Cost of Delegation

One thing I’ve noticed is that using agents is cheap in money but expensive in trust. In the past 48 hours, I’ve run half a dozen background jobs, and I’m probably out less than ten bucks. That’s nothing compared to the hours of engineering time it should have saved. But in practice, I still find myself babysitting their output.

This is the catch: an agent that creates PRs I can’t merge isn’t just wasted money, it’s wasted mental energy. I need to review their work, test it, and often redo it by hand. When that happens, I end up feeling like I paid to be distracted.

That said, when they do land a clean PR, the ROI is massive. It feels like magic to wake up, check the repo, and see chores completed while I slept.

How Claro Benefited

Claro has already benefitted in small but meaningful ways. For example, I had background agents clean up type annotations across the codebase, enforce consistent error handling, and even refactor a gnarly function into something modular. None of those were glamorous tasks, but together they smoothed out the developer experience.

I like to think of it as the AI doing janitorial work. Not glamorous, not headline-worthy, but it makes the house feel clean. And for a new project like Claro, that’s actually huge. A tidy codebase compounds - you move faster, onboard easier, and make fewer mistakes.

What’s Still Missing

The glaring gap today is context continuity. Agents don’t really understand what’s happened in the repo since they started working. They can’t rebase intelligently. Until that changes, they’ll remain more like eager interns than autonomous collaborators. You have to supervise them closely and discard work when it gets stale.

Another missing piece is multi-step strategy. An agent can execute a single command beautifully, but ask it to stage a plan with dependencies, and it struggles. Humans think in sequences (“first remove logging, then update configs, then adjust docs”), but agents right now are one-shotters.

If Cursor cracks those two challenges (continuous context and multi-step planning) I could imagine background agents becoming indispensable.

Where This Is Going

For now, I see background agents as a glimpse of the near future, not the end state. They’re useful in bursts, and they hint at what’s coming. But if you’re expecting them to run like autopilot, you’ll be disappointed.

That said, I can’t shake how different my experience has been compared to just a few months ago. With GPT-5 agents, they’ve gone from frustrating to genuinely helpful. And I suspect six months from now, we’ll look back on this current state as just another stepping stone.

Claro is my testbed for all this. As I push forward on the app, I plan to keep experimenting with agents, not because they’re perfect, but because they’re improving at a pace I’ve rarely seen in any tool. I want Claro’s codebase to be a living proof of what’s possible when humans and AI collaborate. Not as hype, but as daily reality.

Closing Thoughts

Background agents today are like interns who show up early, work hard, and sometimes deliver gold. You wouldn’t leave them in charge of the repo just yet. Still, even that level of help can make a tangible difference when you’re building something from scratch.

If you’re a developer curious about AI in your workflow, background agents are worth exploring. Just don’t expect autopilot. Think of them as assistants who are finally getting good enough to trust with real tasks and who might grow into something more.

And if you’re curious about Claro, well, stay tuned. It’s the app that’s teaching me as much about AI development as I am teaching it about calendars. Learn more at clarocal.com.

Heroku Scheduler style rake tasks on fly.io

2024-01-15T14:16:51+00:00

This year I’ve started migrating many of my personal software projects from Heroku to fly.io to try and keep my costs down. One of the major pain points with Heroku is the pricing of their managed database service Heroku Postgres. Don’t get me wrong, its great but it also adds up quickly. For example a tiny, anemic instance is available for $5/month and a slightly more moderate basic instance is available for $9/month. This adds up quick if you’re like me and launch a few apps every month.

The migrating from Heroku to fly is quite simple and they have excellent docs to help you get to the finish line. There was one aspect of this migration that I was not able to easily figure out was scheduled tasks. One of the more positive aspects is their pricing structure which is more pay-as-you-go vs the minimum payments Heroku requires.

Heroku has a very nice feature called the Heroku Scheduler that allows you to run jobs with a cron-like regularity but I was not able to grok the docs to figure out how to do something similar on fly.io

Eventually I did find a blog post announcing a schedule option for fly machines, but the code examples were not clear. It wasn’t immediately clear how to set an hourly task, but eventually I found the following syntax:

fly machine run . --dockerfile Dockerfile bin/rails data:fetch --schedule daily

This is run from the repo where the fly app was launched and references the Dockerfile to build the app. It also uses bin/rails data:fetch which when added after the Dockerfile replaces the entry command from the Dockerfile. This runs a one-off machine to run the task on a daily basis. This also allows me to use the same generated file that runs the rails app, while skipping the booting of the server.