Vibe Coding

A new paradigm is coming: AI is, like it or not, “taking over our jobs,” and this is the beginning of that transition

Vibe Coding
Chi, the house king.
Vibe coding (or vibecoding) is an approach to producing software by using artificial intelligence (AI), where a person describes a problem in a few natural language sentences as a prompt to a large language model (LLM) tuned for coding. The LLM generates software based on the description, shifting the programmer's role from manual coding to guiding, testing, and refining the AI-generated source code. – Wikipedia

What do I think about the application of vibe coding for “commercial grade” software i.e. for startups in the 7-8 figures revenue range? (Where I have the most experience and network)

I don’t typically comment on “recent events” (at least partly because I’m typically out of the loop on anything recent) but I’ve talked to enough people about this one that I think I can help you navigate it.

And besides, this also applies to my own work, so it’s good to for me think about it.

In summary:

  • Many of us can’t tell how good AI code really is: We struggle to evaluate it because in vibe coding we’re often leveraging it in areas outside of our expertise.
  • The cost function for vibe coding is incorrect: Companies’ goals aren’t correctly aligned to vibe coding’s trade-offs.
  • A new paradigm is coming: AI is, like it or not, “taking over our jobs,” and this is the beginning of that transition. You need to work with AI, and the sooner you learn it, the better.

How good are current AIs at coding for us?

I’m gonna talk about expertise in evaluating capabilities, so some context on the origin of the term vibe coding is important.

The term vibe coding was coined and popularized by Andrej Karpathy. The quote that blew up on X, among others, was “It's not really coding - I just see things, say things, run things, and copy-paste things, and it mostly works.”

Now here’s what you may or may not know about Andrej Karpathy: He was a founding member of OpenAI, has a phD from Stanford under AI legend Fei Fei Li, was Director of AI at Tesla, among other accomplishments. In short, Andrew is at the very top percentile of people most knowledgeable about AI in the world today, and is a prolific computer scientist to boot.

Andrej Karpathy is not your “everyday” engineer, let alone a non-engineer. There’s, presumably, a big difference between him “saying things, running things, and copy-pasting things”, and us doing it.

Which leads me to a new term: expertise opacity.

Expertise opacity

In software engineering, we use the terms black box and white box, particularly for testing, but also other areas like API design:

  • Black box means you can only see the interface but not the inner workings of a system, like a coffee machine: insert a capsule, press a button, coffee comes out.
  • White box means the inner workings are visible – like an italian moka pot: you grab the coffee, and you see how the water heats, pressure builds, and coffee brews.

To illustrate expertise opacity, I asked ChatGPT, which I’m extremely familiar with, to write a blog post on how to be a productive software engineer, a topic with which I’m quite familiar – in Chinese. I don’t speak Chinese, and the output is quite interesting:

Through my own capabilities, I have no way to tell whether that’s a good or a bad blog post. I know it “works” in that I have very little concern that it’s a blog post on global warming or the fall of the Roman Empire instead of about software engineering, but expertise opacity prevents me from looking at this as a white box.

I have no way to tell whether the blog post is full of good or bad advice, whether it says things I agree with or not, whether its advice is applicable to my audience, or even whether it touches on topics I find critical for productivity. I know that it works, but not how it works.

Expertise opacity happens when the internals of a system are available, but cognitively out of reach. Like big finance explaining how derivatives and credit default swaps work, the difference between “this is a bad idea” or “this is sorcery” lies entirely in the reader’s expertise.

And like in finance, there’s a lot of market incentives for saying “this is sorcery” and not much market incentive for saying “this is a bad idea” about bad AI code.

As the saying goes: “He whose bread I eat, whose song I sing.”

Which leads me to my next point: vibe coding’s cost function is incorrect.

Artificial mechanical turks

History has a way of going in circles, but this time it’s done it more like a Möbius strip.

Mechanical Turk is a term that comes from a machine that played chess in the 18th century which turned out to be an illusion: there was a person playing all along, inside of a box. Today, we use mechanical turk for any cases where people pretend to be machines.

Today, machines pretend to be people, but that too can be an illusion unless you can look inside of the box. Like the 2022 journalist who got in a conversation with Cortana and soon was being persuaded to get a divorce, our antropomorphizing of AI without understanding how it works can lead to quite incorrect beliefs, judgements, and outcomes about their equivalence to people.

To talk about vibe coding’s cost function, I want to take the opportunity to quote Andrej again on his term-defining quote: it’s “not too bad for throwaway weekend projects.”

In AI and optimization, a cost function is just a fancy way of saying ”what is the system trying to minimize?” When your cost function is incorrect, you get unpredictable results, but only in hindsight – the system is still doing exactly what you told it to do, just not what you intended it to do.

Some illustrative examples of misaligned cost functions from Wikipedia:

A 2016 OpenAI algorithm trained on the CoastRunners racing game unexpectedly learned to attain a higher score by looping through three targets rather than ever finishing the race. Some evolutionary algorithms that were evolved to play Q*Bert in 2018 declined to clear levels, instead finding two distinct novel ways to farm a single level indefinitely. – Wikipedia, Reward Hacking

These agents behave kind of like people, but not exactly. It’s like the Wells Fargo fraud scandal where employees met unrealistic sales targets by secretly opening fraudulent accounts, on steroids. You have KPIs and make it the sole thing that matter (ie. the cost function), people and AIs will find novel, often unintended ways to hit the number, sometimes to shocking consequences.

Back to vibe coding: when you tell AI to write code to achieve a particular goal, that’s all it optimizes for: writing code that fulfills your prompt’s intent. But humans operate with not only different skill sets but far more complex, implicit, cost functions: not just whether the code works but how it’s written, how it will be maintained, who will read it, and how it fits into a larger system.

While there aren’t many studies on the cost of creation versus maintenance of software, most follow a kind of Pareto principle: around 80% of the time and effort in a system is spent on maintenance, not its creation. That’s where all those R&D dollars are typically going to in a company at the 7-8 figure revenue range.

When Andrej said vibe coding is “not too bad for throwaway weekend projects,” that’s why. The current cost function of vibe coding favors rapid, low-effort creation, while shifting complexity and cost into maintenance.

In other words, vibe coding optimizes for exactly the opposite of where the real cost of commercial-grade software actually lies!

In a way, vibe coding in commercial-grade software is like technical debt accrual on steroids. While debt can be a useful tool in many facets of a company’s lifetime, you must use it strategically for specific purposes, and make sure you’re not being deluded by the person inside of the box pretending to play chess, or by the machine inside of the box pretending to be a software engineer.

But if vibe coding in commercial-grade software is such a bad idea, Dui, then why is it being so widely adopted?

Vibe coding is the future.. most likely

I honestly think vibe coding is a sign of things to come. It’s hard to predict the future, and people are notoriously bad at it (see Tetlock’s books), but the incentives are just too swayed towards AI writing code instead of people, and I expect this sway will keep increasing.

In fact, I don’t think vibe coding is a bad idea at all. It’s just currently poorly aligned to current commercial-grade software engineering’s reality, that’s all.

Note the use of “current” twice in the above paragraph? Either of these things could change in the future:

1) The cost function of vibe coding becomes properly optimized for maintainable code. Through a mix of programming language and compiler design, specific model training, bigger context windows, multi-agent workflows, and other innovations, vibe coding could become much superior to humans at writing maintainable code.

2) The cost of commercial-grade software engineering inverts its Pareto Principle to 80% on creation. If creation is cheapened to an order of magnitude, the RoI of rewrites becomes much higher, and what’s a somewhat common practice then becomes the default practice. Software becomes like smartphones, and you just get a new one when your current one is a few years old.

I expect these transitions to happen gradually, but quickly: AI becomes incrementally better at writing maintainable code, and creation becomes incrementally cheaper.

I also expect the software engineers who are able to leverage full AI capabilities as it incrementally improves to become way superior to the software engineers who aren’t able to use them.

The market is already signaling that humans aren’t on the winning side of the “who writes code” battle. So being a human that can efficiently leverage AI code-writing capabilities will be the real differential for strong software engineers in the future.

Or so I predict.

In Summary

Vibe coding is probably the future. But because nobody can predict whether, or how soon, the necessary innovations to enable vibe coding for commercial-grade software will come in a way that leads it to a real competitive advantage instead of an illusory one, you should approach vibe coding carefully and only for specific use cases, like prototyping.

The main reason for this need to be careful is that the cost function for vibe coding is incorrect for a tech company’s cost structure. While companies spend most of their money on maintenance, vibe coding reduces the cost of creating software at the expense of increasing its maintenance cost.

The reason why we make bad decisions about the use of vibe coding in commercial grade software, despite its incorrect cost function, is that we lack the expertise to evaluate vibe coding as a white box, and instead judge it for what it creates rather than by how it works – what I call expertise opacity. This is often caused by software engineers working on areas they aren’t experts at, or when non-engineers drive its adoption.

Once vibe-coding’s cost function becomes aligned to a tech company’s, it’s most likely how most commercial software will be written, given its current traction despite a very misaligned cost function. That will happen as AI learns to write more maintainable code, and as companies lean more towards spending money on writing new code rather than maintaining it.

Despite this cost function misalignment, the market is signaling that AI will win the “who writes code” battle. A strong software engineer, therefore, will be the one who can effectively leverage full AI code-writing in a way aligned to a tech company’s needs.

And these, folks, are my current thoughts on vibe coding.