Ask Good Questions: When AI Defaults to Newer, Better, Faster

I put up a new guide and a matching field note on Ask Good Questions that I think may resonate with developers using AI in real projects:

Guide:
When AI Defaults to Newer, Better, Faster
https://askgoodquestions.dev/guides/when-ai-defaults-to-newer-better-faster/

Field note:
The Moment I Realized the AI Needed the Rules First
https://askgoodquestions.dev/field-notes/the-moment-i-realized-the-ai-needed-the-rules-first/

One of the patterns I keep seeing is that if you let AI move into implementation without first loading the real project constraints, it tends to start “helping” by leaning toward newer libraries, cleaner rewrites, more modern approaches, and other things that may be completely wrong for the actual environment.

That is fine in theory. In real projects, especially older or mixed environments, it can get you into trouble fast.

This will also be part of my upcoming book, Real Programmers Use AI.

How would you address this or will your book not go into this level of depth?

This is Anthropics Response:

But other’s are seeing this.

https://www.reddit.com/r/ClaudeCode/comments/1s7r3xr/i_can_no_longer_in_good_conscience_recommend/

https://www.reddit.com/r/ClaudeCode/comments/1sc9ayy/my_morning_with_opus/

Yes, absolutely. That is exactly the kind of thing I’m covering, because in practice it is one of the main ways AI gets people into trouble.

A lot of the discussion starts with whether the AI knows enough Clarion, and that does matter.

But just as important is the fact that AI tends to lean toward newer, faster, cleaner, and more modern-looking answers, even when what you really needed was “leave the working parts alone and make the smallest correct change”.

So I’m not treating that as some special case. I’m treating it as part of the everyday discipline of using AI well. A big part of that is showing how to frame the job so the AI understands what must be preserved, what rules it has to follow, and what kind of “help” is actually unwanted.

And then just as important, how to check the work so you do not get fooled by something that looks polished but quietly changed behavior or drifted away from the real requirement.

So yes, I’m definitely going into that level, because that is where AI stops being a novelty and starts becoming something you can actually use for real development.

Hi Charles,

I know you saw that Arnor posted this link in CLC and I think it is a strong support for your contention that AI tends to think that “new is better”…

I like this statement from the post…

‘As reported by the Financial Times, Amazon Web Services suffered a 13-hour outage in December after engineers let its Kiro AI coding tool update code without requiring any oversight. Kiro decided the best solution was to “delete and recreate the environment.” That’s one way to fix a problem, I suppose.’

I think this should be in bold and underlined. Thats a management cock up though!

Corporates can be bad environments for conformism especially with “strong leadership”, with nay sayers usually being ousted if they even made it into such a work place.

Ai has closed the gap with the Clarion Templates, but code quality in my experience is still below par and requires too much human intervention. That might be a different experience in other languages, with way more github repo’s to trawl through, but for now, Ai is not cutting the mustard for me.

However these Ai big names are very much under massive pressure to bring money in now, with the last round of funding dictating income needs to ramp up, so the cheap honeymoon period is over for the public and its time for these Ai’s to start repaying their investors so they are exploring new intiatives to see what sells. Only today Anthropic have announced Glasswing.

These Ai’s are only where they are because of data centres and more powerful faster hardware and the internet.

If they cant generate revenue, an Ai winter rapidly approaches, thats why I’m sticking with local Ai for now. I remember the dotcom bust.

TechDebt is very much an issue, which hinges on the fact someone needs to review and sign off on the code, and that is a harder slower job than coding it by hand imo. I’ve taken on enough apps from others to know the problems with off shored coders and bad management oversight.

Even a Rackspace VP, a vested interest in these Ai LLM’s has penned a piece for Forbes today, because things could go pear shaped quickly. Personally I would be shorting these companies because the Ai bubble is 8x bigger than the 2008 financial crisis, so there will be global fallout and the US loves blowing bubbles because the contagion ripples around the planet when it bursts, harming other countries with useless politicians, which probably explains Trumps war to blow the economy back up…

https://www.forbes.com/councils/forbestechcouncil/2026/04/07/ai-for-tech-debt-clean-before-you-code/

Yes, I did see that, and I think it supports the point pretty well.

“Delete and recreate the environment” is exactly the kind of answer that can sound clean and efficient if you ignore the real-world constraints around it. That’s one of the big traps with AI. It often gravitates toward what looks like the neatest technical solution instead of the safest or most practical one in the situation it’s actually being used in.

And that’s really the heart of what I was getting at in the article. The problem isn’t just that AI suggests something newer or different. It’s that if the rules and constraints aren’t established first, it will often optimize for the wrong thing.

I think the key phrase there really is “without requiring any oversight.”

That is the part people ought to underline, because that is where the real failure starts.

AI can be useful, but in real development it still needs context, constraints, and knowledgeable review. Otherwise it is very easy for it to produce something that looks clean or efficient while missing what actually matters in the project.

So to me the lesson is not just about AI. It is also about process. If the expectation is that the tool can act without oversight, then the problem is already bigger than the tool.

Well, yes and no </g/>

I’m using AI to write code using Flutter and Dart - and I know next to nothing about Flutter and Dart, so I can’t provide a human oversight!

What I’m finding useful is a technique for using AI that I came across in a YouTube tutorial…

  1. Get the agent to review its own code.
  2. Use an alternative agent to re-review the code and offer improvements.

I’m doing this more as a learning exercise than a code writing exercise and it is quite informative.

I used Copilot to write an app “with the appropriate folder structure” and when it had finished I asked it to review the code. It suggested a completely different folder structure, which I got it to implement.

Then, I used Claude to review the code and folder structure and it found that the rewrite was completely stuffed up, with code duplication and new files where the only purpose was to call the old files! I got Claude to tidy it up and to my untrained eye it looked much improved.

I’m tempted to get Copilot to review Claude’s code, but I think you need to recognise when you’re ahead!

Having an appropriate agent.md file seems to be an essential first step and fortunately, for the more widely used programming languages, there are plenty of examples.

That’s a fair point, and I think there’s a middle ground here.

Sometimes we are using AI in a language or framework where we’re not the strongest expert in the syntax or the platform details. I’ve run into that myself. In that case, the oversight may not look like “I know this language inside out,” but it can still look like real programmer judgment.

What I tend to do in that situation is lean much harder on testing, behavior, and explanation.

I’ll have the AI explain what the code is doing, I’ll walk through the logic, and I’ll scrutinize whether it actually behaves the way the requirement says it should. In some ways my measuring stick goes into overdrive precisely because I know I’m not judging it from deep language fluency.

So yes, I think AI reviewing AI can be useful, and I think explanation and comparison are useful too. I just think we have to be careful not to confuse that with the problem being fully solved.

It’s still a limitation, just one that can be managed better or worse depending on how disciplined the developer is.

Worth mentioning that some of the newer tooling is starting to improve specifically in the “review your own output” space, which has always been one of AI’s weaker areas.

For example, the new “Rubber Duck” feature in GitHub Copilot CLI is quite interesting. It effectively brings in a second model from a different AI family to review the first model’s plan/code before it proceeds.

The key idea is that different models have different blind spots, so instead of self-review (which tends to reinforce mistakes), you get a genuine second opinion that can challenge assumptions, spot edge cases, or catch cross-file issues.

In practice it kicks in at useful points after planning, after complex changes, even before tests run and can be triggered manually as well.

Feels like a step towards AI being less about “generate and hope” and more about something closer to a peer-review workflow. Still not perfect of course, but definitely moving in a better direction.

That sounds like a sensible step in the right direction.

One of the long-standing weak spots has been self-review. If the same model that made the mistake is the one checking the mistake, you are often just getting the same assumptions repeated back to you. Using a second model from a different family is a much more interesting idea because at least now you have a chance of different blind spots, different pattern recognition, and a real second pass.

That starts to look more like a peer-review workflow, which is a lot healthier than “generate and hope.”

I’d still put it in the category of improving the process rather than solving the problem. It makes the workflow safer, but I would still want a knowledgeable human making the final call, especially on production code.

Released 6th April.

It is an interesting approach because in the

[MODEL] Claude Code is unusable for complex engineering tasks with the Feb updates #42796

post above, the poster did use Claude to highlight its own problems.

Remember these are only opinions, the proof is whether the code quality improves enough to be trusted like management can trust the good programmers in their team.

In some respects these Ai’s are good training methods requiring better prompts or instructions.

Someone good at coming up with prompts will likely become a good manager of a team of coders, beit human or software based.

Good coders need real world experience to know about the hidden pitfalls of their code running in different environments.

I think that’s about right.

The real test is whether the code quality gets good enough to be trusted, not whether the AI can produce something that looks impressive on first pass.

And yes, there is definitely a management aspect to prompting. You’re really defining work, setting boundaries, reviewing what comes back, and deciding whether it is acceptable.

But the part that still matters most is the real-world experience piece. That’s where programmers learn the hidden traps, odd environments, edge cases, and all the things that don’t show up in the clean demo version of the problem.

So to me AI can help a lot, but that doesn’t mean experience suddenly stops mattering.

This is a distinct problem which isnt a hallucination and might explain the problem Amazon experienced. I havent experienced this myself but its certainly a heads up on a new problem.

At this early stage and rate of adoption, I would say many LLM/GPT Ai’s are in a paid for public beta testing stage. Good enough for production? Not in my opinion. :smiley:

That does sound like a different kind of problem from an ordinary hallucination.

If the model starts losing track of who said what, then you are not just dealing with a bad answer. You are dealing with the conversation state itself becoming unreliable, which is a bigger issue.

So yes, I think that is a real heads-up. It is also one more reason I keep coming back to the same point: these tools can be very useful, but trusting them as if they are already mature production-grade engineering partners is still a stretch.

To me a lot of this does still feel like paid public beta. Useful beta, sometimes very useful beta, but beta all the same.