The great big Ai LLM thread. Github code, blogs & opinions, walkthru's, trainer's & more

,

I say this having not looked at the code or maths involved. So I might change me mind in the future.

But! Never say never! Life is scattered with breakthroughs from singletons or small teams that develop an edge. Any follower of F1 will know that.

Sure big business likes to use its budget & team numbers as a weapon of deterrence, to put others off from competing. And whilst they appear to have made substantial gains, its still just average in the scheme of things.

Some examples of wildcards and underdogs beating the best Frontier models:

Goran Ivanišević (2001 Wimbledon)
The 2010-11 Green Bay Packers (Super Bowl XLV)
Greece National Football Team (Euro 2004)
Netflix vs. Blockbuster
The Blair Witch Project
Skype vs. Telecom Giants
Barry Marshall and Robin Warren (Peptic Ulcers)
Kary Mullis (PCR - Polymerase Chain Reaction)
Jennifer Doudna and Emmanuelle Charpentier (CRISPR-Cas9)
Gregor Mendel (Genetics)

The Crispr-Cas9 would perhaps be the most relevant to Ai models.

  • The Status Quo: Gene editing was a field dominated by giant, heavily bankrolled genomic centers, utilizing expensive and clunky protein-based tools like zinc fingers.
  • The Wild Card: A small, collaborative team led by Doudna and Charpentier (who started her work on a shoestring budget in Europe) zeroed in on a defense mechanism used by bacteria. They figured out how to harness this into a programmable, incredibly cheap, and accessible gene-editing tool. Their wild-card discovery completely eclipsed the older, better-funded methods and won the 2020 Nobel Prize in Chemistry.

So sure these well funded Ai teams might currently have Frontier models, how long until the become eclipsed?

I agree with the “never say never” part.

Small teams and outsiders absolutely can produce breakthroughs. History is full of examples where someone outside the center of power saw the problem differently and changed the direction of an entire field. I would not argue against that at all.

Where I think we may be talking about two different things is the difference between a breakthrough and operating a frontier-scale LLM.

A small team might absolutely discover a better architecture, a better training method, a better way to use memory, a better reasoning loop, a better retrieval system, or a way to make a smaller model punch far above its weight. That kind of thing can happen. In fact, I expect a lot of useful AI progress to come from exactly that sort of work.

But that’s different from saying an individual developer or small company should “develop their own AI model” as a practical replacement for OpenAI, Claude, Gemini, Meta, Microsoft, or Amazon. A frontier model is not just clever code. It is also massive compute, enormous data pipelines, evaluation systems, safety work, deployment infrastructure, monitoring, continuous improvement, and the ability to serve millions of users at scale.

CRISPR is a good example of a breakthrough changing the rules. But even there, discovering the method and turning it into worldwide medical platforms, regulatory systems, manufacturing, delivery mechanisms, trials, and commercial therapies are different layers of the problem.

That’s how I see AI models too.

A small team may well produce the next big idea. They may even produce a smaller model or tool that beats the big models at a narrow task. But for most of us as working developers, the practical question is not whether an underdog can someday change the whole field. The practical question is what we should build our workflow on today.

My answer would still be: do not depend blindly on one vendor, keep your own data portable, keep your prompts and project context under your own control, use local or open models where they make sense, and use the large hosted models where their scale actually gives you a real advantage.

That gives us room for the underdogs without pretending that every small shop can realistically duplicate frontier-scale infrastructure.

Just by clicking a link, it’s possible for an attacker to steal a GitHub token that can read and write to your repos, including private ones.

No surprise there :slight_smile:

For the last couple of years I am using AI or researching it about 10-12 hours a day, pretty much 7 days a week.

I expect my wife to start writing prompts for me any day…

A model optimised for mobile devices.

That’s an interesting direction.

To me, this is where local/on-device AI starts getting a lot more practical.

Not because I think a phone is suddenly going to replace the big frontier models, but because there are a lot of tasks where you don’t need the biggest model in the world. You need something fast, private, cheap to run, and “good enough” for the job.

For things like summarizing notes, cleaning up text, helping with prompts, light coding questions, documentation lookup, recipe cleanup, quick translation, or a local helper inside an app, this kind of model could be very useful.

The important distinction is that “runs locally” does not automatically mean “as capable as ChatGPT/Claude/Gemini in the cloud.” But it does mean the cost, privacy, latency, and availability equation changes.

That matters.

Especially for apps where you want AI assistance without every small feature depending on a paid cloud API call.

You’ve hit the nail on the head but realised.

Dedicated Ai’s run local. I say “run local” because it doesnt need to run externally. For coding it it become like a predictive text facility in your browser, but next level.

Yes, I think that is probably the direction this goes, at least for some kinds of work.

But I’d put one big qualifier on it.

For something mainstream, a small local model may already know enough to be useful as “predictive text on steroids.”

For something like Clarion however, the model itself is only part of the answer.

A generic local model does not magically know Clarion, ABC, templates, generated code, legacy patterns, or the way a particular codebase is structured. So the useful version is probably not just “run Gemma locally and away we go.”

It is more likely:

local model + Clarion-specific knowledge + indexed project source + examples + documentation + a controlled workflow.

That is why John Hickey’s Clarion assistant approach makes sense to me. The value is not simply that AI exists. The value is in building the support structure around it so the assistant has the right context to work from.

So yes, I agree with the idea. I can absolutely see local dedicated AI becoming a normal part of development tools.

But for niche development environments, especially something like Clarion, I think the practical path is less “small model replaces the big models” and more “small model becomes useful when it is backed by the right local knowledge base and guardrails.”

That still makes it valuable.

It just means the real product is not the model by itself. The real product is the workflow wrapped around the model.

That MCP-Eval link is interesting.

That feels like the next practical layer in this whole discussion.

It’s one thing to build an MCP server or expose tools to an AI assistant. It’s another thing to know whether that tool layer is actually doing the right thing consistently.

For Clarion tooling, that could matter quite a bit.

If an assistant can call into the IDE, inspect source, read generated code, search embeds, or manipulate project files, then “it seemed to work once” probably isn’t good enough.

You’d want a repeatable set of tests that ask things like:

Can it find the right procedure?

Can it identify the correct embed point?

Can it distinguish generated code from hand-written code?

Can it explain what it changed?

Can it avoid touching files it should not touch?

That’s where evaluation starts to become part of the product, not just an academic exercise.

The more power we give these tools, the more we need a way to test the tool layer itself.

An early experiment in carrying out an Xz attack by using an agent to build trust (and hacking/impersonating a known-good contributor identity). The agent is obeying commands it was given, the exact opposite of running amok, and although the execution isn’t particularly effective, it is having some success (patches have been accepted).

That Fedora/XZ-style example is the part that worries me more than the usual “AI will replace programmers” argument.

The real risk is not that an agent magically runs amok.

It’s that someone can use agents to do patient, boring trust-building work at scale: file issues, submit small patches, join discussions, imitate normal contributor behavior, and slowly get code accepted.

That changes the economics of supply-chain attacks.

For me, the takeaway is the same one I keep coming back to with agentic coding: useful, yes. But only with limited permissions, isolated workspaces, clean diffs, repeatable tests, and human review.

An AI agent should not be treated like a trusted maintainer just because it can produce plausible work.

well the problem is one just does not know what to expect from the modern inheritors of the founding fathers anymore..

the great migration of eurpean brain power might start moving the other way again…

industry will surely protest.. 4.8 was like a steam train that constantly need coal feed to the boiler and if Mythos is the start of improvement then surely no on wants to go backward…

and 4.8 feels backwards compare to mythos… surely business will rally and realise what they want and it isnt to go backwards…

Foreign national is not a business.
So does it really matter?

Anyway I’ve not been impressed enough to part with any cash.

Without researching farther 4.8 says that fable 5 is not available for anyone and if your running multi language complex projects 4.8 simple does not making for happy travellers..

https://www.youtube.com/watch?v=41yzx5uohHI

This made me laugh! :grin:

Fable5/Mythos 5 progress in the cyber security arena. Not fully convinced this is “progress”.

What I mean is, provided a suitable prompt et al is delivered, you generally get what you ask for.
The “progress” is the unwritten data missing from a prompt et al is improving, its better at interpreting nuance and delivering it.

https://xcancel.com/AISecurityInst/status/2054589763173126339

Anyway Amazon CEO got WH to block Fable 5 / Mythos 5.