The great big Ai LLM thread. Github code, blogs & opinions, walkthru's, trainer's & more

,

https://efficienist.com/claude-code-may-be-burning-your-limits-with-invisible-tokens-you-cant-see-or-audit/

A developer’s experiment suggests Anthropic may be silently injecting thousands of tokens into Claude Code requests, and users’ usage limits are taking the hit.

ccaudit — Claude Code Token Usage Explorer

ccaudit is a terminal UI for exploring how Claude Code spends your token budget. It reads the JSONL session logs that Claude Code writes to ~/.claude/projects/ and breaks down token usage by session, exchange, and content category.

Why observability matters

When Claude Code runs autonomously — spawning subagents, calling tools, reading files, executing commands — you have no visibility into what’s actually happening. The terminal shows a fraction of the activity. Subagents are invisible. Tool calls blur together. And when something goes wrong three agents deep in a parallel execution, you’re left reading through logs after the fact.

Claude Observe captures every hook event as it happens and streams it to a live dashboard. You see exactly what each agent is doing, which tools it’s calling, what files it’s touching, and how subagents relate to their parents.

if you create the MCP server or control the end point connection you control what connections Claude makes. The configuration files in the claude set up let you configure what tools it uses. In the case of the small demo we posted control what comes back to be executed by the claude response is not some agent on a far off machine. You developer has full control over the resulting output actions.

claude 4.7 released.. along with some performance stats.. https://www.anthropic.com/news/claude-opus-4-7

By now some of you will have tried 4.7 and here is an example of Claude code compiling the QT editor no errors..

https://claude.ai/public/artifacts/d36c51ac-e890-4f18-ad9e-5459296b3438

Adding Highlighting for a language or construct and of course then creating a platform for direct AI interaction without the need for MCP servers and also reducing the stack to mere kilobytes not megabytes and gigabytes…

https://claude.ai/public/artifacts/a9d547ff-100c-4736-94db-8b2d9aad2baf

No need for the bloat… it should become a movement like Bauhaus, and of course it means AI will program your programs in real time… without an MCP server..

Once you remove the MCP server and AI becomes part of your application the fastest way to let it drive the APP is to bypass the normal script driven paradigm.

And this is what AI proposes as it determines the services that the HOST can provide to driving the product..

https://claude.ai/public/artifacts/a395f84c-17ca-4bda-9d04-475d60d1d584

https://openai.com/index/work-with-codex-from-anywhere/

Observing their users activities has created these two spin-offs: Finance & Legal.

Although the solution suggested in this issue is to use plugin’s, I do think there is a point to be made to have switchable persona’s.

Hey @Mark_Sarson , yesterday I was testing Hermes Agent and it may be a replacement for a paid subscription of Claude Code or at least a good “let’s see the capabilities for free and see how it works”.

The catch, it use owl-alpha, not a trained model for Clarion as far as I know. The good stuff, is free and with Hermes (the Coding agent) it is supposed to get smarter over time.

From this ClarionHub thread

We have found the real benefit of Claude Code is that it can port code and also test code. Here is Claude code creating, compiling and reading the logs from a program to develop it in a continuous cycle. It also records the video of the running program at the same time. no interaction was required by a human in this process other than a conversation. In this program a mapping solution for automated control rescaling was being tested. It was a cartograph mapping solution we use in linux back ported to clarion code.


I dont have a clue what I’m supposed to be looking at here.

https://znetwork.org/znetarticle/9-trillion-collapse-machine/

The Reddit comments are interesting on this article are interesting, some might even say prophetic.

The article may be overstated, but the underlying point is important: AI is being sold as software while being scaled as heavy industry.

That doesn’t mean AI is useless.

It does mean developers should be cautious about assuming unlimited cheap compute, stable pricing, or magical agentic workflows.

My own takeaway remains the same: use AI aggressively where it helps, but keep the programmer in the loop and keep the workflow grounded.

The AI models seem to be cultural in nature and even claude code is not really what one would like from a european perspective.

I think that’s a fair observation. The models are definitely not culturally neutral. They reflect a lot of the language, assumptions, priorities, and working habits present in the material they were trained on.

That shows up in coding tools too. The default posture often feels very American tech-industry: cloud-first, move-fast, automate aggressively, connect everything, and assume the platform knows best.

That does not make the tools useless, but it does mean they need boundaries. Different developers, companies, and regions may have different expectations around privacy, control, auditability, maintainability, and how much autonomy they are willing to hand over to a tool.

So I think the better way to look at these systems is not as neutral experts, but as powerful assistants whose defaults need to be adjusted to the environment they are being used in.

Seven new AI models developed by its Microsoft AI Superintelligence Team, their attempt to reduce dependency on OpenAi

Independents preferred its midrange MAI-Thinking-1 model to Anthropic’s Claude Sonnet Opus 4.6, and MAI-Thinking–1 benchmark matched Opus 4.6’s coding ability.

MAI-Code-1-Flash is an inference-efficient, 5-billion-parameter coding model developed by Microsoft as part of their homegrown MAI series. It is specifically built for autonomous agentic coding, deeply integrated into GitHub Copilot and VS Code. [1, 2]

Because it is a native Microsoft product embedded into their proprietary developer ecosystem, it is not currently hosted as a public model on the Hugging Face hub. [1]

It is designed to compete with Anthropic’s Claude Haiku. Despite its small size, it scores an impressive 51.2% on SWE-Bench Pro by autonomously planning and completing complex coding tasks from start to finish

If you are looking for highly efficient coding and reasoning models currently available on the Hugging Face hub, consider these widely used alternatives:

  • Step 3.5 Flash by StepFun: A 197B-parameter MoE model (with ~11B active parameters) optimized for high-speed coding and agentic workflows. [1]
  • Ling-flash-2.0 by inclusionAI: A tiny-activation MoE model that achieves major inference speed advantages for local coding environments. [1]
  • Qwen3.5-35B: A powerful open-weight base built for complex reasoning and coding tasks.

Not strictly anything to do with Ai but also released today.

Kapa builds AI assistants that answer questions from technical documentation. The knowledge bases we process hold millions of images: screenshots, architecture diagrams, circuit schematics, annotated UI walkthroughs. We spent several months working out how to make them useful in our RAG pipeline.

Tue, Jun 2 8:30 PM - 9:45 PM BST Duration 1 hour 15 minutes In San Fran Only Not being Recorded!

https://openai.com/index/openai-frontier-models-and-codex-are-now-available-on-aws/

That is interesting, and it actually looks like a pretty natural move by Microsoft.

I don’t think it necessarily means they are moving away from OpenAI entirely, but it does suggest they do not want to be completely dependent on someone else’s frontier models for something as central as Copilot, VS Code, GitHub, Azure, and developer tooling.

The MAI-Code-1-Flash angle is especially interesting because it sounds less like “build the smartest general model” and more like “build a fast, cheap, deeply integrated coding model that Microsoft can control and run at scale.” That may matter more inside Copilot than winning every general benchmark.

For Clarion, though, I would still be cautious. Most of these coding models are going to be strongest where they have the most training data and feedback: C#, TypeScript, Python, Java, C++, web frameworks, and so on. Clarion is still niche enough that the real advantage will probably come from giving the AI very explicit Clarion-specific rules, examples, and tightly bounded tasks.

So I see this less as “one new model wins” and more as another sign that the AI coding tools are becoming a normal part of the developer stack. The trick is still to use them in reviewable chunks, keep the programmer in charge, and not let the tool wander off pretending that almost-right code is good code.

Back In Jan, Microsoft and OpenAI restructured their partnership, dropping Microsoft’s exclusive rights to OpenAI’s models. This cleared the way for a massive $50 billion alliance between OpenAI and Amazon. OpenAI is now offering its technology directly on Amazon Web Services, while Microsoft will no longer pay a revenue share to OpenAI.

https://openai.com/index/next-phase-of-microsoft-partnership/

https://openai.com/index/openai-frontier-models-and-codex-are-now-available-on-aws/

Thats always been the case, which is why you wont find an Ai writing good Template Language code.

There’s just not enough examples on Github, but more importantly, there’s bug’s and plenty of undocumented features cuz of the help docs.

Karpathy’s AutoResearch might work well training itself on the template language, but I’ve not had the time to get into it yet.

The freebie ChatGPT I have here, not that I use it really for coding, cant do template language code, it just hallucinates like its found the main stash in William Leonard Pickard’s nuclear silo! :grin:

Yes, that looks like the important distinction.

Microsoft is not exactly being cut loose from OpenAI, but the relationship is clearly becoming less exclusive and more complicated.

The OpenAI/Microsoft announcement says Microsoft remains OpenAI’s primary cloud partner, and OpenAI products still ship first on Azure unless Microsoft cannot or chooses not to support the needed capabilities. But it also says Microsoft’s license to OpenAI models is now non-exclusive, and OpenAI can serve its products through other cloud providers.

That makes the AWS announcement much more understandable. OpenAI can now meet large enterprise customers where they already live, including AWS and Amazon Bedrock. For big companies, that matters because procurement, security review, compliance, billing, and governance are often already built around AWS.

So I think this supports the larger point: Microsoft still benefits from OpenAI, but it has every reason to reduce dependency and build more of its own model stack. That makes the MAI models less surprising. Microsoft wants Copilot, VS Code, GitHub, Azure, and Windows to be built on technology it can control, tune, price, and deploy without being completely dependent on another company’s roadmap.

For developers, I still think the practical takeaway is the same.

The model landscape is going to keep shifting. OpenAI, Microsoft, Anthropic, Google, Amazon, Meta, and others will keep making moves. The safer long-term habit is not to marry one model or one vendor, but to build workflows that use AI in bounded, reviewable chunks, with enough project-specific context and human review to keep the work honest.

Because those companies will keep moving, not always in the right direction if their past is an example and the corporate practice Enshittification will still exist to maximise profits, the best practice is to develop one’s own Ai model to remain in control.

I agree with the concern about vendor lock-in and enshittification. We have all seen enough software and cloud services get worse, more expensive, more restrictive, or more intrusive over time that it would be foolish not to think about control.

Where I would make a distinction is between several very different things that often get lumped together.

Running a local model can make sense. Using an open-weight model can make sense. Fine-tuning a smaller model for a narrow purpose can make sense. Using RAG or local search over your own documents can also make a lot of sense, especially when privacy, repeatability, or disconnected use matters.

But that is not the same thing as developing your own AI model in the sense of competing with OpenAI, Anthropic, Google, Microsoft, Meta, or Amazon. No individual developer, and probably no ordinary small company, is going to reproduce the compute, training data, research staff, infrastructure, evaluation pipeline, safety work, and ongoing retraining needed to build and maintain a frontier model.

So to me this is partly an apples-to-oranges comparison.

For some tasks, a local or specialized model may be the better answer because it gives you control and keeps the work close to your own data. For other tasks, especially broad reasoning, coding help across many languages, writing assistance, research, and general problem solving, the large hosted models are going to remain far ahead for the foreseeable future.

The practical answer may not be “build your own model.” It may be “avoid building your workflow so that it depends completely on one vendor.” Use portable prompts, keep your own data and documentation, design your workflow so different models can be swapped in, and use local models where they actually fit the job.

That gives you some control without pretending that a desktop machine or small office server can replace a frontier-scale LLM.