Microgpt - a smaller 200 line, 4192 parameter Generative Pre-trained Transformer

Saw this microgpt.

200 Lines of Python, a 4192 parameter Generative Pre-trained Transformer.

There’s some good information here on this type of GPT model, which helps people understand the problems of building these LLM/GPT’s.

My thoughts were, could this be recoded into Clarion? Probably, we have a maths library so should be able to handle the statistical side of things.I think that would be a nice little side project in itself.

The training data would appear to be key.

The cloud data centres make the LLM’s like ChatGPT or Claude possible, but are simply too big and expensive to run using on-prem hardware, so a smaller version geared towards dedicated tasks beit generating clarion code or analysing customer data whilst providing privacy might be an alternative solution. Cue microgpt.

Its suggested this will be too small and will need to be scaled up.

Towards the end at the link, the section titled “Real stuff” covers much of what is required to make this a usable GPT, but this is where you create & train what you need not what OpenAi or Anthropic think is needed.

In some ways, you have the base to apply your own creative direction to create a specialised Ai for your own coding needs or your end user’s need’s.

In the FAQ’s, I particularly like this…

What’s the deal with “hallucinations”? The model generates tokens by sampling from a probability distribution. It has no concept of truth, it only knows what sequences are statistically plausible given the training data. microgpt “hallucinating” a name like “karia” is the same phenomenon as ChatGPT confidently stating a false fact. Both are plausible-sounding completions that happen not to be real. What’s the deal with “hallucinations”? The model generates tokens by sampling from a probability distribution. It has no concept of truth, it only knows what sequences are statistically plausible given the training data. microgpt “hallucinating” a name like “karia” is the same phenomenon as ChatGPT confidently stating a false fact. Both are plausible-sounding completions that happen not to be real.

What are other peoples thought on microGPT being used as a base to develop their own ?LLM? GPT Ai for their own needs?

Worth the effort?

maths people says the bigger the model the better the change the softmax has of giving back something that looks like its useable?

as for connecting to clarion we are about to test that …

this is the CPP end but in clarion we use a interface,com to get and set and invoke the bindmachine which hosts the bindables in this case an AI generated AI bindable… Once AI goes live in the cloud and has access to cpp compilers in a linux always on container app it could in theory generate and consume its own software.. wont need a human unless it makes a mess..

// aibindable.h
// (c) Copyright Quantum Dynamics Ltd 2025
// AI ScriptBindable for UBS (Universal Binding Service)
// 
//
// Self-contained AI service bindable. Wraps HTTP + JSON into
// uniform ScriptBindable interface.
//
// UBS Script usage:
//   ai.provider('claude');
//   ai.key('sk-ant-...');
//   ai.model('claude-sonnet-4-20250514');
//   ai.system('You are a helpful assistant');
//   ai.ask('What is 2+2?');
//   answer = ai.response;
//   tokens = ai.tokens;
//   err = ai.error;
//
// REGISTRATION:
//   bindBindable("ai", new AIBindable());
//
// selectMember maps member name to equate once.
// invoke/get/set switch on equate. No string matching at runtime.

#ifndef AIBINDABLE_H
#define AIBINDABLE_H

#include "..\include\scriptinterface.h"
#include "..\include\scriptbindable.h"
#include <string>
#include <windows.h>
#include <winhttp.h>

// Member equates for switch dispatch
enum
{
    AI_EQU_NONE     = 0,
    AI_EQU_PROVIDER = 1,
    AI_EQU_KEY      = 2,
    AI_EQU_MODEL    = 3,
    AI_EQU_ASK      = 4,
    AI_EQU_RESPONSE = 5,
    AI_EQU_TOKENS   = 6,
    AI_EQU_ERROR    = 7,
    AI_EQU_STATUS   = 8,
    AI_EQU_SYSTEM   = 9
};

class AIBindable : public ScriptBindable
{
public:
    AIBindable();
    ~AIBindable();

    // ScriptBindable interface
    integer_t CALL_TYPE getAsInteger();
    number_t  CALL_TYPE getAsNumber();
    string_t  CALL_TYPE getAsString();
    ScriptBindable* CALL_TYPE copy();

    void CALL_TYPE setToInteger(integer_t integer);
    void CALL_TYPE setToNumber(number_t number);
    void CALL_TYPE setToString(string_t string);
    void CALL_TYPE setToBindable(ScriptBindable* b);

    integer_t CALL_TYPE invoke(ScriptInterface* ifc);
    ScriptBindable* CALL_TYPE selectMember(string_t memberName, long equ, long addr);
    void CALL_TYPE removeMember(string_t memberName);
    integer_t CALL_TYPE existsMember(string_t memberName);
    integer_t CALL_TYPE nextMemberName(string_t* memberName);

    void CALL_TYPE share();
    void CALL_TYPE unShare();
    string_t CALL_TYPE getDescription(byte metatype);
    void* CALL_TYPE getBinding();

	// Direct C++ method for owning classes (JarvisBindable brain thread)
	   std::string askDirect(const std::string &question);
       std::string getError();

private:
    // Dispatch
    int memberEqu;
    int shareCount;

    // Configuration (set by user)
    std::string provider;   // "claude", "openai", "deepseek"
    std::string apiKey;
    std::string model;
    std::string systemPrompt;

    // Result state (set after ask)
    std::string response;
    std::string error;
    std::string tokensUsed;
    int httpStatus;

    // Temp return buffer for getAsString
    std::string tempReturn;

    // Internal HTTP POST - returns response body
    std::string doPost(const std::string &host, const std::string &path,
                       const std::string &body, const std::string &headers);

    // Build provider-specific request
    void doAsk(const std::string &prompt);
};

#endif // AIBINDABLE_H

clarion side uses Interface,COM… we have been doing derivative of this for decade or more in clarion…

 ! Script Bindable - direct UBS bindable

IUniBindable     INTERFACE,com  ! Standard Script Bindable at the CORE of UBS
	
 !	// @brief get as an integer
getAsInteger     Procedure(),proc,long  
!	// @brief get as a number
getAsNumber      Procedure(),proc,real 
!	// @brief get as a (utf8) string
getAsString      Procedure(),proc,*cstring 
!	// @brief  A copy of the bindable is wanted. This might be thought of as getAsBindable
copy             Procedure(),proc,long 
!	// @brief set to an integer
setToInteger     Procedure(long integer)  
!	// @brief set to a number
setToNumber      Procedure(real  number) 
!	// @brief set to a string
setToString      Procedure(const *cstring  string)  
!	// @brief set to a bindable
setToBindable    Procedure(long b)  
!	// @brief invoke
!	// Note It is not usual to call this directly from addons or as an embeded
invoke           Procedure(long ifc),proc,long  

!	// @brief Select a member of the bindable
selectMember     Procedure(const *cstring memberName,long equ, long returnequaddress) ,proc,long 
!	// @brief Remove a member from the bindable
removeMember     Procedure(const *cstring memberName) 
!	// @brief  Test if a member exists
existsMember     Procedure(const *cstring memberName)  ,proc,long 
!	// @brief  Get the name of the next member in alphabetically sorted order
nextMemberName   Procedure(const *cstring memberName)   ,proc,long 

!	// @brief  Something has another reference to the bindable
share            Procedure() 
!	// @brief  Something no longer has a reference to the bindable
unShare          Procedure()  
!	// @brief  Get the name (usually the typename) of the item that has been bound
getDescription   Procedure(byte metatype),proc,*cstring  	
!	// @brief  Get the pointer to the item (or interface) that has been bound
getBinding       Procedure(),proc,long 
              END

The training data needs to be good, its the “authority” in probability training, but then you need a feed back loop to keep things up to date, otherwise you need to keep up dating your training data to keep things relevant, otherwise the probability is left behind, eg the change in slang words by different generations, musical taste, partly dictated by the availability of musical instruments, etc etc. The feedback loop, aka training data updates are what I suspect the incremental dot updates are.

This is why an on-prem specialist Ai could be more useful, less resource hungry & costly, because somethings just dont change very often like the Clarion programming language and other programming languages.

what google has to say..

Small Data Sets: The Precision Specialist
Small datasets can be highly effective when the data is high-quality and tailored to a specific domain.

Contextual Excellence: For specialized tasks like medical imaging or manufacturing defect detection, a small, curated dataset can outperform a large, generic one.
Transfer Learning: You can achieve high accuracy with small data by fine-tuning a model that was pre-trained on a large dataset (like ImageNet or BERT).
Efficiency: Small datasets are faster to process, cheaper to annotate, and require less computational power.

Performance Trade-offs for Softmax
Accuracy vs. Size: As dataset size decreases, most predictive models show a drop in accuracy and an increase in performance variability.
Computational Bottlenecks: For Softmax specifically, large output spaces (many classes) can become a bottleneck during training, as the function depends on every element in the scores vector.
Overfitting Risk: Neural networks are particularly sensitive to small datasets; without enough examples, they may struggle to learn meaningful patterns and instead overfit the limited training samples.

Technically, the hard part is knowing if the training set is correct or not, back to garbage in, garbage out, its where experience counts. Size is irrelevant, but its generally better to be small and specialist than a big generalist which is how Google’s points come across.

Every biological entity on this planet has the same problem, your nerve conduction velocity limits your physical ability because they travel at anything from 9mph to 134mph in muscle, and your brain signals travel at anything from 1.1mph to over 275mph to potentially come up with an idea.

Whilst CPU’s like Intel, AMD and increasingly ARM have stepped up HW performance with out-of-order engines to absorb latency, amongst other developments, there’s only so much they can squeeze out of CPU’s without drastically altering the way data is processed stepping towards greater parallelisation to compensate for these bottle necks, which then takes you back down the road towards specialisation.

If its any compensation, nature favours specialisation, as seen with various biological entities.

Eon’s old, and really is an extension or aspect of the argument accuracy vs size, its down to the training data. There’s countless stories circulating where Ai’s were overtrained and basically rare but often cataclysmic screw ups occurred which only showed up in postmortems.

Generally I agree with the point, but the expertise comes in knowing what the appropriately sized training data set should be. Is a set of staff manuals enough, knowing there could be mistakes and conflicts in those manuals, or should there be more data included?

Something else which became a new precedant is it seems data centre’s are now legitimate targets in war.

So whilst a data centre is your eggs cloned within many baskets within a single or few data centres, data protection rules restrict data leaving country’s or regions, so is there an increased legitimacy to have smaller specialised on-prem Ai to avoid being colateral damage for the actions of one’s own Govt?

Who would have thought Data Protection laws could put data at risk en-mass? :wink:

Should give people a better idea of how things work…

C++ port of the original. Twice the number of lines.

Rust port.

Javascript port which can run in browsers and node.js, so possibly useful for nettalk users.

Dependancy free C port.

The C port has some interesting analysis as to why this implementation might be better for not using an expensive GPU, just as Win12 is reportedly to be released later this year and will be heavily Ai focused only running on machines with NPU’s, much like the Win10 would only run on machines with TPM2.0.

This appears to be a good comparison of the performance of different Ai’s run locally, so its certainly possible to scale up home grown Ai’s.

Change the hardware of your desktop by either selecting your graphics card from the drop down list, or specify the graphics card ram and throughput first, then it resorts the lists of Ai’s which will run well on your machine.

This runs best on the lowest of the low, 0.5GB ram ~50GB/s throughput graphics card.

One day, frontier AI research used to be done by meat computers in between eating, sleeping, having other fun, and synchronizing once in a while using sound wave interconnect in the ritual of “group meeting”. That era is long gone. Research is now entirely the domain of autonomous swarms of AI agents running across compute cluster megastructures in the skies. The agents claim that we are now in the 10,205th generation of the code base, in any case no one could tell if that’s right or wrong as the “code” is now a self-modifying binary that has grown beyond human comprehension. This repo is the story of how it all began. -@karpathy, March 2026.

The idea: give an AI agent a small but real LLM training setup and let it experiment autonomously overnight. It modifies the code, trains for 5 minutes, checks if the result improved, keeps or discards, and repeats. You wake up in the morning to a log of experiments and (hopefully) a better model. The training code here is a simplified single-GPU implementation of nanochat. The core idea is that you’re not touching any of the Python files like you normally would as a researcher. Instead, you are programming the program.md Markdown files that provide context to the AI agents and set up your autonomous research org. The default program.md in this repo is intentionally kept as a bare bones baseline, though it’s obvious how one would iterate on it over time to find the “research org code” that achieves the fastest research progress, how you’d add more agents to the mix, etc. A bit more context on this project is here in this tweet and this tweet.

The idea “It modifies the code, trains for 5 minutes, checks if the result improved, keeps or discards, and repeats.” is something which should be more common imo.

Self learning, or reinforcement learning based on results.

The C7+ IDE has a command line which enables this sort of testing and reinforcement…
SMuRF worked on that IIRC.