Microgpt - a smaller 200 line, 4192 parameter Generative Pre-trained Transformer

Saw this microgpt.

200 Lines of Python, a 4192 parameter Generative Pre-trained Transformer.

There’s some good information here on this type of GPT model, which helps people understand the problems of building these LLM/GPT’s.

My thoughts were, could this be recoded into Clarion? Probably, we have a maths library so should be able to handle the statistical side of things.I think that would be a nice little side project in itself.

The training data would appear to be key.

The cloud data centres make the LLM’s like ChatGPT or Claude possible, but are simply too big and expensive to run using on-prem hardware, so a smaller version geared towards dedicated tasks beit generating clarion code or analysing customer data whilst providing privacy might be an alternative solution. Cue microgpt.

Its suggested this will be too small and will need to be scaled up.

Towards the end at the link, the section titled “Real stuff” covers much of what is required to make this a usable GPT, but this is where you create & train what you need not what OpenAi or Anthropic think is needed.

In some ways, you have the base to apply your own creative direction to create a specialised Ai for your own coding needs or your end user’s need’s.

In the FAQ’s, I particularly like this…

What’s the deal with “hallucinations”? The model generates tokens by sampling from a probability distribution. It has no concept of truth, it only knows what sequences are statistically plausible given the training data. microgpt “hallucinating” a name like “karia” is the same phenomenon as ChatGPT confidently stating a false fact. Both are plausible-sounding completions that happen not to be real. What’s the deal with “hallucinations”? The model generates tokens by sampling from a probability distribution. It has no concept of truth, it only knows what sequences are statistically plausible given the training data. microgpt “hallucinating” a name like “karia” is the same phenomenon as ChatGPT confidently stating a false fact. Both are plausible-sounding completions that happen not to be real.

What are other peoples thought on microGPT being used as a base to develop their own ?LLM? GPT Ai for their own needs?

Worth the effort?

maths people says the bigger the model the better the change the softmax has of giving back something that looks like its useable?

as for connecting to clarion we are about to test that …

this is the CPP end but in clarion we use a interface,com to get and set and invoke the bindmachine which hosts the bindables in this case an AI generated AI bindable… Once AI goes live in the cloud and has access to cpp compilers in a linux always on container app it could in theory generate and consume its own software.. wont need a human unless it makes a mess..

// aibindable.h
// (c) Copyright Quantum Dynamics Ltd 2025
// AI ScriptBindable for UBS (Universal Binding Service)
//
//
// Self-contained AI service bindable. Wraps HTTP + JSON into
// uniform ScriptBindable interface.
//
// UBS Script usage:
// ai.provider(‘claude’);
// ai.key(‘sk-ant-…’);
// ai.model(‘claude-sonnet-4-20250514’);
// ai.system(‘You are a helpful assistant’);
// ai.ask(‘What is 2+2?’);
// answer = ai.response;
// tokens = ai.tokens;
// err = ai.error;
//
// REGISTRATION:
// bindBindable(“ai”, new AIBindable());
//
// selectMember maps member name to equate once.
// invoke/get/set switch on equate. No string matching at runtime.

#ifndef AIBINDABLE_H
#define AIBINDABLE_H

#include “..\include\scriptinterface.h”
#include “..\include\scriptbindable.h”
#include
#include <windows.h>
#include <winhttp.h>

// Member equates for switch dispatch
enum
{
AI_EQU_NONE = 0,
AI_EQU_PROVIDER = 1,
AI_EQU_KEY = 2,
AI_EQU_MODEL = 3,
AI_EQU_ASK = 4,
AI_EQU_RESPONSE = 5,
AI_EQU_TOKENS = 6,
AI_EQU_ERROR = 7,
AI_EQU_STATUS = 8,
AI_EQU_SYSTEM = 9
};

class AIBindable : public ScriptBindable
{
public:
AIBindable();
~AIBindable();

// ScriptBindable interface
integer_t CALL_TYPE getAsInteger();
number_t  CALL_TYPE getAsNumber();
string_t  CALL_TYPE getAsString();
ScriptBindable* CALL_TYPE copy();

void CALL_TYPE setToInteger(integer_t integer);
void CALL_TYPE setToNumber(number_t number);
void CALL_TYPE setToString(string_t string);
void CALL_TYPE setToBindable(ScriptBindable* b);

integer_t CALL_TYPE invoke(ScriptInterface* ifc);
ScriptBindable* CALL_TYPE selectMember(string_t memberName, long equ, long addr);
void CALL_TYPE removeMember(string_t memberName);
integer_t CALL_TYPE existsMember(string_t memberName);
integer_t CALL_TYPE nextMemberName(string_t* memberName);

void CALL_TYPE share();
void CALL_TYPE unShare();
string_t CALL_TYPE getDescription(byte metatype);
void* CALL_TYPE getBinding();

// Direct C++ method for owning classes (JarvisBindable brain thread)
   std::string askDirect(const std::string &question);
   std::string getError();

private:
// Dispatch
int memberEqu;
int shareCount;

// Configuration (set by user)
std::string provider;   // "claude", "openai", "deepseek"
std::string apiKey;
std::string model;
std::string systemPrompt;

// Result state (set after ask)
std::string response;
std::string error;
std::string tokensUsed;
int httpStatus;

// Temp return buffer for getAsString
std::string tempReturn;

// Internal HTTP POST - returns response body
std::string doPost(const std::string &host, const std::string &path,
                   const std::string &body, const std::string &headers);

// Build provider-specific request
void doAsk(const std::string &prompt);

};

#endif // AIBINDABLE_H

clarion side uses interface,com… we have been doing derivative of this for decade or more in clarion…

 ! Script Bindable - direct UBS bindable

IUniBindable     INTERFACE,com  ! Standard Script Bindable at the CORE of UBS
	
 !	// @brief get as an integer
getAsInteger     Procedure(),proc,long  
!	// @brief get as a number
getAsNumber      Procedure(),proc,real 
!	// @brief get as a (utf8) string
getAsString      Procedure(),proc,*cstring 
!	// @brief  A copy of the bindable is wanted. This might be thought of as getAsBindable
copy             Procedure(),proc,long 
!	// @brief set to an integer
setToInteger     Procedure(long integer)  
!	// @brief set to a number
setToNumber      Procedure(real  number) 
!	// @brief set to a string
setToString      Procedure(const *cstring  string)  
!	// @brief set to a bindable
setToBindable    Procedure(long b)  
!	// @brief invoke
!	// Note It is not usual to call this directly from addons or as an embeded
invoke           Procedure(long ifc),proc,long  

!	// @brief Select a member of the bindable
selectMember     Procedure(const *cstring memberName,long equ, long returnequaddress) ,proc,long 
!	// @brief Remove a member from the bindable
removeMember     Procedure(const *cstring memberName) 
!	// @brief  Test if a member exists
existsMember     Procedure(const *cstring memberName)  ,proc,long 
!	// @brief  Get the name of the next member in alphabetically sorted order
nextMemberName   Procedure(const *cstring memberName)   ,proc,long 

!	// @brief  Something has another reference to the bindable
share            Procedure() 
!	// @brief  Something no longer has a reference to the bindable
unShare          Procedure()  
!	// @brief  Get the name (usually the typename) of the item that has been bound
getDescription   Procedure(byte metatype),proc,*cstring  	
!	// @brief  Get the pointer to the item (or interface) that has been bound
getBinding       Procedure(),proc,long 
              END

The training data needs to be good, its the “authority” in probability training, but then you need a feed back loop to keep things up to date, otherwise you need to keep up dating your training data to keep things relevant, otherwise the probability is left behind, eg the change in slang words by different generations, musical taste, partly dictated by the availability of musical instruments, etc etc. The feedback loop, aka training data updates are what I suspect the incremental dot updates are.

This is why an on-prem specialist Ai could be more useful, less resource hungry & costly, because somethings just dont change very often like the Clarion programming language and other programming languages.

what google has to say..

Small Data Sets: The Precision Specialist
Small datasets can be highly effective when the data is high-quality and tailored to a specific domain.

Contextual Excellence: For specialized tasks like medical imaging or manufacturing defect detection, a small, curated dataset can outperform a large, generic one.
Transfer Learning: You can achieve high accuracy with small data by fine-tuning a model that was pre-trained on a large dataset (like ImageNet or BERT).
Efficiency: Small datasets are faster to process, cheaper to annotate, and require less computational power.

Performance Trade-offs for Softmax
Accuracy vs. Size: As dataset size decreases, most predictive models show a drop in accuracy and an increase in performance variability.
Computational Bottlenecks: For Softmax specifically, large output spaces (many classes) can become a bottleneck during training, as the function depends on every element in the scores vector.
Overfitting Risk: Neural networks are particularly sensitive to small datasets; without enough examples, they may struggle to learn meaningful patterns and instead overfit the limited training samples.

Technically, the hard part is knowing if the training set is correct or not, back to garbage in, garbage out, its where experience counts. Size is irrelevant, but its generally better to be small and specialist than a big generalist which is how Google’s points come across.

Every biological entity on this planet has the same problem, your nerve conduction velocity limits your physical ability because they travel at anything from 9mph to 134mph in muscle, and your brain signals travel at anything from 1.1mph to over 275mph to potentially come up with an idea.

Whilst CPU’s like Intel, AMD and increasingly ARM have stepped up HW performance with out-of-order engines to absorb latency, amongst other developments, there’s only so much they can squeeze out of CPU’s without drastically altering the way data is processed stepping towards greater parallelisation to compensate for these bottle necks, which then takes you back down the road towards specialisation.

If its any compensation, nature favours specialisation, as seen with various biological entities.

Eon’s old, and really is an extension or aspect of the argument accuracy vs size, its down to the training data. There’s countless stories circulating where Ai’s were overtrained and basically rare but often cataclysmic screw ups occurred which only showed up in postmortems.

Generally I agree with the point, but the expertise comes in knowing what the appropriately sized training data set should be. Is a set of staff manuals enough, knowing there could be mistakes and conflicts in those manuals, or should there be more data included?

Something else which became a new precedant is it seems data centre’s are now legitimate targets in war.

So whilst a data centre is your eggs cloned within many baskets within a single or few data centres, data protection rules restrict data leaving country’s or regions, so is there an increased legitimacy to have smaller specialised on-prem Ai to avoid being colateral damage for the actions of one’s own Govt?

Who would have thought Data Protection laws could put data at risk en-mass? :wink: