How do LLMs Work

Disclaimer

This is a page started by me (Pranay) to document my understanding of LLMs. All comments and suggestions are welcome. This document will remain in work-in-progress mode for a substantial amount of time, and will likely contain a lot of errors. You've been warned.

Prior Knowledge

We know that:

LLMs are trained to predict the next word
LLMs can produce erroneous results
LLMs are trained on the public data available on the internet. Hence they know a lot more potential pre-formed connections any human

What I Understand After the First Pass (31 August 2023)

Each word is converted into a floating point vector. Vectors allow a word to have hundreds of dimensions. And numerical vectors also have the additional advantage that you can perform mathematical operations on them.
The floating point vectors of similar words will lie closer to each other in the vector space i.e. they will have many of the hundreds of dimensions closer to each other.
Humans can't think of the words in so many dimensions, but computers can!

WebVectors result for similarwords

Sidebar 1: To see the raw vectors associated with each word, I went to this model's website, and checked for the semantic associates of the word "India". These are the results based on two different training sets.

Notes

I began by reading the article that Mihir Mahajan shared on Takshashila's Mattermost server. This was the first article on the subject that made me feel I could go further in understanding LLMs. Concise explainers such as these end up confusing rather than illuminating.