Tshila.admin (talk | contribs) (Created page with "=== Disclaimer === This is a page started by me (Pranay) to document my understanding of LLMs. All comments and suggestions are welcome. This document will remain in work-in-progress mode for a substantial amount of time, and will likely contain a lot of errors. You've been warned. === Prior Knowledge === We know that: * LLMs are trained to predict the next word * LLMs can produce erroneous results * LLMs are trained on the public data available on the internet. Hence...") |
Tshila.admin (talk | contribs) No edit summary |
||
Line 14: | Line 14: | ||
# The floating point vectors of similar words will lie closer to each other in the vector space i.e. they will have many of the hundreds of dimensions closer to each other. | # The floating point vectors of similar words will lie closer to each other in the vector space i.e. they will have many of the hundreds of dimensions closer to each other. | ||
# Humans can't think of the words in so many dimensions, but computers can! | # Humans can't think of the words in so many dimensions, but computers can! | ||
[[File:Taken from WebVectors.png|thumb|WebVectors result for similarwords]] | |||
Sidebar 1: | Sidebar 1: To see the raw vectors associated with each word, I went to this model's [http://vectors.nlpl.eu/explore/embeddings/en/MOD_googlenews_upos_skipgram_300_xxx_2013/India_NOUN/ website], and checked for the semantic associates of the word "India". These are the results based on two different training sets. | ||
Revision as of 14:41, 31 August 2023
Disclaimer
This is a page started by me (Pranay) to document my understanding of LLMs. All comments and suggestions are welcome. This document will remain in work-in-progress mode for a substantial amount of time, and will likely contain a lot of errors. You've been warned.
Prior Knowledge
We know that:
- LLMs are trained to predict the next word
- LLMs can produce erroneous results
- LLMs are trained on the public data available on the internet. Hence they know a lot more potential pre-formed connections any human
What I Understand After the First Pass (31 August 2023)
- Each word is converted into a floating point vector. Vectors allow a word to have hundreds of dimensions. And numerical vectors also have the additional advantage that you can perform mathematical operations on them.
- The floating point vectors of similar words will lie closer to each other in the vector space i.e. they will have many of the hundreds of dimensions closer to each other.
- Humans can't think of the words in so many dimensions, but computers can!
Sidebar 1: To see the raw vectors associated with each word, I went to this model's website, and checked for the semantic associates of the word "India". These are the results based on two different training sets.
Notes
I began by reading the article that Mihir Mahajan shared on Takshashila's Mattermost server. This was the first article on the subject that made me feel I could go further in understanding LLMs. Concise explainers such as these end up confusing rather than illuminating.