Word Vectors, LLMs and Retrieval Augmented Generation
Ever wondered how machines go from seeing random characters like "C-A-T" to actually understanding what a cat is? No, it’s not sorcery (though it sometimes feels that way). It all starts with word vectors, the building blocks of how language models like ChatGPT make sense of human language.
Let’s break it all down.
From Letters to Numbers: Word Vectors
Humans read and understand words as sequences of letters — “C-A-T” rings a bell because you’ve seen it a thousand times. But language models? They’re playing a whole different game.
Machines translate “cat” into a long list of numbers, like: [0.0074, 0.0030, -0.0105, 0.0742, 0.0765, -0.0011, 0.0265, 0.0106, 0.0191, 0.0038, -0.0468, -0.0212, 0.0091, 0.0030, -0.0563, -0.0396, -0.0998, -0.0796, …, 0.0002] (The full vector is 300 numbers long).
These are called word vectors, and they live in an imaginary “word space” — a high-dimensional universe where words are points, and their distances actually mean something.
To make that less abstract: think of a map. You might describe two locations like this:
{
"Carbonteq": [31.46, 74.42],
"Dolmen Mall Lahore": [31.46, 74.43]
}
They're super close in terms of coordinates. Similarly, word vectors that are close together usually represent words that are semantically related — like “cat” and “kitten,” or “car” and “automobile.”
Why Word Vectors Matter
Word vectors aren’t just geeky numerical tricks. They’re powerful because they let us compute with language.
A key advantage of representing words with vectors of real numbers (as opposed to a string of letters, like “C-A-T”) is that numbers enable operations that letters don’t. Word vectors are a useful building block for language models because they encode subtle but important information about the relationships between words. Google researchers took the vector for biggest, subtracted big, and added small. The word closest to the resulting vector was smallest.
You can use vector arithmetic to draw analogies. For example, Paris is to France as Berlin is to Germany. (capitals)
But Wait - Bias Comes Along for the Ride
Here’s the not-so-fun part. Because these word vectors are built from real-world text — the stuff we humans write — they pick up our biases. In some word vector models:
doctor - man + woman = nurse
Yikes. That’s gender bias being encoded into a model. And it’s not just this one case — many professional or social stereotypes creep into models trained on public data. Detecting and mitigating these biases is a big, ongoing area of research.
What Are Large Language Models (LLMs) Really Doing?
At their core, LLMs like ChatGPT are just really good at predicting the next word.
They’ve read massive amounts of text — books, websites, articles — and use word vectors to understand patterns, context, and structure. This allows them to perform everything from sentiment analysis and question-answering to code generation.
Pretty cool, right?
The Limitations: Why LLMs Sometimes Get It Wrong
They’re impressive, but not perfect. Here’s why:
- Outdated Knowledge
- LLMs are trained on data up to a certain point — they don’t know what happened after that. (GPT-4o, for instance, cuts off at October 2023.)
- Factual Errors
- They can sound confident, but still get facts wrong — especially on niche topics.
- Hallucinations
- Yep, that’s a real term. Sometimes, LLMs just make stuff up. Fluent, convincing, but totally fabricated.
RAG: Retrieval-Augmented Generation
Let’s fix that with a concept called RAG — Retrieval-Augmented Generation.
LLM Without RAG = “Closed-Book Exam”
Imagine you’re taking an exam with no reference material. You’ve studied hard, but if a weird question pops up? You might guess. You might even confidently guess wrong.
That’s how a vanilla LLM operates: it relies entirely on what it memorized during training.
LLM With RAG = “Open-Book Exam”
Now imagine you can bring books. You still use your knowledge, but when you don’t know something, you look it up.
That’s what RAG enables. The model retrieves external information, augments the query with it, and then generates a better, more accurate response.
Let’s break that down.
The RAG Breakdown: R → A → G
R: Retrieval
Think of it like a search engine — but for word vectors. We use a vector database to store content as embeddings. When a user asks something, the same embedding model converts the question into a vector. We search the database to find the closest matches using similarity measures like cosine similarity or dot product.
So instead of guessing, the model pulls in relevant info - fast.
A: Augmentation
Now that we have relevant chunks of information, we add them to the original user question. These documents act like reference material, giving the model more context before it answers.
G: Generation
Finally, the LLM takes your question plus the extra context and generates a response that’s: more accurate, better grounded in facts, less likely to hallucinate
Boom - smarter answers, thanks to the RAG pipeline.
RAG isn't perfect, it can still produce misleading content if the retrieved documents are not accurate or relevant. To overcome these problems, many different types of RAG were introduced such as Corrective RAG and Fusion RAG.
From word vectors to RAG, we’ve gone from how machines understand language to how they answer our questions. As we push LLMs to do more, from coding assistants to virtual doctors, understanding their strengths and limitations helps us build and use them more responsibly.