Word Vectors, LLMs and Retrieval Augmented Generation

Ever wondered how machines go from seeing random characters like "C-A-T" to actually understanding what a cat is? No, it’s not sorcery (though it sometimes feels that way). It all starts with word vectors, the building blocks of how language models like ChatGPT make sense of human language.

Let’s break it all down.

From Letters to Numbers: Word Vectors

Humans read and understand words as sequences of letters — “C-A-T” rings a bell because you’ve seen it a thousand times. But language models? They’re playing a whole different game.

Machines translate “cat” into a long list of numbers, like: [0.0074, 0.0030, -0.0105, 0.0742, 0.0765, -0.0011, 0.0265, 0.0106, 0.0191, 0.0038, -0.0468, -0.0212, 0.0091, 0.0030, -0.0563, -0.0396, -0.0998, -0.0796, …, 0.0002] (The full vector is 300 numbers long).

These are called word vectors, and they live in an imaginary “word space” — a high-dimensional universe where words are points, and their distances actually mean something.

To make that less abstract: think of a map. You might describe two locations like this:
{
"Carbonteq": [31.46, 74.42],
"Dolmen Mall Lahore": [31.46, 74.43]
}

Mapped co-ordinates of 5 locations based on their distance

They're super close in terms of coordinates. Similarly, word vectors that are close together usually represent words that are semantically related — like “cat” and “kitten,” or “car” and “automobile.”

Word embeddings visualization for "cat" using spaCy word embeddings

Why Word Vectors Matter

Word vectors aren’t just geeky numerical tricks. They’re powerful because they let us compute with language.

A key advantage of representing words with vectors of real numbers (as opposed to a string of letters, like “C-A-T”) is that numbers enable operations that letters don’t. Word vectors are a useful building block for language models because they encode subtle but important information about the relationships between words. Google researchers took the vector for biggest, subtracted big, and added small. The word closest to the resulting vector was smallest.

You can use vector arithmetic to draw analogies. For example, Paris is to France as Berlin is to Germany. (capitals)

But Wait - Bias Comes Along for the Ride

Here’s the not-so-fun part. Because these word vectors are built from real-world text — the stuff we humans write — they pick up our biases. In some word vector models:

doctor - man + woman = nurse

Yikes. That’s gender bias being encoded into a model. And it’s not just this one case — many professional or social stereotypes creep into models trained on public data. Detecting and mitigating these biases is a big, ongoing area of research.

What Are Large Language Models (LLMs) Really Doing?

At their core, LLMs like ChatGPT are just really good at predicting the next word.

They’ve read massive amounts of text — books, websites, articles — and use word vectors to understand patterns, context, and structure. This allows them to perform everything from sentiment analysis and question-answering to code generation.

Pretty cool, right?

The Limitations: Why LLMs Sometimes Get It Wrong

They’re impressive, but not perfect. Here’s why:

  • Outdated Knowledge
    • LLMs are trained on data up to a certain point — they don’t know what happened after that. (GPT-4o, for instance, cuts off at October 2023.)
  • Factual Errors
    • They can sound confident, but still get facts wrong — especially on niche topics.
  • Hallucinations
    • Yep, that’s a real term. Sometimes, LLMs just make stuff up. Fluent, convincing, but totally fabricated.

RAG: Retrieval-Augmented Generation

Let’s fix that with a concept called RAG — Retrieval-Augmented Generation.

LLM Without RAG = “Closed-Book Exam”

Imagine you’re taking an exam with no reference material. You’ve studied hard, but if a weird question pops up? You might guess. You might even confidently guess wrong.

That’s how a vanilla LLM operates: it relies entirely on what it memorized during training.

LLM With RAG = “Open-Book Exam”

Now imagine you can bring books. You still use your knowledge, but when you don’t know something, you look it up.

That’s what RAG enables. The model retrieves external information, augments the query with it, and then generates a better, more accurate response.

Let’s break that down.

The RAG Breakdown: R → A → G

R: Retrieval

Think of it like a search engine — but for word vectors. We use a vector database to store content as embeddings. When a user asks something, the same embedding model converts the question into a vector. We search the database to find the closest matches using similarity measures like cosine similarity or dot product.

So instead of guessing, the model pulls in relevant info - fast.

A: Augmentation

Now that we have relevant chunks of information, we add them to the original user question. These documents act like reference material, giving the model more context before it answers.

G: Generation

Finally, the LLM takes your question plus the extra context and generates a response that’s: more accurate, better grounded in facts, less likely to hallucinate

Boom - smarter answers, thanks to the RAG pipeline.

Advantages and Disadvantages of RAG

RAG isn't perfect, it can still produce misleading content if the retrieved documents are not accurate or relevant. To overcome these problems, many different types of RAG were introduced such as Corrective RAG and Fusion RAG.

From word vectors to RAG, we’ve gone from how machines understand language to how they answer our questions. As we push LLMs to do more, from coding assistants to virtual doctors, understanding their strengths and limitations helps us build and use them more responsibly.