Large Language Models (LLMs) are like the autocomplete feature on your phone.
But in comparison your phone autocomplete is like a toaster compared to an LLM
which is like a Komatsu D575A Super Dozer.
Toaster not shown.
LLMs are trained on massive amounts of text data books, articles, code,
and on the internet to understand patterns in language and generate
human-like responses.
Think of them as brains that can:
Answer questions
Write code
Summarize documents
Translate languages
Generate content
But unlike humans, they do not actually "understand" anything. Computers have
always been and are still dumb; they can only do what humans make them do.
They are predicting what comes next based on patterns they have learned.
For early LLMs, such a calculation as in the meme would have been
difficult because the statement "my baby is twice as big in 3 months so in 10
years, he will be 7.5 trillion pounds" is logically sound but invalid because of
linear/exponential extrapolation fallacy or assuming that a pattern over
a short period will continue unchanged indefinitely without considering limiting
factors. In this case: human babies will slow growing and stop after a time.
Key Concepts
Parameters: The Brain Size
When you say "hi" to another human, you are unconsciously taking in dozens of
factors.
Who they are
Your and their mood
What are they wearing?
Do they look different from last you saw them?
When was the last time you saw them?
Then you make a decision on how your tone is when you speak, on what subject to
speak about, on which emotion to convey to the other person.
Parameters work the same way: Each factor is an input of the situation where
you make a decision based on learned biases and prior knowledge.
Now imagine you only use half the factors listed above. If you cannot tell the
other person's mood or do not know who they are, you may not make an appropriate
decision on your interaction.
Parameters are the learned weights and biases in a neural network that determine
how the model processes and generates text. Think of them as the "knowledge" the
model has acquired during training. This is the same as how humans learn where
we read books and watch movies and learn patterns which we may later refer to.
Generally speaking, the parameter the higher the count the higher the capability
but also the higher the resource requirements:
1B parameters: Basic tasks capability, CPU will do (GPU would be great), best for simple Q&A
7B parameters: Generally good capability, common GPUs needed, sweet spot for most applications with decent reasoning (most used models are in this range, as of 15 July 2025)
13B parameters: Advanced responses capability, more GPUs or hosted services needed, best for complex tasks past Q&A
70B+ parameters: State-of-the-art capability, heavy GPU investment required, highest level of analytics
As of 15 July 2025. Advancements are made weekly.
Context Window: The Amount I Can Understand For Now
Has anyone ever talked so fast and said so much you did not have time to write it
down or process what was being said? That is like a context window for LLMs.
The context window is how much text an LLM can "remember" at once. It is like
short-term memory everything the model considers when generating a single
response.
You will also hear the term token which means a single chunk of data fed
into the context window. For example, ChatGPT may have a context window of
120,000 tokens.
Tokens can be an ambiguous metric because usually a token is a word or
syllable but depends on the LLM and the infrastructure around the LLM. But in
general we can assume a token anywhere between a syllable to full word.
You should start caring about tokens if you plan to feed 2,000 legal
documents or entire books into an LLM but if you are only asking questions to an
LLM, then not so much.
Humans go through education where they learn general math, history, language,
and science. Then a human may specialize in a field such as Engineering, Business, English,
Music, Research, or Art.
The same with LLMs. LLMs go through two main training phases:
Pre-training: Trained on massive general datasets like Wikipedia, books,
and code.
Fine-tuning: Specialized for specific tasks like classification, question
answering, or code generation.
In terms of resource, Pre-training is cost-prohibitive for most businesses.
Gemini 1.0 cost Google $192 million to train. So as you can imagine, not many
companies train their own model. But that is okay, because there are far cheaper
and easier ways to give the effect of specializing your model.
When you want to make an LLM work with your specific data, you have two main
options:
RAG (Retrieval Augmented Generation)
RAG is like giving your LLM a test but the test is open-book.
When you ask a question, it first searches a database for relevant information,
then uses that context to generate an answer. The point where the data is
injected is the context window like you would when you type questions into
ChatGPT. RAG would find the relevant data and append the data to your question.
Lower money and GPU resource requirements (no Pre-training or Fine-tuning for
RAG)
Privacy (external database can be secured separately)
Cons:
Lighter training method than Fine-tuning
Results may be less consistent compared to Fine-tuning
How it works:
Your documents get encoded into vector embeddings or a string which computers
can "understand", example of real embedding: Df03q4897ps90d8fsd0
Depending on the algorithm set in the vector database, the document strings
are ranked based on how closely related they are.
For example, "dog" and "cat" may be given a score of 0.7 because they are
both animals, both are common pets, but are different species as per the
training data.
"Cat" and "cow" may be given a score of 0.2 because though they are both
animals, they are less seen together in the training data.
LLMs are powerful automation machines but not magic. They do not truly
"understand" anything and they are predicting what text should come next based on
patterns.
Despite its flaws, pattern matching is so sophisticated that it is useful for a
huge range of tasks. The trick is understanding what they are good at (pattern
recognition, text generation, reasoning over provided context) and what they are
not (factual accuracy without confirming, consistent logic).
Start simple, experiment with the tools, and gradually build complexity as you
understand what works for your specific use case.