RAG Chunking Strategies: Why Fixed-Size Chunking Fails

This is Part 13 of the AI Agents series. Parts 11–12 built a working RAG pipeline. This post goes deeper on the most common reason RAG retrieval fails: poor chunking.

1. Why retrieval fails in RAG

When a RAG system returns irrelevant chunks, the root cause is almost always one of two things:

Weak embedding model — the model can’t accurately represent the meaning of the text as a vector
Bad chunking — the text is split in a way that destroys context before it even reaches the embedding model

A bad embedding model is fixable by switching models. Bad chunking is subtler — it silently poisons the entire index, and no embedding model, no matter how good, can recover meaning from a fragment that has none.

2. What a chunk needs to be useful

An embedding is a compressed numerical representation of meaning. For that compression to work, the input text must carry complete, coherent meaning.

Consider the fragment: "rises in"

No human — and no embedding model — knows what this means. Is it a sun? A temperature? A stock price? The vector produced for this fragment will be effectively random noise.

Now consider: "The sun rises in the east"

The meaning is complete. The embedding model can represent this accurately, and similarity search will match it correctly against questions like “where does the sun rise?”

The rule: every chunk must be semantically complete on its own. If a human can’t understand it without surrounding context, the embedding model can’t either. Garbage in, garbage vectors out.

3. Fixed-size chunking

Fixed-size chunking splits text at a fixed character count, regardless of sentence or paragraph boundaries.

def fixed_size_chunks(text: str, chunk_size: int) -> list[str]:
    return [text[i:i + chunk_size] for i in range(0, len(text), chunk_size)]


sample = (
    "The quick brown fox jumps over the lazy dog. "
    "This process is called tokenization. Tokenization is the first step "
    "in natural language processing. Each token represents a word or subword unit."
)

for size in [20, 40, 50]:
    chunks = fixed_size_chunks(sample, size)
    print(f"\n--- chunk_size={size} ({len(chunks)} chunks) ---")
    for i, chunk in enumerate(chunks):
        print(f"  [{i}] '{chunk}'")

Output for chunk_size=20:

[0] 'The quick brown fox '
[1] 'jumps over the lazy'
[2] ' dog. This process '
[3] 'is called tokenizat'
[4] 'ion. Tokenization i'
...

Chunk [3] is "is called tokenizat" — a sentence fragment with no subject and a word split mid-character. Chunk [4] continues that word but has no subject either. Neither chunk produces a useful vector.

Increasing chunk size to 40 or 50 helps some chunks but the problem doesn’t go away — sentence boundaries and chunk boundaries rarely align, so cuts will still land mid-sentence.

4. The context loss problem, concretely

Here’s what happens end-to-end when you have a bad chunk:

Text: "...Hyderabad, the capital of Telangana, is known for its biryani.
       The city was founded in 1591 by..."

Fixed-size cut at 50 chars:
  chunk A: "Hyderabad, the capital of Telangana, is known fo"
  chunk B: "r its biryani. The city was founded in 1591 by..."

Chunk A loses the end of the sentence — “known fo” means nothing. Chunk B starts with “r its biryani” — the subject of the sentence is gone.

If a user asks “What is Hyderabad known for?”, the similarity search has to match that question against vectors built from broken fragments. The answer might technically be in the data, but it won’t be retrieved reliably.

5. When fixed-size chunking is acceptable

Despite its flaws, fixed-size chunking is the right choice in a narrow set of cases:

Highly structured documents with predictable line lengths — log files, CSV rows, fixed-format records where each line is self-contained
Code files where you want to index by line range
Rapid prototyping where you just need something working and will improve chunking later

For any document with natural language — articles, handbooks, research papers, emails — fixed-size chunking is not suitable for production.

6. Fixed-size chunking with overlap

One partial mitigation is adding overlap — repeating the last N characters of each chunk at the start of the next one. This preserves some cross-boundary context.

def fixed_size_chunks_with_overlap(text: str, chunk_size: int, overlap: int) -> list[str]:
    chunks = []
    start = 0
    while start < len(text):
        end = start + chunk_size
        chunks.append(text[start:end])
        start += chunk_size - overlap
    return chunks


chunks = fixed_size_chunks_with_overlap(sample, chunk_size=50, overlap=10)
for i, chunk in enumerate(chunks):
    print(f"[{i}] '{chunk}'")

With overlap=10, the last 10 characters of chunk N are repeated at the start of chunk N+1. This prevents the worst cases — words split mid-character, pronouns with no antecedent — but it doesn’t solve the fundamental problem. You’re still cutting at arbitrary positions.

Overlap helps. It doesn’t make fixed-size chunking good.

7. Summary: fixed-size chunking

Property	Value
Implementation complexity	Very low
Context preservation	Poor
Suitable for natural language	No
Suitable for structured/log data	Yes
Production use for documents	Not recommended

Think of it as the “Hello World” of chunking — useful for learning the mechanics, not something you ship in a real RAG system.

What’s next

Part 14 covers sentence-based chunking — splitting text at sentence boundaries instead of character counts. This preserves semantic completeness in every chunk and produces dramatically better retrieval than fixed-size splitting.

Full video walkthrough is embedded above.