RAG Chunking: Sliding Window Strategy

This is Part 16 of the AI Agents series. Parts 13–15 covered fixed-size, sentence-based, and recursive character splitting. This post covers the fourth main chunking strategy: sliding window.

1. How it differs from previous strategies

Every strategy so far tried to find a good place to split:

Fixed-size: split every N characters
Sentence-based: split at sentence boundaries
Recursive: split at paragraph → sentence → word → character, in that priority order

Sliding window takes a different approach entirely. It doesn’t look for split points. Instead, it moves a window of fixed size forward by a fixed number of steps (the stride), creating chunks that heavily overlap with their neighbors.

Two parameters control everything:

Window size — how many words (or characters) each chunk contains
Stride — how many words (or characters) to advance before starting the next chunk

2. The mechanics

With window_size=5, stride=3 over the text:

RAG enhances LLMs by retrieving external data. This process reduces hallucinations.

Chunk	Words
1	RAG enhances LLMs by retrieving
2	by retrieving external data This
3	data This process reduces hallucinations

The window starts at word 1, takes 5 words, then moves forward 3. Because stride (3) < window size (5), the last 2 words of each chunk appear again at the start of the next one. That repetition is the overlap — and it’s what makes the difference for context preservation.

If stride equals window size, there’s zero overlap — identical to fixed-size word chunking. If stride is 1, every chunk overlaps with the next by window_size - 1 words — maximum density, maximum redundancy.

3. Python implementation

def sliding_window_chunks(text: str, window_size: int, stride: int) -> list[str]:
    words = text.split()
    chunks = []
    start = 0
    while start < len(words):
        chunk = words[start:start + window_size]
        chunks.append(" ".join(chunk))
        start += stride
    return chunks


# Example from the video
text = "The quick brown fox jumps over the lazy dog"

chunks = sliding_window_chunks(text, window_size=8, stride=4)
for i, chunk in enumerate(chunks):
    print(f"[{i}] {chunk}")

Output:

[0] The quick brown fox jumps over the lazy
[1] jumps over the lazy dog

Chunk 1 starts at word 5 (stride=4 moves past the first 4 words). The last chunk takes whatever words remain — it won’t be a full window if the text runs out.

4. Stride controls the overlap

The relationship between stride and overlap is direct:

$$\text{overlap (words)} = \text{window_size} - \text{stride}$$

window_size	stride	overlap
8	8	0 (no overlap)
8	6	2 words
8	4	4 words (50%)
8	2	6 words (75%)
8	1	7 words (maximum density)

A stride of 1 produces the most context-preserving chunks but also the most chunks — for a 1000-word document with window size 8 and stride 1, you get ~993 chunks. That’s expensive to index and query.

Rule of thumb: start with stride at roughly 50–60% of window size. Adjust based on retrieval quality and index size tradeoffs.

5. Character-based vs word-based

The implementation above uses words. For embedding models with token/character limits, you may want character-based sliding:

def sliding_window_char_chunks(text: str, window_size: int, stride: int) -> list[str]:
    chunks = []
    start = 0
    while start < len(text):
        chunks.append(text[start:start + window_size])
        start += stride
    return chunks


chunks = sliding_window_char_chunks(
    "The quick brown fox jumps over the lazy dog",
    window_size=20,
    stride=10
)

for i, chunk in enumerate(chunks):
    print(f"[{i}] '{chunk}'")

Character-based gives precise control over chunk byte size — useful when you’re working close to an embedding model’s context limit. Word-based is more readable and easier to reason about.

6. Integrating with ChromaDB

import chromadb

client = chromadb.PersistentClient(path="./chroma_db")
collection = client.get_or_create_collection(name="sliding_window")

document = """
Nerchuko was founded in 2024 to make data science education accessible.
The platform offers structured learning paths in Python, SQL, statistics, and machine learning.
Learners can track their progress and take assessments after each module.
The AI Agents series is one of the flagship courses on the platform.
"""

chunks = sliding_window_chunks(document.strip(), window_size=15, stride=8)

collection.upsert(
    documents=chunks,
    ids=[f"sw_{i}" for i in range(len(chunks))]
)

results = collection.query(
    query_texts=["What courses does Nerchuko offer?"],
    n_results=3
)

for doc, dist in zip(results["documents"][0], results["distances"][0]):
    print(f"[{dist:.4f}] {doc}")

7. When to use sliding window

Scenario	Sliding window?
Narrative text, novels, articles	Good fit — no hard topic boundaries
Dense technical docs with clear sections	Worse than recursive splitting
Semantic search across continuous prose	Strong fit
FAQ / policy documents	Overkill — sentence chunking is simpler
Small documents with short answers	Avoid — too many overlapping chunks

Sliding window shines when there are no natural structural boundaries to exploit and context flows continuously across the text. For structured documents (handbooks with sections, code with functions, papers with headings), recursive splitting or sentence-based chunking is usually better.

8. Chunking strategies: full comparison

Strategy	Context preserved	Handles structure	Overlap control	Complexity
Fixed-size	Poor	No	Manual	Trivial
Sentence-based	Good	Partial	None	Low
Recursive character	Very good	Yes	Yes	Medium
Sliding window	Good	No	Precise	Low

No single strategy is universally best. Match the strategy to the document type, and when in doubt, test retrieval quality empirically with a set of representative questions.

What’s next

Part 17 covers semantic chunking — the most accurate strategy, which groups sentences by meaning rather than size. It uses embeddings and a similarity threshold to decide where one topic ends and the next begins.

Full video walkthrough is embedded above.