RAG Chunking: Sliding Window Strategy
Sliding window chunking ignores paragraph and sentence boundaries entirely. Instead it moves a fixed-size window forward by a configurable stride — creating dense, overlapping chunks that preserve context across every split.
This is Part 16 of the AI Agents series. Parts 13–15 covered fixed-size, sentence-based, and recursive character splitting. This post covers the fourth main chunking strategy: sliding window.
1. How it differs from previous strategies
Every strategy so far tried to find a good place to split:
- Fixed-size: split every N characters
- Sentence-based: split at sentence boundaries
- Recursive: split at paragraph → sentence → word → character, in that priority order
Sliding window takes a different approach entirely. It doesn’t look for split points. Instead, it moves a window of fixed size forward by a fixed number of steps (the stride), creating chunks that heavily overlap with their neighbors.
Two parameters control everything:
- Window size — how many words (or characters) each chunk contains
- Stride — how many words (or characters) to advance before starting the next chunk
2. The mechanics
With window_size=5, stride=3 over the text:
RAG enhances LLMs by retrieving external data. This process reduces hallucinations.
| Chunk | Words |
|---|---|
| 1 | RAG enhances LLMs by retrieving |
| 2 | by retrieving external data This |
| 3 | data This process reduces hallucinations |
The window starts at word 1, takes 5 words, then moves forward 3. Because stride (3) < window size (5), the last 2 words of each chunk appear again at the start of the next one. That repetition is the overlap — and it’s what makes the difference for context preservation.
If stride equals window size, there’s zero overlap — identical to fixed-size word chunking. If stride is 1, every chunk overlaps with the next by window_size - 1 words — maximum density, maximum redundancy.
3. Python implementation
def sliding_window_chunks(text: str, window_size: int, stride: int) -> list[str]:
words = text.split()
chunks = []
start = 0
while start < len(words):
chunk = words[start:start + window_size]
chunks.append(" ".join(chunk))
start += stride
return chunks
# Example from the video
text = "The quick brown fox jumps over the lazy dog"
chunks = sliding_window_chunks(text, window_size=8, stride=4)
for i, chunk in enumerate(chunks):
print(f"[{i}] {chunk}")
Output:
[0] The quick brown fox jumps over the lazy
[1] jumps over the lazy dog
Chunk 1 starts at word 5 (stride=4 moves past the first 4 words). The last chunk takes whatever words remain — it won’t be a full window if the text runs out.
4. Stride controls the overlap
The relationship between stride and overlap is direct:
$$\text{overlap (words)} = \text{window_size} - \text{stride}$$
| window_size | stride | overlap |
|---|---|---|
| 8 | 8 | 0 (no overlap) |
| 8 | 6 | 2 words |
| 8 | 4 | 4 words (50%) |
| 8 | 2 | 6 words (75%) |
| 8 | 1 | 7 words (maximum density) |
A stride of 1 produces the most context-preserving chunks but also the most chunks — for a 1000-word document with window size 8 and stride 1, you get ~993 chunks. That’s expensive to index and query.
Rule of thumb: start with stride at roughly 50–60% of window size. Adjust based on retrieval quality and index size tradeoffs.
5. Character-based vs word-based
The implementation above uses words. For embedding models with token/character limits, you may want character-based sliding:
def sliding_window_char_chunks(text: str, window_size: int, stride: int) -> list[str]:
chunks = []
start = 0
while start < len(text):
chunks.append(text[start:start + window_size])
start += stride
return chunks
chunks = sliding_window_char_chunks(
"The quick brown fox jumps over the lazy dog",
window_size=20,
stride=10
)
for i, chunk in enumerate(chunks):
print(f"[{i}] '{chunk}'")
Character-based gives precise control over chunk byte size — useful when you’re working close to an embedding model’s context limit. Word-based is more readable and easier to reason about.
6. Integrating with ChromaDB
import chromadb
client = chromadb.PersistentClient(path="./chroma_db")
collection = client.get_or_create_collection(name="sliding_window")
document = """
Nerchuko was founded in 2024 to make data science education accessible.
The platform offers structured learning paths in Python, SQL, statistics, and machine learning.
Learners can track their progress and take assessments after each module.
The AI Agents series is one of the flagship courses on the platform.
"""
chunks = sliding_window_chunks(document.strip(), window_size=15, stride=8)
collection.upsert(
documents=chunks,
ids=[f"sw_{i}" for i in range(len(chunks))]
)
results = collection.query(
query_texts=["What courses does Nerchuko offer?"],
n_results=3
)
for doc, dist in zip(results["documents"][0], results["distances"][0]):
print(f"[{dist:.4f}] {doc}")
7. When to use sliding window
| Scenario | Sliding window? |
|---|---|
| Narrative text, novels, articles | Good fit — no hard topic boundaries |
| Dense technical docs with clear sections | Worse than recursive splitting |
| Semantic search across continuous prose | Strong fit |
| FAQ / policy documents | Overkill — sentence chunking is simpler |
| Small documents with short answers | Avoid — too many overlapping chunks |
Sliding window shines when there are no natural structural boundaries to exploit and context flows continuously across the text. For structured documents (handbooks with sections, code with functions, papers with headings), recursive splitting or sentence-based chunking is usually better.
8. Chunking strategies: full comparison
| Strategy | Context preserved | Handles structure | Overlap control | Complexity |
|---|---|---|---|---|
| Fixed-size | Poor | No | Manual | Trivial |
| Sentence-based | Good | Partial | None | Low |
| Recursive character | Very good | Yes | Yes | Medium |
| Sliding window | Good | No | Precise | Low |
No single strategy is universally best. Match the strategy to the document type, and when in doubt, test retrieval quality empirically with a set of representative questions.
What’s next
Part 17 covers semantic chunking — the most accurate strategy, which groups sentences by meaning rather than size. It uses embeddings and a similarity threshold to decide where one topic ends and the next begins.
Full video walkthrough is embedded above.