Hybrid RAG with Semantic Chunking

Scaling RAG Hybrid Search on 780K Pages

Piotr Chlebek · 2026-4-(work in progress)

Abstract: in progress..

Keywords: in progress..

(work in progress)

Architecting a Private, Vertical Search Engine for a Library of 6,000 Documents

The Library I Always Wanted: Fast, Local, and Under Control

This project was born from my love for non-fiction books and knowledge-dense industry articles. For a long time, I dreamed of building a system that could "understand" and process my large collection of documents on topics I care about.

This text was born from a mix of three sparks. First, I wanted to expand on the ideas that there won't be enough time for during my presentation: AI Tinkerers Gdańsk Meetup – April 23rd. Second, writing is pure joy for me and the best way to organize my thoughts. But the most important catalyst was my colleagues in the industry – it was their encouragement and curiosity about my struggles with new technologies that finally got me to sit down at the keyboard. Since they felt these experiences were worth writing about, I couldn't let them down.

The 5 Million Prompts

After processing ~6k PDFs, I ended up with 780,000 pages. I decided to use several prompts to extract metadata from each page, so I ended up having about 5 million LLM queries to process.

Processing 5 million prompts in the cloud is not cheap. Even with modern features like prompt caching or batch pricing for slower results, the costs are still too high. To save money, I decided to use my own local hardware instead. As a major bonus, running everything locally ensures total privacy for my data. You can find out how I managed to do it with vLLM here: Serving the LLM locally with vLLM.

Of course, there are challenges. Local AI usually has higher latency (slower response times) than big cloud providers. Because of this, I have to be careful about how I use LLMs for certain real-time tasks, such as reranking search results.

In this post:

  • (work in progress)

Related Posts:

References:

Images Source: Google DALL-E 3 (04.2026).

Leave a Comment



We use cookies to improve your experience.