Building a Full-Stack Knowledge Aggregator with FastAPI, ChromaDB, and Next.js

The breaking point wasn't one single event, but a death by a thousand paper cuts. Every week, our team would lose hours to 'archaeology digs'—piecing together answers scattered across Confluence, GitHub, and ancient JIRA tickets. We all felt the pain, so we decided to do something about it.

We held a team-wide vote on a few AI features that could solve our biggest headaches. The winner was overwhelmingly clear: a KB aggregator with AI-assisted triage. Seeing the team rally behind that idea was the final push I needed. I decided to stop hunting and start building.

I built a lightweight Knowledge Base (KB) Aggregator to make that process conversational and dependable. Now, I can just ask a question and get a grounded, concise answer with direct links to the sources.

The "Aha!" Moment

The core idea isn't rocket science. It's a pretty standard Retrieval-Augmented Generation (RAG) pipeline. The system sucks in documents from all our scattered sources, chops them up, and embeds them into a vector store (I chose ChromaDB). When a user asks a question, we find the most relevant chunks of text and feed them to a language model with a simple instruction: "Answer the question using only this information."

The backend is a simple FastAPI server, and the frontend is a Next.js app with a pre-built chat UI that saved me from my own terrible CSS.

My Toolkit, and Why I Chose It

Backend: FastAPI. As someone who loves Python for data work, FastAPI feels like a superpower. It's blazing fast to develop with, and the automatic OpenAPI docs are a lifesaver.
Vector DB: ChromaDB. I wanted something I could run on my laptop without a credit card. ChromaDB is open-source and incredibly simple to get running for a proof-of-concept. It just works.
LLMs: I'm using Ollama to run Llama 3 locally. This was crucial. It meant I could develop the entire thing on a plane without internet and, more importantly, without sending any proprietary code to a third-party API. Swapping to a cloud provider like OpenAI or Anthropic later is trivial.
Frontend: Next.js with assistant-ui. Look, I'm a backend developer. The thought of centering a div gives me cold sweats. Using a polished, off-the-shelf component library like this meant I got a ChatGPT-like experience in an afternoon, not a month.

The Dirty Work: Ingesting Content

Getting the data in is 90% of the battle. My first connectors were naive scripts that pulled everything from our GitHub repos. Big mistake. They timed out constantly and indexed thousands of useless files from node_modules.

The key lesson was to be surgical and, most importantly, to enrich every single chunk with metadata. That source link is everything. Without it, you just have a confident-sounding black box. With it, you have verifiable proof.

GitHub connector, pseudo

# This is simplified, but the core idea is to tag every chunk
metadatas_to_add.append({
    "repository": repo_name,
    "module": module_path,
    "file": filename,
    "link": github_link, # The golden ticket!
})

Teaching the Machine Not to Lie

Here's the flow of the RAG pipeline:

Get the user's question.
(Optional) I added a quick keyword-based filter to immediately reject questions that are obviously off-topic, like "what's for lunch?"
Query ChromaDB to find the top 5 or so document chunks most similar to the question.
Shove those chunks and the original question into a carefully crafted prompt.
Stream the model's response back to the UI so it feels fast and responsive.

My first prompt was way too loose. I asked a question, and the LLM confidently hallucinated a new deployment process that sounded plausible but was completely fictional. That's when I learned the magic of a strong system prompt.

My "Please Don't Make Stuff Up" Prompt


RAG_PROMPT = ChatPromptTemplate.from_template(
    """
    You are an expert programmer assistant. I am going to give you some context
    (code snippets, documentation) and a question. Your task is to answer the
    question based *only* on the provided context.

    If the answer is not in the context, you MUST say "I could not find an
    answer in the available documents." Do not, under any circumstances,
    invent information or use your outside knowledge.

    Context:
    {context}

    Question:
    {question}

    Helpful Answer:
    """
)

Frontend: Plug and Play

As I said, I wanted to spend my time on the backend logic, not fighting with React hooks. The Vercel AI SDK and the assistant-ui library are a dream team for this. The code to get a working chat interface is almost laughably simple.


export const Assistant = () => {
  // Point this to your FastAPI streaming endpoint
  const chat = useChat({ api: '[http://127.0.0.1:8000/api/v1/chat/](http://127.0.0.1:8000/api/v1/chat/)' });
  const runtime = useVercelUseChatRuntime(chat);

  return (
    <AssistantRuntimeProvider runtime={runtime}>
      {/* This gives you the entire chat UI for free! */}
      <Thread />
    </AssistantRuntimeProvider>
  );
};

The Big Question: Why Not Just Buy Glean?

My manager asked me a very fair question early on: "Why are you building this? Can't we just buy a solution like Glean?" It's a great point. Off-the-shelf products are polished, supported, and work instantly. But for us, it came down to a few key issues.

Pros of pre-built solutions (The temptation)

Instant Gratification: You can be up and running in days, not months.
No Ops Headache: They handle scaling, uptime, and maintenance.
Polished UX: They've spent years on user experience and analytics.

Cons (Why I chose to build)

Data Privacy: We handle sensitive IP and customer data. We needed absolute certainty about where our data was stored and processed. Running our own stack on-prem or in our own VPC was non-negotiable for our compliance team.
The Weird Connectors: Commercial products have great connectors for Salesforce and Slack. They don't have connectors for our ancient, home-grown wiki or our bizarre custom deployment log format. I knew we'd need to write our own.
Cost at Scale: The pricing models for many of these services can get scary once you have millions of documents. The cost of running our own models and vector DB on our existing infrastructure turned out to be much lower in the long run.
Vendor Lock-in: Migrating millions of vector embeddings out of a proprietary system sounded like a nightmare I wanted to avoid.

My Decision Checklist

I ended up making this simple checklist. We fell squarely into the "build" category.

Go with a pre-built product if:

Speed is your number one priority.
Your engineering team is already swamped.
Your data isn't subject to strict regulation.

Build your own if:

You have strict data residency or compliance needs (SOC2, HIPAA, etc.).
You need deep customization or connectors for proprietary systems.
You're operating at a scale where running it yourself is cheaper.

What's Next on My List

This is just Part 1. The proof-of-concept works, and people are excited. Here's what I'm tackling next:

Agentic integrations: Letting the model do things, like looking up live metrics from Datadog or creating a JIRA ticket.
More connectors: ServiceNow, Slack, and letting users upload their own PDFs.
Better ops: Building tooling for re-indexing our data with zero downtime.

A Few Scars from the Trenches

Metadata is not optional. My first version didn't link back to the source documents. The first piece of feedback I got was, "This is cool, but how do I know I can trust it?" Add source links to every chunk. Period.

Be ruthless in your prompt. I can't stress this enough. If you give the model an inch, it will take a mile and invent a whole new universe of facts. Constrain it to only use the context you provide.

Stream everything. Don't make the user wait 10 seconds for a full response. Streaming the tokens back as they're generated makes the entire experience feel ten times faster.

Building a Full-Stack Knowledge Aggregator with FastAPI, ChromaDB, and Next.js - Part 1