RAG Knowledge Search

A streaming RAG powered knowledge search tool that lets users ingest their own documents, runs paragraph level embeddings, and returns grounded answers with source citations in real time. Built with OpenAI embeddings, GPT-4o-mini, a custom in-memory vector store, and a Next 16 API layer hardened with rate limits and basic abuse protections.

Next.js 16React 19TypeScriptOpenAI APIEmbeddingsVector SearchStreamingFramer Motion

Overview

RAG Knowledge Search is a small AI lab project that I built to show what applied AI engineering looks like when it is treated as a real product instead of a toy.

It lets users:

Add their own documents into a knowledge base
Ask natural language questions about that content
Get streaming answers from GPT-4o-mini that are grounded in retrieved text
See which source chunks were used for each answer, with match scores

Under the hood it uses:

OpenAI text-embedding-3-small for embeddings
An in-memory vector store for similarity search (ready to swap for Pinecone or pgvector later)
A Next 16 API layer with shared state, chunking, and security constraints
A React-based chat UI that handles streaming and context visualization

This project lives at /ai/rag and powers the first tile in my AI lab.

Problem

Most AI demos show a model that answers questions out of thin air. That is fine for a quick prototype, but it is not how teams actually ship reliable AI features in production.

I wanted a small, end-to-end example that solves a more realistic problem:

Given a set of arbitrary documents, let a user ask questions and get answers that are grounded in those documents, with sources and safety limits, in a way that feels like a real product.

The constraints I set for myself:

No offline preprocessing step. The system should be able to ingest documents on demand.
Answers must be based on retrieved chunks, not free-form guessing.
The UI should feel fast and modern, including streaming.

RAG Knowledge Search

Overview

Problem

Tech Stack

Attribution

Related Projects

MeetWith

Voice Assistant with Tools

Get in Touch

Approach

1. Knowledge store

2. Ingestion pipeline

3. Query and retrieval

4. Streaming chat UI

Architecture

Frontend

Backend

ragStore.ts

POST /api/ai/rag/ingest

GET /api/ai/rag/docs

POST /api/ai/rag/query

Security and Robustness

Rate limiting:

Input validation:

Per-IP document limits:

Origin checks and headers:

UX Details

What I Learned and Why It Matters

Prompt Injection & Safety Lab