Very Simple RAG Implementation

My friend asked me to teach him about Docker/Docker Compose and RAG, ended up implementing a simple RAG system with FastAPI and ChromaDB. I was surprised that it was very easy to configure and use ChromaDB to store embeddings and perform vector search.

My initial thoughts was to use OpenAI embeddings API to generate embeddings and store them in ChromaDB, but it turned out that ChromaDB natively supports the flow within the library, and I could simply store/get similar documents without worrying about the underlying vector records. (Not sure it's good for me or not tho, missing the opportunity to deeply understand the actual embedding records)

I have two separate containers; one is an API server, and the other is a vector database.

services:
  chromadb:
    image: chromadb/chroma:latest
    volumes:
      - chromadb_data:/chroma/chroma
    environment:
      - IS_PERSISTENT=TRUE
      - PERSIST_DIRECTORY=/chroma/chroma
      - ANONYMIZED_TELEMETRY=${ANONYMIZED_TELEMETRY:-TRUE}
    ports:
      - 8001:8000

  fastapi:
    container_name: fastapi
    build: .
    restart: always
    volumes:
      - .:/app
    ports:
      - 8000:8000
    env_file:
      - .env
    depends_on:
      chromadb:
        condition: service_started

volumes:
  chromadb_data:

And, I have two simple API endpoints

class ChatRequest(BaseModel): message: str = Field(description="The message to be sent to the chatbot") class ChatResponse(BaseModel): message: str = Field(description="The message to be sent to the chatbot") @app.post("/query", response_model=ChatResponse, status_code=200) def query(chat_request: ChatRequest): # Perform similarity search and fetch relevant information rag_result = collection.query( query_texts=[chat_request.message], n_results=1, )["documents"][0][0] # Perform chat completion using the retrieved information # TODO: Clean up the query text # TODO: Add integration tests query_text = f"Use this information: {rag_result}\n" query_text += "=" * 80 + "\n" query_text += "Reply 'I don't know' if you don't find the answer in the given context.\n" query_text += "=" * 80 + "\n" query_text += chat_request.message response = openai_client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": query_text}], ) return ChatResponse(message=response.choices[0].message.content) class LearnRequest(BaseModel): text: str = Field(description="The text to be learned by the chatbot") @app.post("/learn", status_code=204) def learn(learn_request: LearnRequest): collection.add( documents=[learn_request.text], ids=[uuid.uuid4().hex], ) return

Using /learn endpoint, I can add knowledge in the vector database.

Using /query endpoint, I can ask for knowledge in the vector database. As my friend asked, there are strict safeguards.

I might not follow the strict definition of RAG, but with these endpoints, I managed to consolidate Generation of LLM (GPT4o) by performing similarity search and provide it with relevant information.

My understanding might be wrong. I'll correct if there are any flaws.

Browsed LangChain

LangChain seems to be just a convenient tool to build LLM application, wrapping API and simplifying API interactions, and managin resouces without many configurations.

I won't touch the library anytime soon, but its existence seems to be worth remembering.

Running Chroma

I know I can use ChromaDB as a database to store embeddings to do RAG, but I didn't know how to deploy it.

Chroma CLI

Docker Compose

version: '3.9' networks: net: driver: bridge services: chromadb: image: chromadb/chroma:latest volumes: - ./chromadb:/chroma/chroma environment: - IS_PERSISTENT=TRUE - PERSIST_DIRECTORY=/chroma/chroma # this is the default path, change it as needed - ANONYMIZED_TELEMETRY=${ANONYMIZED_TELEMETRY:-TRUE} ports: - 8000:8000 networks: - net

It looks very easy to configure. I might want to use it for my personal project.

20240906

Very Simple RAG Implementation

Browsed LangChain

Running Chroma

Chroma CLI

Docker Compose