20240906

I want to run tomorrow, but will consult with my leg muscles.

Very Simple RAG Implementation

My friend asked me to teach him about Docker/Docker Compose and RAG, ended up implementing a simple RAG system with FastAPI and ChromaDB. I was surprised that it was very easy to configure and use ChromaDB to store embeddings and perform vector search.

My initial thoughts was to use OpenAI embeddings API to generate embeddings and store them in ChromaDB, but it turned out that ChromaDB natively supports the flow within the library, and I could simply store/get similar documents without worrying about the underlying vector records. (Not sure it's good for me or not tho, missing the opportunity to deeply understand the actual embedding records)

gomyway1216/rag

I have two separate containers; one is an API server, and the other is a vector database.

services:
  chromadb:
    image: chromadb/chroma:latest
    volumes:
      - chromadb_data:/chroma/chroma
    environment:
      - IS_PERSISTENT=TRUE
      - PERSIST_DIRECTORY=/chroma/chroma
      - ANONYMIZED_TELEMETRY=${ANONYMIZED_TELEMETRY:-TRUE}
    ports:
      - 8001:8000

  fastapi:
    container_name: fastapi
    build: .
    restart: always
    volumes:
      - .:/app
    ports:
      - 8000:8000
    env_file:
      - .env
    depends_on:
      chromadb:
        condition: service_started

volumes:
  chromadb_data:

And, I have two simple API endpoints

class ChatRequest(BaseModel):
    message: str = Field(description="The message to be sent to the chatbot")


class ChatResponse(BaseModel):
    message: str = Field(description="The message to be sent to the chatbot")


@app.post("/query", response_model=ChatResponse, status_code=200)
def query(chat_request: ChatRequest):
    # Perform similarity search and fetch relevant information
    rag_result = collection.query(
        query_texts=[chat_request.message],
        n_results=1,
    )["documents"][0][0]

    # Perform chat completion using the retrieved information
    # TODO: Clean up the query text
    # TODO: Add integration tests
    query_text = f"Use this information: {rag_result}\n"
    query_text += "=" * 80 + "\n"
    query_text += "Reply 'I don't know' if you don't find the answer in the given context.\n"
    query_text += "=" * 80 + "\n"
    query_text += chat_request.message

    response = openai_client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": query_text}],
    )
    return ChatResponse(message=response.choices[0].message.content)


class LearnRequest(BaseModel):
    text: str = Field(description="The text to be learned by the chatbot")


@app.post("/learn", status_code=204)
def learn(learn_request: LearnRequest):
    collection.add(
        documents=[learn_request.text],
        ids=[uuid.uuid4().hex],
    )
    return

Using /learn endpoint, I can add knowledge in the vector database.

Using /query endpoint, I can ask for knowledge in the vector database. As my friend asked, there are strict safeguards.

I might not follow the strict definition of RAG, but with these endpoints, I managed to consolidate Generation of LLM (GPT4o) by performing similarity search and provide it with relevant information.

My understanding might be wrong. I'll correct if there are any flaws.

Browsed LangChain

Build a Chatbot | LangChain

LangChain seems to be just a convenient tool to build LLM application, wrapping API and simplifying API interactions, and managin resouces without many configurations.

I won't touch the library anytime soon, but its existence seems to be worth remembering.

Running Chroma

I know I can use ChromaDB as a database to store embeddings to do RAG, but I didn't know how to deploy it.

Running Chroma - ChromaDB

Chroma CLI

As simple as

$ pip install chromadb
$ chroma run --host localhost --port 8000 --path ./my_chroma_data

Docker Compose

version: '3.9'

networks:
  net:
    driver: bridge
services:
  chromadb:
    image: chromadb/chroma:latest
    volumes:
      - ./chromadb:/chroma/chroma
    environment:
      - IS_PERSISTENT=TRUE
      - PERSIST_DIRECTORY=/chroma/chroma # this is the default path, change it as needed
      - ANONYMIZED_TELEMETRY=${ANONYMIZED_TELEMETRY:-TRUE}
    ports:
      - 8000:8000
    networks:
      - net

It looks very easy to configure. I might want to use it for my personal project.


Chipotle 1000 Bagels 600 Sushi bowl 800 Protein bar 200 Yogurt 300

Total 2900 kcal


MUST:

TODO:


index 20240905 20240907