I want to run tomorrow, but will consult with my leg muscles.
My friend asked me to teach him about Docker/Docker Compose and RAG, ended up implementing a simple RAG system with FastAPI and ChromaDB. I was surprised that it was very easy to configure and use ChromaDB to store embeddings and perform vector search.
My initial thoughts was to use OpenAI embeddings API to generate embeddings and store them in ChromaDB, but it turned out that ChromaDB natively supports the flow within the library, and I could simply store/get similar documents without worrying about the underlying vector records. (Not sure it's good for me or not tho, missing the opportunity to deeply understand the actual embedding records)
I have two separate containers; one is an API server, and the other is a vector database.
services:
chromadb:
image: chromadb/chroma:latest
volumes:
- chromadb_data:/chroma/chroma
environment:
- IS_PERSISTENT=TRUE
- PERSIST_DIRECTORY=/chroma/chroma
- ANONYMIZED_TELEMETRY=${ANONYMIZED_TELEMETRY:-TRUE}
ports:
- 8001:8000
fastapi:
container_name: fastapi
build: .
restart: always
volumes:
- .:/app
ports:
- 8000:8000
env_file:
- .env
depends_on:
chromadb:
condition: service_started
volumes:
chromadb_data:
And, I have two simple API endpoints
class ChatRequest(BaseModel):
message: str = Field(description="The message to be sent to the chatbot")
class ChatResponse(BaseModel):
message: str = Field(description="The message to be sent to the chatbot")
@app.post("/query", response_model=ChatResponse, status_code=200)
def query(chat_request: ChatRequest):
# Perform similarity search and fetch relevant information
rag_result = collection.query(
query_texts=[chat_request.message],
n_results=1,
)["documents"][0][0]
# Perform chat completion using the retrieved information
# TODO: Clean up the query text
# TODO: Add integration tests
query_text = f"Use this information: {rag_result}\n"
query_text += "=" * 80 + "\n"
query_text += "Reply 'I don't know' if you don't find the answer in the given context.\n"
query_text += "=" * 80 + "\n"
query_text += chat_request.message
response = openai_client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": query_text}],
)
return ChatResponse(message=response.choices[0].message.content)
class LearnRequest(BaseModel):
text: str = Field(description="The text to be learned by the chatbot")
@app.post("/learn", status_code=204)
def learn(learn_request: LearnRequest):
collection.add(
documents=[learn_request.text],
ids=[uuid.uuid4().hex],
)
return
Using /learn
endpoint, I can add knowledge in the vector
database.
Using /query
endpoint, I can ask for knowledge in the
vector database. As my friend asked, there are strict safeguards.
I might not follow the strict definition of RAG, but with these endpoints, I managed to consolidate Generation of LLM (GPT4o) by performing similarity search and provide it with relevant information.
My understanding might be wrong. I'll correct if there are any flaws.
LangChain seems to be just a convenient tool to build LLM application, wrapping API and simplifying API interactions, and managin resouces without many configurations.
I won't touch the library anytime soon, but its existence seems to be worth remembering.
I know I can use ChromaDB as a database to store embeddings to do RAG, but I didn't know how to deploy it.
As simple as
$ pip install chromadb
$ chroma run --host localhost --port 8000 --path ./my_chroma_data
version: '3.9'
networks:
net:
driver: bridge
services:
chromadb:
image: chromadb/chroma:latest
volumes:
- ./chromadb:/chroma/chroma
environment:
- IS_PERSISTENT=TRUE
- PERSIST_DIRECTORY=/chroma/chroma # this is the default path, change it as needed
- ANONYMIZED_TELEMETRY=${ANONYMIZED_TELEMETRY:-TRUE}
ports:
- 8000:8000
networks:
- net
It looks very easy to configure. I might want to use it for my personal project.
Chipotle 1000 Bagels 600 Sushi bowl 800 Protein bar 200 Yogurt 300
Total 2900 kcal
MUST:
TODO: