I talked with my friends for a long time. I'm tired.
Motivation: I got curious how ChromaDB stores embeddings or other data.
Troubleshooting | ChromaDB Docs
Chroma requires SQLite > 3.35
ChromaDB simply uses SQL to store its data.
It's interesting that it uses SQLite. SQLite is not designed to handle large amount of requests. Therefore, ChromaDB may not be the option when developing Web API that need to handle multiple requests at the same time.
Usually, SQLite allows at most one writer to proceed concurrently.
I thought ChromaDB was the go-to option, but because its documentation is not very helpful, and the write throughput is limited, I'm inclined to explore other options.
$ docker cp rag-chromadb-1:chroma/chroma/chroma.sqlite3 .
Successfully copied 169kB to /Users/username/rag/.
$ sqlite3 chroma.sqlite3
SQLite version 3.43.2 2023-10-10 13:08:14
Enter ".help" for usage hints.
sqlite>
sqlite> .tables
collection_metadata embeddings
collections embeddings_queue
databases embeddings_queue_config
embedding_fulltext_search maintenance_log
embedding_fulltext_search_config max_seq_id
embedding_fulltext_search_content migrations
embedding_fulltext_search_data segment_metadata
embedding_fulltext_search_docsize segments
embedding_fulltext_search_idx tenants
embedding_metadata
sqlite> .headers ON
sqlite> SELECT * FROM collections;
id|name|dimension|database_id|config_json_str
fa8b28c4-2e56-4bd9-aaf5-53ae438f1423|learn|384|00000000-0000-0000-0000-000000000000|{"hnsw_configuration": {"space": "l2", "ef_construction": 100, "ef_search": 10, "num_threads": 8, "M": 16, "resize_factor": 1.2, "batch_size": 100, "sync_threshold": 1000, "_type": "HNSWConfigurationInternal"}, "_type": "CollectionConfigurationInternal"}
sqlite> SELECT * FROM embeddings;
id|segment_id|embedding_id|seq_id|created_at
1|f425325a-7d10-4faa-873a-e86495072384|9527d8eb60f04fa7b109891f396ea1a0||2024-10-12 12:16:25
sqlite> SELECT * FROM embedding_fulltext_search;
string_value
Yudai Yaguchi is 9 feet tall
sqlite> SELECT * FROM embedding_fulltext_search_content;
id|c0
1|Yudai Yaguchi is 9 feet tall
sqlite> SELECT * FROM embedding_fulltext_search_data;
id|block
1|
10|
137438953473|
However, as I browsed the tables, vectors seem not to be stored in the SQL database. SQLite is just to store other data than actual embeddings.
I browsed the GitHub repository and some web pages, but I could not find the exact answer, but my guesstimate is that, ChromaDB stores vectors in memory for fast access, and the reason the data persists in my Docker Compose is that the memory is configured as Docker Volume.
But I'm not sure. I'll correct my understanding later, or I would appreciate it if you could fork and create a pull request toward this page.
Rice Roni 400 Mashed potatoes 400 Protein shake 200 Japanese Matcha Soy 600 Vietnamese Cafe 200
Total 1800 kcal
4 mi run
MUST:
uv
TODO: