20241031 ◎

I talked with my friends for a long time. I'm tired.

ChromaDB and SQLite

Motivation: I got curious how ChromaDB stores embeddings or other data.

Troubleshooting | ChromaDB Docs

Chroma requires SQLite > 3.35

ChromaDB simply uses SQL to store its data.

It's interesting that it uses SQLite. SQLite is not designed to handle large amount of requests. Therefore, ChromaDB may not be the option when developing Web API that need to handle multiple requests at the same time.

SQLite: Begin Concurrent

Usually, SQLite allows at most one writer to proceed concurrently.

I thought ChromaDB was the go-to option, but because its documentation is not very helpful, and the write throughput is limited, I'm inclined to explore other options.

$ docker cp rag-chromadb-1:chroma/chroma/chroma.sqlite3 .
Successfully copied 169kB to /Users/username/rag/.
$ sqlite3 chroma.sqlite3
SQLite version 3.43.2 2023-10-10 13:08:14
Enter ".help" for usage hints.
sqlite>
sqlite> .tables
collection_metadata                embeddings
collections                        embeddings_queue
databases                          embeddings_queue_config
embedding_fulltext_search          maintenance_log
embedding_fulltext_search_config   max_seq_id
embedding_fulltext_search_content  migrations
embedding_fulltext_search_data     segment_metadata
embedding_fulltext_search_docsize  segments
embedding_fulltext_search_idx      tenants
embedding_metadata
sqlite> .headers ON
sqlite> SELECT * FROM collections;
id|name|dimension|database_id|config_json_str
fa8b28c4-2e56-4bd9-aaf5-53ae438f1423|learn|384|00000000-0000-0000-0000-000000000000|{"hnsw_configuration": {"space": "l2", "ef_construction": 100, "ef_search": 10, "num_threads": 8, "M": 16, "resize_factor": 1.2, "batch_size": 100, "sync_threshold": 1000, "_type": "HNSWConfigurationInternal"}, "_type": "CollectionConfigurationInternal"}
sqlite> SELECT * FROM embeddings;
id|segment_id|embedding_id|seq_id|created_at
1|f425325a-7d10-4faa-873a-e86495072384|9527d8eb60f04fa7b109891f396ea1a0||2024-10-12 12:16:25
sqlite> SELECT * FROM embedding_fulltext_search;
string_value
Yudai Yaguchi is 9 feet tall
sqlite> SELECT * FROM embedding_fulltext_search_content;
id|c0
1|Yudai Yaguchi is 9 feet tall
sqlite> SELECT * FROM embedding_fulltext_search_data;
id|block
1|
10|
137438953473|

However, as I browsed the tables, vectors seem not to be stored in the SQL database. SQLite is just to store other data than actual embeddings.

I browsed the GitHub repository and some web pages, but I could not find the exact answer, but my guesstimate is that, ChromaDB stores vectors in memory for fast access, and the reason the data persists in my Docker Compose is that the memory is configured as Docker Volume.

But I'm not sure. I'll correct my understanding later, or I would appreciate it if you could fork and create a pull request toward this page.


Rice Roni 400 Mashed potatoes 400 Protein shake 200 Japanese Matcha Soy 600 Vietnamese Cafe 200

Total 1800 kcal

4 mi run


MUST:

TODO:


index 20241030 20241101