Integration · Chroma
The fastest local-dev vector DB — Chroma + parsr chunks for prototypes and edge deployments.
Chroma (~14K GitHub stars) is the most-used vector DB for prototyping: pip install, no daemon, embedded SQLite-backed by default, scales to a single-node Chroma Cloud deployment for production. The wedge for parsr customers is the iteration speed — a fintech team can stand up RAG over a parsr-extracted dataset in <50 lines, no infrastructure ticket. Chroma's collection metadata filters mirror parsr's chunk metadata exactly, so `chunk.metadata` JSON hands directly into `collection.add(..., metadatas=[...])`. Production-scale users move to Chroma Cloud or pgvector; Chroma is the right starting point.
Install
One command
pip install parsr-sdk chromadb openaiCode
Working sample
from parsr_sdk import AsyncParsr
import chromadb
parsr = AsyncParsr(api_key="sk_eu_live_...")
client = chromadb.PersistentClient(path="./.chroma")What you get
Highlights
- Zero-ops local dev — pip install, no daemon
- Chroma metadata filters map 1:1 onto parsr chunk metadata
- Persistent client backed by SQLite — your dev DB survives restarts
- Production migration path to Chroma Cloud (or pgvector) without code changes
- Built-in OpenAI / Cohere / Voyage embedding functions — no separate embedder wiring
Architecture
How the pieces fit
Single Chroma collection (or one per doc_type). PersistentClient writes to local SQLite for development; HttpClient targets Chroma Cloud for production. parsr.parse(*, include_chunks=true) → collection.add(ids, documents, metadatas, embeddings). Embeddings can be supplied explicitly (OpenAI route) or computed by Chroma's embedding_function.
Quickstart
End-to-end example
Parse a document with `include_chunks=true`, embed each chunk, upsert into Chroma, query.
import os
from parsr_sdk import AsyncParsr
import chromadb
from chromadb.utils.embedding_functions import OpenAIEmbeddingFunction
parsr = AsyncParsr(api_key=os.environ["PARSR_API_KEY"])
client = chromadb.PersistentClient(path="./.chroma")
embedder = OpenAIEmbeddingFunction(
api_key=os.environ["OPENAI_API_KEY"],
model_name="text-embedding-3-small",
)
collection = client.get_or_create_collection(
name="parsr_invoices",
embedding_function=embedder,
metadata={"hnsw:space": "cosine"},
)
# 1. Parse + chunks.
result = await parsr.parse_invoice(
document_url="https://files.example.com/invoice.pdf",
include_chunks=True,
chunking={"strategy": "block"},
)
# 2. Insert. Chroma calls the embedder for us.
collection.add(
ids=[c.id for c in result.chunks],
documents=[c.text for c in result.chunks],
metadatas=[
{
"doc_type": c.metadata.get("doc_type", "invoice"),
"org_id": "org_acme",
"section": c.metadata.get("section", ""),
"page_numbers_first": (c.page_numbers or [None])[0],
}
for c in result.chunks
],
)
# 3. Filtered semantic query.
hits = collection.query(
query_texts=["Largest line item"],
where={"$and": [
{"org_id": {"$eq": "org_acme"}},
{"doc_type": {"$eq": "invoice"}},
]},
n_results=3,
)
for doc, meta, dist in zip(hits["documents"][0], hits["metadatas"][0], hits["distances"][0]):
print(dist, meta["section"], doc[:80])Cost
What you'll actually pay
Chroma local is free — your only cost is the embedding provider (~€0.02 per 1K chunks at OpenAI text-embedding-3-small). Chroma Cloud Starter is $0/mo up to ~3M vectors as of writing; production tiers start around $30/mo. parsr cost is unchanged. The combined dev-time pipeline costs <€5 for a 10K-chunk demo on a developer's laptop.
Performance
Tuning tips
- Switch from PersistentClient to HttpClient (Chroma Cloud or self-hosted) once your collection passes ~500K vectors — local SQLite gets slow
- Use the OpenAI embedding function with text-embedding-3-small — it batches calls automatically (saves round-trip latency)
- Index by separate collection per doc_type rather than a single mixed collection at scale — Chroma's where filters are cheaper than collection-level scans
- Don't store full chunk text in metadata — Chroma already stores `documents`; duplicating bloats the collection and slows queries
Three lines and you're calling parsr from Chroma.
Start building