Integration · Qdrant
Self-hosted-or-managed vector DB with Rust performance — parsr chunks + Qdrant + payload filters.
Qdrant (~22K GitHub stars) is the leading Rust-built vector database, optimised for production-scale recall. Two flavours: Qdrant Cloud (managed, EU regions including Frankfurt) and Qdrant self-hosted via Docker. The pull for parsr customers is two-fold: payload filters (Qdrant's term for metadata) compose powerfully with HNSW search via the `must`/`should`/`must_not` query language, and self-hosting gives EU compliance teams a path to vector RAG with zero third-party data residency. Qdrant's `quantization` modes (scalar, product, binary) are the fastest path to fitting >10M finance-document chunks on a single box.
Install
One command
pip install parsr-sdk qdrant-client openaiCode
Working sample
from parsr_sdk import AsyncParsr
from qdrant_client import AsyncQdrantClient
parsr = AsyncParsr(api_key="sk_eu_live_...")
qd = AsyncQdrantClient(host="localhost", port=6333)What you get
Highlights
- Rust core — sub-5ms p95 retrieval at 10M+ vectors on a single box
- Self-hosted via Docker keeps data in your VPC — strongest EU compliance story
- Payload filters compose with HNSW (must/should/must_not) for complex finance queries
- Quantization (scalar/product/binary) trades a few % recall for 4–32x storage savings
- Qdrant Cloud Frankfurt + parsr EU = managed end-to-end EU residency
Architecture
How the pieces fit
One Qdrant collection per doc_type (or a single collection with a doc_type payload field). parsr.parse(*, include_chunks=true) → embed each chunk → upsert into the collection with the chunk's text + metadata as `payload`. Query via `query_points` with the question embedding + a payload filter on org_id and doc_type.
Quickstart
End-to-end example
Parse a document with `include_chunks=true`, embed each chunk, upsert into Qdrant, query.
import os
from parsr_sdk import AsyncParsr
from qdrant_client import AsyncQdrantClient
from qdrant_client.models import (
Distance, VectorParams, PointStruct, Filter, FieldCondition, MatchValue,
)
from openai import AsyncOpenAI
parsr = AsyncParsr(api_key=os.environ["PARSR_API_KEY"])
qd = AsyncQdrantClient(
url=os.environ["QDRANT_URL"], api_key=os.environ.get("QDRANT_API_KEY")
)
openai = AsyncOpenAI()
COLLECTION = "parsr-invoices"
DIM = 1536
# 1. Collection setup (run once).
if not await qd.collection_exists(COLLECTION):
await qd.create_collection(
collection_name=COLLECTION,
vectors_config=VectorParams(size=DIM, distance=Distance.COSINE),
)
# 2. Parse + chunks.
result = await parsr.parse_invoice(
document_url="https://files.example.com/invoice.pdf",
include_chunks=True,
chunking={"strategy": "block"},
)
# 3. Embed + upsert. UUIDs from chunk.id keep upserts idempotent.
texts = [c.text for c in result.chunks]
embeds = await openai.embeddings.create(model="text-embedding-3-small", input=texts)
points = [
PointStruct(
id=c.id,
vector=e.embedding,
payload={
"text": c.text,
"org_id": "org_acme",
"doc_type": c.metadata.get("doc_type", "invoice"),
"page_numbers": c.page_numbers,
"section": c.metadata.get("section", ""),
},
)
for c, e in zip(result.chunks, embeds.data)
]
await qd.upsert(collection_name=COLLECTION, points=points)
# 4. Filtered semantic query.
question = "Largest line item"
qe = await openai.embeddings.create(
model="text-embedding-3-small", input=[question]
)
hits = await qd.query_points(
collection_name=COLLECTION,
query=qe.data[0].embedding,
query_filter=Filter(
must=[
FieldCondition(key="org_id", match=MatchValue(value="org_acme")),
FieldCondition(key="doc_type", match=MatchValue(value="invoice")),
]
),
limit=3,
)
for h in hits.points:
print(h.score, h.payload["section"], h.payload["page_numbers"])Cost
What you'll actually pay
Qdrant Cloud Free covers up to 1 GB / ~250K vectors for development. Production EU starts ~€25/mo for 4 GB. Self-hosted on a CAX21 box is ~€7/mo and supports up to ~5M unquantized 1536-dim vectors comfortably. parsr cost is unchanged. Qdrant is the cheapest path to >10M-vector production deployments thanks to quantization — binary quantization can drop storage 32x with <2% recall loss on finance docs.
Performance
Tuning tips
- Enable scalar quantization (or product, or binary) once you cross ~1M vectors — recall stays within 1-2% on finance docs but RAM drops 4–32x
- Index payload fields you filter on (`create_payload_index` for org_id, doc_type) — without it, large filters degrade to full scans
- Set `hnsw_config.m=16` and `ef_construct=100` for finance docs (defaults are tuned for general-purpose); recall improves measurably
- Use named vectors when you mix embedding models (e.g., a `dense` and a `sparse` vector per chunk for hybrid retrieval)
Three lines and you're calling parsr from Qdrant.
Start building