Release history
This page summarizes the mnestic fork releases. The authoritative,
fully-detailed log lives in
CHANGELOG-FORK.md.
0.8.5
Index builds, 15× faster — plus the read path stops paying for transactions it doesn't need.
- Flat in-RAM parallel index builds —
::hnsw createused to build the graph through the storage engine's tuple encoding: every neighbour visit paid a decode, a hash of the full compound key, and allocator traffic — a profile showed less than half the build was distance math. The build now works the way hnswlib and pgvector do: a contiguous vector slab, integer-ID adjacency arrays, parallel insertion with per-node locks, then one serialisation pass into the existing on-disk format. Nothing changes for readers: same tuple layout, same search path, same incremental maintenance, and the non-blocking build (reads proceed during construction) is preserved. Measured on the 40k × 384-dim RocksDB corpus:::hnsw create294 s → 19 s synthetic, 89.1 s → 8.1 s with real embeddings, recall@10 unchanged. Decomposition: 3.2× from the flat layout (serial), ~5× from parallel insertion.MNESTIC_INDEX_BUILD_THREADS=1restores serial insertion; parallel builds produce equivalent-recall graphs, not byte-identical ones — guarded by a recall-agreement test.::fts creategets the same treatment (no deletion pass against a provably-empty index, parallel tokenisation): ~2× on short documents, more on long ones. - Plain-snapshot read path (RocksDB) — read-only scripts no longer open a pessimistic transaction; they read through a plain snapshot (the standard MVCC read pattern — TiKV and CockroachDB made the same call). Same consistent view as before, no lock-manager bookkeeping, and reads structurally cannot wait on writer locks. Keyed point reads: p50 28.5 → 23.9 µs (−16%), p99 −19%. Retrieval-scale queries on a cache-resident corpus are unchanged, as expected — and a write attempted through a read-only transaction now errors explicitly. Isolation semantics are pinned by tests.
- Batched HNSW neighbour reads —
hnsw_search_levelfetches unvisited neighbours' vectors through one RocksDBMultiGetper expansion step (shared bloom probes, batched block reads) instead of one serial point get per neighbour. Neutral on a cache-resident corpus; the win case is cold-cache and larger-than-RAM data. ::describeworks now — documented, implemented, and tested at the operation level upstream, but never reachable: the grammar rule was missing from the script-level alternations, since before the fork. Wired in, with a read-only-mode guard. A small fix, and a fair illustration of what "maintained" means: someone is reading this code.- Requires
mnestic-rocks0.1.9 for the rocksdb feature.
0.8.4
A defect fix plus the fusion-explainability primitive. Also the first release
with the Python family on PyPI: mnestic (abi3 wheels for Linux x86_64 +
aarch64, macOS x86_64 + arm64, Windows), langchain-mnestic, and
llama-index-vector-stores-mnestic.
- Per-leg fusion detail —
ReciprocalRankFusion(..., detailed: true)switches the output to one row per (item, contributing list):[item, fused_score, list_id, leg_rank, leg_score], whereleg_rankis the 1-based within-list rank the fusion actually used. The fused score reconstructs exactly asΣ 1/(k + leg_rank).HybridSearch::detailedplumbs it through the one-call helper (Python:detailed=True) — the mechanism behind "why was this retrieved" surfaces. - Concurrency fix — 0.8.3's durable
avgdlcounter was one shared storage key written inside every document transaction; under RocksDB pessimistic transactions, concurrent writers to any FTS-indexed relation conflicted on that key (and the unlocked read-modify-write lost updates). The doc-stats counter is now process-cached and scan-seeded — one scan per index per process, maintained in memory, no shared hot-path key. Per-queryavgdlstays O(1); BM25 scores are unchanged.
0.8.3
Two agentic-memory wedge features, validated end-to-end on the hybrid-recall benchmark (40k chunks, vs SQLite / DuckDB / LanceDB / Kuzu).
- Native 3-way fused recall — graph proximity is now a typed
GraphLegonhybrid_search. Each leg expands from seed nodes over a stored edge relation up tomax_hops, scores every reached node by its minimum hop distance, and feeds that ranked list into the same RRF as the vector and keyword legs — one call, one transaction, no hand-written recursion. The fused call runs all three signals at 41.55 ms p50 (recall 0.873), ~4× faster than the hand-decomposed path, fusing a signal no 2-way engine can. - BM25-correct full-text search — the default
::ftsscore kind is now Okapi BM25 (term-frequency saturationk1+ document-length normalizationb), andORnow sums per-term contributions instead of taking the max.avgdlis an O(1) read instead of a per-query index scan (the 0.8.3 durable-counter design was replaced by a process-level cache in 0.8.4 — see above). Fused recall 0.75 → 0.954 (parity with DuckDB's 0.957); cold p99 tail 2,900 → 258 ms.
Caution
Behavior change: the default ::fts scorer moved from tf_idf to bm25.
Pass score_kind: 'tf_idf' (or 'tf') for byte-identical upstream scoring.
→ Hybrid retrieval (RRF + MMR + graph legs)
0.8.2
Non-blocking HNSW index builds. ::hnsw create no longer holds the base
relation's write lock during graph construction, so concurrent reads no longer
stall for the build's duration. Measured: 90,507 reads completed (slowest 0.8 ms)
during a ~5.6 s, 40,000-vector build that would previously have blocked them all.
RocksDB only.
0.8.1
- One-call hybrid retrieval —
hybrid_searchruns HNSW + FTS (+ optional graph traversal), fuses with RRF, and optionally diversifies with MMR in a single typed call. - HNSW index build ~3× faster (20k × 128: 135 s → 43.6 s, measured release); the built graph is byte-identical.
mnestic-rocks— the C++/RocksDB bridge is now a maintained fork (importable name stayscozorocks).- Blocking clippy CI gate.
→ Hybrid retrieval (RRF + MMR)
0.8.0
First fork release: CozoDB 0.7.6 plus 30 unreleased upstream commits (the fork point), bumped to 0.8.0 to mark the fork's identity.
- Equality pushdown — keyed
stored_prefix_joinfor equality post-filters (~28–29× faster single-row lookups at 5k rows). ReciprocalRankFusion/MaximalMarginalRelevancefixed rules (aliasesRRF/MMR).- ULID functions —
rand_ulid(),ulid_timestamp(). - Parser fix — keyword-prefixed identifiers parse correctly (upstream #281).
env_loggermoved to a dev-dependency (upstream #287).
→ Equality pushdown · ULID identifiers
Upstream CozoDB history
CozoDB's own release notes (v0.1 through v0.7) — covering the introduction of HNSW vector search, full-text search, MinHash-LSH, JSON values, and time travel — remain available in the original documentation. mnestic inherits all of that functionality.