mnestic

Release history

This page summarizes the mnestic fork releases. The authoritative, fully-detailed log lives in CHANGELOG-FORK.md.

0.8.5

Index builds, 15× faster — plus the read path stops paying for transactions it doesn't need.

  • Flat in-RAM parallel index builds::hnsw create used to build the graph through the storage engine's tuple encoding: every neighbour visit paid a decode, a hash of the full compound key, and allocator traffic — a profile showed less than half the build was distance math. The build now works the way hnswlib and pgvector do: a contiguous vector slab, integer-ID adjacency arrays, parallel insertion with per-node locks, then one serialisation pass into the existing on-disk format. Nothing changes for readers: same tuple layout, same search path, same incremental maintenance, and the non-blocking build (reads proceed during construction) is preserved. Measured on the 40k × 384-dim RocksDB corpus: ::hnsw create 294 s → 19 s synthetic, 89.1 s → 8.1 s with real embeddings, recall@10 unchanged. Decomposition: 3.2× from the flat layout (serial), ~5× from parallel insertion. MNESTIC_INDEX_BUILD_THREADS=1 restores serial insertion; parallel builds produce equivalent-recall graphs, not byte-identical ones — guarded by a recall-agreement test. ::fts create gets the same treatment (no deletion pass against a provably-empty index, parallel tokenisation): ~2× on short documents, more on long ones.
  • Plain-snapshot read path (RocksDB) — read-only scripts no longer open a pessimistic transaction; they read through a plain snapshot (the standard MVCC read pattern — TiKV and CockroachDB made the same call). Same consistent view as before, no lock-manager bookkeeping, and reads structurally cannot wait on writer locks. Keyed point reads: p50 28.5 → 23.9 µs (−16%), p99 −19%. Retrieval-scale queries on a cache-resident corpus are unchanged, as expected — and a write attempted through a read-only transaction now errors explicitly. Isolation semantics are pinned by tests.
  • Batched HNSW neighbour readshnsw_search_level fetches unvisited neighbours' vectors through one RocksDB MultiGet per expansion step (shared bloom probes, batched block reads) instead of one serial point get per neighbour. Neutral on a cache-resident corpus; the win case is cold-cache and larger-than-RAM data.
  • ::describe works now — documented, implemented, and tested at the operation level upstream, but never reachable: the grammar rule was missing from the script-level alternations, since before the fork. Wired in, with a read-only-mode guard. A small fix, and a fair illustration of what "maintained" means: someone is reading this code.
  • Requires mnestic-rocks 0.1.9 for the rocksdb feature.

0.8.4

A defect fix plus the fusion-explainability primitive. Also the first release with the Python family on PyPI: mnestic (abi3 wheels for Linux x86_64 + aarch64, macOS x86_64 + arm64, Windows), langchain-mnestic, and llama-index-vector-stores-mnestic.

  • Per-leg fusion detailReciprocalRankFusion(..., detailed: true) switches the output to one row per (item, contributing list): [item, fused_score, list_id, leg_rank, leg_score], where leg_rank is the 1-based within-list rank the fusion actually used. The fused score reconstructs exactly as Σ 1/(k + leg_rank). HybridSearch::detailed plumbs it through the one-call helper (Python: detailed=True) — the mechanism behind "why was this retrieved" surfaces.
  • Concurrency fix — 0.8.3's durable avgdl counter was one shared storage key written inside every document transaction; under RocksDB pessimistic transactions, concurrent writers to any FTS-indexed relation conflicted on that key (and the unlocked read-modify-write lost updates). The doc-stats counter is now process-cached and scan-seeded — one scan per index per process, maintained in memory, no shared hot-path key. Per-query avgdl stays O(1); BM25 scores are unchanged.

0.8.3

Two agentic-memory wedge features, validated end-to-end on the hybrid-recall benchmark (40k chunks, vs SQLite / DuckDB / LanceDB / Kuzu).

  • Native 3-way fused recall — graph proximity is now a typed GraphLeg on hybrid_search. Each leg expands from seed nodes over a stored edge relation up to max_hops, scores every reached node by its minimum hop distance, and feeds that ranked list into the same RRF as the vector and keyword legs — one call, one transaction, no hand-written recursion. The fused call runs all three signals at 41.55 ms p50 (recall 0.873), ~4× faster than the hand-decomposed path, fusing a signal no 2-way engine can.
  • BM25-correct full-text search — the default ::fts score kind is now Okapi BM25 (term-frequency saturation k1 + document-length normalization b), and OR now sums per-term contributions instead of taking the max. avgdl is an O(1) read instead of a per-query index scan (the 0.8.3 durable-counter design was replaced by a process-level cache in 0.8.4 — see above). Fused recall 0.75 → 0.954 (parity with DuckDB's 0.957); cold p99 tail 2,900 → 258 ms.

Caution

Behavior change: the default ::fts scorer moved from tf_idf to bm25. Pass score_kind: 'tf_idf' (or 'tf') for byte-identical upstream scoring.

Hybrid retrieval (RRF + MMR + graph legs)

0.8.2

Non-blocking HNSW index builds. ::hnsw create no longer holds the base relation's write lock during graph construction, so concurrent reads no longer stall for the build's duration. Measured: 90,507 reads completed (slowest 0.8 ms) during a ~5.6 s, 40,000-vector build that would previously have blocked them all. RocksDB only.

Non-blocking HNSW builds

0.8.1

  • One-call hybrid retrievalhybrid_search runs HNSW + FTS (+ optional graph traversal), fuses with RRF, and optionally diversifies with MMR in a single typed call.
  • HNSW index build ~3× faster (20k × 128: 135 s → 43.6 s, measured release); the built graph is byte-identical.
  • mnestic-rocks — the C++/RocksDB bridge is now a maintained fork (importable name stays cozorocks).
  • Blocking clippy CI gate.

Hybrid retrieval (RRF + MMR)

0.8.0

First fork release: CozoDB 0.7.6 plus 30 unreleased upstream commits (the fork point), bumped to 0.8.0 to mark the fork's identity.

  • Equality pushdown — keyed stored_prefix_join for equality post-filters (~28–29× faster single-row lookups at 5k rows).
  • ReciprocalRankFusion / MaximalMarginalRelevance fixed rules (aliases RRF / MMR).
  • ULID functionsrand_ulid(), ulid_timestamp().
  • Parser fix — keyword-prefixed identifiers parse correctly (upstream #281).
  • env_logger moved to a dev-dependency (upstream #287).

Equality pushdown · ULID identifiers

Upstream CozoDB history

CozoDB's own release notes (v0.1 through v0.7) — covering the introduction of HNSW vector search, full-text search, MinHash-LSH, JSON values, and time travel — remain available in the original documentation. mnestic inherits all of that functionality.