mnestic

Hybrid retrieval (RRF + MMR)

Vector similarity, keyword match, and graph proximity are three different signals about what to recall right now. mnestic fuses them with Reciprocal Rank Fusion (RRF) and de-duplicates the result with Maximal Marginal Relevance (MMR) — exposed both as composable Datalog fixed rules and as a single typed Rust call.

mnestic

This entire page is specific to mnestic. RRF and MMR landed in 0.8.0; the one-call hybrid_search API landed in 0.8.1; native graph legs (GraphLeg) and BM25-default full-text scoring landed in 0.8.3.

DbInstance::hybrid_search (and Db::hybrid_search) assemble the proven CozoScript pattern, pass the query vector and text as script parameters (never string-interpolated), validate every interpolated identifier against injection, and run it read-only.

use cozo::{DbInstance, GraphLeg, HybridSearch, MmrParams};
 
let recalls = db.hybrid_search(&HybridSearch {
    relation:     "memory".into(),
    vector_index: "embedding".into(),
    query_vector: cue,              // Vec<f32> from your embedder
    vector_k:     24,
    ef:           80,
    fts_index:    "summary_fts".into(),
    query_text:   "pricing decision".into(),
    fts_k:        24,
    // graph leg: 2-hop proximity from a seed over *recalls,
    // ranked by minimum hop distance, fused in the same call.
    graph_legs:   vec![GraphLeg {
        edge_relation: "recalls".into(),
        seeds:         vec![seed.into()],
        max_hops:      2,
        ..GraphLeg::default()
    }],
    rrf_k:        60.0,
    mmr: Some(MmrParams {
        lambda: 0.5,
        k: 12,
        embedding_col: "embedding".into(),
    }),
    ..HybridSearch::default()
})?;

To see (or hand-tune) the CozoScript it generates rather than run it, call hybrid_search_script with the same HybridSearch.

HybridSearch fields

FieldDefaultMeaning
relationThe base stored relation, e.g. "memory".
id_col"id"Key column holding the item id.
vector_indexHNSW index name (the <name> in relation:<name>).
query_vectorQuery embedding for the semantic leg.
vector_f64falseSend the query vector as F64 to match an F64 index.
vector_k10k for the HNSW search.
ef50HNSW search breadth.
fts_indexFull-text index name.
query_textQuery text for the keyword leg (a CozoScript FTS expression).
fts_k10k for the FTS search.
graph_legs[]Typed graph-proximity legs (GraphLeg) — bounded-hop traversal fused as ranked lists. (0.8.3)
extra_lists[]Low-level escape hatch: raw ranked lists spliced into the fusion. Prefer graph_legs for traversal.
rrf_k60.0RRF rank-bias damping constant.
mmrNoneOptional MMR diversity rerank. None returns the fused ranking directly.
limit10Max rows when no MMR rerank is applied.
detailedfalsePer-leg contribution rows — see below. (0.8.4)

The graph signal: typed GraphLeg (0.8.3)

A GraphLeg expands from a set of seeds over a stored edge relation up to max_hops, scores every reached node by its minimum hop distance (closer ⇒ higher rank), and contributes that ranked list to the same Reciprocal Rank Fusion as the vector and keyword legs. mnestic generates the recursive shortest-path rule for you — a seed relation, a hop-1 base rule, and a min(dist) recursive rule gated at max_hops — so there is no hand-written recursion.

use cozo::GraphLeg;
 
GraphLeg {
    label:         "graph".into(),   // fusion list tag (validated)
    edge_relation: "recalls".into(), // the stored edge relation to traverse
    from_col:      "from".into(),    // source-id column   (default "from")
    to_col:        "to".into(),      // dest-id column     (default "to")
    seeds:         vec![seed.into()],// query anchors (not themselves scored)
    max_hops:      2,                // bounded expansion; must be >= 1
    undirected:    false,            // also follow to_col -> from_col when true
}

mnestic

Injection-safe. Seed values are passed as query parameters ($hg{i}_seed{j}), never string-interpolated; the label, edge relation, and column names are validated as bare identifiers, and empty seeds / max_hops == 0 are rejected. Multiple seeds are unioned, and multiple graph_legs each become their own ranked list in the fusion. An empty graph_legs generates the exact pre-0.8.3 script (backward compatible).

Escape hatch: extra_lists

If you need a ranked list that isn't a bounded-hop traversal, extra_lists takes a HybridList { label, rule_body }. The rule_body is a Datalog rule body that must bind two variables — id (the item key, matching the fused output) and score (higher is better). Prefer graph_legs for graph proximity: an extra_lists entry is a single spliced rule body and cannot express the recursive shortest-path rule that bounded-hop proximity needs.

use cozo::HybridList;
 
HybridList {
    label: "recency".into(),
    rule_body: "*memory{ id, created_at }, score = created_at".into(),
}

Caution

rule_body is your own Datalog and is spliced verbatim — it is not sanitized. Only label is validated. Keep it free of untrusted input.

Per-leg detail: recall that explains itself (0.8.4)

With detailed: true the output switches to one row per (item, contributing leg): [id, score, list_id, leg_rank, leg_score] — which legs surfaced each result, the 1-based within-leg rank the fusion actually used (after best-score dedup), and the leg's raw score (cosine for the vector leg, BM25 for the keyword leg, negative hop distance for a graph leg). Legs an item did not appear in contribute no row, and the fused score reconstructs exactly:

score = Σ over legs of 1 / (rrf_k + leg_rank)

Without MMR the row limit widens to limit × leg-count so the top limit items always arrive with all their legs; with MMR the detail is joined onto MMR's selection (head: [id, rank, score, list_id, leg_rank, leg_score]).

The same option exists on the raw fixed rule — ReciprocalRankFusion(combined[lid, item, score], k: 60, detailed: true) — and in the Python binding (detailed=True in the hybrid_search dict).

This is the substrate for "why was this retrieved" product surfaces: every fused ranking decomposes into auditable per-signal contributions.

The keyword leg uses BM25 (0.8.3)

The FTS leg scores with Okapi BM25 by default (term-frequency saturation + document-length normalization, with OR-disjunction summing per-term contributions). This is what lifted fused recall@10 from ~0.75 to 0.954 in the benchmark. It is the engine-wide ::fts default — see Proximity searches for score_kind tuning (k1, b) and how to fall back to tf_idf/tf.

The primitives, in Datalog

If you want full control, use the fixed rules directly. Both are also available as the aliases RRF and MMR.

ReciprocalRankFusion

Fuses several ranked result lists into one ranking via Σ 1/(k + rank_in_list). Input is a single relation [list_id, item, score]; rows are grouped by list_id, ranked within each list by score, and the reciprocal-rank contributions are summed per item.

  • Options: k (default 60), descending (default true).
  • Output: [item, fused_score], composable in further Datalog.
candidates[list_id, item, score] <- [
    ['vec', 'a', 0.91], ['vec', 'b', 0.74],
    ['fts', 'b', 7.2],  ['fts', 'c', 5.1],
]
 
?[item, fused] <~ ReciprocalRankFusion(candidates[], k: 60)
:order -fused

Note

Datalog can already sum reciprocal contributions, but it cannot assign a rank position within a group — that intra-list ranking is the missing primitive RRF supplies.

MaximalMarginalRelevance

Re-ranks a candidate set to balance relevance against diversity, avoiding near-duplicate recalls. It greedily selects argmax(λ·relevance − (1−λ)·max cosine_sim to already-selected).

  • Input: [item, relevance, vector].
  • Options: lambda (default 0.5, clamped to [0, 1]), k (default 0 = all).
  • Output: [item, rank] in selection order.
?[item, rank] <~ MaximalMarginalRelevance(candidates[], lambda: 0.5, k: 12)
:order rank

Both rules reject non-finite (NaN/inf) scores; MMR also rejects inconsistent vector dimensions rather than panicking, and uses the true maximum cosine similarity so anti-correlated candidates are rewarded with diversity credit.

Why one call

Before mnestic, composing this pipeline took roughly seven hand-assembled Datalog rules: run the HNSW leg, run the FTS leg, tag and union them into ranked lists, fuse, then rerank. hybrid_search collapses that into a typed call while still letting you inspect the generated script.