Stored relations & transactions

Data on disk lives in stored relations. This page covers everything about writing them: the mutation ops (:create, :put, :rm, and friends), how column specs work, how to run several queries in one atomic transaction, secondary indices, and triggers. To read stored relations in queries you use *relation[...] or *relation{...} atoms, described in Queries.

A schema for an agent's episodic memory takes one op:

:create memory {
    id: String
    =>
    kind: String,
    text: String,
    importance: Float,
    at: Float,
    v: <F32; 4>,
}

Columns before => are the key, columns after it are values. Writing rows is :put, and reading them back is a query:

?[id, kind, text, importance, at, v] <- [
  ['m2', 'decision', 'Chose RocksDB over sled for the write path',            0.9, 1751414400.0, [0.9, 0.1, 0.2, 0.1]],
  ['m3', 'note',     'Nightly compaction stalls search-service around 03:00', 0.7, 1751500800.0, [0.6, 0.1, 0.8, 0.1]],
  ['m4', 'insight',  'Compaction stalls correlate with oversized SST files',   0.8, 1751587200.0, [0.7, 0.1, 0.7, 0.1]],
]
:put memory { id => kind, text, importance, at, v }

?[id, text, importance] := *memory{id, text, importance}, importance >= 0.8

id    text                                                    importance
m2    Chose RocksDB over sled for the write path              0.9
m4    Compaction stalls correlate with oversized SST files    0.8

The rest of this page keeps working in this world. Besides more memory rows, the examples assume three companion relations: entity {id => name, kind} for the people and tools the agent knows about, mentions {memory, entity} linking memories to entities (an all-key relation: with no =>, every column is part of the key), and recalls {from, to => strength}, weighted association edges between memories.

Stored relations

A mutation is a query option: a query produces a result relation, and the op says what to do with it. Successful mutations return a single status row (OK) unless :returning is given.

`:create <NAME> <SPEC>`

Create a stored relation with the given name and spec. No stored relation with the same name can exist beforehand. If a query is specified, data from the resulting relation is put into the newly created stored relation. This is the only stored-relation op for which the query may be omitted.

`:replace <NAME> <SPEC>`

Similar to :create, except that if the named stored relation exists beforehand, it is completely replaced. The schema of the replaced relation need not match the new one. You cannot omit the query for :replace.

Three constraints apply:

A relation with secondary indices cannot be :replaced (error eval::replace_rel_with_indices); drop the indices first.
The old relation's on replace triggers run first, and its on put/on rm triggers apply to the rows written by this :replace statement — which can fail if the schema changed under them — but the relation afterwards has no triggers attached. Re-attach them with ::set_triggers if you need them to survive. (See Triggers.)
:replace cannot run inside a trigger body. Attaching such a trigger succeeds; the first mutation that fires it aborts with eval::replace_in_trigger.

`:put <NAME> <SPEC>`

Put rows from the resulting relation into the named stored relation. If keys from the data exist beforehand, the corresponding rows are replaced with new ones (an upsert).

`:rm <NAME> <SPEC>`

Remove rows from the named stored relation. Only keys are used; removing a non-existent key is not an error and does nothing.

`:insert <NAME> <SPEC>`

Like :put, but if a key from the data exists beforehand, an error is raised. Re-inserting the existing entity e_sam fails with:

Assertion failure for ["e_sam", "Sam A.", "person"] of entity: key exists in database

`:update <NAME> <SPEC>`

Update rows in the named stored relation. Specify all keys and only the non-keys you want to change; the other non-keys keep their old values. Updating a non-existent key is an error.

?[id, importance] <- [['m1', 0.7]]
:update memory { id => importance }

?[text, importance] := *memory{id: 'm1', text, importance}

text                                          importance
Maya prefers pull requests under 400 lines    0.7

`:delete <NAME> <SPEC>`

Like :rm, removes whole rows by key, but deleting a non-existent key raises an error:

Assertion failure for ["e_ghost"] of entity: key does not exists in database

`:ensure <NAME> <SPEC>`

Assert without writing: every row in the result must already exist in the stored relation — the key must exist and the stored values must match — and no other process may have written to those rows when the enclosing transaction commits. Useful for read-write consistency across a chained transaction. A value mismatch aborts with:

Assertion failure for ["m3", "m4", 0.5] of recalls: key exists in database, but value does not match

`:ensure_not <NAME> <SPEC>`

The negation: ensure that the rows (keys) do not exist in the database, and that no other process has written to them when the enclosing transaction commits.

`:returning`

Combined with :put, :rm, :insert, :update or :delete, the mutated rows are returned instead of a status code. The returned schema follows the stored relation, with a _kind field prepended:

?[id, name, kind] <- [['e_ci', 'CI pipeline', 'tool'], ['e_sam', 'Sam Alvarez', 'person']]
:returning
:put entity { id => name, kind }

_kind       id      name           kind
inserted    e_ci    CI pipeline    tool
inserted    e_sam   Sam Alvarez    person
replaced    e_sam   Sam            person

For upserting ops _kind is inserted (the new rows) or replaced; a replaced row reports the old values that were overwritten, so a put over an existing key yields both. For removals it is requested (every key asked for, non-keys filled with null) or deleted (rows that actually existed):

?[id] <- [['e_ci'], ['e_ghost']]
:returning
:rm entity { id }

_kind        id         name           kind
requested    e_ci       null           null
requested    e_ghost    null           null
deleted      e_ci       CI pipeline    tool

You can rename and remove stored relations with the system ops ::rename and ::remove, described in System ops.

mnestic

Relations with a TxTime column write differently (mnestic 0.10.0, bitemporality). The engine stamps the column at commit; supplying it yourself is an error (column tt is engine-assigned at commit and cannot be supplied). :put appends a new belief instead of overwriting, :rm appends a retraction instead of physically deleting, the existence checks of :insert, :delete, :ensure and :ensure_not evaluate against the current belief, and :replace is rejected outright (replace would silently destroy its history). A further op, :reconcile <NAME> <SPEC>, performs recompute-based belief revision on such relations. See Time travel and the fork overview for the temporal model.

Create and replace

The format of <SPEC> is identical for all ops, but the semantics differ slightly. For :create and :replace, the spec defines the schema: columns enclosed in curly braces and separated by commas, with => separating key columns (a composite key) from value columns, as in the memory example at the top of the page. If all columns are keys, => may be omitted. The order of columns matters: rows are stored sorted by key, so the declared key order determines which prefix lookups are cheap.

Each column may declare a type (see Types). On a type mismatch, the system first tries to coerce the given value, and aborts the query with an error if coercion fails. Omitting the type means Any?: all values are acceptable.

By default a spec's column takes the identically named binding from the query. You can specify the correspondence explicitly with =, and you must do so when the query head contains aggregations, since count(memory) is not a valid column name:

?[entity, count(memory)] := *mentions{memory, entity}
:create entity_stats {
    entity: String,
    =>
    mention_count = count(memory),
}

?[entity, mention_count] := *entity_stats{entity, mention_count}

entity       mention_count
e_maya       1
e_pg         3
e_rocksdb    3
e_sam        1
e_search     1

A column with both a type and a correspondence is written mention_count: Int = count(memory).

Instead of a binding, a column can carry a default expression, evaluated anew for each row that omits it:

:create session { id: String default rand_ulid() => topic: String }

?[topic] <- [['compaction stalls'], ['connector timeouts']]
:put session { topic }

?[id, topic] := *session{id, topic}

id                            topic
01KX9F9M5RH80BGZKH0T0PMX1Z    connector timeouts
01KX9F9M5RK8FMN0C3C3ZS0W2R    compaction stalls

(Your ids — and, since both ULIDs share a millisecond, the row order — will differ.) Because the generator runs per row, each row gets a fresh id.

mnestic

rand_ulid() is a mnestic 0.8.0 addition: ULIDs embed a millisecond timestamp in their high-order bits, so key order is creation order — useful for "most recent first" scans with no separate timestamp column. See ULID identifiers.

Caution

A Validity default must not be built on now() directly. now(), parse_timestamp(), round(), floor() and ceil() return float seconds, while a validity stores integer microseconds, so Validity default [floor(now()), true] stamps every row it writes at 1970. mnestic 0.12.2 rejects the float on write (eval::float_validity) instead of coercing it, as every release before it did. Write default [to_int(now() * 1000000), true], or default 'ASSERT' — see Time travel.

Put, update, remove, ensure and ensure-not

For :put, :rm, :ensure and :ensure_not, a column may be omitted from the spec only if it has a default generator in the relation's schema (as session.id above). A merely nullable column must still be supplied; omitting it raises required column <name> not provided by input. Bind it to null explicitly if that is what you mean. Writing default clauses in the spec of these ops has no effect: defaults belong to the schema declared at :create/:replace time.

For :put and :ensure, the spec must generate all keys and all values. For :rm and :ensure_not, it only needs to generate the keys. For :update, specify all keys plus exactly the columns you want to change.

Chaining queries

Each script sent to the engine executes in a single transaction. To make several operations atomic, put multiple queries in one script, each wrapped in curly braces {}. Each query has its own query options, execution proceeds serially, and the first error aborts the whole transaction. The returned relation is that of the last query.

This adds a fresh insight, links it into the association graph, asserts an invariant with :ensure, and reads back the new neighbourhood — all atomically:

{
    ?[id, kind, text, importance, at, v] <- [
        ['m9', 'insight', 'The 03:00 stall vanished after capping SST size',
         0.8, 1752019200.0, [0.7, 0.1, 0.6, 0.1]]
    ]
    :put memory { id => kind, text, importance, at, v }
}
{
    ?[from, to, strength] <- [['m5', 'm9', 0.8]]
    :put recalls { from, to => strength }
}
{
    ?[from, to, strength] <- [['m3', 'm4', 0.9]]
    :ensure recalls { from, to => strength }
}
{
    ?[to, strength] := *recalls{from: 'm5', to, strength}
}

to    strength
m9    0.8

If any part fails — including the :ensure finding that another process changed the m3-to-m4 edge — nothing is committed. :assert none, :assert some, :ensure and :ensure_not are the tools for expressing transaction-level constraints.

When a transaction starts, it reads from a snapshot: only already-committed data, plus writes made within the same transaction, are visible. At the end, changes commit only if there are no conflicts and no errors. If a mutation activates triggers, the triggers run inside the same transaction.

There is a mini-language hidden behind query chains. The chains above are simple query expressions; the other constructs are:

%if <cond> %then ... (%else ...) %end for conditional execution, with a negated form starting %if_not. <cond> is a query expression or an ephemeral relation; either way the condition is a relation, and a relation is falsy if it contains no rows and truthy otherwise.
%loop ... %end for looping, with %break and %continue. Prefix the loop with %mark <marker> and use %break <marker> / %continue <marker> to jump several levels.
%return <query expression, ephemeral relation, or empty> for early termination.
%debug <ephemeral relation> prints the relation to standard output.
%ignore_error <query expression> runs the expression, swallowing any error it raises.
%swap <ephemeral relation> <another ephemeral relation> swaps the contents of two ephemeral relations.

An ephemeral relation is visible only inside its transaction and disappears when the transaction ends. It is created and used exactly like a stored relation, but its name starts with an underscore _. Think of ephemeral relations as the variables of the mini-language.

Caution

Because an ephemeral relation dies with its transaction, :create _draft { id } as a standalone script is a silent no-op: it succeeds, and the relation is gone before the next script runs. Querying *_draft[id] afterwards fails with Cannot find requested stored relation '_draft'. Ephemeral relations are only useful inside a chained script.

Loop until an ephemeral relation has three rows:

{:create _seen {id}}
 
%loop
    %if { len[count(id)] := *_seen[id]; ?[z] := len[z], z >= 3 }
        %then %return _seen
    %end
    { ?[id] := id = rand_uuid_v1(); :put _seen {id} }
%end

id
8e9ce19a-7d6a-11f1-b3ef-3d65da68eb9f
8e9cf3ec-7d6a-11f1-9ea9-74893410b98f
8e9d02f6-7d6a-11f1-85dc-d1e8aa2c521d

(Your UUIDs will differ.)

Two things in this example repay attention. First, the condition produces a row only when the count passes the threshold. Truthiness is has rows, so a condition that always yields a row is always truthy: writing it as ?[x] := len[z], x = z >= 3, which binds the comparison as a boolean column, returns immediately with an empty _seen (a single false row is still a row). Second, the fresh value must come from an inline rule (:=), not a constant rule: the body of a constant rule (<-) is evaluated to a constant once, so a constant-rule rand_uuid_v1() would put the same row forever and the loop would never terminate.

%debug shows a loop draining a relation one row per iteration:

{?[a] <- [[1], [2], [3]]; :replace _test {a}}
 
%loop
    { ?[a] := *_test[a]; :limit 1; :rm _test {a} }
    %debug _test
 
    %if_not _test
    %then %break
    %end
%end
 
%return _test

The return relation is empty; on standard output, %debug printed:

_test: NamedRows { headers: ["a"], rows: [[2], [3]], next: None }
_test: NamedRows { headers: ["a"], rows: [[3]], next: None }
_test: NamedRows { headers: ["a"], rows: [], next: None }

And %swap exchanges two ephemeral relations wholesale. Here the result is empty because _test was swapped with the empty _test2:

{?[a] <- [[1], [2], [3]]; :replace _test {a}}
{?[a] <- []; :replace _test2 {a}}
%swap _test _test2
%return _test

Any query in a script can be postfixed with as <name> (the name starting with an underscore) to store its result in an ephemeral relation, as if by :replace:

{ ?[id] := *memory{id, kind}, kind == 'decision' } as _decisions
%return _decisions

id
m2
m5

The basic query language is already Turing-complete, so the mini-language adds no expressive power — but iterative algorithms are far more direct as chained queries. PageRank as a single Datalog query is a tangle of recursive aggregations; as a chained loop it is a few lines.

Multi-statement transaction

You can also hold a transaction open across multiple round trips from the hosting environment: request a transaction, run queries and mutations against it, then commit or abort. mnestic exposes this in the Rust API (DbInstance::multi_transaction), the Python binding, and the standalone server's /transact HTTP endpoints. It is more flexible than the chaining mini-language, but the surface is specific to each host environment — see Beyond CozoScript.

mnestic

Since mnestic 0.11.0, multi_transaction runs on a dedicated thread instead of parking a rayon worker for the transaction's whole lifetime — a long-lived open transaction can no longer starve the pool and deadlock the process.

Indices

Indices on stored relations are reorderings of the original columns. Consider finding every decision the agent has recorded:

?[id] := *memory{id, kind: 'decision'}

memory is keyed on id alone, so this is a full scan. ::explain (op and ref columns shown) confirms it: load_stored reads all of :memory, and the kind constraint is applied by a materialized join after the fact:

op                 ref
unify              *5
load_stored        :memory
stored_mat_join
out

Create an index whose key starts with kind:

::index create memory:by_kind {kind, id}

You do not specify functional dependencies when creating an index (here there are none anyway). Rows already in the relation are indexed immediately; there is no separate backfill step. An index is a read-only stored relation that you can query directly:

?[id] := *memory:by_kind{kind: 'decision', id}

id
m2
m5

The original query now compiles to the same plan automatically, as a keyed prefix lookup on the index:

op                    ref
unify                 *5
load_stored           :memory:by_kind
stored_prefix_join
out

The engine is deliberately conservative about choosing indices: it has no cost-based optimizer, and an index is only chosen when a prefix of the index's key is bound, the case where it avoids a full scan. You never have to fight the planner away from a bad index choice; when in doubt, query the index explicitly.

mnestic

Index matching is prefix-only, and it composes with the mnestic 0.8.0 equality pushdown: an equality post-filter such as *memory{id, kind}, kind == 'decision' is rewritten into a bound prefix before index selection runs, so both query shapes reach the prefix join above. Upstream compiled the post-filter form to a full scan. See Equality pushdown.

You need not specify all columns when creating an index; the database completes them to form a key. Creating ::index create mentions:by_entity {entity} on mentions {memory, entity} actually builds the index {entity, memory}, as ::columns mentions:by_entity shows:

column    is_key    index    type      has_default    default_expr
entity    true      0        String    false          null
memory    true      1        String    false          null

To drop an index:

::index drop mentions:by_entity

Indices can be used as inputs to fixed rules, and are eligible in time-travel queries as long as their last key column is of type Validity.

B-tree indices are one of four index families. See Proximity searches for the other three: HNSW vector indices (::hnsw), full-text search (::fts) and MinHash-LSH (::lsh).

Triggers

Triggers attach to a stored relation with the system op ::set_triggers:

::set_triggers <REL_NAME>
 
on put { <QUERY> }
on rm { <QUERY> }
on replace { <QUERY> }
on put { <QUERY> } # as many triggers as you need

<QUERY> is any valid query, with one exception: it may not contain a :replace op. ::set_triggers itself accepts such a body; the mutation that fires the trigger is what aborts, with eval::replace_in_trigger.

on put triggers run when rows are inserted or upserted: :put, :insert and :update all activate them, as does the data written by a :replace on the relation. Inside the trigger, the implicitly defined rules _new[] and _old[] hold the added rows and the overwritten rows respectively, each with the relation's full column list as bindings. For fresh inserts _old[] is empty.
on rm triggers run when rows are deleted: :rm and :delete activate them. _new[] holds the keys requested for deletion — key columns only, and a key appears even if no such row existed — while _old[] holds the rows actually deleted, with both keys and non-keys.
on replace triggers are activated by a :replace op and run before any on put triggers.

All triggers for a relation are specified together in one ::set_triggers op; running it again replaces the whole set, and running it with an empty body removes all triggers. ::show_triggers <REL_NAME> lists what is attached.

Association edges are worth querying in both directions, so keep a manually maintained reverse-edge relation in sync with recalls:

:create recalled_by { to: String, from: String }

::set_triggers recalls
 
on put {
    ?[to, from] := _new[from, to, strength]
    :put recalled_by { to, from }
}
on rm {
    ?[to, from] := _old[from, to, strength]
    :rm recalled_by { to, from }
}

Triggers only see writes made after they are attached, so backfill the existing rows once, manually:

?[to, from] := *recalls{from, to}
:put recalled_by { to, from }

From here on the pair stays in sync: a new edge into m2 appears in the reverse relation in the same transaction.

?[from, to, strength] <- [['m9', 'm2', 0.4]]
:put recalls { from, to => strength }

?[from] := *recalled_by{to: 'm2', from}

from
m1
m9

Removing the edge removes its mirror image the same way. (For a pure column reordering like this, ::index create does the same job with the backfill and both triggers for free; write the triggers yourself when the maintained data is not a reordering.)

Cascading cleanup is such a case. Deleting a memory should also delete its mentions rows, something no index can express:

::set_triggers memory
 
on rm {
    ?[memory, entity] := _new[memory], *mentions{memory, entity}
    :rm mentions { memory, entity }
}

?[id] <- [['m7']]
:rm memory { id }

Before, ?[memory] := *mentions{memory, entity: 'e_pg'} returned m6, m7, m8; after the single :rm, it returns:

memory
m6
m8

Caution

Triggers do not propagate: if a trigger's own mutation hits a relation that also has triggers, those do not run. In the cascade above, if mentions had an on rm trigger, deleting m7 from memory would not fire it. (Early CozoDB versions propagated triggers; this was changed upstream because propagation created more problems than it solved.) Chains of derived writes need to be expressed in one trigger.

Caution

:replace does not keep triggers attached. The old triggers fire one last time for the rows the :replace itself writes, but afterwards ::show_triggers on the relation shows nothing. Re-attach the triggers with ::set_triggers after a :replace.

Triggers are also explicitly not run by the bulk ingestion APIs (import_relations, import_from_backup); if imports must activate them, use parameterized queries instead.

mnestic

The bulk import paths maintain B-tree indices but not HNSW, FTS or LSH indices — bulk-imported rows stay invisible to vector, text and similarity search until those indices are rebuilt with ::reindex (mnestic 0.12.1). Both paths now warn loudly instead of corrupting silently: import_relations since mnestic 0.10.5, and import_from_backup since 0.12.1, where it had been entirely silent — a restored backup returned nothing from hybrid retrieval with no signal anywhere. See import_relations.

Storing large values

There is a limit to how much data fits in a single value or row, and it depends on the storage engine. For the in-memory engine the limit is RAM. For the SQLite engine, the keys as a whole and the values as a whole are each stored as a single BLOB field, subject to SQLite's limits. For the RocksDB engine, the keys as a whole form a RocksDB key, which has a hard limit of 8MB and should be kept much smaller for performance. Values have no comparable hard limit. The stock build does not enable RocksDB's BlobDB mode; if you store many large values, you can supply a RocksDB options file (a file named options inside the database directory) that enables it and tunes value storage.

Performance-wise, any large value in a row touched by a query is read into memory whole. If you store large payloads, keep them in a dedicated key-value relation and the metadata in a separate one: search, filter and join on the metadata relation first, and join the large-value relation last, when the row set is already small.

Adapted from the CozoDB documentation by Ziyang Hu and the Cozo Project Authors, used under CC‑BY‑SA‑4.0. Adaptations for mnestic are released under the same license. mnestic is an independent fork and is not affiliated with or endorsed by the original authors.