mnestic

Queries

CozoScript, a Datalog dialect, is the query language of the engine.

A CozoScript query consists of one or many named rules. Each named rule represents a relation, i.e. a collection of data divided into rows and columns. The rule named ? is the entry to the query, and the relation it represents is the result of the query. Each named rule has a rule head, which corresponds to the columns of the relation, and a rule body, which specifies the content of the relation, or how the content should be computed.

Relations (stored or otherwise) abide by set semantics. Thus even if a rule computes a row multiple times, the resulting relation only contains a single copy.

There are two types of named rules in CozoScript:

  • Inline rules, distinguished by using := to connect the head and the body. The logic used to compute the resulting relation is defined inline.
  • Fixed rules, distinguished by using <~ to connect the head and the body. The logic is fixed according to which algorithm or utility is requested.

The constant rules which use <- to connect the head and the body are syntax sugar. For example:

const_rule[a, b, c] <- [[1, 2, 3], [4, 5, 6]]

is identical to:

const_rule[a, b, c] <~ Constant(data: [[1, 2, 3], [4, 5, 6]])

Inline rules

An example of an inline rule is:

hc_rule[a, e] := rule_a['constant_string', b], rule_b[b, d, a, e]

The rule body of an inline rule consists of multiple atoms joined by commas, and is interpreted as representing the conjunction of these atoms.

Atoms

Atoms come in various flavours. In the example above, rule_a['constant_string', b] is an atom representing a rule application: a rule named rule_a must exist in the same query and have the correct arity (2 here). Each row in the named rule is then unified with the bindings given as parameters in the square bracket: here the first column is unified with a constant string, and unification succeeds only when the string completely matches what is given; the second column is unified with the variable b, and as the variable is fresh at this point (because this is its first appearance), the unification will always succeed. For subsequent atoms, the variable becomes bound: it takes on the value of whatever it was unified with in the named relation. When a bound variable is unified again, for example b in rule_b[b, d, a, e], this unification will only succeed when the unified value is the same as the current value. Thus, repeated use of the same variable in named rules corresponds to inner joins in relational algebra.

Atoms representing applications of stored relations are written with an asterisk before the name:

*stored_relation[bind1, bind2]

Written this way using square brackets, as many bindings as the arity of the stored relation must be given. If some columns do not need to be bound, use the special underscore variable _: it does not take part in any unifications.

You can also bind columns by name:

*stored_relation{col1: bind1, col2: bind2}

In this form, any number of columns may be omitted, and columns may come in any order. If the name you want to give the binding is the same as the name of the column, you can write *stored_relation{col1}, which is the same as *stored_relation{col1: col1}.

Expressions are also atoms, such as a > b + 1. Here a and b must be bound somewhere else in the rule. Expression atoms must evaluate to booleans, and act as filters: only rows where the expression evaluates to true are kept.

Unification atoms unify explicitly:

a = b + c + d

Whatever appears on the left-hand side must be a single variable and is unified with the result of the right-hand side.

Note

This is different from the equality operator ==, where the left-hand side is a completely bound expression. When the left-hand side is a single bound variable, the equality and the unification operators are equivalent.

Unification atoms can also unify with multiple values in a list:

a in [x, y, z]

If the right-hand side does not evaluate to a list, an error is raised.

As explained above, atoms correspond to either relations, projections or filters in relational algebra. Linked by commas, they therefore represent a joined relation, with columns either constants or variables. The head of the rule, which in the simplest case is just a list of variables, then defines the columns to keep in the output relation and their order.

Each variable in the head must be bound in the body (the safety rule). Not all variables appearing in the body need to appear in the head.

Multiple definitions and disjunction

For inline rules only, multiple rule definitions may share the same name, with the requirement that the arity of the head in each definition must match. The returned relation is then formed by the disjunction of the multiple definitions (a union of rows).

You may also use the explicit disjunction operator or in a single rule definition:

rule1[a, b] := rule2[a] or rule3[a], rule4[a, b]

There is also an and operator, semantically identical to the comma , but with higher operator precedence than or (the comma has the lowest precedence).

Negation

Atoms in inline rules may be negated by putting not in front of them, e.g. not rule1[a, b]. When negating rule applications and stored relations, at least one binding must be bound somewhere else in the rule in a non-negated context (another safety rule). The unbound bindings in negated rules remain unbound: negation cannot introduce new bindings to be used in the head.

Negated expressions act as negative filters, which is semantically equivalent to putting ! in front of the expression. Explicit unification cannot be negated unless the left-hand side is bound, in which case it is treated as an expression atom and then negated.

Recursion

The body of an inline rule may contain rule applications of itself, and multiple inline rules may apply each other recursively. The only exception is the entry rule ?, which cannot be referred to by other rules including itself.

Recursion cannot occur in negated positions (safety rule): r[a] := not r[a] is not allowed.

Caution

As CozoScript allows explicit unification, queries that produce infinite relations may be accepted by the compiler. One of the simplest examples is:

r[a] := a = 0
r[a] := r[b], a = b + 1
?[a] := r[a]

It is not even in principle possible to rule out all infinite queries without wrongly rejecting valid ones. If you accidentally submit one, refer to System ops for how to terminate queries. Alternatively, give a timeout for the query when you submit.

Aggregation

In CozoScript, aggregations are specified for inline rules by applying aggregation operators to variables in the rule head:

?[department, count(employee)] := *personnel{department, employee}

Here we use the familiar count operator. Any variables in the head without aggregation operators are treated as grouping variables, and aggregation is applied using them as keys. If you do not specify any grouping variables, the resulting relation contains exactly one row.

Aggregation operators are applied to the rows computed by the body of the rule using bag semantics. The reason for this complication is that if aggregations are applied with set semantics, then ?[count(employee)] := *personnel{employee} does not do what you expect: it either returns a single value 1 if there are any matching rows, or 0 if the stored relation is empty.

If a rule has several definitions, they must have identical aggregations applied in the same positions.

Aggregations are allowed for self-recursion for a limited subset of operators, the so-called semi-lattice aggregations (see Aggregations):

shortest_distance[destination, min(distance)] :=
    route{source: 'A', destination, distance}
 
shortest_distance[destination, min(distance)] :=
    shortest_distance[existing_node, prev_distance], # recursion
    route{source: existing_node, distance: route_distance},
    distance = prev_distance + route_distance
 
?[destination, min_distance] :=
    shortest_distance[destination, min_distance]

Here self-recursion of shortest_distance contains the min aggregation.

For a rule head to be considered semi-lattice-aggregate, the aggregations must come at the end of the rule head. If you write the head as shortest_distance[min(distance), destination], the engine will complain about unsafe recursion through aggregation, since written this way min is considered an ordinary aggregation.

Fixed rules

The body of a fixed rule starts with the name of the utility or algorithm being applied, then takes a specified number of named or stored relations as its input relations, followed by options that you provide. For example:

?[] <~ PageRank(*route[], theta: 0.5)

Here the relation *route is the single input relation expected. Input relations may be stored relations or relations resulting from rules. Each utility/algorithm expects specific shapes for their input relations; consult the Utilities & algorithms reference for each one's API.

In fixed rules, bindings for input relations are usually omitted, but sometimes if they are provided they are interpreted and used in algorithm-specific ways (for example in the DFS algorithm).

In the example above, theta is an option of the algorithm, which is required to be an expression evaluating to a constant. Each utility/algorithm expects specific types for the options; some options have default values and may be omitted.

Each fixed rule has a determinate output arity. The bindings in the rule head can be omitted, but if provided, you must abide by the arity.

Query options

Each query can have options associated with it:

?[name] := *personnel{name}
 
:limit 10
:offset 20

All query options start with a single colon :. Query options can appear before or after rules, or even sandwiched between rules. Several query options deal with transactions; those are discussed in Stored relations & transactions. The rest are explained below.

:limit <N>

Limit output relation to at most <N> rows. If possible, execution will stop as soon as this number of output rows is collected (early stopping).

:offset <N>

Skip the first <N> rows of the returned relation.

:timeout <N>

Abort if the query does not complete within <N> seconds. Seconds may be specified as an expression so that random timeouts are possible. Defaults to 300 seconds. To disable the timeout, set it to 0.

:sleep <N>

If specified, the query will wait for <N> seconds after completion, before committing or proceeding to the next query. Useful for deliberately interleaving concurrent queries to test complex logic.

:sort <SORT_ARG> (, <SORT_ARG>)*

Sort the output relation. If :limit or :offset are specified, they are applied after :sort. Specify each <SORT_ARG> as it appears in the rule head of the entry, separated by commas. You can optionally specify the sort direction by prefixing with + or - (minus denotes descending order), e.g. :sort -count(employee), dept_name sorts by employee count in reverse order first, then breaks ties with department name in ascending alphabetical order.

Caution

Aggregations must be done in inline rules, not in output sorting. In the example above, the entry rule head must contain count(employee); employee alone is not acceptable.

:order <SORT_ARG> (, <SORT_ARG>)*

Alias for :sort.

:assert none

The query returns nothing if the output relation is empty, otherwise execution aborts with an error. Useful for transactions and triggers.

:assert some

The query returns nothing if the output relation contains at least one row, otherwise execution aborts with an error. Useful for transactions and triggers. Consider adding :limit 1 to ensure early termination if you do not need to check all return tuples.

Adapted from the CozoDB documentation by Ziyang Hu and the Cozo Project Authors, used under CC‑BY‑SA‑4.0. Adaptations for mnestic are released under the same license. mnestic is an independent fork and is not affiliated with or endorsed by the original authors.