Queries
CozoScript, a Datalog dialect, is the query language of the engine.
A CozoScript query consists of one or many named rules. Each named rule
represents a relation, i.e. a collection of data divided into rows and
columns. The rule named ? is the entry to the query, and the relation it
represents is the result of the query. Each named rule has a rule head, which
corresponds to the columns of the relation, and a rule body, which specifies the
content of the relation, or how the content should be computed.
Relations (stored or otherwise) abide by set semantics. Thus even if a rule computes a row multiple times, the resulting relation only contains a single copy.
There are two types of named rules in CozoScript:
- Inline rules, distinguished by using
:=to connect the head and the body. The logic used to compute the resulting relation is defined inline. - Fixed rules, distinguished by using
<~to connect the head and the body. The logic is fixed according to which algorithm or utility is requested.
The constant rules which use <- to connect the head and the body are syntax
sugar. For example:
const_rule[a, b, c] <- [[1, 2, 3], [4, 5, 6]]is identical to:
const_rule[a, b, c] <~ Constant(data: [[1, 2, 3], [4, 5, 6]])Inline rules
An example of an inline rule is:
hc_rule[a, e] := rule_a['constant_string', b], rule_b[b, d, a, e]The rule body of an inline rule consists of multiple atoms joined by commas, and is interpreted as representing the conjunction of these atoms.
Atoms
Atoms come in various flavours. In the example above, rule_a['constant_string', b]
is an atom representing a rule application: a rule named rule_a must exist in
the same query and have the correct arity (2 here). Each row in the named rule is
then unified with the bindings given as parameters in the square bracket: here
the first column is unified with a constant string, and unification succeeds only
when the string completely matches what is given; the second column is unified
with the variable b, and as the variable is fresh at this point (because this
is its first appearance), the unification will always succeed. For subsequent
atoms, the variable becomes bound: it takes on the value of whatever it was
unified with in the named relation. When a bound variable is unified again, for
example b in rule_b[b, d, a, e], this unification will only succeed when the
unified value is the same as the current value. Thus, repeated use of the same
variable in named rules corresponds to inner joins in relational algebra.
Atoms representing applications of stored relations are written with an asterisk before the name:
*stored_relation[bind1, bind2]Written this way using square brackets, as many bindings as the arity of the
stored relation must be given. If some columns do not need to be bound, use the
special underscore variable _: it does not take part in any unifications.
You can also bind columns by name:
*stored_relation{col1: bind1, col2: bind2}In this form, any number of columns may be omitted, and columns may come in any
order. If the name you want to give the binding is the same as the name of the
column, you can write *stored_relation{col1}, which is the same as
*stored_relation{col1: col1}.
Expressions are also atoms, such as a > b + 1. Here a and b must be bound
somewhere else in the rule. Expression atoms must evaluate to booleans, and act
as filters: only rows where the expression evaluates to true are kept.
Unification atoms unify explicitly:
a = b + c + dWhatever appears on the left-hand side must be a single variable and is unified with the result of the right-hand side.
Note
This is different from the equality operator ==, where the left-hand side is
a completely bound expression. When the left-hand side is a single bound
variable, the equality and the unification operators are equivalent.
Unification atoms can also unify with multiple values in a list:
a in [x, y, z]If the right-hand side does not evaluate to a list, an error is raised.
Head
As explained above, atoms correspond to either relations, projections or filters in relational algebra. Linked by commas, they therefore represent a joined relation, with columns either constants or variables. The head of the rule, which in the simplest case is just a list of variables, then defines the columns to keep in the output relation and their order.
Each variable in the head must be bound in the body (the safety rule). Not all variables appearing in the body need to appear in the head.
Multiple definitions and disjunction
For inline rules only, multiple rule definitions may share the same name, with the requirement that the arity of the head in each definition must match. The returned relation is then formed by the disjunction of the multiple definitions (a union of rows).
You may also use the explicit disjunction operator or in a single rule
definition:
rule1[a, b] := rule2[a] or rule3[a], rule4[a, b]There is also an and operator, semantically identical to the comma , but
with higher operator precedence than or (the comma has the lowest precedence).
Negation
Atoms in inline rules may be negated by putting not in front of them, e.g.
not rule1[a, b]. When negating rule applications and stored relations, at least
one binding must be bound somewhere else in the rule in a non-negated context
(another safety rule). The unbound bindings in negated rules remain unbound:
negation cannot introduce new bindings to be used in the head.
Negated expressions act as negative filters, which is semantically equivalent to
putting ! in front of the expression. Explicit unification cannot be negated
unless the left-hand side is bound, in which case it is treated as an expression
atom and then negated.
Recursion
The body of an inline rule may contain rule applications of itself, and multiple
inline rules may apply each other recursively. The only exception is the entry
rule ?, which cannot be referred to by other rules including itself.
Recursion cannot occur in negated positions (safety rule): r[a] := not r[a]
is not allowed.
Caution
As CozoScript allows explicit unification, queries that produce infinite relations may be accepted by the compiler. One of the simplest examples is:
r[a] := a = 0
r[a] := r[b], a = b + 1
?[a] := r[a]It is not even in principle possible to rule out all infinite queries without wrongly rejecting valid ones. If you accidentally submit one, refer to System ops for how to terminate queries. Alternatively, give a timeout for the query when you submit.
Aggregation
In CozoScript, aggregations are specified for inline rules by applying aggregation operators to variables in the rule head:
?[department, count(employee)] := *personnel{department, employee}Here we use the familiar count operator. Any variables in the head without
aggregation operators are treated as grouping variables, and aggregation is
applied using them as keys. If you do not specify any grouping variables, the
resulting relation contains exactly one row.
Aggregation operators are applied to the rows computed by the body of the rule
using bag semantics. The reason for this complication is that if aggregations are
applied with set semantics, then ?[count(employee)] := *personnel{employee}
does not do what you expect: it either returns a single value 1 if there are
any matching rows, or 0 if the stored relation is empty.
If a rule has several definitions, they must have identical aggregations applied in the same positions.
Aggregations are allowed for self-recursion for a limited subset of operators, the so-called semi-lattice aggregations (see Aggregations):
shortest_distance[destination, min(distance)] :=
route{source: 'A', destination, distance}
shortest_distance[destination, min(distance)] :=
shortest_distance[existing_node, prev_distance], # recursion
route{source: existing_node, distance: route_distance},
distance = prev_distance + route_distance
?[destination, min_distance] :=
shortest_distance[destination, min_distance]Here self-recursion of shortest_distance contains the min aggregation.
For a rule head to be considered semi-lattice-aggregate, the aggregations must
come at the end of the rule head. If you write the head as
shortest_distance[min(distance), destination], the engine will complain about
unsafe recursion through aggregation, since written this way min is considered
an ordinary aggregation.
Fixed rules
The body of a fixed rule starts with the name of the utility or algorithm being applied, then takes a specified number of named or stored relations as its input relations, followed by options that you provide. For example:
?[] <~ PageRank(*route[], theta: 0.5)Here the relation *route is the single input relation expected. Input relations
may be stored relations or relations resulting from rules. Each utility/algorithm
expects specific shapes for their input relations; consult the
Utilities & algorithms reference for each one's API.
In fixed rules, bindings for input relations are usually omitted, but sometimes if they are provided they are interpreted and used in algorithm-specific ways (for example in the DFS algorithm).
In the example above, theta is an option of the algorithm, which is required to
be an expression evaluating to a constant. Each utility/algorithm expects specific
types for the options; some options have default values and may be omitted.
Each fixed rule has a determinate output arity. The bindings in the rule head can be omitted, but if provided, you must abide by the arity.
Query options
Each query can have options associated with it:
?[name] := *personnel{name}
:limit 10
:offset 20All query options start with a single colon :. Query options can appear before
or after rules, or even sandwiched between rules. Several query options deal with
transactions; those are discussed in
Stored relations & transactions. The rest are explained
below.
:limit <N>
Limit output relation to at most <N> rows. If possible, execution will stop as
soon as this number of output rows is collected (early stopping).
:offset <N>
Skip the first <N> rows of the returned relation.
:timeout <N>
Abort if the query does not complete within <N> seconds. Seconds may be
specified as an expression so that random timeouts are possible. Defaults to 300
seconds. To disable the timeout, set it to 0.
:sleep <N>
If specified, the query will wait for <N> seconds after completion, before
committing or proceeding to the next query. Useful for deliberately interleaving
concurrent queries to test complex logic.
:sort <SORT_ARG> (, <SORT_ARG>)*
Sort the output relation. If :limit or :offset are specified, they are applied
after :sort. Specify each <SORT_ARG> as it appears in the rule head of the
entry, separated by commas. You can optionally specify the sort direction by
prefixing with + or - (minus denotes descending order), e.g.
:sort -count(employee), dept_name sorts by employee count in reverse order
first, then breaks ties with department name in ascending alphabetical order.
Caution
Aggregations must be done in inline rules, not in output sorting. In the
example above, the entry rule head must contain count(employee); employee
alone is not acceptable.
:order <SORT_ARG> (, <SORT_ARG>)*
Alias for :sort.
:assert none
The query returns nothing if the output relation is empty, otherwise execution aborts with an error. Useful for transactions and triggers.
:assert some
The query returns nothing if the output relation contains at least one row,
otherwise execution aborts with an error. Useful for transactions and triggers.
Consider adding :limit 1 to ensure early termination if you do not need to
check all return tuples.
Adapted from the CozoDB documentation by Ziyang Hu and the Cozo Project Authors, used under CC‑BY‑SA‑4.0. Adaptations for mnestic are released under the same license. mnestic is an independent fork and is not affiliated with or endorsed by the original authors.