Tips for writing queries
Dealing with nulls
Cozo is strict about types. A simple query such as:
?[a] := *rel[a, b], b > 0will throw if some of the b is null: comparisons can only be made between
values of the same type. The solution is that you may decide to consider any
null values to be equivalent to some default values:
?[a] := *rel[a, b], (b ~ -1) > 0here ~ is the coalesce operator. The parentheses are not necessary, but it
reads better this way.
You can also check for null explicitly:
?[a] := *rel[a, b], if(is_null(b), false, b > 0)cond is also helpful in this case.
How to join relations
Suppose we have the following relation:
:create friend {fr, to}Let's say we want to find Alice's friends' friends' friends' friends' friends. One way to write this is:
?[who] := *friends{fr: 'Alice', to: f1},
*friends{fr: f1, to: f2},
*friends{fr: f2, to: f3},
*friends{fr: f3, to: f4},
*friends{fr: f4, to: who}Another way is:
f1[who] := *friends{fr: 'Alice', to: who}
f2[who] := f1[fr], *friends{fr, to: who}
f3[who] := f2[fr], *friends{fr, to: who}
f4[who] := f3[fr], *friends{fr, to: who}
?[who] := f4[fr], *friends{fr, to: who}These two queries yield identical values. But on real networks, where loops abound, the second way of writing executes exponentially faster than the first. Why? Because of set semantics in relations, the second way of writing deduplicates at every turn, whereas the first way of writing builds up all paths to the final layer of friends. In fact, even if there are no duplicates, the second version may still be faster, because in Cozo rules run in parallel whenever allowed by semantics and available resources.
The moral of the story is, always prefer to break your query into smaller rules. It usually reads better, and unlike in some other databases, it almost always executes faster in Cozo as well. But for this particular case, in which the query is largely recursive, prefer to make it a recursive relation:
f_n[who, min(layer)] := *friends{fr: 'Alice', to: who}, layer = 1
f_n[who, min(layer)] := f_n[fr, last_layer], *friends{fr, to: who}, layer = last_layer + 1, layer <= 5
?[who] := f_n[who, 5]The condition layer <= 5 is necessary to ensure termination.
Are there any situations where the first way of writing is acceptable? Yes:
?[who] := *friends{fr: 'Alice', to: f1},
*friends{fr: f1, to: f2},
*friends{fr: f2, to: f3},
*friends{fr: f3, to: f4},
*friends{fr: f4, to: who}
:limit 1in this case, we stop at the first path, and this way of writing avoids the overhead of multiple rules and is perhaps very slightly faster.
Also, if you want to count the different paths, you must write:
?[count(who)] := *friends{fr: 'Alice', to: f1},
*friends{fr: f1, to: f2},
*friends{fr: f2, to: f3},
*friends{fr: f3, to: f4},
*friends{fr: f4, to: who}The multiple-rules way of writing gives wrong results due to set semantics. Due
to the presence of the aggregation count, this query only keeps a single path
in memory at any instant, so it won't blow up your memory even on web-scale
data.
Adapted from the CozoDB documentation by Ziyang Hu and the Cozo Project Authors, used under CC‑BY‑SA‑4.0. Adaptations for mnestic are released under the same license. mnestic is an independent fork and is not affiliated with or endorsed by the original authors.