# Query Language Reference

# Conceptual Overview

The Modality Trace Query Language is a tool for interacting with system trace data acquired with Modality. It can be used to identify portions of the trace which match a certain pattern, to compute aggregations over part of the trace, or to check that a property holds over the collected trace data.

It can also be used as a specification or executable requirement mechanism; using the modality check command described below will evaluate such specifications over actually observed system behavior. When appropriate, Modality can call out counterexamples to your specification or help you narrow down which parts of a non-passing specification are not fulfilled by the system.

# What's in a trace?

In Modality, a system trace consists of a collection of related events. These events are logged using the modality-probe library, and collected from many probes into a session. The session data is stored by Modality, which can execute trace queries over it.

Causality is central to Modality's tracing system - it's how Modality can tell if one event is before or after another. Unlike from other systems you may be used to, event ordering in Modality is not derived from a common time source. Instead, Modality uses the snapshot data passed between modality-probe instances to establish before-and-after relationships between events on different probes. This means that event ordering in Modality is partial: Given two events, one may be before or after the other, or they may have no known ordering at all (that is, they are logically concurrent).

This causal ordering is the backbone of the Modality Trace Query Language.

A diagram of Modality's scope capabailities, defining areas of interest in the complex flow of system events. Modality scope is observing the trace of all events, including several Modality mutations.

# What's a trace query?

Modality stores all the collected trace data for a session in a causal graph, where each node is an event, and each edge represents a causal (happened-before, happened-after) relationship between the events. You can inspect this graph directly using the modality log command.

Trace queries give you a way to use that graph to learn or check things about your system. You might use a trace query to:

  • Find occurrences of an error event
  • Check that all values logged at a particular point in your software are within the desired range
  • Check that all the faults of a certain type are followed by an automatic remediation.
  • Specify that some error conditions should never happen
  • A complex, distributed initialization sequence always occurs as specified.

# A basic example

Here's an example of a trace query:

MATCH radar_power_on@control AS radar_on ->
      (radar_reading@radar AND (payload < 0.2 OR payload > 0.8)) AS invalid_reading

This will look for a sequence of two events: radar_on on the control probe, and invalid_reading on the radar probe. The invalid_reading event is defined as any event named radar_reading whose payload is outside the range [0.2, 0.8]. These matches will be limited to pairs of events where invalid_reading is causally after the radar_on event.

There may be many pairs of events which satisfy the constraints set down in this query. When running the query with the modality query command, each of these pairs of events comprises a single result. modality query will show you many different pairs of events that satisfy the constraints.

# A more sophisticated example

This trace query matches a non-deterministic, distributed initialization sequence. It's written in a specification style, so it can be used to check that the specified sequence is always followed.

FOR ALL
  begin_startup@controller AS start
MATCH
  start
  -> broadcast_init@controller AS bcast WITHIN 1ms
  -> finish_startup@controller AS done WITHIN 1s
INTERSECT
  bcast
  -> poll_completion@controller REPEAT 1..1000
  !-> poll_completion@controller
  -> done
INTERSECT
  bcast
  -> recv_init@radar
  -> send_completion@radar WITHIN 500ms
  -> recv_radar_init_complete@controller
  -> done
INTERSECT bcast
  -> recv_init@imu
  -> send_completion@imu WITHIN 250ms
  -> recv_imu_init_complete@controller
  -> done

This describes four intersecting chains of events, starting with the begin_startup event at the controller probe, and interacting with the radar and imu probes. It specifies time limits using the local timekeeping available at each probe.

When used with the modality query command, this query will find all the named events for the matching initialization sequences. When used with modality check, it will verify that all matching start events are followed by the initialization sequence; if there are any which don't, it will tell you which ones.

# Matching events (MATCH clause)

# Basic structure

The MATCH clause is used to specify local properties of the events you're looking for. An event predicate is used to match against the nodes in the graph, and a label is used to refer to the matched event throughout the rest of the query.

MATCH (id = 12) AS ev1 -> (name = "MY_EVENT") AS ev2

In this example, the id = 12 event predicate is used to identify all nodes in the causal graph where an event with id 12 was logged. Similarly, the name = "MY_MATCH" predicate is used to identify nodes in the causal graph where the logged event has the specified name. These matched nodes are associated with their respective labels, ev1 and ev2, which are available in further query clauses.

Many events may be matched in the MATCH clause. If more than one is present, their relationships must be specified in the where clause.

# Shortcut syntax

# Implicit labels

If you're writing a query that just matches a single event, you don't need to label it:

MATCH name="radar_on" AND probe="ecu"

A label will be automatically generated based on the content of the event predicate.

# event@probe

It is very common to write predicates which match on event name and probe name. There is a shortcut syntax for this purpose:

MATCH some_event_name@some_probe_name

Event and probe IDs can also be used in this way, if you don't have the names available:

MATCH 12@1

This can be combined with constraints as part of a larger event predicate:

MATCH supply_voltage_measurement@ecu AND payload > 12.0

You can also use wildcards here, in either position:

MATCH *@ecu -> another_event@*

# Event Predicates

An event predicate is an expression that is evaluated against a candidate event in the causal graph to determine if it should be included as a query result. It should evaluate to a boolean value; true means the event is included, while false means it is not.

# Operators

Event predicates use typical operators:

  • Boolean operators: AND, OR, NOT
  • Comparison: =, !=, <, >, <=, >=
  • Arithmetic: +, -, *, /

The equality operator "=" is also used for glob-matching: all strings can be compared with a glob pattern that use the "*" character as a wildcard, such as "prefix*suffix".

When a set of values is available, such as the tag, scope, and scope_instance fields, equality tests for membership in that set. Sets of strings support glob matching as well.

# Event Fields

The following named fields may be used in event predicates:

  • id: The numerical event id
  • name: The event's name, as a string
  • payload: The payload provided when the event was logged.
  • probe: The probe where the event was logged. This may be compared with either the numeric probe id or the probe's name.
  • coordinates: The event's coordinates, which identify it by position in the causal graph. This value is found in query results, and in the output of modality log.
  • component: The name of the component in which the event was defined.
  • component_id: The UUID of the component in which the event was defined, formatted as a string.
  • tag: The events tags. To match a tag, use string equality: tag = "debug" will match any event with the debug tag. Glob wildcards may also be used here.
  • scope: The names of all scopes active at the time the event was logged.
  • scope_instance: The numerical ids of all scope instances active at the time the event was logged.
  • file: The path to the file in which the event was defined.
  • mutation_epoch: The numerical id assigned by modality to a section of causal history, used to track the effects of injected mutations.
  • severity: The numerical severity of an event, valued 1-10, derived from the presence of SEVERITY_N tags in the event definition.
  • expectation: The name given to an expectation event. For expectations logged from a probe, this is the same as the event name. For expectations logged with the modality expect command, this is the name given at the command line.
  • is_expectation: A boolean indicating whether an event is an expectation event.
  • outcome: If the event is an expectation, this gives its outcome. This evaluates to the special values PASS or FAIL.

Some of these fields are only available on some events. If an event predicate that uses an unavailable field is evaluated against an event that does not have it, the event is simply skipped.

# ANY

If you need a predicate that can match any event, use the special ANY keyword.

MATCH first@p -> ANY -> second@p

# Grouping

A long or complex path may benefit from some structure. You can define groups in a MATCH clause by parentheses and apply a name:

MATCH (power_on@p AS a -> init@p AS b -> finished_start@p AS c) AS startup -> interesting_error@p

When using grouping, the results are accessed using group.item. For example, the payload of a above could be accessed in a subsequent FILTER clause with startup.a.payload.

# Repetition

The repetition of a single event or a pattern of events can be described using REPEAT ... This works a lot like repetition does in a regular expression:

MATCH (loop_start@p AS start -> loop_end AS end) AS loop REPEAT 2..3

Each repetition is treated like a group for the purpose of results labeling, except the repetition number is addressed as an index. For example, results in the second match of the above repetition named loop can be addressed as loop[1].start or loop[1].end.

# Event relationships (Paths in MATCH)

The MATCH clause can specify multiple events, constraining the matches using structural constraints betwen them. All relationship constraints are based on these basic causal relationships:

  • FOLLOWED BY (or ->)
  • PRECEDED BY (or <-)

These operate on labeled events to constrain the query results to those where the events occur in the given causal order. For example:

MATCH (ANY AS a) FOLLOWED BY (ANY as b)

The parentheses are optional; we use them here for clarity. You could equivalently write:

MATCH ANY AS a -> ANY as b

# Time-limited relationships

If you have added wall-clock time information to your events, you can use that in a relationship constraint:

MATCH ANY AS a -> ANY AS b WITHIN 1 ms

The supported time units are MINUTES, SECONDS, MILLISECONDS, MICROSECONDS, and NANOSECONDS, or their abbreviations m, s, ms, us, and ns.

Note that this will only work for events which share a common time domain, which is specified when you initialize the probe. This should always be the case for events on the same probe. It can work across probes if you have made special accomodations for time synchronization between them (in a way that meets your system requirements) and have appropriately initialized the relevant probes with the same time domain id.

# Step Limits

You can limit the distance between matches in the causal graph using WITHIN <n> STEPS and AFTER <n> STEPS. For example, this will match any two causally connected events within three steps of each other:

MATCH ANY AS a -> ANY AS b WITHIN 3 STEPS

Note While step limits are useful for narrowing down a query during debugging, they are inherently somewhat fragile. When you add additional instrumentation to a SUT, you could easily invalidate the structural assumptions made by some query using step limits. As such, you should avoid using step limits when writing specifications or test cases.

# Negative lookahead

A negative relationship with nothing else after it behaves like a lookahead operator. For example, suppose you want to identify errors which are not followed by a remediation event inside a suitable time window:

MATCH tag='error' NOT FOLLOWED BY tag='remediation' WITHIN 5 ms

The negative relationship operators are:

  • NOT FOLLOWED BY (or !->)
  • NOT PRECEDED BY (or <-!)

Note that the event matched by the right-hand side of the relationship may not be labeled; there's actually nothing there to find, since this is specifying what should /not/ be present.

# Negative path constraints

You can similarly specify a negative constraint to be applied on the path between two nodes. For example, you might want to look for pairs of startup and shutdown events that don't have an error signaled on the path between them:

MATCH startup@p AS a !-> tag='error' -> shutdown@p AS b

The syntax is similar to negative lookahead, but the subsequent -> means that it will apply to any nodes found between the events a and b. There don't have to be any events there: this query will match even if a is directly followed by b, with nothing in between.

# CUT

By default, each relationship matches only the first (nearest) event in the specified direction. There may be times when you want more matches, or even to find out all the matches. The trace query language uses the CUT relationship operator to describe this; this is meant to be reminiscent of 'cut' in Prolog, as it stops exploring a particular branch of potential results once some have been found.

The default behavior is to limit to just one result per relationship. So, these two queries are equivalent.

MATCH ANY AS a -> ANY AS b
MATCH ANY AS a -> ANY AS b CUT 1

If you want to see up to 3 bs for every a, you can say:

MATCH ANY AS a -> ANY AS b CUT 3

If you want to see ALL the possible bs for every a, use NO CUT:

MATCH ANY AS a -> ANY AS b NO CUT

# Multiple paths (INTERSECT)

A plain MATCH clause is sufficent to match trace-like patterns, which consist of a single path through the causal graph. If you want to match more complicated patterns, you can do so by describing them as multiple paths which INTERSECT with each other. For example:

MATCH source_1@p1 -> meet@center AS meet
INTERSECT source_2@p2 -> meet

Note that the meet label is used in both paths: this means that the two nodes are unified. meet has to be the same node in both paths; graphically it's a "V" pattern. Modality will search for ways to choose three nodes such that ALL the given relationships are satisfied.

One common misinterpretation of this is to view them as 'shortcuts', to match multiple events using the same condition. This is not correct. Each time you reuse an event label in an INTERSECT clause, you're applying a new constraint to the same event. So each time you talk about event a, it's always the same a within a single query result.

# Filtering over multiple events (FILTER clause)

The filter clause consists of a single boolean expression that is nearly identical to the expressions used for event predicates. The only difference is that all event fields must be label-qualified, using label.field syntax.

Thus gives the filter expression access to multiple events at the same time. This is the primary use case for FILTER. For example, you might want to find events with the same payload as each other:

MATCH ANY AS a -> ANY AS b
FILTER a.payload = b.payload

# FILTER with CUT relationships

When using FILTER, you should be aware of the fact that relationships only give you the first answer for any relationship by default. You can use this fact to express some unique queries. This is best explained by example. Suppose you wanted to find ground_contact events where the most recent accelerometer reading is above a certain threshold:

MATCH ground_contact@control AS a <- acceleration@control AS b
FILTER b.payload > 12.0

Note that putting the payload constraint in the event predicate for b would have an entirely different meaning; it would find the nearest event to a whose payload is greater than 12.

# Aggregation

Aggregation in the Modality Trace Query Language works very similarly to aggregation in other query languages. You can specify aggregations over previous query results, do some computations with them, and give them names.

MATCH ANY AS a
AGGREGATE stddev(a.payload) AS payload_stddev

Will compute the standard deviation of the payload value for all events, naming it payload_stddev in the aggregation result. Any events that do not have a payload will be skipped, when computing the aggregation.

The parameter to each aggregate function is an expression with access to all matched events, just like the ones used in FILTER.

MATCH ANY AS a, ANY AS b
AGGREGATE stddev(a.payload - b.payload) AS payload_diff_stddev

You can also do computations with the results of multiple aggregations:

MATCH ANY AS a, ANY AS b
AGGREGATE (min(a.payload) - min(b.payload)) / 2 AS min_payload_avg

# Aggregate functions

  • count(): Count the number of results. Unlike the other aggregates, this one doesn't have an expression argument.
  • min(...), max(...): Compute the minimum or maximum value of the expression.
  • sum(...), mean(...): Compute the sum or mean of the expression.
  • distinct(...): Return the set of distinct values taken on by the expression.
  • count_distinct(...): Return the number of distinct values taken on by the expression.
  • variance(...), stddev(...): Calculate the variance or standard deviation of the expression.

# Extended query structures

The basic MATCH/FILTER/AGGREGATE query form is the core of all trace queries. However, there are some extensions to this syntax available.

# NEVER MATCH queries

Regular modality queries are great for specifying what should happen. But what if you want to specify what shouldn't happen? NEVER MATCH queries do the job.

NEVER MATCH supply_sample_voltage@expensive_sensor AND payload > 1.25

You can prepend NEVER to any query that doesn't use aggregation to invert its meaning.

  • When you execute a NEVER MATCH query with the modality check command, it does what you'd expect: if modality can't find the thing that shouldn't exist, the check succeeds. Otherwise, it fails.

  • When you execute a NEVER MATCH query with the modality query, command, a single synthetic query result is returned, where the success label is bound to the value true or false.

# FOR ALL queries

It is common to think of system specification in terms of conditionals: "if this thing occurs, then that other thing should respond in this way." You can verify these behaviors with Modality by using the FOR ALL query structure.

FOR ALL this_thing@over_here AS antecedent
MATCH antecedent -> that_thing@over_there AS consequent

This will start by looking for all the different antecedent events. For each of those, it will check that there's a consequent event which follows it. If there isn't, the query failures.

There are two logical sections to these queries, the FOR ALL part, and the MATCH part. You can put a whole query in each of them (except for AGGREGATE), allowing for very sophisticated antecedents and consequents. Any event labels declared in the MATCH section of the antecedent may be used in the consequent.

In addition to FOR ALL / MATCH, you can also do FOR ALL / NEVER MATCH.

Execution of FOR ALL queries works the same as NEVER queries, described above.

# Mutation preconditions

When injecting mutations into a SUT using the modality mutate command, or in an automated fashion using an objective you can specify when then mutation should be injected as a 'mutation preconditon' query. This is a special form of a trace query that has limited features, as it is executed dynamically by the Modality Probe as the events are logged.

  • Each event predicate must specify both an event and a probe, either by id or by name.

  • Each event predicate may specify up to two constraints on the event's payload, as a comparision with a constant value. These must be ANDed together.

  • No other event predicate structures are allowed.

  • Only FOLLOWED BY relationships are supported, without any extra constraints.

  • Relationships which cross from one probe to another are allowed, but they will only work if you have fully integrated modality-probe into the SUT and are sending snapshots across the probe boundary. The connection can be indirect, via an intermdiate probe.

  • FILTER and AGGREGATE are not allowed, nor are the FOR ALL or NEVER forms.

  • The entire SUT has a limit on the number of event predicates which can be under consideration at once. This applies across the preconditions for all active mutations.

# Trace query structure

# Query clauses

MATCH is present in all trace queries, but there are some additional clauses available as well.

MATCH <event path>
[INTERSECT <event path>]*
[FILTER <filter predicate>]
[AGGREGATE <aggregate definitions>]

The MATCH clause is used to match events in the trace based on event-local predicates. They can use properties such as the event name or id, the probe where the event was logged, the event's payload, and so on.

A 'path' of related events may be specified in the MATCH clause, using the event relationship syntax. For example, you can limit matches to only cases where one event is causally followed by another. The match may be additionally constrained by time elapsed, and by various constraints on the path between the events.

The optional INTERSECT clause may be added to match patterns of events that do not lie along a single causal path. You can add as many of these as you need.

The optional FILTER clause is used to provide an additional level of filtering that applies after everything in the MATCH and INTERSECT clauses is taken into account. It may apply conditions across multiple matched events: for example, you can use a FILTER to look for two events that have the same payload.

The optional AGGREGATE clause is used to compute aggregations over the results matched by the trace query preceding it. You might use this to compute the standard deviation of a certain event payload, or to check that an important startup event happens exactly once.

# Evaluation model

A clear understanding of how each query clause is evaluated is very helpful for writing good trace queries. It is helpful to break it up into three stages.

# Evaluating Events and Relationships (MATCH/INTERSECT)

Modality starts by identifying all the ways that events can be assigned to labels that are consistent with the predicates and relationships in the MATCH and INTERSECT clauses. Each way this assignment can happen is a distinct query result.

For example, consider this query:

MATCH ANY AS a -> ANY AS b

Each result from this query is a map which binds the labels in the MATCH clause: a = some trace event, b = some trace event. The events will be chosen so that b is always causally following a.

There are of course many different ways to choose a and b to make this work. By Modality will try show you the single nearest b for every a. (see CUT for ways to control this). The result of query evaluation is a list of ways to bind its events that match the event predicates and relationship constraints.

Using INTERSECT doesn't change this model; it just adds additional constraints to the matches.

# Filtering Results (FILTER)

From an evaluation perspective, the FILTER clause is applied to each query result after it comes out of the MATCH/INTERSECT section. Its result is used to either keep each result, or throw it out.

The FILTER clause can be used in a way that could be equivalently represented in a MATCH predicate. But much of its unique power lies in the fact that it is evaluated after event relationships are taken into account.

# Aggregation (AGGREGATE)

Aggregation accumulates data from the output of the filtering stage.

# Special syntax rules

  • Keywords are not case sensitive. For example, MATCH and match are treated identically. We use the upper-case form in the documentation for clarity.

  • Spaces, tabs, and newlines may be used freely; all whitespace is equivalent, and indentation is not syntactically significant.

# Query evaluation context

There are three main ways to use trace queries with modality, the query command, the check command, and as a mutation precondition.

# query command

The query command is used when you are interested in the events matched by the query, or the values computed by an aggregation.

# check command

The check command evaluates the query in a pass/fail fashion. It behaves slightly differently depending on the content of the query:

  • For a regular MATCH/INTERSECT/FILTER query, check will succeed if the query has any results.
  • If the query has an AGGREGATE clause, check will succeed if all the boolean-valued aggregates equal true.
  • Some of the extended query behaviors behave differently in a check context.

# Mutation preconditions

A subset of the Modality Trace Query Language may be used to specify preconditions to mutation injection; it is described in detail in the mutation preconditions section.