Data Collection with Modality

One of Modality's core features is its ability to collect data in whatever form your system can produce it. As such, it provides a flexible set of tools for you to create a data pipeline that works for whatever constraints your system or infrastructure operates under.

# Data Translation

Data translation is the process of transforming the trace data emitted by your system into the Modality data model. The data model is simple and generic: it consists of timelines, events, and attributes. By mapping all trace data from your system into a common representation, regardless of the source, Modality makes it much easier to reason about your system both locally and as a whole.

The actual translation process is performed by plugins to modality-reflector. There is a different plugin for each source platform or protocol; for example, there is an LTTng plugin for collecting traces from embedded linux, and a TraceRecorder plugin for collecting traces from MCUs running FreeRTOS.

Each plugin emits data that has been translated to Modality's data model.

Plugins are configured and managed by the reflector. Each of them has different ways to customize the data translation step, by changing the reflector configuration file. For specific configuration details, see the plugin reference documentation and the modality-reflector configuration file documentation.

Modality provides several platform integrations that automatically handle data translation for some common platforms. We also provide an SDK for writing custom collection and import plugins, or to use the Modality ingest protocol directly.

# Data Enrichment

In the Modality data model, both events and timelines may be labeled with attributes: arbitrary key-value pairs of metadata.

When using a reflector, you can add additional attributes or override existing attributes on the timelines which pass through it. This is a powerful capability that can be used, for example, to segregate and isolate data by locality.

For example, you might have a fleet of robots in garage A, and another in garage B, all belonging to site 12. By configuring a reflector pipeline, you can easily add timeline metadata to the data flowing through that reflector.

This is very powerful for creating a data labeling system that correctly separates the concerns of locality or test environment from the implementation itself. In the above example, each of the components inside each robot doesn't need to KNOW what garage it is in; it simply needs to report telemetry as normal, and the infrastructure at that location applies the appropriate metadata. Having this information at hand lets you produce views or segments of the data at any depth or breadth needed.

# Proxying Through Infrastructure

In the world of embedded systems-of-systems, strange and bespoke infrastructure is the norm. In order to develop and troubleshoot your system, you might have a test rig that you connect to using JTAG, RS-485, and Ethernet all at once. Any practical data collection mechanism for such systems must take this into account.

Even IT systems may have network topologies making it impossible to connect down into a system you're interested in observing.

Modality solves this problem by enabling multiple reflectors to be chained together, in a tree-like structure. All network connections are client-initiated, so each reflector only needs to be configured with knowledge of its direct parent, and be given the ability to connect to it over the network.

The reflector binary is very lightweight, so in many cases you'll be able to deploy it very close to the system being observed. For example, it's perfectly reasonable to deploy the reflector on an APU inside your system that has some spare capacity. It could be configured with plugins that require direct access to the internals of the system, and would then proxy events back out to Modality itself or to an upstream reflector.