Querying the aggregator

The aggregator listens for incoming connection on a unix-domain stream socket. It communicates using the JSON-RPC protocol. It uses line framing (each message is a single line and it is not allowed to format the JSON in multi-line format).

Generally, the protocol is meant to be machine-friendly, and that has certain consequences on the design.

1 Conventions

The JSON examples in this document are pretty-printed to aid the human reader. However, as mentioned above, the real communication is not and each message must be formatted on a single line.

Each message in the examples is prefixed either by A (the aggregator) or F (the frontend) and it denotes the sender of the message. The terms server and clients are not used here, since the terms are already used in the JSON-RPC protocol and in our usage each party can act as both the client and the server (in certain situations).

2 Units

Unless specified otherwise, the protocol uses these units of measurement:

Duration: The duration of something is measured in milliseconds, and carries only the integral part (eg. no fractions).
Time: An absolute time is denoted as the number of milliseconds since the unix epoch (1.1. 1970). This is different from the unix timestamp in the sense that the timestamp is in seconds ‒ therefore our numbers are simply 1000 times larger. It contains no fractions.

Some places allow specifying the time as relative to the current instant, or to an instant of observation. In such case it is in milliseconds from now. As the aggregator is interested only in the past, this makes it easy to distinguish between absolute and relative times. Absolute times are positive, while relative ones are negative. Zero is considered positive in this case (and therefore time of 0 means the epoch).
Size: Sizes are in bytes. They usually denote the whole size of eg. packet, including the IP headers (but without the ethernet frame headers, since they are exchanged on every physical hop and may change, while the IP sizes are end-to-end stable).
Counts: Counts of something (eg. packets) are without any units or multipliers.
Speeds: Speeds are in bytes per second, rounded to the nearest integer.

3 Terminology

Every (successful) query result contains none or more buckets of communication. The buckets are connections (or, more correctly, network flows slices) grouped by certain criteria. Each connection can be present at most in one bucket.

The bucket lists two kinds of information. One is headers, or columns. These are the things that are constant over the life of one connection ‒ like the IP addresses of endpoints, protocols used, etc. It is possible to aggregate (group) the connections together using these columns, or filter interesting connections by them.

The other kind is statistics. These are the bits of information that change over time, like sizes of transferred data or speeds. These can be shown as a grand total per the whole interval of the query for the whole bucket, or split into short intervals, to form kind of graphs. The statistics are provided separately for each direction.

4 Introduction by the `version` notification

Upon connecting, the aggregator sends a version notification. It contains the version number of the API (not of the software) and a list of optional features.

A: {
    "jsonrpc": "2.0",
    "method": "version",
    "params": {
        "major": 0,
        "minor": 2,
        "features": []
    }
}

The frontend is free to ignore the notification if it doesn't care.

The version number is a semantic version in a sense ‒ an increase in a minor version only means the protocol is backwards-compatible. Breaking changes must be advertised by an increase in major version. However, prior to 1.0, any changes are possible at any time.

The features lists a set of features that the client may check. This is in addition to the minor version number ‒ for example, when another column is added, both the minor version number is increased (since it isn't a breaking change) and a feature is added, stating the column is available. However, features are going to be available post 1.0 and each major version may reset the available features (eg. they would be no longer optional and become part of the base protocol).

5 The `query` method

The basic querying can be done with the query method. The query can be used to list various criteria of past communications, as well as statistics of the communication.

An example query might look like this:

F: {
    "jsonrpc": "2.0",
    "id": 42,
    "method": "query",
    "params": {}
}

This query would provide the total sizes and speeds of all the communication held in the aggregator. See below for explanation why that is so.

The parameters described below can be combined in arbitrary ways. However, due to the nature of the aggregator, it is possible that some queries can't be answered because the aggregator doesn't keep all the details for all the time.

Note that the id is important, for two reasons. First, JSON-RPC distinguishes between methods and notifications and if there's no id, the message is considered to be a notification (and no response is sent to notifications). The other reason is if multiple queries are submitted at once, the results may arrive out of order and the id can be used to pair them to the queries.

Also, the order (of parameters or values in the parameters) doesn't matter. Similarly, the results are ordered arbitrarily (with the exception of timelines, see below).

5.1 The time interval

There are two parameters, start and end that specify the interval of the history to examine. If the parameter is present, it specifies one-sided bound on the interval. If it is not, then the query is unbounded on that side and the query is limited only by the available data.

Therefore, to query for the whole history kept inside the aggregator, none of these parameters are present.

A query for the whole January of 2017 would look like this:

F: {
    "jsonrpc": "2.0",
    "id": 42,
    "method": "query",
    "params": {
        "start": 1483225200000,
        "end": 1485903600000
    }
}

A query for the last 30 minutes of data (note the negative number and missing end parameter):

F: {
    "jsonrpc": "2.0",
    "id": 42,
    "method": "query",
    "params": {
        "start": -1800000
    }
}

As the data is not stored as continuous in the aggregator, but in intervals (and these intervals are coarser further in history), the resulting interval will be rounded to the nearest available interval boundary.

5.2 Filtering of the communication

By default all communication in the time interval is considered and processed. It is possible to list criteria the communication must fulfill to be included, by the filter parameter.

The filter parameter is an object (in the JSON terminology) of conditions. Each condition is a tuple of a column and a set of allowed values. All condition must pass for the communication to match (technically, the filter parameter is in CNF) and specifying no filter parameter is equivalent to providing an empty object of conditions.

See below for list of all available columns and their set syntax.

Query only for the communication by local computer with a mac address 11:22:33:44:55:66.

F: {
    "jsonrpc": "2.0",
    "id": 42,
    "method": "query",
    "params": {
        "filter": {
            "local-mac":["11:22:33:44:55:55"]
        }
    }
}

To further limit the query to services with remote DNS name of example.com or example.org and only TCP communication, the query would get extended to:

F: {
    "jsonrpc": "2.0",
    "id": 42,
    "method": "query",
    "params": {
        "filter": {
            "local-mac":["11:22:33:44:55:55"],
            "remote-name-any":["example.org","example.com"],
            "ip-proto":["TCP"]
        }
    }
}

A value of null may be present in each of the sets, which allows the flows that don't have the value for that criterion (eg. there may be flows without a port if they are not TCP nor UDP, or flows without a domain name).

5.3 Asking for specific columns

Only the requested columns are returned. Therefore, to get any, they need to be listed.

The requested columns are passed as an array of column identifiers in the parameter columns. See below for the list of identifiers.

To list the remote domain names used in communications, the frontend would send the following query:

F: {
    "jsonrpc": "2.0",
    "id": 42,
    "method": "query",
    "params": {
        "columns": [
            "remote-name-primary"
        ]
    }
}

Each bucket provides a set of all values in that bucket for each column requested.

5.4 Specifying aggregation

Unless any aggregation is specified, all the communication falls into a single bucket. However, if some columns are selected for aggregation, each unique tuple of the columns' values get their own bucket (no information of that column is considered to be a separate value).

Therefore, if the frontend is interested in communication for each local computer separately, it would ask for aggregation by a local MAC address (because one computer usually has multiple IP addresses, but only a single MAC address).

F: {
    "jsonrpc": "2.0",
    "id": 42,
    "method": "query",
    "params": {
        "aggregate": [
            "local-mac"
        ]
    }
}

To further split the communication by the remote endpoints, one would add the remote IP addresses:

F: {
    "jsonrpc": "2.0",
    "id": 42,
    "method": "query",
    "params": {
        "aggregate": [
            "local-mac",
            "remote-ip"
        ]
    }
}

Columns listed in aggregation are also returned, so they don't need to be listed again in columns.

5.5 More details

By default, each column provides only single statistics snapshot, accounting for the whole interval of the query. If the parameter details is set to true, each bucket provides a series of statistics, corresponding to consecutive time intervals. The intervals are the same for all buckets (therefore some intervals might contain empty statistics, since there was no communication in the bucket at the time).

However, the intervals don't have to be of the same length. In general, older data are stored in coarser form, so the older intervals are longer.

5.6 Putting it all together

The whole power of querying comes when these things get combined together. First, the filtering and time interval restrictions are applied. Then the matching communication is aggregated. Last, the columns and statistics are computed for each bucket.

So, this could be a session of queries the frontend might issue:

Listing all the local MAC addresses active over TCP in the last hour, so the list of computers is found out:

F: {
    "jsonrpc": "2.0",
    "id": 1,
    "method": "query",
    "params": {
        "start": -3600000,
        "filter": {
            "ip-proto":["TCP"]
        },
        "columns": [
            "local-mac"
        ]
    }
}

After getting the list of MAC addresses, the user could pick one computer and have its communication listed. This would provide data of this one computer, with communication to each service separately. For each service it would provide the IP addresses and ports used. Also, detailed statistics (that can be shown in a graph) are requested.

F: {
    "jsonrpc": "2.0",
    "id": 2,
    "method": "query",
    "params": {
        "start": -3600000,
        "filter": {
            "ip-proto":["TCP"],
            "local-mac":["11:22:33:44:55:66"]
        },
        "aggregate": [
            "remote-name-primary"
        ],
        "columns": [
            "remote-ip",
            "remote-port"
        ],
        "details": true
    }
}

5.7 The query response

Each successful query returns a response that looks something like this (including the query as well):

F: {
    "jsonrpc": "2.0",
    "id": 42,
    "method": "query",
    "params": {
        "details": true,
        "columns": [
            "remote-ip"
        ],
        "aggregate": [
            "local-ip"
        ]
    }
}
A: {
    "jsonrpc":"2.0",
    "id":42,
    "result":{
        "buckets": [
            {
                "headers": {
                    "remote-ip": [
                        "10.67.22.1",
                        "34.210.7.70",
                    ],
                    "local-ip": [
                        "10.67.22.8"
                    ]
                }
                "stats": [
                    {
                        "in": {
                            "avg-speed": 755,
                            "max-speed": 943,
                            "packets": 316,
                            "size": 45307,
                            "flows: 2,
                            "start": 1501079439522,
                            "end": 1501079479522
                        },
                        "out": {
                            "avg-speed": 54,
                            "max-speed": 286,
                            "packets": 32,
                            "size": 3218,
                            "flows: 2,
                            "start": 1501079439522,
                            "end": 1501079479522
                        }
                    },
                    {
                        "in": {
                            "avg-speed": 307,
                            "max-speed": 4379,
                            "packets": 43,
                            "size": 18364
                            "flows: 2,
                            "start": 1501079479522,
                            "end": 1501079539521
                        },
                        "out": {
                            "avg-speed": 264,
                            "max-speed": 2460,
                            "packets": 49,
                            "size": 15865,
                            "flows: 2,
                            "start": 1501079479522,
                            "end": 1501079539521
                        }
                    },
                    { }
                ]
            },
            {
                "headers": {
                    "remote-ip": [
                        "2606:2800:134:1a0d:1429:742:782:b6",
                        "2606:2800:220:13d:2176:94a:948:148e",
                        "2a02:2b88:2:1::10b3:1",
                        "fe80::da58:d7ff:fe00:34"
                    ],
                    "local-ip": [
                        "2001:470:58d0:0:bd89:e2c:59e3:517d"
                    ]
                },
                "stats": [
                    {
                        "in": {
                            "avg-speed": 130,
                            "max-speed": 335,
                            "packets": 71,
                            "size": 7840,
                            "flows": 1,
                            "start": 1501079429522,
                            "end": 1501079479510
                        },
                        "out": {
                            "avg-speed": 191,
                            "max-speed": 315,
                            "packets": 124,
                            "size": 11458,
                            "flows": 1,
                            "start": 1501079429522,
                            "end": 1501079479510
                        }
                    },
                    { },
                    { }
                ]
            },
        ],
        "timeline": [
            {
                "end": 1501079479522
            },
            {
                "end": 1501079539522,
                "start": 1501079479522
            },
            {
                "start": 1501079539522
            }
        ]
    }
}

We can see several things here. First, there are two buckets, aggregated by the local IP address. Therefore, communication is split into these buckets. Each bucket lists the local IP address and all the remote IP addresses that local address communicated with. This is in the headers field, which lists the separate columns.

As with filters, each header may contain a null value, which denotes absense of that value for some flows in the bucket. A header with only [null] may be omitted altogether.

Also, each bucket has the stats field. That one is an array of time intervals and each interval contains the statistics of the communication. If one direction or the whole statistic is empty, it means no communication happened during that interval.

In addition to the buckets with communication, we see the timeline field. The timeline describes the intervals into which the query time is split for all the buckets. The first and last interval are bounded only from one end. There's always the same number of elements in the timeline array and in each stats field of a bucket.

In case the details is not set to true, the timeline is not present. In such case, the timeline is not split into intervals (there's only a single interval that is unbounded from both sides). The stats arrays of buckets contain a single element each, with grand-totals across the whole query time.

6 Information in statistics

The statistics contain these information:

packets: Number of packets transferred.
size: Amount of data transferred, in bytes.
avg-speed: The average speed of transfer (summed across all the flows in the bucket).
max-speed: The maximal speed of transfer.
flows: Number of different flows inside the bucket at that slice.
start: The time of the first activity.
end: The time of the last activity.

7 Supported columns

Each column is denoted by an unique name. The names are case sensitive. Further columns may be added in the future, but the frontend never gets columns it didn't ask for.

7.1 Endpoint identifiers

For each such identifier, there're are two columns, one named local-something and the other remote-something. They describe the communication endpoint in LAN or on the wide internet respectively.

local-mac, remote-mac: The MAC (or hardware) address of the endpoint. Note that it isn't always known (and is null in such case), or meaningful (in the case of remote machines).
local-ip, remote-ip: The IP address (either IPv4 or IPv6) of the endpoint.
local-name-primary, remote-name-primary: The primary DNS name of the endpoint. If a DNS name for the communication is known, this one is considered to be the primary one (in case there are multiple). This is based on heuristics and doesn't have to be exactly what the user entered, but this generally prefers things like facebook.com over edge-star-mini-shv-01-atl3.facebook.com.
local-name-set, remote-name-set: The whole set of names of the endpoint is considered a single value ‒ therefore, while the above would fold two facebook servers together, this one would have them separated if their „ugly“ name is different.
local-mac-name, remote-mac-name: Some (usually local) MAC addresses may have a name associated with them (for example from DHCP).
local-port, remote-port: The ports of communication. Not all protocols have ports, but these which do have them always (eg. no TCP flow is missing a port).

7.2 Bidirectional columns

These are the columns that are in just one instance on the flow.

ip-proto: Currently either UDP, TCP or ?. Others might be available in the future (and addition of more types wouldn't be considered a breaking change).
ip-proto-raw: The raw numerical value of the protocol in the IP headers.
direction: The direction in which the communication was initiated, either IN or OUT.

7.3 Presence of columns

Only few of these columns are mandatory to be present:

local-ip, remote-ip
ip-proto, ip-proto-raw
direction

If a column is not present on a flow, it is possible to match the flow by the value of null.

8 The `repeated` method

The query method runs a query once and returns an answer. The repeated query runs the query every time the aggregator closes a batch. The purpose is to keep a live display of activity.

The parameters are:

id: A string, denoting the repeated query instance. It is possible to have multiple of them active.
query: This has the same format and meaning as the whole params of the query method.

If the id is the same as of some already existing repeated query, the original query is replaced. If the query parameter is not present, it deactivates the previous query of the id.

The query is run once and the result is directly provided, as with the query method. This confirms the activation of the query. Further results are sent inside the repeated-result notification. If the first query results in error, the query is not activated.

In case the query parameter is missing (eg. it is used to deactivate a previous query), it returns an empty result.

F: {
    "jsonrpc": "2.0",
    "id": 42,
    "method": "repeated",
    "params": {
        "query": {
            "start": -1800000,
            "columns": ["remote-name-primary"],
        },
        "id": "active-remotes"
    }
}
A: {
    "jsonrpc": "2.0",
    "id": 42,
    "result": {
        "buckets": [
            {
                "headers": {
                    "remote-name-primary": ["example.com"]
                },
                "stats": [{
                    "in": {
                        "avg-speed": 130,
                        "max-speed": 335,
                        "packets": 71,
                        "size": 7840,
                        "flows": 1,
                        "start": 1501079429522,
                        "end": 1501079479510
                    },
                    "out": {
                        "avg-speed": 191,
                        "max-speed": 315,
                        "packets": 124,
                        "size": 11458,
                        "flows": 1,
                        "start": 1501079429522,
                        "end": 1501079479510
                    }
                }]
            }
        ]
    }
}
...
A: {
    "jsonrpc": "2.0",
    "method": "repeated-result",
    "params": {
        "id": "active-remotes",
        "result": {
            "buckets": [
                {
                    "headers": {
                        "remote-name-primary": ["example.com", "example.org"]
                    },
                    "stats": [{
                        "in": {
                            "avg-speed": 230,
                            "max-speed": 735,
                            "packets": 91,
                            "size": 9840,
                            "flows": 1,
                            "start": 1501079429522,
                            "end": 1501079493510
                        },
                        "out": {
                            "avg-speed": 191,
                            "max-speed": 315,
                            "packets": 124,
                            "size": 11458,
                            "flows": 1,
                            "start": 1501079429522,
                            "end": 1501079493510
                        }
                    }]
                }
            ]
        }
    }
}
F: {
    "jsonrpc": "2.0",
    "id": 43,
    "method": "repeated",
    "params": {
        "id": "active-remotes"
    }
}
A: {
    "jsonrpc": "2.0",
    "id": 42,
    "result": {}
}

8.1 Notes

If the query is being replaced and the new query fails, there's no guarantee if the old query stays the same or if it is deactivated.

If relative times are used in start and end, they are relative to each time the query is run. Therefore, the above example is always querying the latest half an hour.