Querying the aggregator
The aggregator listens for incoming connection on a unix-domain stream socket. It communicates using the JSON-RPC protocol. It uses line framing (each message is a single line and it is not allowed to format the JSON in multi-line format).
Generally, the protocol is meant to be machine-friendly, and that has certain consequences on the design.
1 Conventions
The JSON examples in this document are pretty-printed to aid the human reader. However, as mentioned above, the real communication is not and each message must be formatted on a single line.
Each message in the examples is prefixed either by A
(the aggregator) or F
(the frontend) and it denotes the sender of the message. The terms server
and
clients
are not used here, since the terms are already used in the JSON-RPC
protocol and in our usage each party can act as both the client and the server
(in certain situations).
2 Units
Unless specified otherwise, the protocol uses these units of measurement:
Duration: The duration of something is measured in milliseconds, and carries only the integral part (eg. no fractions).
Time: An absolute time is denoted as the number of milliseconds since the unix epoch (1.1. 1970). This is different from the unix timestamp in the sense that the timestamp is in seconds ‒ therefore our numbers are simply 1000 times larger. It contains no fractions.
Some places allow specifying the time as relative to the current instant, or to an instant of observation. In such case it is in milliseconds from now. As the aggregator is interested only in the past, this makes it easy to distinguish between absolute and relative times. Absolute times are positive, while relative ones are negative. Zero is considered positive in this case (and therefore time of 0 means the epoch).
Size: Sizes are in bytes. They usually denote the whole size of eg. packet, including the IP headers (but without the ethernet frame headers, since they are exchanged on every physical hop and may change, while the IP sizes are end-to-end stable).
Counts: Counts of something (eg. packets) are without any units or multipliers.
Speeds: Speeds are in bytes per second, rounded to the nearest integer.
3 Terminology
Every (successful) query result contains none or more buckets of communication. The buckets are connections (or, more correctly, network flows slices) grouped by certain criteria. Each connection can be present at most in one bucket.
The bucket lists two kinds of information. One is headers, or columns. These are the things that are constant over the life of one connection ‒ like the IP addresses of endpoints, protocols used, etc. It is possible to aggregate (group) the connections together using these columns, or filter interesting connections by them.
The other kind is statistics. These are the bits of information that change over time, like sizes of transferred data or speeds. These can be shown as a grand total per the whole interval of the query for the whole bucket, or split into short intervals, to form kind of graphs. The statistics are provided separately for each direction.
4 Introduction by the version
notification
Upon connecting, the aggregator sends a version
notification. It contains the
version number of the API (not of the software) and a list of optional
features.
A: { "jsonrpc": "2.0", "method": "version", "params": { "major": 0, "minor": 2, "features": [] } }
The frontend is free to ignore the notification if it doesn't care.
The version number is a semantic version in a sense ‒ an increase in a minor
version only means the protocol is backwards-compatible. Breaking changes must
be advertised by an increase in major version. However, prior to 1.0
, any
changes are possible at any time.
The features
lists a set of features that the client may check. This is in
addition to the minor version number ‒ for example, when another column is
added, both the minor version number is increased (since it isn't a breaking
change) and a feature is added, stating the column is available. However,
features are going to be available post 1.0
and each major version may reset
the available features (eg. they would be no longer optional and become part of
the base protocol).
5 The query
method
The basic querying can be done with the query
method. The query can be used
to list various criteria of past communications, as well as statistics of the
communication.
An example query might look like this:
F: { "jsonrpc": "2.0", "id": 42, "method": "query", "params": {} }
This query would provide the total sizes and speeds of all the communication held in the aggregator. See below for explanation why that is so.
The parameters described below can be combined in arbitrary ways. However, due to the nature of the aggregator, it is possible that some queries can't be answered because the aggregator doesn't keep all the details for all the time.
Note that the id
is important, for two reasons. First, JSON-RPC distinguishes
between methods and notifications and if there's no id
, the message is
considered to be a notification (and no response is sent to notifications). The
other reason is if multiple queries are submitted at once, the results may
arrive out of order and the id
can be used to pair them to the queries.
Also, the order (of parameters or values in the parameters) doesn't matter. Similarly, the results are ordered arbitrarily (with the exception of timelines, see below).
5.1 The time interval
There are two parameters, start
and end
that specify the interval of the
history to examine. If the parameter is present, it specifies one-sided bound
on the interval. If it is not, then the query is unbounded on that side and the
query is limited only by the available data.
Therefore, to query for the whole history kept inside the aggregator, none of these parameters are present.
A query for the whole January of 2017 would look like this:
F: { "jsonrpc": "2.0", "id": 42, "method": "query", "params": { "start": 1483225200000, "end": 1485903600000 } }
A query for the last 30 minutes of data (note the negative number and missing
end
parameter):
F: { "jsonrpc": "2.0", "id": 42, "method": "query", "params": { "start": -1800000 } }
As the data is not stored as continuous in the aggregator, but in intervals (and these intervals are coarser further in history), the resulting interval will be rounded to the nearest available interval boundary.
5.2 Filtering of the communication
By default all communication in the time interval is considered and processed.
It is possible to list criteria the communication must fulfill to be included,
by the filter
parameter.
The filter
parameter is an object (in the JSON terminology) of conditions.
Each condition is a tuple of a column and a set of allowed values. All condition
must pass for the communication to match (technically, the filter parameter is
in CNF) and specifying
no filter
parameter is equivalent to providing an empty object of conditions.
See below for list of all available columns and their set syntax.
Query only for the communication by local computer with a mac address
11:22:33:44:55:66
.
F: { "jsonrpc": "2.0", "id": 42, "method": "query", "params": { "filter": { "local-mac":["11:22:33:44:55:55"] } } }
To further limit the query to services with remote DNS name of example.com
or
example.org
and only TCP communication, the query would get extended to:
F: { "jsonrpc": "2.0", "id": 42, "method": "query", "params": { "filter": { "local-mac":["11:22:33:44:55:55"], "remote-name-any":["example.org","example.com"], "ip-proto":["TCP"] } } }
A value of null
may be present in each of the sets, which allows the flows
that don't have the value for that criterion (eg. there may be flows without a
port if they are not TCP nor UDP, or flows without a domain name).
5.3 Asking for specific columns
Only the requested columns are returned. Therefore, to get any, they need to be listed.
The requested columns are passed as an array of column identifiers in the
parameter columns
. See below for the list of identifiers.
To list the remote domain names used in communications, the frontend would send the following query:
F: { "jsonrpc": "2.0", "id": 42, "method": "query", "params": { "columns": [ "remote-name-primary" ] } }
Each bucket provides a set of all values in that bucket for each column requested.
5.4 Specifying aggregation
Unless any aggregation is specified, all the communication falls into a single bucket. However, if some columns are selected for aggregation, each unique tuple of the columns' values get their own bucket (no information of that column is considered to be a separate value).
Therefore, if the frontend is interested in communication for each local computer separately, it would ask for aggregation by a local MAC address (because one computer usually has multiple IP addresses, but only a single MAC address).
F: { "jsonrpc": "2.0", "id": 42, "method": "query", "params": { "aggregate": [ "local-mac" ] } }
To further split the communication by the remote endpoints, one would add the remote IP addresses:
F: { "jsonrpc": "2.0", "id": 42, "method": "query", "params": { "aggregate": [ "local-mac", "remote-ip" ] } }
Columns listed in aggregation are also returned, so they don't need to be
listed again in columns
.
5.5 More details
By default, each column provides only single statistics snapshot, accounting
for the whole interval of the query. If the parameter details
is set to
true
, each bucket provides a series of statistics, corresponding to
consecutive time intervals. The intervals are the same for all buckets
(therefore some intervals might contain empty statistics, since there was no
communication in the bucket at the time).
However, the intervals don't have to be of the same length. In general, older data are stored in coarser form, so the older intervals are longer.
5.6 Putting it all together
The whole power of querying comes when these things get combined together. First, the filtering and time interval restrictions are applied. Then the matching communication is aggregated. Last, the columns and statistics are computed for each bucket.
So, this could be a session of queries the frontend might issue:
Listing all the local MAC addresses active over TCP in the last hour, so the list of computers is found out:
F: { "jsonrpc": "2.0", "id": 1, "method": "query", "params": { "start": -3600000, "filter": { "ip-proto":["TCP"] }, "columns": [ "local-mac" ] } }
After getting the list of MAC addresses, the user could pick one computer and have its communication listed. This would provide data of this one computer, with communication to each service separately. For each service it would provide the IP addresses and ports used. Also, detailed statistics (that can be shown in a graph) are requested.
F: { "jsonrpc": "2.0", "id": 2, "method": "query", "params": { "start": -3600000, "filter": { "ip-proto":["TCP"], "local-mac":["11:22:33:44:55:66"] }, "aggregate": [ "remote-name-primary" ], "columns": [ "remote-ip", "remote-port" ], "details": true } }
5.7 The query response
Each successful query returns a response that looks something like this (including the query as well):
F: { "jsonrpc": "2.0", "id": 42, "method": "query", "params": { "details": true, "columns": [ "remote-ip" ], "aggregate": [ "local-ip" ] } } A: { "jsonrpc":"2.0", "id":42, "result":{ "buckets": [ { "headers": { "remote-ip": [ "10.67.22.1", "34.210.7.70", ], "local-ip": [ "10.67.22.8" ] } "stats": [ { "in": { "avg-speed": 755, "max-speed": 943, "packets": 316, "size": 45307, "flows: 2, "start": 1501079439522, "end": 1501079479522 }, "out": { "avg-speed": 54, "max-speed": 286, "packets": 32, "size": 3218, "flows: 2, "start": 1501079439522, "end": 1501079479522 } }, { "in": { "avg-speed": 307, "max-speed": 4379, "packets": 43, "size": 18364 "flows: 2, "start": 1501079479522, "end": 1501079539521 }, "out": { "avg-speed": 264, "max-speed": 2460, "packets": 49, "size": 15865, "flows: 2, "start": 1501079479522, "end": 1501079539521 } }, { } ] }, { "headers": { "remote-ip": [ "2606:2800:134:1a0d:1429:742:782:b6", "2606:2800:220:13d:2176:94a:948:148e", "2a02:2b88:2:1::10b3:1", "fe80::da58:d7ff:fe00:34" ], "local-ip": [ "2001:470:58d0:0:bd89:e2c:59e3:517d" ] }, "stats": [ { "in": { "avg-speed": 130, "max-speed": 335, "packets": 71, "size": 7840, "flows": 1, "start": 1501079429522, "end": 1501079479510 }, "out": { "avg-speed": 191, "max-speed": 315, "packets": 124, "size": 11458, "flows": 1, "start": 1501079429522, "end": 1501079479510 } }, { }, { } ] }, ], "timeline": [ { "end": 1501079479522 }, { "end": 1501079539522, "start": 1501079479522 }, { "start": 1501079539522 } ] } }
We can see several things here. First, there are two buckets, aggregated by the
local IP address. Therefore, communication is split into these buckets. Each
bucket lists the local IP address and all the remote IP addresses that local
address communicated with. This is in the headers
field, which lists the
separate columns.
As with filters, each header may contain a null
value, which denotes absense
of that value for some flows in the bucket. A header with only [null]
may be
omitted altogether.
Also, each bucket has the stats
field. That one is an array of time intervals
and each interval contains the statistics of the communication. If one
direction or the whole statistic is empty, it means no communication happened
during that interval.
In addition to the buckets with communication, we see the timeline
field. The
timeline describes the intervals into which the query time is split for all the
buckets. The first and last interval are bounded only from one end. There's
always the same number of elements in the timeline
array and in each stats
field of a bucket.
In case the details
is not set to true
, the timeline
is not present. In
such case, the timeline is not split into intervals (there's only a single
interval that is unbounded from both sides). The stats
arrays of buckets
contain a single element each, with grand-totals across the whole query time.
6 Information in statistics
The statistics contain these information:
packets
: Number of packets transferred.size
: Amount of data transferred, in bytes.avg-speed
: The average speed of transfer (summed across all the flows in the bucket).max-speed
: The maximal speed of transfer.flows
: Number of different flows inside the bucket at that slice.start
: The time of the first activity.end
: The time of the last activity.
7 Supported columns
Each column is denoted by an unique name. The names are case sensitive. Further columns may be added in the future, but the frontend never gets columns it didn't ask for.
7.1 Endpoint identifiers
For each such identifier, there're are two columns, one named local-something
and the other remote-something
. They describe the communication endpoint in
LAN or on the wide internet respectively.
local-mac
,remote-mac
: The MAC (or hardware) address of the endpoint. Note that it isn't always known (and isnull
in such case), or meaningful (in the case of remote machines).local-ip
,remote-ip
: The IP address (either IPv4 or IPv6) of the endpoint.local-name-primary
,remote-name-primary
: The primary DNS name of the endpoint. If a DNS name for the communication is known, this one is considered to be the primary one (in case there are multiple). This is based on heuristics and doesn't have to be exactly what the user entered, but this generally prefers things likefacebook.com
overedge-star-mini-shv-01-atl3.facebook.com
.local-name-set
,remote-name-set
: The whole set of names of the endpoint is considered a single value ‒ therefore, while the above would fold two facebook servers together, this one would have them separated if their „ugly“ name is different.local-mac-name
,remote-mac-name
: Some (usually local) MAC addresses may have a name associated with them (for example from DHCP).local-port
,remote-port
: The ports of communication. Not all protocols have ports, but these which do have them always (eg. no TCP flow is missing a port).
7.2 Bidirectional columns
These are the columns that are in just one instance on the flow.
ip-proto
: Currently eitherUDP
,TCP
or?
. Others might be available in the future (and addition of more types wouldn't be considered a breaking change).ip-proto-raw
: The raw numerical value of the protocol in the IP headers.direction
: The direction in which the communication was initiated, eitherIN
orOUT
.
7.3 Presence of columns
Only few of these columns are mandatory to be present:
local-ip
,remote-ip
ip-proto
,ip-proto-raw
direction
If a column is not present on a flow, it is possible to match the flow by
the value of null
.
8 The repeated
method
The query
method runs a query once and returns an answer. The repeated
query runs the query every time the aggregator closes a batch. The purpose is
to keep a live display of activity.
The parameters are:
id
: A string, denoting the repeated query instance. It is possible to have multiple of them active.query
: This has the same format and meaning as the wholeparams
of thequery
method.
If the id
is the same as of some already existing repeated query, the
original query is replaced. If the query parameter is not present, it
deactivates the previous query of the id
.
The query is run once and the result is directly provided, as with the query
method. This confirms the activation of the query. Further results are sent
inside the repeated-result
notification. If the first query results in error,
the query is not activated.
In case the query
parameter is missing (eg. it is used to deactivate a
previous query), it returns an empty result.
F: { "jsonrpc": "2.0", "id": 42, "method": "repeated", "params": { "query": { "start": -1800000, "columns": ["remote-name-primary"], }, "id": "active-remotes" } } A: { "jsonrpc": "2.0", "id": 42, "result": { "buckets": [ { "headers": { "remote-name-primary": ["example.com"] }, "stats": [{ "in": { "avg-speed": 130, "max-speed": 335, "packets": 71, "size": 7840, "flows": 1, "start": 1501079429522, "end": 1501079479510 }, "out": { "avg-speed": 191, "max-speed": 315, "packets": 124, "size": 11458, "flows": 1, "start": 1501079429522, "end": 1501079479510 } }] } ] } } ... A: { "jsonrpc": "2.0", "method": "repeated-result", "params": { "id": "active-remotes", "result": { "buckets": [ { "headers": { "remote-name-primary": ["example.com", "example.org"] }, "stats": [{ "in": { "avg-speed": 230, "max-speed": 735, "packets": 91, "size": 9840, "flows": 1, "start": 1501079429522, "end": 1501079493510 }, "out": { "avg-speed": 191, "max-speed": 315, "packets": 124, "size": 11458, "flows": 1, "start": 1501079429522, "end": 1501079493510 } }] } ] } } } F: { "jsonrpc": "2.0", "id": 43, "method": "repeated", "params": { "id": "active-remotes" } } A: { "jsonrpc": "2.0", "id": 42, "result": {} }
8.1 Notes
If the query is being replaced and the new query fails, there's no guarantee if the old query stays the same or if it is deactivated.
If relative times are used in start
and end
, they are relative to each time
the query is run. Therefore, the above example is always querying the latest
half an hour.