Skip to content

Latest commit

 

History

History
560 lines (382 loc) · 33 KB

README.md

File metadata and controls

560 lines (382 loc) · 33 KB

ADS-B Massive Visualizer

This website (technology demo) allows you to aggregate and visualize massive amounts of air traffic data. The data is hosted in a ClickHouse database and queried on the fly. You can tune the visualizations with custom SQL queries and drill-down from 50 billion records to individual data records.

Examples

The visualization is insanely beautiful.

Helicopters over Manhattan

Denver airport

If we zoom into a single airport, we can see where the planes are parked and even color them by a manufacturer or an airline:

Denver airport

You can select only military airplanes and find military bases and airfields.

Military training in Texas

Langley

You can select only Helicopters and find hospitals and police stations.

VUMC

London helicopters

Las Vegas helicopters

Gliders are also interesting.

Gliders in Netherlands near Utrecht

Gliders in Netherlands near Utrecht

Small airports are beautiful.

Bay Area small airports

Dallas small airports

F-16 military training:

F-16 air bases in the US

F-16 training in Langley

F-16 training in Belgium

It is interesting to explore strange gaps in the map.

Kaliningrad

Popocatepetl

Canary

Balloons and dirigibles are nice.

Indianapolis Balloons

Project Loon

Balloons over Alps

Zeppelin

Zeppelin in LA

In Dubai Airport, the green hairball is a hangar of Emirates Engineering where Airbuses are maintained: Emirates Engineering

In Israel, there are strange patterns made by patrolling drones: Patroling in Israel

In Namibia, there is a nice air club: Namibia air club

Near Toulouse, France, Airbus A-380 does its test flights:

A-380 in Toulouse

Cognac Air Base

Area 51

By editing the SQL query you can color every airline with its own color. Airlines all over Europe

Airlines in Denver airport

Analyzing a single airline we can see how the war affected the air traffic. War

There are interesting examples of single aircrafts: Police

An-124 is a beautiful cargo plane, and you can find many of them in Leipzig: An-124

But DC-3 is unanimously the best:

DC-3

Data Source

ADS-B (Automatic Dependent Surveillance–Broadcast) is a radio protocol that is used to broadcast the data such as coordinates, altitude and velocity, and plenty of other interesting data. It is broadcast by "transponders" installed on airplanes (and not only planes). This protocol is unencrypted and there are no restrictions for collecting, using, or redistributing this data. Most passenger airplanes are obligated to broadcast this data, and in certain countries, even gliders, drones, and airport ground vehicles. Military and private light aircrafts might broadcast or not broadcast the data.

It is possible to collect this data out of thin air using your own radio receiver (e.g., in the form of SDR), although your receiver will see the data only in a certain range of your location. There are platforms for sharing and exchange of this data. Some of these platforms invite participants to share the data but restrict its redistribution by providing commercial access to it. While the source data, broadcast by airplanes, is essentially public domain, the companies may produce and license derivative works from this data.

We use the data from three sources: ADSB.lol (full historical data is provided without restrictions: 30..50 million records per day, available since 2023), Airplanes.live (full historical data since 2023, 30..70 million records per day, and live real-time feed - they provide this data to us with the following restrictions: - no reselling the data; - no commercial use (except for within this application) without an agreement; - attribution is required) and ADSB-Exchange (only provides samples of data from first day of each month: around 1.2 billion records per day with better coverage).

The data from ADSB.lol, is made available under the Open Database License (ODbL).

Implementation Details

The website is implemented as a single HTML page. It does not use JavaScript frameworks and the source code is not minified, so you can easily read it.

Rendering the Map

It uses a Leaflet library to display the map. It adds two layers to the map. The background layer uses tiles from OpenStreetMap for a usual geographic map. The main layer overlays the visualization on top of the background map.

The visualization layer uses a GridLayer with a custom callback function createTile which generates Canvas elements on the fly:

L.GridLayer.ClickHouse = L.GridLayer.extend({
    createTile: function(coords, done) {
        let tile = L.DomUtil.create('canvas', 'leaflet-tile');
        tile.width = 1024;
        tile.height = 1024;
        render(this.options.table, this.options.priority, coords, tile).then(err => done(err, tile));
        return tile;
    }
});

const layer_options = {
    tileSize: 1024,
    minZoom: 2,
    maxZoom: 19,
    minNativeZoom: 2,
    maxNativeZoom: 16,
    attribution: '© Alexey Milovidov, ClickHouse, Inc. (data: adsb.lol, airplanes.live, adsbexchange.com)'
};

It uses tiles of 1024x1024 size for high resolution and to lower the number of requests to the database.

The rendering function performs a request to ClickHouse using its HTTP API with the JavaScript's fetch function:

const query_id = `${uuid}-${query_sequence_num}-${table}-${coords.z - 2}-${coords.x}-${coords.y}`;
const hosts = getHosts(key);
const url = host => `${host}/?user=website&default_format=RowBinary` +
    `&query_id=${query_id}&replace_running_query=1` +
    `&param_table=${table}&param_sampling=${[0, 100, 10, 1][priority]}` +
    `&param_z=${coords.z - 2}&param_x=${coords.x}&param_y=${coords.y}`;

progress_update_period = 1;
const response = await Promise.race(hosts.map(host => fetch(url(host), { method: 'POST', body: sql })));

The SQL query can be edited by a user on the fly using a form on the page to adjust the visualization. It is a parameterized query, accepting tile coordinates (x, y) and zoom level as parameters.

The query returns RGBA values of each pixel of the image in the RowBinary format (1024x1024 pixels, 1048576 rows, 4 bytes each, 4 MiB in total for each tile). It uses zstd compression in HTTP response as long as the browser supports it. It was a nice observation that zstd compression over raw pixels bitmap works better than PNG! (not surprising, though).

Most of the time the image data compresses several times. But anyway, hundreds of megabytes have to be transferred over the network. That's why the service can feel slow on bad Internet connections.

let ctx = tile.getContext('2d');
let image = ctx.createImageData(1024, 1024, {colorSpace: 'display-p3'});
let arr = new Uint8ClampedArray(buf);

for (let i = 0; i < 1024 * 1024 * 4; ++i) { image.data[i] = arr[i]; }

ctx.putImageData(image, 0, 0, 0, 0, 1024, 1024);

The data is put on the canvas and returned. We use the "Display P3" color space to have a wider gamut in supporting browsers.

We use three different tables with different levels of detail: planes_mercator contains 100% of the data, planes_mercator_sample10 contains 10% of the data, and planes_mercator_sample100 contains 1% of the data. The loading starts with a 1% sample to provide instant response even while rendering the whole world. After loading the first level of detail, it continues to the next level of 10%, and then it continues with 100% of the data. This gives a nice effect of progressive loading.

The image data is also cached on the client simply in a JavaScript object:

if (!cached_tiles[key]) cached_tiles[key] = [];

/// If there is a higer-detail tile, skip rendering of this level of detal.
if (cached_tiles[key][priority + 1]) return;

buf = cached_tiles[key][priority];

The downside is that after browsing for a certain time, the page will eat too much memory.

Database and Queries

The database is pretty small by ClickHouse standards. It has 44.47 billion rows in the planes_mercator table as of Mar 29th, 2024, and is continuously updated with new records. It takes 1.6 TB of disk space.

The table schema is as follows (you can read it in the setup.sql source):

CREATE TABLE planes_mercator
(
    mercator_x UInt32 MATERIALIZED 0xFFFFFFFF * ((lon + 180) / 360),
    mercator_y UInt32 MATERIALIZED 0xFFFFFFFF * (1/2 - log(tan((lat + 90) / 360 * pi())) / 2 / pi()),

    INDEX idx_x (mercator_x) TYPE minmax,
    INDEX idx_y (mercator_y) TYPE minmax,

    time DateTime64(3),
    date Date,
    icao String,
    r String,
    t LowCardinality(String),
    dbFlags Int32,
    noRegData Bool,
    ownOp LowCardinality(String),
    year UInt16,
    desc LowCardinality(String),
    lat Float64,
    lon Float64,
    altitude Int32,
    ground_speed Float32,
    track_degrees Float32,
    flags UInt32,
    vertical_rate Int32,
    aircraft_alert Int64,
    aircraft_alt_geom Int64,
    aircraft_gva Int64,
    aircraft_nac_p Int64,
    aircraft_nac_v Int64,
    aircraft_nic Int64,
    aircraft_nic_baro Int64,
    aircraft_rc Int64,
    aircraft_sda Int64,
    aircraft_sil Int64,
    aircraft_sil_type LowCardinality(String),
    aircraft_spi Int64,
    aircraft_track Float64,
    aircraft_type LowCardinality(String),
    aircraft_version Int64,
    aircraft_category Enum8(
        'A0', 'A1', 'A2', 'A3', 'A4', 'A5', 'A6', 'A7',
        'B0', 'B1', 'B2', 'B3', 'B4', 'B5', 'B6', 'B7',
        'C0', 'C1', 'C2', 'C3', 'C4', 'C5', 'C6', 'C7',
        'D0', 'D1', 'D2', 'D3', 'D4', 'D5', 'D6', 'D7',
        ''),
    aircraft_emergency Enum8('', 'none', 'general', 'downed', 'lifeguard', 'minfuel', 'nordo', 'unlawful', 'reserved'),
    aircraft_flight LowCardinality(String),
    aircraft_squawk String,
    aircraft_baro_rate Int64,
    aircraft_nav_altitude_fms Int64,
    aircraft_nav_altitude_mcp Int64,
    aircraft_nav_modes Array(Enum8('althold', 'approach', 'autopilot', 'lnav', 'tcas', 'vnav')),
    aircraft_nav_qnh Float64,
    aircraft_geom_rate Int64,
    aircraft_ias Int64,
    aircraft_mach Float64,
    aircraft_mag_heading Float64,
    aircraft_oat Int64,
    aircraft_roll Float64,
    aircraft_tas Int64,
    aircraft_tat Int64,
    aircraft_true_heading Float64,
    aircraft_wd Int64,
    aircraft_ws Int64,
    aircraft_track_rate Float64,
    aircraft_nav_heading Float64,
    source LowCardinality(String),
    geometric_altitude Int32,
    geometric_vertical_rate Int32,
    indicated_airspeed Int32,
    roll_angle Float32
) ENGINE = MergeTree ORDER BY (mortonEncode(mercator_x, mercator_y), time);

It contains lat and lon columns with coordinates, and we use MATERIALIZED columns to automatically convert them to coordinates in the Web-Mercator projection, which is used by the Leaflet software and most of the maps on the Internet. The Mercator coordinates are stored in UInt32 which makes it easy to do arithmetics with tile coordinates and zoom levels in a SQL query.

The table is sorted by a Morton Curve of Web Mercator coordinates, and we defined a minmax index on them - this is how queries for certain tiles will read only the requested data.

We create level-of-detail tables with Materialized Views, so they are calculated automatically:

CREATE TABLE planes_mercator_sample10 AS planes_mercator;
CREATE TABLE planes_mercator_sample100 AS planes_mercator;

CREATE MATERIALIZED VIEW view_sample10 TO planes_mercator_sample10 AS SELECT * FROM planes_mercator WHERE rand() % 10 = 0;
CREATE MATERIALIZED VIEW view_sample100 TO planes_mercator_sample100 AS SELECT * FROM planes_mercator WHERE rand() % 100 = 0;

We use a service on our staging environment in ClickHouse Cloud. The staging environment is used to test new ClickHouse versions and new types of infrastructure that we implement. For example, we can try different types of machines, and different scales of the service, or we can test new features, such as distributed cache, that are in development. Also, the staging environment uses fault injection: we interrupt network connections with a certain probability to ensure that the service operates normally. It uses chaos engineering: we terminate various machines of clickhouse-server and clickhouse-keeper at random, and also randomly scale the service back and forth to a different number of machines. This is how this project facilitates the development and testing of our service.

We also created a backup service and we are making requests to both of them concurrently. Whichever service returns first, will be used. This is how we can avoid downtime while still using our staging environment.

Let's take a look at an SQL query for the "Boeing vs. Airbus" visualization:

WITH
    bitShiftLeft(1::UInt64, {z:UInt8}) AS zoom_factor,
    bitShiftLeft(1::UInt64, 32 - {z:UInt8}) AS tile_size,

    tile_size * {x:UInt16} AS tile_x_begin,
    tile_size * ({x:UInt16} + 1) AS tile_x_end,

    tile_size * {y:UInt16} AS tile_y_begin,
    tile_size * ({y:UInt16} + 1) AS tile_y_end,

    mercator_x >= tile_x_begin AND mercator_x < tile_x_end
    AND mercator_y >= tile_y_begin AND mercator_y < tile_y_end AS in_tile,

    bitShiftRight(mercator_x - tile_x_begin, 32 - 10 - {z:UInt8}) AS x,
    bitShiftRight(mercator_y - tile_y_begin, 32 - 10 - {z:UInt8}) AS y,

    y * 1024 + x AS pos,

    count() AS total,
    sum(desc LIKE 'BOEING%') AS boeing,
    sum(desc LIKE 'AIRBUS%') AS airbus,
    sum(NOT (desc LIKE 'BOEING%' OR desc LIKE 'AIRBUS%')) AS other,

    greatest(1000000 DIV {sampling:UInt32} DIV zoom_factor, total) AS max_total,
    greatest(1000000 DIV {sampling:UInt32} DIV zoom_factor, boeing) AS max_boeing,
    greatest(1000000 DIV {sampling:UInt32} DIV zoom_factor, airbus) AS max_airbus,
    greatest(1000000 DIV {sampling:UInt32} DIV zoom_factor, other) AS max_other,

    pow(total / max_total, 1/5) AS transparency,

    255 * (1 + transparency) / 2 AS alpha,
    pow(boeing, 1/5) * 256 DIV (1 + pow(max_boeing, 1/5)) AS red,
    pow(airbus, 1/5) * 256 DIV (1 + pow(max_airbus, 1/5)) AS green,
    pow(other, 1/5) * 256 DIV (1 + pow(max_other, 1/5)) AS blue

SELECT round(red)::UInt8, round(green)::UInt8, round(blue)::UInt8, round(alpha)::UInt8
FROM {table:Identifier}
WHERE in_tile
GROUP BY pos ORDER BY pos WITH FILL FROM 0 TO 1024*1024

It uses the WITH clause to define aliases for various expressions. It uses type-safe parameters (x, y, z, sampling, table) for parameterized queries.

The first part of the query calculates the condition in_tile which is used in the WHERE section to filter the data in the requested tile. Then it calculates the colors: alpha, red, green, and blue. They are adjusted by the pow function for better uniformity, clamped to the 0..255 range, and converted to UInt8. The sampling parameter is used for adjustment in a way so that queries with a lower level of detail will return pictures with mostly the same relative colors. We group by the pixel coordinate pos and also use the WITH FILL modifier in the ORDER BY to fill zeros in the pixel positions that have no data. As a result, we will get an RGBA bitmap with the exact 1024x1024 size.

Reports

If you select an area with the right mouse button or use a selection tool, it will generate a report from the database for the selection area. This is entirely straightforward. For example, here is a query for the top aircraft types:

const sql_types = `
    WITH mercator_x >= {left:UInt32} AND mercator_x < {right:UInt32}
        AND mercator_y >= {top:UInt32} AND mercator_y < {bottom:UInt32} AS in_tile
    SELECT t, anyIf(desc, desc != '') AS desc, count() AS c
    FROM {table:Identifier}
    WHERE t != '' AND ${condition}
    GROUP BY t
    ORDER BY c DESC
    LIMIT 100`;

The report is calculated for flight numbers, aircraft types, registration (tail numbers), and owners. You can click on any item and it will apply a filter to the main SQL query. For example, click on A388 and it will show you a visualization for Airbus 380-800.

As a bonus, if you move the cursor over an aircraft type, it will go to Wikipedia API and try to find a picture of this aircraft. It often finds something else on Wikipedia, though.

Saved Queries

You can edit a query and then share a link. The query is converted to a 128-bit hash and saved in the same ClickHouse database:

async function saveQuery(text) {
    const sql = `INSERT INTO saved_queries (text) FORMAT RawBLOB`;
    const hosts = getHosts(null);
    const url = host => `${host}/?user=website_saved_queries&query=${encodeURIComponent(sql)}`;
    const response = await Promise.all(hosts.map(host => fetch(url(host), { method: 'POST', body: text })));
}

async function loadQuery(hash) {
    const sql = `SELECT text FROM saved_queries WHERE hash = unhex({hash:String}) LIMIT 1`;
    const hosts = getHosts(null);
    const url = host => `${host}/?user=website_saved_queries&default_format=JSON&param_hash=${hash}`;
    const response = await Promise.race(hosts.map(host => fetch(url(host), { method: 'POST', body: sql })));
    const data = await response.json();
    return data.data[0].text;
}

We use a different user website_saved_queries with different access control and quotas for these queries.

Progress Bar

It is nice to display a progress bar with the amount of data processed in rows and bytes.

const sql = `SELECT
            sum(read_rows) AS r,
            sum(total_rows_approx) AS t,
            sum(read_bytes) AS b,
            r / max(elapsed) AS rps,
            b / max(elapsed) AS bps,
            formatReadableQuantity(r) AS formatted_rows,
            formatReadableSize(b) AS formatted_bytes,
            formatReadableQuantity(rps) AS formatted_rps,
            formatReadableSize(bps) AS formatted_bps
        FROM clusterAllReplicas(default, system.processes)
        WHERE user = 'website' AND startsWith(query_id, {uuid:String})`;

    const hosts = getHosts(uuid);
    const url = host => `${host}/?user=website_progress&default_format=JSON&param_uuid=${uuid}`;

    let responses = await Promise.all(hosts.map(host => fetch(url(host), { method: 'POST', body: sql })));
    let datas = await Promise.all(responses.map(response => response.json()));

We just select from system.processes across all servers of the cluster. Actually, it displays not the precise progress, because there are many tiles requested in parallel and many queries, with some of them finished and some in progress. The query will see only in-progress queries, so the number of total processed records will be lower than the actual.

We also color the progress bar differently when we are loading the first level of detail, the second level of detail, etc.

Cache Locality

The service in ClickHouse Cloud can use multiple replicas and by default, the requests are routed to an arbitrary healthy replica. If a query is going to process a large amount of data, it will be parallelized across many replicas, but if it is a small query it can use just a single replica. The data is stored in AWS S3 and each replica pod also has a locally attached SSD which is used for the cache, and consequently, the page cache in memory also makes an effect.

To achieve better performance, we use a "sticky connections" feature of ClickHouse Cloud:

let prefix = '';
if (stick) {
    const hash = sipHash128(JSON.stringify(stick)).substr(0, 3);
    prefix = `${hash}.sticky.`;
}
const cloud_host = `https://${prefix}kvzqttvc2n.eu-west-1.aws.clickhouse-staging.com`;

It maps a query to a pseudorandom hostname, e.g. xyz.sticky.kvzqttvc2n.eu-west-1.aws.clickhouse-staging.com so that the same query will be mapped to the same hostname. The load balancer maps this hostname to a replica using a consistent hashing algorithm.

References

Similar Projects

Similar projects track back to 2007. Analysis and visualization of the air traffic and geospatial data are easy and accessible (assuming you use ClickHouse).

Flight Patterns by Aaron Koblin (2007)

Airspace Rebooted by ItoWorld and Hal Bertram (2010)

Also, read the blog post about rebuilding this heatmap (2017).

Highways In The Skies by Topi Tjukanov (2018)

Flight Data Visualization by Stephen (2018)

Carto.com: BigQuery Tiler (2021)

Due to the limitations of the underlying technology (BigQuery instead of ClickHouse), the service works slowly and each query comes with a big cost.

One Day of Global Air Traffic by Carlos Bergillos (2022)

This project is very similar to mine, but I found it late. It uses pre-generated tiles instead of generating them on the fly. The author probably didn't know about ClickHouse. This limits it to just a single visualization.

Heavy.AI: Ships Tracking (2022)

This is a demo from another database vendor. They use a smaller dataset (AIS data for ship tracking) but the demo is slow.

Global Flight Tracking by Mark Litwintschik (2023)

This blog post shows how to analyze a single day of data using a Python script. The technology stack used in the article is not powerful enough to analyze the full dataset on the fly, which is only possible with ClickHouse.

List Of Exchanges

I want to help you with your database infrastructure. Please contact me here.