SOAR-EU (Scalable Open Automatable Reproducible — European Urban) is a pedestrian-scale urban data model for the EU-funded TWIN2EXPAND project. It produces standardised, multi-scale spatial metrics at street-segment level for 626 urban centres across EU-27 + Norway, Liechtenstein, and Switzerland.
uv syncProject configuration is managed using pyproject.toml. uv is used for package management: uv sync installs all dependencies into a .venv folder.
All scripts read the data root from the T2E_DATA_DIR environment variable. Set it in your .env file or export it in your shell:
# Option 1: add to .env (recommended)
T2E_DATA_DIR=/path/to/your/data
# Option 2: export in your shell
export T2E_DATA_DIR=/path/to/your/dataCopy .env.example to .env and fill in T2E_DATA_DIR and any Zenodo credentials you need:
cp .env.example .envThe pipeline requires several external datasets to be downloaded before processing. Each step below produces a GeoPackage that feeds into the next. All commands should be run from the repository root.
All scripts resolve data paths from T2E_DATA_DIR automatically (loaded from .env). Paths can also be passed as positional arguments to override the defaults.
Boundaries are extracted from the GHS Urban Centre Database (GHS-UCDB) R2024A produced by the European Commission Joint Research Centre. Urban centres are defined using the Degree of Urbanisation (DEGURBA) methodology: contiguous 1 km^2 cells with at least 1,500 residents per km^2 and cumulative population of at least 50,000. The dataset is available under the European Commission reuse policy (Decision 2011/833/EU).
Download the GHS-UCDB GeoPackage from the above link, then run:
python -m src.data.generate_boundary_polysUrban Atlas 2021 (~34 GB FlatGeobuf vectors, DOI). Download via the Copernicus Data Space Ecosystem S3 endpoint (see CDSE download instructions below).
python -m src.data.load_urban_atlas_blocksStreet Tree Layer 2021 (~4 GB FlatGeobuf vectors). Download via CDSE S3 alongside Urban Atlas.
python -m src.data.load_urban_atlas_treesDigital Height Model (~1 GB raster).
python -m src.data.load_bldg_hts_rasterDownloads and clips Overture layers (buildings, street edges/nodes, POI places, infrastructure) per city boundary. Each city is saved as a separate GeoPackage.
python -m src.data.load_overture --parallel_workers 6 --zipThe Overture POI schema is based on
overture_categories.csv.
Eurostat Census Grid 2021 — population and demographic statistics aggregated to 1 km^2 cells. Download the Version 2021 ZIP dataset.
Compute all street-segment metrics:
python -m src.processing.generate_metrics --zipBoth Copernicus datasets are distributed as FlatGeobuf files via the Copernicus Data Space Ecosystem S3 endpoint.
- Create an account at https://dataspace.copernicus.eu/
- Generate S3 credentials from your account dashboard (save the secret key immediately)
- Configure the AWS CLI:
aws configure
# Access Key ID: <your CDSE access key>
# Secret Access Key: <your CDSE secret key>
# Default region: (leave blank)
# Default output format: json
export AWS_ENDPOINT_URL=https://eodata.dataspace.copernicus.eu/The CDSE S3 endpoint does not return files inside subdirectories in a flat listing, so aws s3 cp --recursive alone downloads nothing. Iterate over city directories:
# Urban Atlas 2021 (~34 GB)
S3_BASE="s3://EODATA/CLMS/land_cover_use_in_priority_areas/urban_atlas/clms_ua_land-cover-land-use_europe_V025ha_3yearly_v1/2021/01/01"
DEST="$T2E_DATA_DIR/UA_2021_3035_eu"
aws s3 ls "$S3_BASE/" | awk '{print $2}' | while read dir; do
aws s3 cp "$S3_BASE/$dir" "$DEST/$dir" --recursive
done
# Street Tree Layer 2021 (~4 GB)
S3_BASE="s3://EODATA/CLMS/land_cover_use_in_priority_areas/urban_atlas/clms_ua_street-tree-layer_europe_V005ha_3yearly_v1/2021/01/01"
DEST="$T2E_DATA_DIR/STL_2021_3035_eu"
aws s3 ls "$S3_BASE/" | awk '{print $2}' | while read dir; do
aws s3 cp "$S3_BASE/$dir" "$DEST/$dir" --recursive
doneReference: https://documentation.dataspace.copernicus.eu/APIs/S3.html
The processed dataset can be uploaded to Zenodo using paper_data/zenodo_upload.py. The script bundles per-city GeoPackages by country (to stay within Zenodo's 100-file limit), sets deposit metadata, and supports resumable uploads.
Ensure ZENODO_TOKEN and ZENODO_RECORD_ID are set in your .env file, then:
# Preview what will be uploaded
uv run python paper_data/zenodo_upload.py --dry-run --bundle
# Bundle by country and upload (resumable)
uv run python paper_data/zenodo_upload.py --bundle --resume
# Update metadata only
uv run python paper_data/zenodo_upload.py --metadata-onlyBundles are saved to $T2E_DATA_DIR/zenodo_bundles/ by default (override with --bundle-dir).
| Source | Content | Licence |
|---|---|---|
| GHS-UCDB R2024A | Urban centre boundary polygons | EC reuse policy (Decision 2011/833/EU) |
| Overture Maps (Transportation, Buildings) | Street networks, building footprints | ODbL |
| Overture Maps (Places) | POI places | CDLA-Permissive-2.0 |
| Overture Maps (Infrastructure) | Transit stops, street furniture, parking | ODbL |
| Copernicus Urban Atlas 2021 | Land-cover/land-use blocks | EEA reuse policy (Directive 2003/98/EC) |
| Copernicus Street Tree Layer 2021 | Tree canopy polygons | EEA reuse policy (Directive 2003/98/EC) |
| Copernicus Digital Height Model 2012 | Building height raster (10 m, EPSG:3035) | EEA reuse policy (Directive 2003/98/EC) |
| Eurostat Census Grid 2021 | Population/demographic cells (1 km^2) | EC reuse policy (Decision 2011/833/EU) |
This repository depends on copy-left open source packages licensed as AGPLv3 and therefore adopts the same licence for the code. The dataset published on Zenodo is licensed under the Open Database License (ODbL 1.0) to comply with share-alike requirements of the Overture Maps layers.
- Data paper — SOAR-EU dataset description and POI validation (Data in Brief)
- Atlas paper — Morphological typology of European cities (CEUS)
If you use this dataset or code, please cite:
Simons, G. (2026). SOAR-EU: Scalable Open Automatable Reproducible pedestrian-scale urban metrics for 626 European urban centres. Available at: https://github.com/UCL/t2e-soar-eu