Skip to content

scripts for downloading and processing ABS boundaries

License

Notifications You must be signed in to change notification settings

alanboth/abs_boundaries

Repository files navigation

ABS-derived boundary files

The purpose of this repository is to provide a convenient method for downloading the various boundary files from the Australian Bureau of Statistics (ABS) and converting them into a consistent format. Some benefits of this code include:

  • A single place to get all the ABS boundary files across 2011, 2016 and 2021

  • Automatically downloads all necessary files

  • Consistently formatted column headings across the years (SA1_MAIN11, SA1_MAINCODE_2016, SA1_CODE_2021 are now all now sa1_code)

  • Includes population-weighted centroids for each boundary type that are guaranteed to lie within the original region’s geometry (i.e., no centroids in the ocean)

  • Manually extracting the population and dwelling counts from ABS excel spreadsheets is usually an error-prone manual task. This code automatically converts them into clean CSVs

  • Some suburbs can lie in multiple LGAs. This code will store the largest two for each suburb

  • Produces a single sqlite file for each census year, which can be used across projects

  • Includes code that can filter the boundaries down to just the cities or states required, optionally including a user-specified buffer

Output files are available here.

Quick setup

This assumes you have R installed

# install any missing R packages
Rscript installPackages.R
# download the raw boundary files from the ABS
Rscript downloadData.R --year='2021'
# convert the data into a consistent output format
Rscript convertData.R --year='2021'

downloadData.R

Synopsis

Rscript downloadData.R [--year]

Description

downloadData.R downloads data for a given year from the abs website. It also converts meshblock count spreadsheets into plain csv files and extracts geopackage files from zips. If a file is already downloaded, then it will not be downloaded again.

--year

Specify the year desired. Currently 2011, 2016 and 2021 are supported.

Example

Rscript downloadData.R --year='2011'
Rscript downloadData.R --year='2016'
Rscript downloadData.R --year='2021'

convertData.R

Synopsis

Rscript convertData.R [--year]

Prerequisites

Description

convertData.R converts the downloaded data into a single sqlite file. The following tables are created:

  • australia
  • state
  • cities
  • lga (local government areas)
  • ssc (state suburbs/suburbs and localities)
  • poa (postcodes)
  • sa4 (statistical area level 4)
  • sa3 (statistical area level 3)
  • sa2 (statistical area level 2)
  • sa1 (statistical area level 1)
  • mb (meshblocks)
  • state_centroids
  • cities_centroids
  • lga_centroids
  • ssc_centroids
  • poa_centroids
  • sa4_centroids
  • sa3_centroids
  • sa2_centroids
  • sa1_centroids
  • mb_centroids

--year

Specify the year desired. Currently 2011, 2016 and 2021 are supported. Note: 2011 lacks an Australia shapefile so the one from 2016 is used instead. downloadData.R will have to be run for both years before 2011 can be converted.

Example

Rscript convertData.R --year='2011'
Rscript convertData.R --year='2016'
Rscript convertData.R --year='2021'

filterToStudyRegion.R

Synopsis

Rscript filterToStudyRegion.R [--year] [--cities] [--states] [--buffer] [--epsg] [--filename]

Prerequisites

Description

filterToStudyRegion.R filters the converted data to the desired study region

--year

Specify the year desired. Currently 2011, 2016 and 2021 are supported.

--cities

Comma separated list of cities to filter to. Not mandatory, but cannot be used with --states. Can list a single city.

--states

Comma separated list of states to filter to. Not mandatory, but cannot be used with --cities. Can list a single state.

--buffer

Specify the buffer distance in metres around the desired cities or states to be included. Not mandatory.

--epsg

Specify the EPSG (i.e., coordinate reference system to transform the data into). Not mandatory, but if nothing is specified, the source dataset's EPSG will be used.

--format

Specify the file format for your output dataset. Defaults to 'sqlite' but geopackage 'gpkg' is also possible

--filename

Specify the filename for your output dataset.

Example

# A Victoria study region in sqlite format
Rscript filterToStudyRegion.R \
  --year='2016' \
  --states='Victoria' \
  --epsg='7899' \
  --format='sqlite' \
  --filename='Victoria'

# Melbourne and Sydney plus a 10km buffer in geopackage format
Rscript filterToStudyRegion.R \
  --year='2021' \
  --cities='Greater Melbourne, Greater Sydney' \
  --buffer='10000' \
  --epsg='7845' \
  --format='gpkg' \
  --filename='MelbourneAndSydney'

Useful Coordinate Reference Systems

Geographic coordinate reference systems use latitude and longitude, measuring distance in degrees whereas projected ones use easting and northing, measuring distance in metres. Any geospatial analysis that involves distances will generally use a projected coordinate reference system.

EPSG Name Extent Notes
7845 GDA2020 / GA LCC Australia Best projected CRS for Australia-wide distance calculations
4326 WGS 84 World Default geographic CRS for Australia. Useful for webmaps
7842 GDA2020 Australia Latest geographic CRS for Australia
7899 GDA2020 / Vicgrid Victoria Latest projected CRS for Victoria
8058 GDA2020 / NSW Lambert NSW Latest projected CRS for NSW
8059 GDA2020 / SA Lambert SA Latest projected CRS for SA

MGA Zones:

MGA Zones are more accurate than the state or country-wide projections, but cover a much smaller extent. There are seven MGA zones covering mainland Australia and they are typically used for projects where the region of interest is city-scale. It's generally recommend to use the new MGA zones.

MGA Zone Older EPSG Older Name New EPSG New Name
50 28350 GDA94 / MGA zone 50 7850 GDA2020 / MGA zone 50
51 28351 GDA94 / MGA zone 51 7851 GDA2020 / MGA zone 51
52 28352 GDA94 / MGA zone 52 7852 GDA2020 / MGA zone 52
53 28353 GDA94 / MGA zone 53 7853 GDA2020 / MGA zone 53
54 28354 GDA94 / MGA zone 54 7854 GDA2020 / MGA zone 54
55 28355 GDA94 / MGA zone 55 7855 GDA2020 / MGA zone 55
56 28356 GDA94 / MGA zone 56 7856 GDA2020 / MGA zone 56

Cities and towns in each MGA Zone:

Capital cities are in bold

Zone 50 Zone 51 Zone 52 Zone 53 Zone 54 Zone 55 Zone 56
Albany Broome Greater Darwin Alice Springs Ballarat Airlie Beach - Cannonvale Armidale
Bunbury Esperance Port Augusta Broken Hill Albury - Wodonga Ballina
Busselton Kalgoorlie - Boulder Port Lincoln Colac Australian Capital Territory Batemans Bay
Geraldton Port Pirie Greater Adelaide Bairnsdale Bowral - Mittagong
Greater Perth Whyalla Horsham Bathurst Bundaberg
Karratha Mildura - Buronga Bendigo Byron Bay
Port Hedland Mount Gambier Burnie - Somerset Camden Haven
Mount Isa Cairns Coffs Harbour
Murray Bridge Castlemaine Forster - Tuncurry
Portland Devonport Gladstone
Swan Hill Dubbo Gold Coast - Tweed Heads
Victor Harbor - Goolwa Echuca - Moama Grafton
Warrnambool Emerald Greater Brisbane
Geelong Greater Sydney
Goulburn Gympie
Greater Hobart Hervey Bay
Greater Melbourne Kempsey
Griffith Kingaroy
Launceston Lismore
Mackay Lithgow
Moe - Newborough Maryborough
Mudgee Medowie
Orange Morisset - Cooranbong
Sale Muswellbrook
Shepparton - Mooroopna Nelson Bay
Townsville Newcastle - Maitland
Traralgon - Morwell Nowra - Bomaderry
Ulverstone Port Macquarie
Wagga Wagga Rockhampton
Wangaratta Singleton
Warragul - Drouin St Georges Basin - Sanctuary Point
Sunshine Coast
Tamworth
Taree
Toowoomba
Ulladulla
Warwick
Wollongong
Yeppoon

Older Coordinate Reference Systems

These are generally no longer used, but are included for completeness.

EPSG Name Extent Notes
3112 GDA94 / Geoscience Australia Lambert Australia Older projected CRS that's useful for Australia-wide distance calculations
3111 GDA94 / Vicgrid Victoria Older projected CRS for Victoria
3308 GDA94 / NSW Lambert NSW Older projected CRS for NSW
3107 GDA94 / SA Lambert SA Older projected CRS for SA

About

scripts for downloading and processing ABS boundaries

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages