The purpose of this repository is to provide a convenient method for downloading the various boundary files from the Australian Bureau of Statistics (ABS) and converting them into a consistent format. Some benefits of this code include:
-
A single place to get all the ABS boundary files across 2011, 2016 and 2021
-
Automatically downloads all necessary files
-
Consistently formatted column headings across the years (SA1_MAIN11, SA1_MAINCODE_2016, SA1_CODE_2021 are now all now sa1_code)
-
Includes population-weighted centroids for each boundary type that are guaranteed to lie within the original region’s geometry (i.e., no centroids in the ocean)
-
Manually extracting the population and dwelling counts from ABS excel spreadsheets is usually an error-prone manual task. This code automatically converts them into clean CSVs
-
Some suburbs can lie in multiple LGAs. This code will store the largest two for each suburb
-
Produces a single sqlite file for each census year, which can be used across projects
-
Includes code that can filter the boundaries down to just the cities or states required, optionally including a user-specified buffer
Output files are available here.
This assumes you have R installed
# install any missing R packages
Rscript installPackages.R
# download the raw boundary files from the ABS
Rscript downloadData.R --year='2021'
# convert the data into a consistent output format
Rscript convertData.R --year='2021'
Rscript downloadData.R [--year]
downloadData.R downloads data for a given year from the abs website. It also converts meshblock count spreadsheets into plain csv files and extracts geopackage files from zips. If a file is already downloaded, then it will not be downloaded again.
--year
Specify the year desired. Currently 2011, 2016 and 2021 are supported.
Rscript downloadData.R --year='2011'
Rscript downloadData.R --year='2016'
Rscript downloadData.R --year='2021'
Rscript convertData.R [--year]
convertData.R converts the downloaded data into a single sqlite file. The following tables are created:
- australia
- state
- cities
- lga (local government areas)
- ssc (state suburbs/suburbs and localities)
- poa (postcodes)
- sa4 (statistical area level 4)
- sa3 (statistical area level 3)
- sa2 (statistical area level 2)
- sa1 (statistical area level 1)
- mb (meshblocks)
- state_centroids
- cities_centroids
- lga_centroids
- ssc_centroids
- poa_centroids
- sa4_centroids
- sa3_centroids
- sa2_centroids
- sa1_centroids
- mb_centroids
--year
Specify the year desired. Currently 2011, 2016 and 2021 are supported. Note: 2011 lacks an Australia shapefile so the one from 2016 is used instead. downloadData.R
will have to be run for both years before 2011 can be converted.
Rscript convertData.R --year='2011'
Rscript convertData.R --year='2016'
Rscript convertData.R --year='2021'
Rscript filterToStudyRegion.R [--year] [--cities] [--states] [--buffer] [--epsg] [--filename]
filterToStudyRegion.R filters the converted data to the desired study region
--year
Specify the year desired. Currently 2011, 2016 and 2021 are supported.
--cities
Comma separated list of cities to filter to. Not mandatory, but cannot be used with --states
. Can list a single city.
--states
Comma separated list of states to filter to. Not mandatory, but cannot be used with --cities
. Can list a single state.
--buffer
Specify the buffer distance in metres around the desired cities or states to be included. Not mandatory.
--epsg
Specify the EPSG (i.e., coordinate reference system to transform the data into). Not mandatory, but if nothing is specified, the source dataset's EPSG will be used.
--format
Specify the file format for your output dataset. Defaults to 'sqlite'
but geopackage 'gpkg'
is also possible
--filename
Specify the filename for your output dataset.
# A Victoria study region in sqlite format
Rscript filterToStudyRegion.R \
--year='2016' \
--states='Victoria' \
--epsg='7899' \
--format='sqlite' \
--filename='Victoria'
# Melbourne and Sydney plus a 10km buffer in geopackage format
Rscript filterToStudyRegion.R \
--year='2021' \
--cities='Greater Melbourne, Greater Sydney' \
--buffer='10000' \
--epsg='7845' \
--format='gpkg' \
--filename='MelbourneAndSydney'
Geographic coordinate reference systems use latitude and longitude, measuring distance in degrees whereas projected ones use easting and northing, measuring distance in metres. Any geospatial analysis that involves distances will generally use a projected coordinate reference system.
EPSG | Name | Extent | Notes |
---|---|---|---|
7845 | GDA2020 / GA LCC | Australia | Best projected CRS for Australia-wide distance calculations |
4326 | WGS 84 | World | Default geographic CRS for Australia. Useful for webmaps |
7842 | GDA2020 | Australia | Latest geographic CRS for Australia |
7899 | GDA2020 / Vicgrid | Victoria | Latest projected CRS for Victoria |
8058 | GDA2020 / NSW Lambert | NSW | Latest projected CRS for NSW |
8059 | GDA2020 / SA Lambert | SA | Latest projected CRS for SA |
MGA Zones are more accurate than the state or country-wide projections, but cover a much smaller extent. There are seven MGA zones covering mainland Australia and they are typically used for projects where the region of interest is city-scale. It's generally recommend to use the new MGA zones.
MGA Zone | Older EPSG | Older Name | New EPSG | New Name |
---|---|---|---|---|
50 | 28350 | GDA94 / MGA zone 50 | 7850 | GDA2020 / MGA zone 50 |
51 | 28351 | GDA94 / MGA zone 51 | 7851 | GDA2020 / MGA zone 51 |
52 | 28352 | GDA94 / MGA zone 52 | 7852 | GDA2020 / MGA zone 52 |
53 | 28353 | GDA94 / MGA zone 53 | 7853 | GDA2020 / MGA zone 53 |
54 | 28354 | GDA94 / MGA zone 54 | 7854 | GDA2020 / MGA zone 54 |
55 | 28355 | GDA94 / MGA zone 55 | 7855 | GDA2020 / MGA zone 55 |
56 | 28356 | GDA94 / MGA zone 56 | 7856 | GDA2020 / MGA zone 56 |
Capital cities are in bold
Zone 50 | Zone 51 | Zone 52 | Zone 53 | Zone 54 | Zone 55 | Zone 56 |
---|---|---|---|---|---|---|
Albany | Broome | Greater Darwin | Alice Springs | Ballarat | Airlie Beach - Cannonvale | Armidale |
Bunbury | Esperance | Port Augusta | Broken Hill | Albury - Wodonga | Ballina | |
Busselton | Kalgoorlie - Boulder | Port Lincoln | Colac | Australian Capital Territory | Batemans Bay | |
Geraldton | Port Pirie | Greater Adelaide | Bairnsdale | Bowral - Mittagong | ||
Greater Perth | Whyalla | Horsham | Bathurst | Bundaberg | ||
Karratha | Mildura - Buronga | Bendigo | Byron Bay | |||
Port Hedland | Mount Gambier | Burnie - Somerset | Camden Haven | |||
Mount Isa | Cairns | Coffs Harbour | ||||
Murray Bridge | Castlemaine | Forster - Tuncurry | ||||
Portland | Devonport | Gladstone | ||||
Swan Hill | Dubbo | Gold Coast - Tweed Heads | ||||
Victor Harbor - Goolwa | Echuca - Moama | Grafton | ||||
Warrnambool | Emerald | Greater Brisbane | ||||
Geelong | Greater Sydney | |||||
Goulburn | Gympie | |||||
Greater Hobart | Hervey Bay | |||||
Greater Melbourne | Kempsey | |||||
Griffith | Kingaroy | |||||
Launceston | Lismore | |||||
Mackay | Lithgow | |||||
Moe - Newborough | Maryborough | |||||
Mudgee | Medowie | |||||
Orange | Morisset - Cooranbong | |||||
Sale | Muswellbrook | |||||
Shepparton - Mooroopna | Nelson Bay | |||||
Townsville | Newcastle - Maitland | |||||
Traralgon - Morwell | Nowra - Bomaderry | |||||
Ulverstone | Port Macquarie | |||||
Wagga Wagga | Rockhampton | |||||
Wangaratta | Singleton | |||||
Warragul - Drouin | St Georges Basin - Sanctuary Point | |||||
Sunshine Coast | ||||||
Tamworth | ||||||
Taree | ||||||
Toowoomba | ||||||
Ulladulla | ||||||
Warwick | ||||||
Wollongong | ||||||
Yeppoon |
These are generally no longer used, but are included for completeness.
EPSG | Name | Extent | Notes |
---|---|---|---|
3112 | GDA94 / Geoscience Australia Lambert | Australia | Older projected CRS that's useful for Australia-wide distance calculations |
3111 | GDA94 / Vicgrid | Victoria | Older projected CRS for Victoria |
3308 | GDA94 / NSW Lambert | NSW | Older projected CRS for NSW |
3107 | GDA94 / SA Lambert | SA | Older projected CRS for SA |