Skip to content
This repository has been archived by the owner on Jan 2, 2021. It is now read-only.

Latest commit

 

History

History
289 lines (212 loc) · 9.81 KB

README.md

File metadata and controls

289 lines (212 loc) · 9.81 KB

Travis GitHub license

The project was closed. See alternatives.

COVID-19 data

The goal of the project is to provide a simple and unified HTTP interface to COVID-19 latest datasets. The shared dataset updates daily to have fresh data.

Currenty the project includes only the data from J.Hopkins but may be extended in future. How this works: (1) download data from repos, (2) combine data tables and tidy them, (3) save in different formats on github pages.

Table of contents

Motivation

COVID-19 data analysis and visualization has a great impact in 2020 all over the world. We developed this repository to support data data analysis, model development and online tools by simplifying the access to the data.

Usage

R

downloading JSON

# install.packages('httr')

response <- httr::GET('https://insysbio.github.io/covid-19-data/hopkins/json/_combined.json')
response_json <- httr::content(response, as = 'parsed', type = 'application/json')

downloading CSV

# install.packages('httr')

response <- httr::GET('https://insysbio.github.io/covid-19-data/hopkins/csv/_combined.csv')
response_csv <- httr::content(response, as = 'parsed', type = 'text/csv')

Julia

downloading JSON

# ] add HTTP JSON
using HTTP, JSON

response = HTTP.get("https://insysbio.github.io/covid-19-data/hopkins/json/_combined.json")
response_json = JSON.parse(String(response.body))

downloading CSV

# ] add HTTP CSV
using HTTP, CSV

response = HTTP.get("https://insysbio.github.io/covid-19-data/hopkins/csv/_combined.csv")
response_csv = CSV.read(response.body)

Shell

Download all data in CSV format as local file using bash shell

curl 'https://insysbio.github.io/covid-19-data/hopkins/json/_combined.json' --compressed > _combined.json
curl 'https://insysbio.github.io/covid-19-data/hopkins/csv/_combined.csv' --compressed > _combined.csv

Git

To clone the latest datasets to the directory covid-19

git clone -b docs --single-branch https://github.com/insysbio/covid-19-data.git covid-19

To update the previously cloned repository

cd covid-19
git fetch
git pull

Data structure

All daily data follow the same structure which is similar to J.Hopkins' tables with minor modifications. Currently files are stored in two formats: CSV and JSON.

CSV formatted

See also World dataset, US dataset, Russian dataset list

Available fields:

Admin2 (if set) Place like city or town
Province.State Territory name from the original dataset
Country.Region Country name from the original dataset
Lat, Long Latitude and longitude from the original dataset
confirmed Confirmed cumulative cases
recovered Recovered cumulative cases
deaths Deaths cumulative cases
date Date in format YYYY-mm-dd
confirmed_new Confirmed cases for the date (calcuated as today - yesterday)
recovered_new Recovered cases for the date
deaths_new Deaths cases for the date
hasErrors If true there are missing data or inconsistency between yesterday and today
country_code Two-letter country code based on ISO:3166 standard*
country_code3 Three-letter country code based on ISO:3166 standard*
territory_code Territory code or two-leter country code based on ISO:3166 standard*
hasParent If TRUE the data refer to the region of some "parent" country
group unique id of group: if hasParent==TRUE, it is "territory_code" or "Territory_code-City_name", and "country_code" otherwise

* to read more about country code standard: https://www.iso.org/iso-3166-country-codes.html

JSON formatted

See also World dataset, US dataset, Russian dataset list

Available fields:

Admin2 (if set) Place like city or town
Province.State Territory name from the original dataset
Country.Region Country name from the original dataset
Lat, Long Latitude and longitude from the original dataset
hasErrors If true there are errors in one of series data point
country_code Two-letter country code based on ISO:3166 standard*
country_code3 Three-letter country code based on ISO:3166 standard*
territory_code Territory code or two-leter country code based on ISO:3166 standard*
hasParent If TRUE the data refer to the region of some "parent" country
group unique id of group: if hasParent==TRUE, it is "territory_code" or "Territory_code-City_name", and "country_code" otherwise
timeseries Array of time series data, see below

* to read more about country code standard: https://www.iso.org/iso-3166-country-codes.html

Time series fields:

date Date in format YYYY-mm-dd
confirmed Confirmed cumulative cases
recovered Recovered cumulative cases
deaths Deaths cumulative cases
confirmed_new Confirmed cases for the date (calcuated as today - yesterday)
recovered_new Recovered cases for the date
deaths_new Deaths cases for the date
hasErrors If true there are missig data or inconsistency between yesterday and today

Example

{
  "AD": {
    "hasErrors": false,
    "Province.State": "",
    "Country.Region": "Andorra",
    "Lat": 42.5063,
    "Long": 1.5218,
    "isTerritory": false,
    "country_code": "AD",
    "group": "AD",
    "timeseries": [
      {
        "date": "2020-01-22",
        "confirmed": 0,
        "recovered": 0,
        "deaths": 0,
        "confirmed_new": 0,
        "recovered_new": 0,
        "deaths_new": 0,
        "hasErrors": false
      },
      {
        "date": "2020-01-23",
        "confirmed": 0,
        "recovered": 0,
        "deaths": 0,
        "confirmed_new": 0,
        "recovered_new": 0,
        "deaths_new": 0,
        "hasErrors": false
      },
      ...
    ]
  },
  ...
}

J.Hopkins' dataset

This is the most popular COVID-19 dataset supported the Johns Hopkins University Applied Physics Lab (JHU APL). The sources are located in GitHub repository and updated daily.

Currently data are separated by two datasets: World and US.

World data

The current interface performs some transformation and shares data from files:

  • csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv
  • csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_recovered_global.csv
  • csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv

The original dataset includes the following data:

  • Confirmed
  • Recovered
  • Death

Full list of exported files can be found here: World dataset.

US data

The current interface performs some transformation and shares data from files:

  • csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_US.csv
  • csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_US.csv

The original dataset includes the following data:

  • Confirmed
  • Death

Full list of exported files can be found here: US dataset.

Untransformed files

The original files can be downloaded from this repository:

Russian dataset

The data was taken from the repository of COVID-19_plus_Russia where the data of J.Hopkins is competed by data from Yandex COVID map. The sources are located in GitHub repository and updated daily.

The current interface performs some transformation and shares data from files:

  • csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_RU.csv
  • csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_RU.csv

The original dataset includes the following data:

  • Confirmed
  • Death

Full list of exported files can be found here: Russian dataset.

Untransformed files

The original files can be downloaded from this repository:

Contributing

  • Use issues page to write about your ideas and found bugs.
  • Let us know if you use the data for your study or application

Authors

License

This repository is distributed under MIT license.

© InSysBio LLC, 2020