API: coronavirus

Access a tidy format dataset of the 2019 Novel Coronavirus COVID-19 (2019-nCoV) epidemic through the coronavirus API.

Table of Contents


By using SKEMA Quantum Studio framework (Warin 2019), this course will teach you how to use the coronavirus package.

Database description

The coronavirus package provides a tidy format dataset of the 2019 Novel Coronavirus COVID-19 (2019-nCoV) epidemic. The raw data pulled from the Johns Hopkins University Center for Systems Science and Engineering (JHU CCSE) Coronavirus repository.

A csv format of the package dataset available here.

A summary dashboard is available here.

Functions

This package gives access a tidy format dataset of the 2019 Novel Coronavirus COVID-19 (2019-nCoV) epidemic. The function below allows you to download the data.

Each of these functions are detailed in this course and some examples are provided.

data(“coronavirus”)

This is a basic example which shows you how to get the data:


library(coronavirus)

data("coronavirus")

This coronavirus dataset has the following fields:


head(coronavirus) 

# A tibble: 6 x 7
  Province.State Country.Region   Lat  Long date       cases type     
  <chr>          <chr>          <dbl> <dbl> <date>     <int> <chr>    
1 ""             Japan           35.7  140. 2020-01-22     2 confirmed
2 ""             South Korea     37.6  127. 2020-01-22     1 confirmed
3 ""             Thailand        13.8  101. 2020-01-22     2 confirmed
4 "Anhui"        Mainland China  31.8  117. 2020-01-22     1 confirmed
5 "Beijing"      Mainland China  40.2  116. 2020-01-22    14 confirmed
6 "Chongqing"    Mainland China  30.1  108. 2020-01-22     6 confirmed

tail(coronavirus)

# A tibble: 6 x 7
  Province.State Country.Region   Lat  Long date       cases type     
  <chr>          <chr>          <dbl> <dbl> <date>     <int> <chr>    
1 Shanghai       Mainland China  31.2 121.  2020-02-16    16 recovered
2 Shanxi         Mainland China  37.6 112.  2020-02-16     4 recovered
3 Sichuan        Mainland China  30.6 103.  2020-02-16    12 recovered
4 Tianjin        Mainland China  39.3 117.  2020-02-16     8 recovered
5 Xinjiang       Mainland China  41.1  85.2 2020-02-16     2 recovered
6 Zhejiang       Mainland China  29.2 120.  2020-02-16    28 recovered

Here is an example of a summary total cases by region and type (top 20):


library(dplyr)

summary_df <- coronavirus %>% group_by(Country.Region, type) %>%
  summarise(total_cases = sum(cases)) %>%
  arrange(-total_cases)

summary_df %>% head(20)

# A tibble: 20 x 3
# Groups:   Country.Region [15]
   Country.Region type      total_cases
   <chr>          <chr>           <int>
 1 Mainland China confirmed       70446
 2 Mainland China recovered       10748
 3 Mainland China death            1765
 4 Others         confirmed         355
 5 Singapore      confirmed          75
 6 Japan          confirmed          59
 7 Hong Kong      confirmed          57
 8 Thailand       confirmed          34
 9 South Korea    confirmed          29
10 Malaysia       confirmed          22
11 Taiwan         confirmed          20
12 Singapore      recovered          18
13 Germany        confirmed          16
14 Vietnam        confirmed          16
15 Australia      confirmed          15
16 US             confirmed          15
17 Thailand       recovered          14
18 France         confirmed          12
19 Japan          recovered          12
20 Macau          confirmed          10

Summary of new cases during the past 24 hours by country and type (as of 2020-03-26):


library(tidyr)

coronavirus %>% 
  filter(date == max(date)) %>%
  select(country = Country.Region, type, cases) %>%
  group_by(country, type) %>%
  summarise(total_cases = sum(cases)) %>%
  pivot_wider(names_from = type,
              values_from = total_cases) %>%
  arrange(-confirmed)

# A tibble: 12 x 4
# Groups:   country [12]
   country              confirmed recovered death
   <chr>                    <int>     <int> <int>
 1 Mainland China            2099      1454   103
 2 Others                      70        NA    NA
 3 Japan                       16        NA    NA
 4 Singapore                    3        NA    NA
 5 Taiwan                       2        NA     1
 6 Hong Kong                    1         1    NA
 7 South Korea                  1        NA    NA
 8 Thailand                     1         2    NA
 9 United Arab Emirates         1         1    NA
10 India                       NA         3    NA
11 Macau                       NA         2    NA
12 UK                          NA         7    NA

tl;dr


library(coronavirus)

data("coronavirus")

head(coronavirus) 
tail(coronavirus)

library(dplyr)

summary_df <- coronavirus %>% group_by(Country.Region, type) %>%
  summarise(total_cases = sum(cases)) %>%
  arrange(-total_cases)

summary_df %>% head(20)

library(tidyr)

coronavirus %>% 
  filter(date == max(date)) %>%
  select(country = Country.Region, type, cases) %>%
  group_by(country, type) %>%
  summarise(total_cases = sum(cases)) %>%
  pivot_wider(names_from = type,
              values_from = total_cases) %>%
  arrange(-confirmed)

Code learned this week

Command Detail
data(“coronavirus”) Get data for of all Corona Virus cases

References

This course uses the coronavirus package, created by Rami Krispin.


Warin, Thierry. 2019. “SKEMA Quantum Studio: A Technological Framework for Data Science in Higher Education.” https://doi.org/10.6084/m9.figshare.8204195.v2.

Citation

For attribution, please cite this work as

Warin (2020, April 2). Virtual Campus: API: coronavirus. Retrieved from https://virtualcampus.skemagloballab.io/posts/api-coronavirus/

BibTeX citation

@misc{warin2020api:,
  author = {Warin, Thierry},
  title = {Virtual Campus: API: coronavirus},
  url = {https://virtualcampus.skemagloballab.io/posts/api-coronavirus/},
  year = {2020}
}