Rows: 370
Columns: 26
$ Range <chr> "1997-1999", "1997-1999", "1997-1999", "1997-1999", "…
$ Month <chr> "1/1/1997", "2/1/1997", "3/1/1997", "4/1/1997", "5/1/…
$ Vietnam <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ Turkey <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ Thailand <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ Iraq <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ Indonesia <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ Egypt <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ Djibouti <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ China <dbl> 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 4, 13, 0, 0, 0, 0, 0, 0…
$ Cambodia <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ Nigeria <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ Azerbaijan <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ Pakistan <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ Myanmar <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ Laos <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ Bangladesh <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ Canada <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ India <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ Nepal <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ `United Kingdom` <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ Spain <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ `United States` <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ Ecuador <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ Chile <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ Australia <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
ETC5513: Collaborative and Reproducible Practices
Tutorial 2
🎯 Objectives
- Working on a reproducible RStudio Project
- Working on a HTML report and example different YAML themes
- Practice Markdown syntax
- Practice R
- Practice R chunk options
- Gain experience on data wrangling using the
tidyverse
suite of packages - Producing exploratory data analysis figures using the package
ggplot2
- Learn how to add figure captions
- Create HTML tables and learn how to add captions
Exercise 1: Hands on practice with COVID-19 Data
- The data for the tutorial is inside a folder called
data
, which is bundled with the RStudio Project you made in this week’s workshop. Find that file in the lower right pane where all your files are listed. - Create a new section heading in your
qmd
document to read the data with the title “Reading Avian Influenza Data”
Hint: Use#
- Inside this new section, create an R Code Chunk with options
echo: true
,warning: false
,message: false
called “Reading data” and insert the following code:
<- read_csv("data/avian_influenza_numbers.csv") dat
- Insert a new R Chunk and find out what information you can get from the command
head(dat)
- Modify the
head
command to display 10 rows. - Create another two R chunks and use in each of them the R functions
glimpse()
andstr()
. What information can you get from those commands? Hint: For more information on R functions, type in the R console?glimpse()
. - Using an R inline command, write the dimension of the dataset in a sentence.
Hint: Have a look atncol
andnrow
. - Add a new subsection heading (
###
) with “Why is it important to know the dimension of your dataset?” and write a brief sentence with the explanation - Add a new subsection heading (
###
) with “What are the variable names in the dataset?” and display the names of the dataset variables using R.
Hint:?names()
in the R consolee - Select two variables and use a markdown list to briefly explain what each of the variables are measuring.
Exercise 4: Data Wrangling
- Using the R package
dplyr
(which is loaded withtidyverse
), and using the pipe (|>
), create a new dataset calleddata_cleaned
that only contains the following variables:Month
Australia
Egypt
United States
- Inspect
data_cleaned
and describe using a markdown list the type of variables in this new dataset. Write the names of the variables in bold. Do you think the variable attributes are correct?
- Convert the variable
date
into a date vector usinglubridate::mdy
. What do you notice?
- Remove cases for which the data is aggregated or doesn’t have a valid month.
- What is the dimension of this new data set? Compare it with the dimension of
cleaned_data
. How many cases have we lost?
- Provide a table summary of the three countries using the
kable()
function from thekableExtra
package. Give it the caption “Summary of number of cases of Avian Influenza”.
- Visualize the case counts using a histogram and give an explanation about the information that a histogram convey. In addition change the x label in the plot to Age and remove the y axis label.
Hint: As a first step, do this for just one country. To do multiple countries at once, you will need topivot_longer
your dataset.
- Extension: Change this plot to a time series plot, with one bar per month. As an extra challenge, split this out into three separate plots - one per country.
Important
At the end of this tutorial, you should have a full QMD file that renders, including your code and the outputs from it. This means you can read it from top to bottom and remember what you did.