ETC5513: Collaborative and Reproducible Practices

Assignment 1

Author

Michael Lydeamore

Published

21 May 2024

🎯 Objectives

  • Working on a reproducible Rstudio project
  • Produce a html report using an qmd file and explore YAML themes to define your prefer template
  • Practice markdown syntax
  • Practice R coding
  • Explore R chunk options to customize your report template
  • Create html tables, add table captions and refer to them in the report text as described in Week 4
  • Create simple figures to visualize the data, add figure captions and refer to the figures in the report text as you learned in Week 4
  • Demonstrate that you are able to clone a GitHub repository locally and synchronize the changes between your local and remote repositories.
  • Show that you can create meaningful Git commits so that the changes and the history of the project can be recorded and tracked.
Important

This is an individual assignment.

The final PDF report can not have more than 5 pages (not including the appendix).

You cannot use a dataset that has been used for another assignment.

Part A: Setting up version control

In this assignment, you need to create an RStudio project that will be able to produce a reproducible HTML report. For the report, please select a data set that interest you from this webpage: Our World in Data.

The size of the data set is not relevant, however your data must contain at least 3 numerical variables and a character or factor variable and its size must not be more than 50MB. Your reproducible report should be created using a Quarto file in Rstudio and needs to be knitted into a HTML file. The report must knit into HTML without any errors and must have have all the code and code outputs displayed (unless otherwise specified in the instructions below).

You will be working with a GitHub repository for this assignment. We will use GitHub Classroom, where I have set up some of the structure for you. Please join the classroom here. Make sure to choose your Monash username from the list so we can find your assignment.

Repository setup (6 points)

Clone the repository created by GitHub classroom onto your computer as we have done in lectures and tutorials (1 point).

Once you have cloned the repository to your computer, make sure to add all the files in the upcoming section, and use relevant, clear commit messages (5 points).

Caution

The GitHub repo will appear in the GitHub classroom space. Do not work in or create a different repository for this assignment.

You can delete the example files out of Data/ and Images/.

Part B: Creating a reproducible report

There is a pre-made RStudio project in the git repository. Open that with RStudio and we will work inside that project.

Quarto setup (1 point)

Change the YAML options for this Quarto file to render both a HTML and a PDF report.

R code chunk for loading libraries (2 points)

Create a new Quarto file using RStudio. This file is where we will create the report for the rest of this assignment, so make sure to give it a sensible name (1 point).

Load all the libraries you will use in the report in an R chunk located at the beginning of your Quarto file. Make sure you set the options so that you do not display any R code, messages, or warnings in the rendered HTML document (for this chunk) (1 point).

Introduction section (5 points)

Using markdown, write a 4 sentence maximum motivation of what you are going to research and why. Make sure it is relevant to your dataset.

Research Question section (5 points)

Using markdown, discuss in 3 sentences maximum, the specific question that you are going to investigate or answer in this report using your selected data.

Dataset Introduction section (5 points)

In this section, briefly describe your data (i.e. what the data is measuring or recording) in five sentences maximum using markdown. You must provide a link to the location of the data inserted in the text using markdown.

Create a table using the kable() function from the knitr R package to report the variable names. Add a table caption that briefly describes the table in no more than 2 sentences. Include a cross-reference to the table in your text.

Dataset Description subsection (5 points)

Create a subsection that will report details about your data. You must include the size of the dataset such as the number of observations, variables and variable types. You must include the following:

  • A sentence that includes inline R code describing the number of variables and observations in your dataset.
  • A screenshot of your inline R code, saved as a png file. Upload this file inside the Image folder.
  • Include that picture in your subsection. You can either use an R code chunk (with knitr::include_graphics) or markdown image syntax (![]()).
  • Using the function head(), display the first two rows of data to show the types of variables that are in the dataset (i.e. numeric, character/factor etc).

Results section (5 points)

Using visualisations of the data, discuss the answer to your research questions. You must:

  • Create 2 figures maximum of your data that will help you answer your research question. Each figure must have a caption using the options inside the R code chunk. Create the figures using the ggplot2 package.
  • Using markdown, add a list with two bullet points describing what you see in each of the figures using italic font and how that can help you answer your research question. Make sure you cross-reference the figures in the text.

Marking Rubric

In addition to the points as described above, you will be graded on your:

  • Report template (5 points). The default Quarto template will be granted 1 point. More sophisticated templates will add more points into this component, up to 5. Remember, styling that takes away from the report is not beneficial.
  • Issues with spelling and grammar (up to -5 points).
  • R code style (5 points)
  • Report quality: Sections in the report are connected and aligned with the research question in a coherent way (6 points).

Maximum grade: 50 points.

The usage of AI

You may use Generative AI (such as ChatGPT) to correct your English or to help with your R code (for example to find bugs or ask for enhancements to your existing code). However, if you use ChatGPT you must declare it by adding a section in your Quarto report called Appendix, and display screenshots of your ChatGPT queries and interactions related to this assignment.

You cannot use ChatGPT to generate content for this assignment from scratch, including code.

Monash University supports the responsible and ethical use of generative AI. For more info please refer to Monash Policy and practice guidance around acceptable and responsible use of AI technologies.

Tip

Remember you can be better than chatGPT. If you just use chatGPT to create content for you, where is your value and why are your skills special?

Assignment Submission

The report must be rendered to HTML and PDF, and all the code and code outputs must be visible. The PDF cannot have a length of more than 5 pages (excluding the appendix).

You do not need to upload anything to Moodle. All marking will occur directly from your GitHub repositories.

Plagiarism

Monash University is committed to honesty and academic integrity. There are serious consequences for plagiarism and collusion. If plagiarism and/or collusion is detected further actions will be taken according to Monash University policy and procedures. More info here:

https://www.monash.edu/students/admin/policies/academic-integrity

You cannot re-use assignments that have been submitted or used in other units.