ETC5513: Collaborative and Reproducible Practices
Assignment 1
🎯 Objectives
- Working on a reproducible project
- Produce a html report using an qmd file and explore YAML themes to define your prefer template
- Practice markdown syntax
- Practice R coding
- Explore R chunk options to customize your report template
- Create html tables, add table captions and refer to them in the report text as described in Week 4
- Create simple figures to visualize the data, add figure captions and refer to the figures in the report text as you learned in Week 4
- Demonstrate that you are able to clone a GitHub repository locally and synchronize the changes between your local and remote repositories.
- Show that you can create meaningful Git commits so that the changes and the history of the project can be recorded and tracked.
This is an individual assignment.
The final PDF report can not have more than 5 pages (not including the appendix).
Background
You are the Head of Marketing for an online retailer. Over the last quarter we ran a personalised marketing campaign that varied how much we spent contacting individual customers (ads, emails, coupons). I want you to treat the data as the canonical record of that campaign and answer one question: did higher marketing spend per customer lead to higher purchase amounts?
Part A: Setting up version control
In this assignment, you need to create a project that will be able to produce a reproducible HTML report. For the report, you will each have your own unique dataset to work on. Your dataset will be in the data/ folder on the GitHub repository.
You can find your dataset on Moodle, in the Assignment 1 section, under Feedback.
Your reproducible report should be created using a Quarto file and needs to be rendered into a HTML and a PDF file. The report must render without any errors.
You will be working with a GitHub repository for this assignment. We will use GitHub Classroom, where I have set up some of the structure for you. Please join the classroom here. Make sure to choose your Monash username from the list so we can find your assignment.
Repository setup (6 points)
Clone the repository created by GitHub classroom onto your computer as we have done in lectures and tutorials (1 point).
Once you have cloned the repository to your computer, make sure to add all the files in the upcoming section, and use relevant, clear commit messages (5 points).
The GitHub repo will appear in the GitHub classroom space. Do not work in or create a different repository for this assignment.
Part B: Creating a reproducible report
There is some pre-made structure in the repository. Open the folder in Positron as we have done in class to work on.
R code chunk for loading libraries (3 points)
Create a new Quarto file. This file is where we will create the report for the rest of this assignment, so make sure to give it a sensible name (1 point).
Load all the libraries you will use in the report in an R chunk located at the beginning of your Quarto file. Make sure you set the options so that you do not display any R code, messages, or warnings in the rendered HTML document (for this chunk) (1 point).
Set the YAML so that the report will be output as a HTML and PDF file.
Introduction section (2 points)
Using markdown, write a 4 sentence maximum motivation of what you are going to research and why. Make sure it is relevant to your dataset.
Research Question section (2 points)
Using markdown, discuss in 3 sentences maximum, the specific question that you are going to investigate or answer in this report using your selected data.
Dataset Introduction section (3 points)
In this section, briefly describe your data (i.e. what the data is measuring or recording) in five sentences maximum using markdown.
To do this, you will need to clean your dataset. Include a description of this cleaning process (not the code) in the report. The code to clean the dataset must be included in the Quarto file, but not in the report.
Dataset Description subsection (5 points)
Create a subsection that will report details about your data. You must include the size of the dataset such as the number of observations, variables and variable types. You must include a sentence that includes inline R code describing the number of variables and observations in your dataset.
Create two other summaries of your data, one as a plot and one as a table. Examples include counts of certain variables in groups, or summary statistics of numeric variables. Make sure these summaries are providing a broader context to the report and are not just arbitrary commentaries.
Results section (5 points)
Using visualisations of the data, discuss the answer to your research questions. You must:
- Create 2 figures maximum of your data that will help you answer your research question. Each figure must have a caption using the options inside the R code chunk. Create the figures using the
ggplot2package. - The figures must have captions, and must be referred to in the text of the results section.
As well as the figures, you must also include a paragraph describing these results and how they link back to your research question. Remember, you can use inline code to extract certain numbers automatically.
Marking Rubric
In addition to the points as described above, you will be graded on your:
- Issues with spelling and grammar (up to -3 points).
- R code style (i.e. spacing, variable names) (2 points)
- Report quality: Sections in the report are connected and aligned with the research question in a coherent way (2 points).
Maximum grade: 30 points.
The usage of AI
You may use Generative AI (such as ChatGPT) to correct your English or to help with your R code (for example to find bugs or ask for enhancements to your existing code). However, you must declare your use of AI in a separate qmd file called appendix.qmd, and display screenshots of your AI queries and interactions related to this assignment. The appendix does not count towards your page limit.
Monash University supports the responsible and ethical use of generative AI. For more info please refer to Monash Policy and practice guidance around acceptable and responsible use of AI technologies.
Assignment Submission
The report must be rendered to HTML and PDF. The PDF cannot have a length of more than 5 pages (excluding the appendix).
You do not need to upload anything to Moodle. All marking will occur directly from your GitHub repositories.
Plagiarism
Monash University is committed to honesty and academic integrity. There are serious consequences for plagiarism and collusion. If plagiarism and/or collusion is detected further actions will be taken according to Monash University policy and procedures. More info here:
https://www.monash.edu/students/admin/policies/academic-integrity
You cannot re-use assignments that have been submitted or used in other units.