Main tools
- R
- RStudio
- Command Line Interface
git
- GitHub
- VSCode
During this semester these tools will be essential for us to build reproducible and collaborative research practices.
Lecturer: Michael Lydeamore
Department of Econometrics and Business Statistics
Aim
Dr. Michael Lydeamore
Lecturer & Chief Examiner
Naveen Kaushik
Tutor
David Wu
Tutor
Contacting the teaching team
Most material in this course was developed by
Dr. Patricia Menendez
Patricia is a strong believer and trailblazer in reproducible research.
Learning objectives
All that combined with the learning of statistical concepts!
Tip
Please participate during the lectures and tutorials. The success of the unit depends not only on the teaching team but also you as part of this unit’s team.
Start with individual projects
Will continue with a class group project
Finally, you will work on yuour own projects
The lectures will be a combination of presentations with interactive exercises.
Each lecture will commence with a open frame (5 minutes), where you can talk about your learning, share comments, issues and resources with the rest of the class.
That time can also be used for questions (as can any other time in the lecture).
The tutorials will be entirely based on computer practicals and you will be working individually as well as in groups.
Go over the material before the tutorial
Goal is to practice the ideas covered in lectures by working through activies and exercises individually and in groups.
Tip
Unit website
Note
Materials are designed to develop your hard and soft skills.
Please see Moodle for Zoom details
✅ Consultation hours: We are here to help you!
✅ Moodle discussion forum
Get used to using the forum - helping your peers is a fantastic way to learn.
Question
What is the problem with this approach?
If one parameter or one number changes in your data?
GAME OVER
We start all over again 😭😭😭
Maybe we copy and paste into a new script
After a week, a month, a year… it gets very hard to remember all the steps!
Definitions by the USA National Academies of Science, Engineering and Medicine:
Reproducibility (“computational reproducibility”) means obtaining consistent computational results using the same input data, computational steps, methods, code, and conditions of analysis.
Replicability means obtaining consistent results across studies aimed at answering the same scientific question, each of which has obtained its own data
Literate programming
Literate programming is an approach to writing reports using software that weaves together the source code and text at the time of creation.
Donald Knuth coined the term literate programming in the 1970s to refer to a source file that could be both run by a computer and “woven” with a formatted presentation document
Reproducibility is a way of thinking and approaching projects
Some projects require a single tool (be that R, Python, MATLAB or many others) and may only involve one person.
Others might involve different teams and require many different tools
Using tools for reproducible research and reporting
Definition: Dynamic Documents
A dynamic document includes code used for data analysis and report text
These two things produce your report/paper/presentation
All in a sequential and dynamic way!
Code?
They are related but they are not the same. Why?
knitr
and quarto
allow us to connected R-based analyses to a presentation, papers, and report documents created with markup languages such as LaTeX and Markdown.R by itself has the capabilities to gather and analyse data, and with a little help from knitr
and quarto
, with some markup languages, present results in a way that is highly reproducible.
Is an integrated developer environment (IDE)
We don’t need RStudio, but it lets us do things more easily.
git
(version control)It has a cloud counterpart called RStudio Cloud
It’s RStudio, in the cloud.
Why?
Definition: Version Control
A system that records changes to a file or a set of files over time, so that you can recall specific versions later.
git
Definition: git
Git is a distributed version-control system for tracking changes in source code during software development. It is designed for coordinating work among programmers, but it can be used to track changes in any set of files. Its goals include speed, data integrity, and support for distributed, non-linear workflows
git
repositoriesgit
and GitHubWe will learn general practical tips for reproducible workflows
There is no one-size-fits-all approach!
git
During this semester these tools will be essential for us to build reproducible and collaborative research practices.
This week the tutorial will focus on providing an introduction to different resources.
Summary
Resources
git
manualETC5513 Week 1