Lecturer: Michael Lydeamore
Department of Econometrics and Business Statistics
git records project historyrenv records package versionstargets records the analysis workflowToday is about judgement: using the right layer at the right time.
Aim
targets pipeline from scratchYou have inherited a project from a previous analyst.
The team needs a final report.
The README says:
That is a claim. Today we test it.
Instructor notes: week12/guided-notes.qmd
From the project root:
If it fails, that is useful evidence.
The README also says:
Before editing, read the script and ask: what assumptions does it make?
The project does not have one clear answer to:
Warning
How do I rebuild the final report from the raw data?
That is the reproducibility question underneath all the symptoms.
Start by finding the shape of the project.
Then read:
The README is part of the software.
| Question | Evidence |
|---|---|
| Can the report render? | quarto render report.qmd |
| Can the script run? | source("analysis.R") |
| Are paths project-relative? | Look for absolute paths and mismatched folders |
| Are raw and derived files separate? | Inspect data/, output/, outputs/ |
| Are packages recorded? | Look for renv.lock, DESCRIPTION, README |
| Is the workflow explicit? | Look for _targets.R or one rebuild command |
Small wins first; structure second.
First make the project honest.
The inherited script mixes several jobs:
Those are natural pipeline targets.
Move the work into small functions:
Functions are easier to test, reuse, and connect.
Before:
After:
The report needs:
If the report depends on these files, the pipeline should create these files.
_targets.RStart with an empty file.
No half-built pipeline. We write the contract ourselves.
File targets tell targets to watch the file itself.
Each target is a named promise: this object can be rebuilt.
The order comes from dependencies, not from memory.
Generated files can also be targets.
The report should not render until its inputs exist.
From the repaired project folder:
Then run it again:
The second run should skip work that is already up to date.
A good project can explain how it rebuilds itself.
The README should say:
And it should explain:
| Layer | In this case |
|---|---|
| Quarto | Final report |
| Git | History of the repair |
| GitHub Issues | Work that could be assigned |
renv |
Package versions, if this were being handed to a new machine |
targets |
Rebuild order and stale output detection |
| Docker | Only needed if R/system environment becomes the problem |
Before:
The report worked on the previous analyst’s laptop.
After:
From the project folder, one command rebuilds the derived files and final report from the raw data.
That is a stronger and more testable claim.
renv add value?week12/case-study/brokenweek12/case-study/solution
ETC5513 Week 12