# A tibble: 3 × 2
species mean_flipper
<fct> <dbl>
1 Adelie 190.
2 Chinstrap 196.
3 Gentoo 217.
Lecturer: Michael Lydeamore
Department of Econometrics and Business Statistics
renv
handles package environmentstargets
handles workflow pipelinesrenv
?renv
stands for “Reproducible ENVironments”We first have to understand what is a library and what is a repository.
A library is a directory which contains all of your install packages. For us, so far, we have one library shared across all of our projects.
You can see your current libraries by running .libPaths()
A repository is a source of packages, such as CRAN.
Other repositories include: Bioconductor, Posit Public Package Manager, and R Universe.
You can see your available repositories with getOption('repos')
renv
This will set up a project-specific library, which isolates the packages for each project.
renv
Worksrenv
creates a lockfile (renv.lock
)renv
renv.lock
— lists exact package versionsrenv/
folder — stores the project library.Rprofile
— ensures renv
is activated when you open the projectMake sure you check these files in to version control!
renv
renv.lock
file to your repositoryHere is an example lockfile:
{
"R": {
"Version": "4.4.3",
"Repositories": [
{
"Name": "CRAN",
"URL": "https://cloud.r-project.org"
}
]
},
"Packages": {
"markdown": {
"Package": "markdown",
"Version": "1.0",
"Source": "Repository",
"Repository": "CRAN",
"Hash": "4584a57f565dd7987d59dda3a02cfb41"
},
"mime": {
"Package": "mime",
"Version": "0.12.1",
"Source": "GitHub",
"RemoteType": "github",
"RemoteHost": "api.github.com",
"RemoteUsername": "yihui",
"RemoteRepo": "mime",
"RemoteRef": "main",
"RemoteSha": "1763e0dcb72fb58d97bab97bb834fc71f1e012bc",
"Requirements": [
"tools"
],
"Hash": "c2772b6269924dad6784aaa1d99dbb86"
}
}
}
There are two main things in here: R
and Packages
.
It is Packages
that has everything needed to reinstall an exact version of a package.
Often we use the same packages across most of our projects. Conveniently, renv
reuses packages across our projects by maintaining a cache.
You’ll see sometimes a message that says:
Linked from Cache
when installing. This is the package being re-used!
renv
renv
doesn’t solve everything for you:
When someone else clones your project:
renv
with GitHub Actions (Advanced)renv::restore()
is run as part of the CI workflowAdd this to .github/workflows/ci.yaml
:
name: R-CMD-check
on: [push, pull_request]
jobs:
R-CMD-check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up R
uses: r-lib/actions/setup-r@v2
- name: Install system dependencies
run: sudo apt-get update && sudo apt-get install -y libcurl4-openssl-dev libssl-dev libxml2-dev
- name: Restore packages with renv
run: Rscript -e 'install.packages("renv"); renv::restore()'
- name: Run your script or tests
run: Rscript your_script.R
Your GitHub Action will now use the same package versions as your local setup.
Often, we end up with something like this:
01-data.R
02-model.R
03-plots.R
And then we source these in order each time.
This works OK for small projects, but scales very poorly.
targets
?tar_target()
defines a steptargets
watches for changesPipeline:
read_csv() → data ─┐
└──> lm() → model
If the file changes, targets
knows to rerun read_csv()
and everything downstream.
your-project/
├── _targets.R # pipeline definition
├── data/
├── R/ # helper functions
├── renv.lock
├── renv/
└── analysis.qmd
When running a function, the package hashes the function.
If the function doesn’t change, the hash will stay the same
If the function has changed, then so too will the hash.
Results are stored on-disk in a compressed format.
Targets can be loaded using tar_load
or tar_read
(learn the keybinds!)
targets
with plots_targets.R
list(
tar_target(
penguins_plot,
ggplot(penguins, aes(x = flipper_length_mm, y = body_mass_g)) + geom_point())
)
In your console:
tar_read
prints the object inside it tar_load
loads the object into your workspace (like using <-
)
tarchetypes
tarchetypes
contains useful functions for working with targets
Useful for:
Often we have a dataset where we want to run a pipeline on the groups. In standard R, we do this as follows:
In targets
-land, we do it like this instead:
This results in
> tar_read(mean_flipper)
# A tibble: 3 × 2
mean_flipper species
<dbl> <fct>
1 190. Adelie
2 196. Chinstrap
3 217. Gentoo
Note: tarchetypes
will try and row bind these outputs. If your output is not a vector, then you will need iteration = list
as an argument to the target. We will see this in the workshop! :::
You can also use targets
in your Quarto documents with
This will scan your qmd
file for tar_target
commands, and load them in the way that you would expect.
will show you outdated, up-to-date, and not yet run targets.
Remember, targets
only runs what it thinks it needs to!
You should check in your _targets.R
file, but you typically ignore the cache.
To do that, add
_targets
to your .gitignore
file.
target
tingWe’ve scratched the surface of this package. You can also:
pins
for re-use elsewhereIt also forces you to write your code in a “functional” way, which leads to easier code maintenance and readability down the track.
renv
:
targets
:
Next week:
Docker!
ETC5513 Week 10