ETC5513: Reproducible and Collaborative Practices

Referencing, large files, and GitHub Issues

Lecturer: Michael Lydeamore

Department of Econometrics and Business Statistics



Open Frame

Recap

  • Start learning about LaTeX
  • LaTeX integration in QMD files
  • Understand commits and SHA1 numbers
  • Moving back to past commits
  • Reverting commits

Today’s plan

Aim

  • Learn how to add references and bibliography
  • Dealing with large files
  • Tags
  • GitHub issues

Parts of a LaTeX file

\documentclass{article} % Preamble/header: load packages, set options
\usepackage{amsmath}
\author{M. J. Lydeamore}

\begin{document} % Body starts here, and goes until corresponding 'end'

\section{Introduction}

\end{document} % "After body" but before end

The includes-in-* sections in Quarto go into each of these pieces of the LaTeX code.

Different ways to include LaTeX

  1. Manually typing out the code
  2. Using an input .tex file
  3. Customised template file

Referencing

There’s a system for it, never do it manually!

The bibliography file

First thing we need is a place to store information about our references.

A standard file format is called the bibliography database file (which ends in .bib).

The bibliography file

  • To cite a paper in the text, we use the key from the bib file
  • These files are plain text, so you can open them in RStudio or VSCode

To get bib entries:

  • Google Scholar
  • Reference manager such as Zotero or Mendeley

The bibliography file: Example

@Manual{R-base,
  title = {R: A Language and Environment for Statistical
           Computing}, 
  author = {{R Core Team}},
  organization = {R Foundation for Statistical Computing},
  address = {Vienna, Austria},
  year = {2019},
  url = {https://www.R-project.org},
}

To cite this in your quarto file, use @R-base

Citing R packages

You can get the citations for R packages using toBibtex(citation("tidyverse"))

Demo & Practice

Large files in git and GitHub

What happens when you commit a big file on GitHub?

Files larger than 50mb won’t upload properly

Instead, we deal with these using Git Large File Storage (lfs).

Please see info for installation here and make sure you install this extension.

GitHub and BitBucket both have LFS support, and handle changes to these files much more sensibly.

Setup for git LFS

To set up large file storage:

  • Navigate to the repo
  • In the CLI, type git lfs install

Now, large files can be tracked using the relevant command. For example:

git lfs track '*.nc'
git lfs track '*.csv'
git lfs track '*.pdf'

Note the quotes!

This will create a .gitattributes file. Make sure to add this to your repo.

Then, use your standard workflow of add, commit, push.

Summary

  1. Navigate to the local repo and run git lfs install
  2. `git lfs track “*.csv”
  3. git add .gitattributes
  4. git add data.csv as per normal

It is essential you run git lfs install before committing and pushing, otherwise you will get an error message.

What if I already have a large file?

If you have committed a large file, you probably got an error message you have to fix.

Recall:

git reset --soft HEAD~1

Then:

git lfs install
git add .gitattributes file.csv
git commit -m "Adding data files through GIT LFS"
git push origin main

Demo

More on committing

So far, we’ve used a one-line commit (although we could do with more practice on those)

We can add more text into a commit, and many times this is sensible.

Detailed commit structure:

First line

Blank line

Rest of the text

Demo

Git tags

tags are custom labels or references that point to specific points in your git history

Generlaly used to capture a specific point in the repo history, like a release, or a report milestone.

Tip

You can think of a tag like a branch that doesn’t change

Unlike branches, tags don’t have any further history of commits.

Great tutorial on tags here

Git tags example

Types of tags

There are two types of tags: lightweight and annotated. The difference is the metadata they come with.

Best practice is lightweight tags for personal use, and annotated tags as marks for version releases

Annotated tags stores extra metadata like the tagger name, email and date

Lightweight tags are only a pointer to a commit.

Creating and sharing tags

To create an annotated tag:

git tag -a v 1.0

or

git tag -a v1.0 SHA1

This will create a new annotated tag with label v1.0. The command will open your text editor for further metadata.

Annotated tag example

Lightweight tags

git tag v1.0 will create a lightweight tag

Can you spot the difference?

Listing tags

git tag

git tag -n will also show us the associated message, or you can do git show tagname

Tagging old commits

By default, git tag creates a tag on HEAD

If you want to tag an old commit:

git log --oneline
git tag -a v1.2 15027957951b64cf874c3557a0f

Sharing tags

Sharing tags is similar to pushing branches.

By default, git push does not push tags

git push origin tagname will push the tag.

Viewing your tags on GitHub

Viewing your tags on GitHub

Checking tags

You can check out tags: git checkout v1.0

This will put you in a detached HEAD state, so commits won’t be recorded

Unless??

Deleting a tag

  • git tag: Lists tags
  • git tag -d v1: Deletes tag v1
  • git push origin --delete v1: Deletes it from the remote

Why use tags?

We could just use a branch (that never gets merged) instead of a tag.

For storing points in history, this is fine (although not the intention of a branch)

A branch is supposed to be a “variation” on the main repository, not a point-in-time record

Tags are designed to fill that void.

GitHub Issues

“I have a problem”

GitHub Issues

You can create a GitHub Issue on the web:

GitHub Issues

Your collaborators can see the issue:

More on issues

More info about issues here

Practice time

Week 9 Lesson

Important

  • Learn how to add references and bibliography
  • Dealing with large files
  • Tags
  • GitHub issues