ETC5513: Reproducible and Collaborative Practices

Reproducible reporting and version control systems

Lecturer: Michael Lydeamore

Department of Econometrics and Business Statistics



Open Frame

Recap

  1. More Git/GitHub tools
  2. Pull requests: a tool to collaborate with others via GitHub
  3. VSCode

Today’s plan

Aim

  • Learn about git rebase
  • Learn about git fetch and git merge
  • Learn about staging and unstaging files
  • Learn how to undo some changes

Rebase

In git, there are two main ways to integrate changes from one branch into another:

  1. git merge
  2. git rebase

Merging branches using git merge

If we use git merge to combine diverging branches, we will have a non-linear history.

Question

In which situation can we merge branches and have a linear history?

Rebase

Rebasing is the process of moving or combining a sequence of commits from a branch onto another branch.

  • Rebasing is most useful and easily visualised in the context of a feature branching workflow
  • Rebasing changes the base of your branch from one commit to another, making it appear as though you’d created your branch from a different commit.

Important

Rebasing moves an entire branch to another location in the repository

It can create a cleaner history if you don’t want merge lines everywhere.

Rebase in practice

Assume the following history exists and the current branch is feature:

          E<---F<---G feature
         /
    A<---B<---C<---D main

If we want to rebase the commits in the feature branch into the main branch, we need to do the following:

  • git checkout feature and
  • git rebase main

or

  • git rebase main feature
        feature    E'<--F'<--G' 
                  /
 A<---B<---C<---D main

which results in:

   A<---B<---C<---D<---E'<--F'<--G' main

Rebasing can be dangerous

Git rebase and merge

  • Merging is a non-destructive operation. The existing branches are not changed in any way, and this avoids all the potential problems of rebasing.

  • Rebasing moves the entire feature branch to begin on the tip of the main branch, incorporating all of the new commits into main.

  • Rebasing rewrites the projecth istory by creating brand nwe commits for each commit in the original branch, giving a cleaner history

  • However, this creates problems with safety and traceability

Golden rule for rebase

Never use it on public branches (such as main) in collaborative projects.

More on branching

Imagine that you are working on your local repository and a collaborator has created a new branch in your remote repo. You are currently working on your local repo and want to have a look at the new branch. That means that the local repo and your remote repo have diverged. That is, both the local and remote repositories are not currently synchronized.

  • To synchronize your work: git fetch origin
  • git fetch origin looks where origin is and fetches any data from it that you don’t yet have.
  • It also updates your local database repo, moving your origin/main pointer (HEAD) to its new, more up-to-date position. However, it does not move the HEAD of your local repository

About remotes

Note: If a git repo contains more than one remote, then git fetch will fetch all the changes from all remotes.

To fetch only one, use git fetch origin (or whatever remote you are after).

Why use fetch?

origin/main is the remote tracking branch, which provides information about where the main branch is in origin

How does fetch work?

git fetch downloaded the new B commit however our local working directory is not updated, and the head of our main branch is still pointing to commit A.

How do we merge those branches?

We need to combined main branch with the remote tracking origin/main branch. How?

By merging!

First, we need to move in the main branch and then merge origin/main

git checkout main
git merge origin/main

If the branches have not diverged, we can merge without conflict via a ‘fast merge’.

If the branches have diverged, we will need to resolve conflicts.

You can check the status of the local and remote branches by using git branch -vv

Very useful commands for fetching

  • git remote lets you create, view, and delete connections to remote repositories.
  • git branch -vv allows you to check the status of your local and remote branches in relation to each other.
  • git fetch origin fetches the changes from remote origin
  • git branch -a lists all the branches available in the local repository + all the branches fetched from the remote.

Tip

The branches fetched from the remote origin would be preceded by remotes/origin/

Undoing an error

Not everything we do on a project will be worth keeping. We’ve seen already one way to undo some work (git stash). But, that relies on not having already committed the changes.

What if we’ve made a commit, and then realise we don’t want that commit anymore?

We can reset

Or revert

Undoing an error

First, we should discuss checkout. We’ve talked alot about HEAD - that’s the current pointer of the repository.

We’ve seen git checkout before too: When swapping between branches!

Fundamentally, branches are just commits with a slightly different pointer. That means, we can checkout to a specific commit.

One option to go backwards on the git tree and make a new set of changes is to checkout to a commit hash, make the new changes, and then push and manage the merge conflicts.

Undoing an error

Here’s our git tree:

(a) --- (b) --- (c) --- (d) <- HEAD, main

When we checkout b, our git tree changes to:

(a) --- (b) --- (c) --- (d) <- main
          \ 
          main

Note that doing this will create diverging histories and so is generally to be avoided. You can overwrite history using git push -f, but this is really discouraged.

Reset or Revert

There are two ways to change commit history of a repository: reset and revert.

Tip

Generally, reset is for when the commit isn’t public, and revert is when you’ve already made a public commit.

This is because reset changes the commit history, and revert does not.

Week 6 Lesson

Recap

  • Learn about git rebase
  • Learn about git fetch and git merge
  • Learn about staging and unstaging files
  • Learn how to undo some changes

Trivia

  • 9 questions 5min

  • 1 team challenges 10min

  • 5 questions + 2 Drawing challenges 10min

A speed quiz!

Challenge: 10 minutes

As a team get organize to do the following:

  1. Create a public repository that contains a README file.

  2. Create a pull request where one of the team members suggest a modification of the README file.

  3. Successfully incorporate the pull request.

Questions and drawing challenge: 5 minutes

  1. What is the Git command to merge the current branch with the branch called “Mybranch”?

  2. What is the Git command to delete the branch “Mybranch”?

  3. What is the Git command to get all the change history in the remote repository “origin”?

  4. Git pull is a combination of?

  5. Draw a situation where a conflict between different branches will arise?

  6. Draw a situation where a conflict between your local and your remote repository will arise?

Challenge: 10 minutes

As a team get organised to do the following:

  1. One of the members creates a public repository using the standard Rstudio Quarto template. After committing those changes to the remote repository, please invite the other members.
  2. Each team member creates an individual branch in which they will need to add a file name as studentname.R.
  3. Merge all the branches into the main branch.
  4. The main branch should contain the README file and all the files that each of the team member has created.