ETC5513: Collaborative and Reproducible Practices

Tutorial 7

Author

Michael Lydeamore

Published

8 May 2025

Cleaning Up Your Commit History

With git rm and .gitignore

🧭 Goal

Learn how to remove files from Git tracking with git rm, and prevent them from being re-added with a .gitignore file.


1️⃣ Create and Clone a Repo

On GitHub

  1. Create a new repository:

    git-ignore-cleanup
  2. ✅ Check “Add a README file”


In RStudio

  1. Go to File → New Project → Version Control → Git
  2. Paste the repo URL (SSH or HTTPS)
  3. Choose a location and click Create Project

✅ You’re now working in a Git-tracked project.


2️⃣ Add and Commit a File

  1. Create a new qmd in RStudio:

    • Go to File → New File → Quarto Document
  2. Save the file as notes.qmd

  3. Add some content like:

    summary(mtcars)
  4. Stage and commit the file:

    "Add analysis script"

3️⃣ Accidentally Add a Data File

  1. Download the data from Week 2 on Moodle, and save it into your project as data.csv

  2. Stage and commit:

    git add data.csv
    git commit -m "Add raw data"

4️⃣ Remove the File from Git, But Keep It Locally

Realise you don’t want this in version control, but you still need it for local use.

In the terminal:

git rm --cached data.csv
git commit -m "Stop tracking data.csv"

data.csv is still on your computer, but Git will no longer track it.


5️⃣ Add It to .gitignore

To prevent it from being accidentally added again:

  1. Open (or create) a .gitignore file in your repo root

  2. Add:

    data.csv
  3. Save and stage the .gitignore file

  4. Commit:

    "Ignore data.csv"

🔍 Check It Worked

  1. Run:
git status

✅ You should not see data.csv listed anywhere — Git is now ignoring it.


3️⃣ Squash Commits with Interactive Rebase

Let’s now try squashing a few commits into one clean one.


Step 1: Make a Messy Commit History

  1. Edit your .qmd file and make 3 separate commits:

    • Add a new section or chunk → commit: "Add section"
    • Fix a typo → commit: "Fix typo"
    • Add a final comment → commit: "Add footnote"

✅ Commit after each change using the Git pane or terminal.


Step 2: Check Your Commit History

Run in the Terminal:

git log --oneline

You should see something like:

c3d4e5f Add footnote
b2c3d4e Fix typo
a1b2c3d Add section
...

Step 3: Start an Interactive Rebase

git rebase -i HEAD~3

You’ll see:

pick a1b2c3d Add section
pick b2c3d4e Fix typo
pick c3d4e5f Add footnote

Step 4: Squash the Commits

Change it to:

pick a1b2c3d Add section
squash b2c3d4e
squash c3d4e5f

Save and write a new combined commit message like:

Add section with typo fix and footnote

Save again to finish the rebase.


Step 5: Confirm It Worked

Run:

git log --oneline

✅ You should now see one clean commit where there were three.


🧠 Reflect

  • Why is --amend useful when working on a single file?
  • When is it good practice to squash commits?
  • What would happen if you did this after pushing?

✅ Summary

Action Command
Fix your last commit git commit --amend
Combine multiple commits git rebase -i HEAD~N
Keep your history clean Use these before pushing

🎉 You’ve just learned to write cleaner, more professional commit histories!


🔁 Extension Activity: Merge vs Rebase

🧭 Goal

Understand the difference between git merge and git rebase by applying both to the same branches and comparing the result.


1️⃣ Setup: Create a Feature Branch

In your GitHub-connected RStudio project:

  1. Create a file: experiment.R

  2. Add one line:

    Main branch version
  3. Save, stage, and commit:

    "Add base file on main"
  4. Create a new branch called feature:

git switch -c feature

2️⃣ Add Work on the Feature Branch

  1. Edit experiment.R again:

    Feature branch addition
  2. Save and commit:

    "Add feature content"

✅ You now have two commits on separate branches.


3️⃣ Add a Change to main

  1. Switch back to main:
git switch main
  1. Add to the file again:

    Main branch additional note
  2. Save and commit:

    "Add note on main branch"

📊 At This Point…

Your Git history looks like this:

          A---B  (feature)
         /
    ---O---C  (main)
  • O = Original commit
  • A = Feature commit
  • C = Main branch commit
  • B = We’ll merge or rebase next

4️⃣ Option A: Merge the Feature Branch

git merge feature

You’ll get a merge commit, like this:

          A---B  (feature)
         /     \
    ---O---C-----M  (main)

✅ History shows a clear branching path and merge point.


5️⃣ Option B: Try It Again with Rebase

This will recreate the same setup and use rebase instead of merge.

  1. Reset the last merge:
git reset --hard HEAD~1
  1. Switch to the feature branch:
git switch feature
  1. Rebase it onto main:
git rebase main
  1. Now go back to main and fast-forward:
git switch main
git merge feature

📊 After Rebase

Your Git history now looks like:

    ---O---C---A'  (main, feature)
  • A' is a new version of A, replayed on top of C
  • No merge commit needed — linear history

🧠 Reflect

  • What’s the key difference between merge and rebase?
  • Which history is easier to read?
  • When is a merge preferred?
  • Why must you be careful rebasing pushed commits?

✅ Summary

Action Result
git merge feature Preserves both branches + merge commit
git rebase main (on feature) Rewrites feature history as linear
Use merge after pushing ✅ Safe for shared work
Use rebase before pushing ✅ Keeps history clean

🎉 You’ve now seen both strategies in action — use the right one for the right job!