Basics of using Git from the Command Line

Author(s)

Overview
Questions:

How can I start tracking my changes with git?

How do I commit changes?

How can I undo a mistake?

Objectives:

Create a repository

Commit a file

Make some changes

Use the log to view the diff

Undo a bad change

Requirements:

Foundations of Data Science

CLI basics: tutorial hands-on

Time estimation: 30 minutes

Last modification: Oct 19, 2022

License: Tutorial Content is licensed under Creative Commons Attribution 4.0 International License The GTN Framework is licensed under MIT

Comment: Source

This tutorial contains text from this tutorial by Robert Adolf (@rdadolf), which is licensed CC-BY.

Version control is a way of tracking the change history of a project. Even if you have never used a version control tool, you’ve probably already done it manually: copying and renaming project folders (“paper-1.doc”, “paper-2.doc”, etc.) is a form of version control. Within bioinformatics (from research, to development, to sysadmin) a lot of us are using git as our primary method of source control for everything we do: notes, slides, tutorials, code, notebooks, ansible, system configuration, and more.

Git is a tool that automates and enhances a lot of the tasks that arise when dealing with larger, longer-living, and collaborative projects. It’s also become the common underpinning to many popular online code repositories, GitHub being the most popular.

While it can be used collaboratively, this tutorial focuses on a single-user git repository for the most basic operations.

Agenda

In this tutorial, you will learn how to create a git repo, and begin working with it.

Why should you use version control?

Pre-requisites

Setting up a Repository

Adding Files

Logs

Branching

Undo! Revert!

Further Reading

Why should you use version control?

If you ask 10 people, you’ll get 10 different answers, but one of the commonalities is that most people don’t realize how integral it is to their development process until they’ve started using it. Still, for the sake of argument, here are some highlights:

You can undo anything: Git provides a complete history of every change that has ever been made to your project, timestamped, commented, and attributed. If something breaks, you always have the choice of going back to a previous tate.
You won’t need to keep undo-ing things: One of the advantages of using git properly is that by keeping new changes separate from a stable base, you tend to avoid the massive rollbacks associated with constantly tinkering with a single code.
You can identify exactly when and where changes were made (and by whom!): Git allows you to pinpoint when a particular piece of code was changed, so finding what other pieces of code a bug might affect or figuring out why a certain expression was added is easy.
Git forces teams to face conflicts directly: On a team-based project, many people are often working with the same code. By having a tool which understands when and where files were changed, it’s easy to see when changes might conflict with each other. While it might seem troublesome sometimes to have to deal with conflicts, the alternative—not knowing there’s a conflict—is much more insidious.

Pre-requisites

You will need to install git, if you have not done so already.

Setting up a Repository

Let’s create a new repository.

Hands-on: Create a Repository
Make a new directory where you will store your files, and navigate into it.
Input: Bash
mkdir git-tutorial; cd git-tutorial;
Create or “initialise” the git repository with the git init command.
Input: Bash
git init
Output
Initialized empty Git repository in /tmp/project/.git/

This has created a folder .git in your project directory, here is where git stores all of it’s data that it needs to track repository changes over time. It’s not terribly interesting yet though!

Hands-on: What's the status
You can always check the status of a repository with git status
Input: Bash
git status
Output
On branch main

No commits yet

nothing to commit (create/copy files and use "git add" to track)

Adding Files

Let’s add our first file, often a (pretty empty) readme file.

Hands-on: What's the status
Create a new file, readme.md with some basic content
Input: Bash
echo "My Project" > readme.md
Add a file with git add. This adds it to git’s staging area to be committed.
Input: Bash
git add readme.md
Commit the file! This will add it to git’s log.

It depends a lot on the community, some have specific style guides they enforce, some don’t, but in general

Keep the description short (<72 chars) and descriptive.

If you need, provide a long description as well, explaining your changes. (Use git commit without the -m flag!) A lot has been written about good commit messages, search the internet and find ideas for what you think makes a good commit message!

And beware of the trap we all fall into sometimes, unhelpful commit messages Even your author is very, very guilty of this, but you can do better!
Input: Bash
git commit -m "Add readme"
Output
[main (root-commit) f5ec14f] Add readme
 1 file changed, 1 insertion(+)
 create mode 100644 readme.md
Question: Is there anything left to do? Check the status

Check git status to see if there’s anything else left to resolve.
$ git status
On branch main
nothing to commit, working tree clean

Congratulations! You’ve made your first commit. The output of the commit command lists everything you’ve just done:

[main (root-commit) f5ec14f] Add readme
 1 file changed, 1 insertion(+)
 create mode 100644 readme.md

f5ec14f is the commit id, every commit you make is given a hash which uniquely refers to that specific commit. Next we see our commit message Add readme, a brief mention of how many files we’ve changed, and how many insertions or deletions we’ve made to the text, and lastly which files we’ve added.

Exercise: Make some more commits

Hands-on: Make some more commits
Add your name to the readme.md and commit your changes.
Input: Bash
echo "Author: hexylena" >> readme.md
git add readme.md
git commit -m 'Add author name'
Make up a project description, add it to the readme, and commit.
Input: Bash
echo "This project enables stakeholders to experience synergistic effects and increase link up opportunities to improve quarterly and YOY ROI.\n" >> readme.md
git add readme.md
git commit -m 'Add project description'
Pick a license for your project, and mention it in the readme.md, and commit.
Input: Bash
echo "# License\nAGPL-3.0" >> readme.md
git add readme.md
git commit -m 'Add project license'

After this step you should have ~3 commits to work with!

Logs

One of the most helpful things about git is that, if you have written good commit messages, you can tell what you did and when!

Hands-on: Check the Receipts

Check the log with git log. Notice that you can see each commit in reverse chronological order (newest at top), who made the commit, when, and what the commit message was.

Input: Bash
git log

Input: Output

commit 5d05eb3ec22fd49282b585c60ef8f983d68c2fd7
Author: Helena Rasche <hxr@hx42.org>
Date:   Mon Jun 13 12:13:21 2022 +0200

    Add project license

commit 62f974ec5f538232f65b016cf073815349364efa
Author: Helena Rasche <hxr@hx42.org>
Date:   Mon Jun 13 12:13:16 2022 +0200

    Add project description

commit 10355c019c04052c15a95a817de04f9ea0ec336c
Author: Helena Rasche <hxr@hx42.org>
Date:   Mon Jun 13 12:13:11 2022 +0200

    Add author name

commit f5ec14f05384d76812fc0576df5e4af79336f4e6
Author: Helena Rasche <hxr@hx42.org>
Date:   Mon Jun 13 11:59:23 2022 +0200

    Add readme

The output of git log is a great way to help you remember what you were doing.

Hands-on: `git log -p`

Use git log -p to see the log, along with which lines were changed in each commit.

But currently this log is pretty boring, so let’s replace a line and see how that looks.

Hands-on: Replace a line
Update your project description in the readme.md, you’ve been told you need to support completely different features.
Input: Bash
sed -i s'/enables.*ROI/creates baking recipes/g' readme.md
git add readme.md
git commit -m 'Update project description'
Check what happened with the git log -p:
Input: Output
$ git log -p
commit 416a121dfcda14de0c2cb181f298b2c08950475f (HEAD -> main)
Author: Helena Rasche <hxr@hx42.org>
Date:   Mon Jun 13 12:18:00 2022 +0200

    Update project description

diff --git a/readme.md b/readme.md
index befc0c9..3b8899e 100644
--- a/readme.md
+++ b/readme.md
@@ -1,6 +1,6 @@
 My Project
 Author: hexylena
-This project enables stakeholders to experience synergistic effects and increase link up opportunities to improve quarterly and YOY ROI.
+This project creates baking recipes.

 # License
 AGPL-3.0
This is a diff, a comparison between two versions of a file.
If you haven’t worked with diffs before, this can be something quite new or different.

If we have two files, let’s say a grocery list, in two files. We’ll call them ‘a’ and ‘b’.
Old
$ cat old
🍎
🍐
🍊
🍋
🍒
🥑
New
$ cat new
🍎
🍐
🍊
🍋
🍍
🥑
We can see that they have some different entries. We’ve removed 🍒 because they’re awful, and replaced them with an 🍍

Diff lets us compare these files
$ diff old new
5c5
< 🍒
---
> 🍍
Here we see that 🍒 is only in a, and 🍍 is only in b. But otherwise the files are identical.

There are a couple different formats to diffs, one is the ‘unified diff’
$ diff -U2 old new
--- old	2022-02-16 14:06:19.697132568 +0100
+++ new	2022-02-16 14:06:36.340962616 +0100
@@ -3,4 +3,4 @@
 🍊
 🍋
-🍒
+🍍
 🥑
This is basically what you see in the training materials which gives you a lot of context about the changes:

--- old is the ‘old’ file in our view

+++ new is the ‘new’ file

@@ these lines tell us where the change occurs and how many lines are added or removed.

Lines starting with a - are removed from our ‘new’ file

Lines with a + have been added.

So when you go to apply these diffs to your files in the training:

Ignore the header

Remove lines starting with - from your file

Add lines starting with + to your file

The other lines (🍊/🍋 and 🥑) above just provide “context”, they help you know where a change belongs in a file, but should not be edited when you’re making the above change. Given the above diff, you would find a line with a 🍒, and replace it with a 🍍

Added & Removed Lines

Removals are very easy to spot, we just have removed lines
--- old	2022-02-16 14:06:19.697132568 +0100
+++ new	2022-02-16 14:10:14.370722802 +0100
@@ -4,3 +4,2 @@
 🍋
 🍒
-🥑
And additions likewise are very easy, just add a new line, between the other lines in your file.
--- old	2022-02-16 14:06:19.697132568 +0100
+++ new	2022-02-16 14:11:11.422135393 +0100
@@ -1,3 +1,4 @@
 🍎
+🍍
 🍐
 🍊
Completely new files

Completely new files look a bit different, there the “old” file is /dev/null, the empty file in a Linux machine.
$ diff -U2 /dev/null old
--- /dev/null	2022-02-15 11:47:16.100000270 +0100
+++ old	2022-02-16 14:06:19.697132568 +0100
@@ -0,0 +1,6 @@
+🍎
+🍐
+🍊
+🍋
+🍒
+🥑
And removed files are similar, except with the new file being /dev/null
--- old	2022-02-16 14:06:19.697132568 +0100
+++ /dev/null	2022-02-15 11:47:16.100000270 +0100
@@ -1,6 +0,0 @@
-🍎
-🍐
-🍊
-🍋
-🍒
-🥑

Who did that? `git blame` to the rescue

If you want to know who changed a specific line of a file, you can use git blame to find out it was probably your fault (as most of us experience when we check the logs.)

Input: Bash
git blame readme.md

Input: Output

^f5ec14f (Helena Rasche 2022-06-13 11:59:23 +0200 1) My Project
10355c01 (Helena Rasche 2022-06-13 12:13:11 +0200 2) Author: hexylena
416a121d (Helena Rasche 2022-06-13 12:18:00 +0200 3) This project creates baking recipes.
62f974ec (Helena Rasche 2022-06-13 12:13:16 +0200 4)
5d05eb3e (Helena Rasche 2022-06-13 12:13:21 +0200 5) # License
5d05eb3e (Helena Rasche 2022-06-13 12:13:21 +0200 6) AGPL-3.0

here we can see for every line: which commit last affected it, who made that commit, and when.

Branching

Git has the concept of branches which are most often used to manage development over time, before it’s considered final. Until now you’ve seen main in your commits and commit logs (or maybe master if your git installation is a bit older.)

Oftentimes you’ll see this pattern:

There is a main branch with a lot of history
You want to test out a new option, new configuration, new script you’re working on
1. So you make a branch
2. Work on that branch
And merge it back into the main branch, once it’s done.

This is especially relevant for any project that is shared with others, has a public view, or a deployed version of the code. There you don’t want to affect anyone else using the project, or you don’t want to affect the production deployment, until you’re done making your changes.

Hands-on: Create a new branch
git switch -c <branch> is the command used to create a new branch and switch to it.
Input: Bash
git switch -c test
Input: Output
Switched to a new branch 'test'

If you look around, you’ll notice everything looks exactly the same! But in fact we are now on a different branch:

Hands-on: See available branches
git branch lists our available branches, and puts an asterisk next to the one we’re currently on.
Input: Bash
git branch
Input: Output

```bash main

test ```

We’re now on the test branch, so let’s make a commit.

Hands-on: Add a new file
Add a new file, let’s call it docs.md. Write something into it, it doesn’t matter much what.
Input: Bash
echo "# Project Documentation" > docs.md
Add it, commit it.
Input: Bash
git add docs.md
git commit -m "Added documentation"

This file now only exists on the testing branch.

Hands-on: Try Switching Branches
Try switching back and forth between the main and test branches, and check what’s available on each!
Input: Bash
git branch
Input: Bash
git switch main
ls
Input: Output
readme.md
Input: Bash
git switch test
ls
Input: Output
docs.md   readme.md

Each branch has a different view of the repository, and might have different changes on it. Branches are really useful to keep track of work in progress, until it’s done. In a single user environment however, most people often don’t use them, but once you’re collaborating with other’s they’re incredibly important!

Merging

Once you’re done with a branch, you can merge it into the main branch. This will take all of the work you did on that branch, and make it part of the main branch.

First, let’s compare the two branches, to see what changed.

Hands-on: Replacing argv.
Compare your current branch against the main branch with git diff main
Input: Bash
git diff main
Input: Output
diff --git a/docs.md b/docs.md
new file mode 100644
index 0000000..384aaaa
--- /dev/null
+++ b/docs.md
@@ -0,0 +1 @@
+# Project Documentation

We can see the output shows all of our changes compared to the main branch and it looks like what we want, so, let’s merge it in.

Hands-on: Merge the `test` branch into `main`
Switch to the main branch
Input: Bash
git switch main
Merge in the test branch
Input: Bash
git merge test
Input: Output
Updating 416a121..9a3387d
Fast-forward
 docs.md | 1 +
 1 file changed, 1 insertion(+)
 create mode 100644 docs.md

This has merged all of the changes you made on the test branch into the main branch.

Hands-on: Check the history

Check git log -p again to see the history.

Undo! Revert!

Oh no, you’ve decided you liked your original project description better. Let’s find that commit and revert it.

Hands-on: Find and revert the bad commit
Find the commit you want to revert, e.g. with git log, find the one named “Update project description” (or similar.)
Input: Bash
git log
We can use the git revert command to undo this commit.
Input: Bash
git revert 416a121dfcda14de0c2cb181f298b2c08950475f

This generates a new commit, which reverts the older commit (and probably puts you in a text editor to edit the commit message). This is not the only way to undo mistakes, but probably the easiest.

If you check your git log you’ll see the change was undone in a second commit, reverting the first. So if you just look at the current files it appears we never undid it, but within the logs we can see the undo step.

With that you’ve got enough skills to track your own data/code/etc with git!

Frequently Asked Questions

Have questions about this tutorial? Check out the FAQ page for the Foundations of Data Science topic to see if your question is listed there. If not, please ask your question on the GTN Gitter Channel or the Galaxy Help Forum

Feedback

Did you use this material as an instructor? Feel free to give us feedback on how it went.
Did you use this material as a learner or student? Click the form below to leave feedback.

Citing this Tutorial

, 2022 Basics of using Git from the Command Line (Galaxy Training Materials). https://training.galaxyproject.org/training-material/topics/data-science/tutorials/git-cli/tutorial.html Online; accessed TODAY
Batut et al., 2018 Community-Driven Data Analysis Training for Biology Cell Systems 10.1016/j.cels.2018.05.012

@misc{data-science-git-cli,
author = "Helena Rasche",
title = "Basics of using Git from the Command Line (Galaxy Training Materials)",
year = "2022",
month = "10",
day = "19"
url = "\url{https://training.galaxyproject.org/training-material/topics/data-science/tutorials/git-cli/tutorial.html}",
note = "[Online; accessed TODAY]"
}
@article{Batut_2018,
    doi = {10.1016/j.cels.2018.05.012},
    url = {https://doi.org/10.1016%2Fj.cels.2018.05.012},
    year = 2018,
    month = {jun},
    publisher = {Elsevier {BV}},
    volume = {6},
    number = {6},
    pages = {752--758.e1},
    author = {B{\'{e}}r{\'{e}}nice Batut and Saskia Hiltemann and Andrea Bagnacani and Dannon Baker and Vivek Bhardwaj and Clemens Blank and Anthony Bretaudeau and Loraine Brillet-Gu{\'{e}}guen and Martin {\v{C}}ech and John Chilton and Dave Clements and Olivia Doppelt-Azeroual and Anika Erxleben and Mallory Ann Freeberg and Simon Gladman and Youri Hoogstrate and Hans-Rudolf Hotz and Torsten Houwaart and Pratik Jagtap and Delphine Larivi{\`{e}}re and Gildas Le Corguill{\'{e}} and Thomas Manke and Fabien Mareuil and Fidel Ram{\'{\i}}rez and Devon Ryan and Florian Christoph Sigloch and Nicola Soranzo and Joachim Wolff and Pavankumar Videm and Markus Wolfien and Aisanjiang Wubuli and Dilmurat Yusuf and James Taylor and Rolf Backofen and Anton Nekrutenko and Björn Grüning},
    title = {Community-Driven Data Analysis Training for Biology},
    journal = {Cell Systems}
}
                   

Congratulations on successfully completing this tutorial!

Basics of using Git from the Command Line

Why should you use version control?

Pre-requisites

Setting up a Repository

Adding Files

Exercise: Make some more commits

Logs

Added & Removed Lines

Completely new files

Who did that? git blame to the rescue

Branching

Merging

Undo! Revert!

Further Reading

Frequently Asked Questions

Feedback

Citing this Tutorial

Who did that? `git blame` to the rescue