Introduction to (source code) version control
Marek Šuppa
Ondrej Jariabka
Adrián Matejov
git
for Data Science?git
for Data Science?essay.doc
, essay_v2.doc
, essay_final.doc
)git
for Data Science?Helps avoid "versioning hell" (you know, files like essay.doc
, essay_v2.doc
, essay_final.doc
)
Gives you the ability to "jump in time"
git
for Data Science?Helps avoid "versioning hell" (you know, files like essay.doc
, essay_v2.doc
, essay_final.doc
)
Gives you the ability to "jump in time"
Helps you make your work "reproducible"
git
for Data Science?Helps avoid "versioning hell" (you know, files like essay.doc
, essay_v2.doc
, essay_final.doc
)
Gives you the ability to "jump in time"
Helps you make your work "reproducible"
Makes it a bit more straightforward to work on common (larger) projects with others
The Global Information Tracker(TM)
Linus actually claims it does not mean anything...
A distributed version control system
Will keep track of the changes you make to the files it tracks
A distributed version control system
Will keep track of the changes you make to the files it tracks
If you screw things up (i.e. accidentally remove some file), you can get back to previous state
A distributed version control system
Will keep track of the changes you make to the files it tracks
If you screw things up (i.e. accidentally remove some file), you can get back to previous state
Allows for these changes to be easily transferred to others
A distributed version control system
Will keep track of the changes you make to the files it tracks
If you screw things up (i.e. accidentally remove some file), you can get back to previous state
Allows for these changes to be easily transferred to others
Originally designed as a source code version control system for the Linux kernel
A distributed version control system
Will keep track of the changes you make to the files it tracks
If you screw things up (i.e. accidentally remove some file), you can get back to previous state
Allows for these changes to be easily transferred to others
Originally designed as a source code version control system for the Linux kernel
Free and open-source software distributed under GNU GPL2 license
A distributed version control system
Will keep track of the changes you make to the files it tracks
If you screw things up (i.e. accidentally remove some file), you can get back to previous state
Allows for these changes to be easily transferred to others
Originally designed as a source code version control system for the Linux kernel
Free and open-source software distributed under GNU GPL2 license
Currently the standard for source code versioning
Git stores its metadata (along with "snapshots") in a special .git
folder
A folder which contains this .git
folder is called a "repository"
Git stores its metadata (along with "snapshots") in a special .git
folder
A folder which contains this .git
folder is called a "repository"
A new repository can be initialized using the git init
command
$ mkdir repo$ cd repo$ git initInitialized empty Git repository in /tmp/repo/.git/$ ls -alhtotal 84Kdrwxrwxr-x 3 mrshu mrshu 4.0K Nov 28 14:39 .drwxrwxrwt 108 root root 72K Nov 28 14:39 ..drwxrwxr-x 7 mrshu mrshu 4.0K Nov 28 14:39 .git
git status
$ git statusOn branch masterNo commits yetnothing to commit (create/copy files and use "git add" to track)
git status
$ git statusOn branch masterNo commits yetnothing to commit (create/copy files and use "git add" to track)
$ echo "My Analysis" > README$ git statusOn branch masterNo commits yetUntracked files: (use "git add <file>..." to include in what will be committed) READMEnothing added to commit but untracked files present (use "git add" to track)
git status
said, we can use git add
to track a file$ git add README$ git statusOn branch masterNo commits yetChanges to be committed: (use "git rm --cached <file>..." to unstage) new file: README
git status
said, we can use git add
to track a file$ git add README$ git statusOn branch masterNo commits yetChanges to be committed: (use "git rm --cached <file>..." to unstage) new file: README
git commit
$ git commit[master (root-commit) b3d8a54] Add README 1 file changed, 1 insertion(+) create mode 100644 README
Note: git commit
opens up your default editor, most likely vim
.
The whole process once again, visualized in a pretty picture
Let's use the same process to add one more file.
$ echo "Licensed under the terms of the CC-0 license." > LICENSE
Let's use the same process to add one more file.
$ echo "Licensed under the terms of the CC-0 license." > LICENSE
The git
commands are essentially the same:
$ git statusOn branch masterUntracked files: (use "git add <file>..." to include in what will be committed) LICENSEnothing added to commit but untracked files present (use "git add" to track)$ git add LICENSE$ git commit[master 98bef79] Add LICENSE 1 file changed, 1 insertion(+) create mode 100644 LICENSE
Suppose we change one of the files we track:
$ echo -e "\nThis repo contains the analysis of git usage." >> README
Suppose we change one of the files we track:
$ echo -e "\nThis repo contains the analysis of git usage." >> README
Running git status
shows what has happened:
$ git statusOn branch masterChanges not staged for commit: (use "git add <file>..." to update what will be committed) (use "git restore <file>..." to discard changes in working directory) modified: READMEno changes added to commit (use "git add" and/or "git commit -a")
Using the git diff
command, we can see what has changed:
$ git diffdiff --git a/README b/READMEindex dd0c36f..302b24f 100644--- a/README+++ b/README@@ -1 +1,3 @@ My Analysis++This repo contains the analysis of git usage.
Using the git diff
command, we can see what has changed:
$ git diffdiff --git a/README b/READMEindex dd0c36f..302b24f 100644--- a/README+++ b/README@@ -1 +1,3 @@ My Analysis++This repo contains the analysis of git usage.
And we can again add the change in, just like before:
$ git add README$ git commit[master cd18d6e] Add some more description to README 1 file changed, 2 insertions(+)
git log
:$ git logcommit 96cee8d998f7306527fa360cb2dda6edb1dffc2f (HEAD -> master)Author: mrshu <mr@shu.io>Date: Mon Nov 30 13:30:45 2020 +0000 Add some more description to READMEcommit 98bef799dad1374e9a6bdd3cb0e31ab98d90f028Author: mrshu <mr@shu.io>Date: Sat Nov 28 21:28:28 2020 +0000 Add LICENSEcommit b3d8a54c03255fa93355edc78c3494e4b4c4ef4aAuthor: mrshu <mr@shu.io>Date: Sat Nov 28 20:54:02 2020 +0000 Add README
git init
git status
git diff
git add [file]
[file]
(add to staging area, so that it can be committed)git commit
A brief introduction into Git internals
Git stores snapshots, not diffs between commits
Git stores snapshots, not diffs between commits
Everything in Git is referenced by (and validated via) cryptographic hashes
Everything has a SHA1
Among other things, this ensures the integrity of the data
Each commit links to its "parent" (if it has one).
In its simplest form, a repository is a set of linked commits.
Each commit links to its "parent" (if it has one).
In its simplest form, a repository is a set of linked commits.
But how does Git know what commit are we currently on?
HEAD
isHEAD
is a special file which says which commit is the repository pointing to.
$ cat .git/HEAD ref: refs/heads/master
HEAD
isHEAD
is a special file which says which commit is the repository pointing to.
$ cat .git/HEAD ref: refs/heads/master
Just as above, HEAD
can point to "references" -- other files with actual hashes.
$ cat .git/refs/heads/master 96cee8d998f7306527fa360cb2dda6edb1dffc2f
These references are also called branches (or tags).
A quick look at (probably) the most famous Git feature.
Creating a new branch is easy -- just run git branch [branchname]
.
Creating a new branch is easy -- just run git branch [branchname]
.
Internally, Git creates a new pointer with the name of your branch.
Creating a new branch is easy -- just run git branch [branchname]
.
Internally, Git creates a new pointer with the name of your branch.
It will point to the same commit as HEAD
did at the time.
Creating a new branch is easy -- just run git branch [branchname]
.
Internally, Git creates a new pointer with the name of your branch.
It will point to the same commit as HEAD
did at the time.
$ git branch testing
Just as we see in the git log
below, we are still at the master
branch:
$ git log --oneline --decorate96cee8d (HEAD -> master, testing) Add some more description to README98bef79 Add LICENSEb3d8a54 Add README
Just as we see in the git log
below, we are still at the master
branch:
$ git log --oneline --decorate96cee8d (HEAD -> master, testing) Add some more description to README98bef79 Add LICENSEb3d8a54 Add README
But we can easily switch with the git checkout
:
$ git checkout testingSwitched to branch 'testing'$ git log --oneline --decorate96cee8d (HEAD -> testing, master) Add some more description to README98bef79 Add LICENSEb3d8a54 Add README
Just as we see in the git log
below, we are still at the master
branch:
$ git log --oneline --decorate96cee8d (HEAD -> master, testing) Add some more description to README98bef79 Add LICENSEb3d8a54 Add README
But we can easily switch with the git checkout
:
$ git checkout testingSwitched to branch 'testing'$ git log --oneline --decorate96cee8d (HEAD -> testing, master) Add some more description to README98bef79 Add LICENSEb3d8a54 Add README
And check which branch we are on with git branch
:
$ git branch * testing master
Here is what the situation looks like, visually.
Before:
After (running git checkout testing
):
git checkout
can also be used on anything else that resolves to a Git commit (like tags, HEAD
and others)
Let's suppose we make some changes in the current (testing
) branch
$ echo "print('Analysis is done in here')" > analysis.py
And commit them to the repository.
$ git commit [testing f0aa1ae] Add analysis.py 1 file changed, 1 insertion(+) create mode 100644 analysis.py
Let's suppose we make some changes in the current (testing
) branch
$ echo "print('Analysis is done in here')" > analysis.py
And commit them to the repository.
$ git commit [testing f0aa1ae] Add analysis.py 1 file changed, 1 insertion(+) create mode 100644 analysis.py
Visually, the situation will look as follows:
But what if we'd like to go back and make licensing clearer in README?
But what if we'd like to go back and make licensing clearer in README?
Not a bit deal. We'll checkout master
and add the changes there.
$ git checkout masterSwitched to branch 'master'
$ echo -e "\n\nThis project is released to the public domain." >> README$ git diffdiff --git a/README b/READMEindex 302b24f..82b5c99 100644--- a/README+++ b/README@@ -1,3 +1,6 @@ My Analysis This repo contains the analysis of git usage.+++This project is released to the public domain.$ git add README$ git commit[master f9aa801] Update README to mention licensing Date: Mon Nov 30 14:24:18 2020 +0000 1 file changed, 3 insertions(+)
By making changes in both the master
and the testing
branch we have created a so called "divergent history".
$ git log --oneline --decorate --graph --all * f9aa801 (HEAD -> master) Update README to mention licensing | * f0aa1ae (testing) Add analysis.py |/ * 96cee8d Add some more description to README * 98bef79 Add LICENSE * b3d8a54 Add README
To get out of the divergent history state, we can merge the histories together.
$ git merge testingMerge made by the 'recursive' strategy. analysis.py | 1 + 1 file changed, 1 insertion(+) create mode 100644 analysis.py
To get out of the divergent history state, we can merge the histories together.
$ git merge testingMerge made by the 'recursive' strategy. analysis.py | 1 + 1 file changed, 1 insertion(+) create mode 100644 analysis.py
And here is what the history looks like now:
$ git log --oneline --decorate --graph --all * 0c1dc34 (HEAD -> master) Merge branch 'testing' |\ | * f0aa1ae (testing) Add analysis.py * | f9aa801 Update README to mention licensing |/ * 96cee8d Add some more description to README * 98bef79 Add LICENSE * b3d8a54 Add README
To get out of the divergent history state, we can merge the histories together.
$ git merge testingMerge made by the 'recursive' strategy. analysis.py | 1 + 1 file changed, 1 insertion(+) create mode 100644 analysis.py
And here is what the history looks like now:
$ git log --oneline --decorate --graph --all * 0c1dc34 (HEAD -> master) Merge branch 'testing' |\ | * f0aa1ae (testing) Add analysis.py * | f9aa801 Update README to mention licensing |/ * 96cee8d Add some more description to README * 98bef79 Add LICENSE * b3d8a54 Add README
Once we are done with it, we can also delete the testing branch:
$ git branch -d testingDeleted branch testing (was f0aa1ae).
Here is what the situation looked like before:
Here is what the situation looked like before:
And after:
git branch
git branch [branchname]
HEAD
git checkout [branchname]
[branchname]
branchgit log --oneline --decorate --graph --all
git merge [branchname]
[branchname]
into the current branchgit branch -d
Using Git to collaborate with others
$ git clone https://gitlab.com/vidriduch/davos-hall-of-fame.gitCloning into 'davos-hall-of-fame'...remote: Enumerating objects: 7, done.remote: Counting objects: 100% (7/7), done.remote: Compressing objects: 100% (5/5), done.remote: Total 7 (delta 0), reused 0 (delta 0), pack-reused 0Unpacking objects: 100% (7/7), 606 bytes | 101.00 KiB/s, done.
This creates a new directory called davos-hall-of-fame
, with a repository.
$ git clone https://gitlab.com/vidriduch/davos-hall-of-fame.gitCloning into 'davos-hall-of-fame'...remote: Enumerating objects: 7, done.remote: Counting objects: 100% (7/7), done.remote: Compressing objects: 100% (5/5), done.remote: Total 7 (delta 0), reused 0 (delta 0), pack-reused 0Unpacking objects: 100% (7/7), 606 bytes | 101.00 KiB/s, done.
This creates a new directory called davos-hall-of-fame
, with a repository.
$ cd davos-hall-of-fame/
In which we can create a new branch, in which we'll make our changes:
$ git checkout -b mrshu/hall-of-fameSwitched to a new branch 'mrshu/hall-of-fame'
(git checkout -b
creates a new branch and switches into it right away)
Let's add my name to hall_of_fame.md
:
$ echo "mrshu" >> hall_of_fame.md$ git diffdiff --git a/hall_of_fame.md b/hall_of_fame.mdindex e69de29..01dd831 100644--- a/hall_of_fame.md+++ b/hall_of_fame.md@@ -0,0 +1 @@+mrshu
Let's add my name to hall_of_fame.md
:
$ echo "mrshu" >> hall_of_fame.md$ git diffdiff --git a/hall_of_fame.md b/hall_of_fame.mdindex e69de29..01dd831 100644--- a/hall_of_fame.md+++ b/hall_of_fame.md@@ -0,0 +1 @@+mrshu
And let's commit it in.
$ git add hall_of_fame.md$ git commit[mrshu/hall-of-fame 22573c5] Add mrshu to hall_of_fame.md 1 file changed, 1 insertion(+)
Let's add my name to hall_of_fame.md
:
$ echo "mrshu" >> hall_of_fame.md$ git diffdiff --git a/hall_of_fame.md b/hall_of_fame.mdindex e69de29..01dd831 100644--- a/hall_of_fame.md+++ b/hall_of_fame.md@@ -0,0 +1 @@+mrshu
And let's commit it in.
$ git add hall_of_fame.md$ git commit[mrshu/hall-of-fame 22573c5] Add mrshu to hall_of_fame.md 1 file changed, 1 insertion(+)
And push it in:
$ git push --set-upstream origin mrshu/hall-of-fame
Git as a technology is completely independent from the "web frontends", such as GitHub and GitLab
By learning to use Git you learn the fundamentals that power all of them
Git as a technology is completely independent from the "web frontends", such as GitHub and GitLab
By learning to use Git you learn the fundamentals that power all of them
GitHub and/or GitLab are businesses
git
for Data Science?Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |