Repositories, a Git Tutorial
Table of Contents
- Table of Contents
This page will be completely useless to those who know how to use a repository. Students need to understand how repositories work, since almost no job will let you work completely alone. The repository we’re going to use is git, since it is a de facto standard for distributed repositories. There are lots of good tutorials for understanding git, but none are that simple. Here I want to let you see how you will work with others, as simply as possible. In this tutorial we will not use the command line, we will use a graphical interface: Atlassian’s SourceTree, which is one of the best I’ve seen around. It is available for both MacOS X and Windows.
The main tenets of distributed repositories are simple:
Thou shall work on your own computer, and your computer only Thou shall commit often, thou shall commit early
So, every action you will take is bounded to your own PC. No exchange is done with your collaborators. We will learn how to work with other people from the simplest point of view: we are the masters of our repository, and we will get in touch with others just when we want.
Committing, i.e., storing changes to your repository along with a useful message that describes what you’ve done, is not only encouraged, but let me say this: it is mandatory.
Usually you will create a repository in a remote location, for instance Atlassian’s BitBucket. Why? For two factors:
- Disaster recovery
The last one is easy to understand: if you work on your computer, and only there, what happens if your hard drive dies? So, having a third-party handling a copy of your repository is good.
The first one is the bridge between you and others. If you work on your computer, how will others access your changes? We have two options: opening ports on your computer and creating accounts, or using a third-party hosting. It isn’t good to open your computer to others, moreover, what would happen if you close your laptop? Others won’t be able to access your changes.
We’re left with one option: going external. So let’s clone the repository.
What have you technically done? You have copied all the changes since the creation of the repository. So you have a complete clone on your computer. The remote location will be retained (remember you’ve cloned it), and it is properly called “origin” (in the window under “Remotes”). Your local repository is called “working copy”, since it is the location where you will work.
Now we have our local repository, and we’re ready to work. Let’s add two text files, just for exemplification, called a.txt and b.txt. After adding them, you will see that, if you open the repository in SourceTree, there is something in your working copy.
This repository management software, git, work simply with changes. Anything may change: you may add a file (a change), you may delete one file (a change), or you can change a file.
So, once you take a look at all your changes, you need to add your changes to the list of changes that you deem good to go (you may see the “Add” button). This is called “staging”. Once you stage them, they will move in the upper side of the window, as in the picture. Then we will commit, and you must supply a message for your commit.
A commit is a point in your timeline where changes are stable. You can picture this in the following diagram.
But what happened really? Taking a look at the repository window, we will see that on the right, under “Branches”, we have an item called “master”. This is a branch.
A repository is like a tree. It starts from the root creating the repository, and therefore creating the main branch, that is called “master”. Our local repository has a branch, and since it is a clone, it will be called as the origin: master.
Note that this cloning effectively creates a new local branch, and since we just cloned, we have no connection whatsoever with the origin. We can inspect our commit in the window, noting all changes that we’ve made.
Again, there is no connection with the origin, in fact, if we open the origin in the “Remotes” item, we won’t see anything. In fact, nothing happened remotely: we’ve just created a repository there, and no changes were applied. Remember: everything we do, is just local. So how could we possibly share work with others?
We are now ready to connect our local repository to the origin, or in git’s terms, we are going to track the origin’s master branch with our local master branch. We will do this by pushing our changes to the origin.
A push is the action of sending all your commits to the remote location. This ensures that all our work won’t die with our hard drive; in the following it will allow us to work with other people, too. So let’s hit the “Push” button. We are prompted with only one choice: selecting what branch we’d like to push, and if we want to track it.
Of course we’d like to push our local master branch to the origin’s master branch, and we’d like to track it. It means that all changes that will happen remotely, will be seen and we will be alerted of this. This fact will be handy when working with other people, as we will see.
So now all changes will be tracked (to and from the origin), and every commit that happened will appear in the origin (you can check this out by going to the web interface). Also, in your window under “Branches”, i.e., your local repository, the last commit will have not only the “master” tag, but also the “origin/master” one.
As a side note, all commits have a unique identification, which is actually a SHA hash.
So now we’re ready to work with other people. But how can we ensure that our changes won’t conflict? Before continuing we need to create a new branch so that we won’t work on the master. A rule of thumb is the following:
You shall never use the master, you shall always commit to a separate branch.
When you hit the “Branch” button you will be prompted with a choice: the name of the new branch, and if we want to checkout the new branch, which means in the git jargon, to switch your local copy to the new branch.
The main window for our repository will then make this branch appear. A small icon will show which branch we’ve currently checked out.
We have now created an alternate timeline of changes, so we will work on this branch:
Just for the sake of trying, we will now change a file, commit, and push our changes to the origin, tracking the branch as seen early. Needless to say, on the remote location we will have two branches now: “master” and “foo”, as you can see from the “Remotes” section opening “origin”, as well as see the tracked branch in your “Branches” section (in your “foo” branch, of course).
Note that you could push every branches you have, with SourceTree. However, it is advisable to push only what you intend to: your working branch “foo”. The timeline will be as follows, reflecting all changes that happened locally.
Now we are ready to go multiuser. Things will get tricky, but very simple.
Let’s now assume the existence of other user, and that we’ve granted access to our repository, the “origin”. This is done by adding the said user via your repository provider, in our case, we will use BitBucket’s web interface.
Now our colleague will work on the repository. In order to do so, he will follow the same steps as above: cloning it, branching, and pushing commits. In our example, we will know that our friend created a new branch called “bar”. If we check our repository window under “Remotes”, we will see that our origin now has three branches: master, foo, and bar**.
What we want now, is to get his branch on our local system. This operation is called “checking out”. The term means, as we saw earlier, that we switch to a specific branch: we could now switch to our friend’s “bar” branch by hitting the “Checkout” button, and choosing “Checkout New Branch”.
We have the choice of tracking the remote branch, so that any update on it will be notified to us. This is recommended, and it is turned on by default on SourceTree. So now we are on a new local branch called “bar” that tracks a remote branch with the same name; of course, on your computer the remote branch will be named “origin/bar”, to distinguish it from the local one.
As said earlier we shouldn’t work on the same branch with colleagues, in order to avoid conflicts (what happens when we both change the same file?), so we will promptly switch back to our “foo” branch, or in the git jargon: we checkout “foo”.
So now we’re back on our “foo” branch, and we suppose that our colleague worked a little, committing his changes and pushing them to the origin. We can then update our local repository by hitting “Fetch”, and since we tracked the “bar” branch, and it will tell us that our local “bar” branch is two commits behind.
We can therefore proceed in updating our copy of “bar”.
Since we tracked and fetched “bar”, we can now switch to it and merge all the changes that our friend made to his branch. After checking out the branch, we can pull the changes into being: we just fetched all the changes, without applying them. In simpler terms, we didn’t merge the changes: we just cached them.
Why this? It is safer to do so. Fetch updates, pull on the other hand not only would fetch all the latest changes (if necessary), but then it would merge them in the current branch.
Now we have a complete copy of all the changes made by our colleague on his branch. We could check all of his work, but again, it is advisable that we don’t use his branch. So now the problem is simple: how can we take a look at the project as a whole? If this were a source code project, can we see if our friend’s changes break anything in our source, i.e., in our branch?
We now updated all branches, merged all changes into our local repository. We are ready to jump into the riskiest of all operations: merging branches. This operation will merge all changes in a smart way, and usually will not produce errors.
Just hit the “Merge” button and select a branch. Remember that we are working on our branch “foo”, and we will merge our colleague’s “bar” branch into ours. This will not, obviously, involve any checkout: we will remain on “foo”, we will just import all changes.
Many choices are available. We can merge from a point in the log, selecting a particular branch, for instance bar, or origin/bar; alternatively, if we already fetched a branch, we could merge from it. We choose to merge from our local “bar” branch.
Everything should go without a fuss, and you could build your project and commit again, pushing changes to the origin.
A wishful thinking is that any merge will be painless. Sometimes people modify files concurrently, and sometimes conflicts between changes happen. When you merge conflicting branches you will be warned, and you will have to resolve them manually.
If you right click on the file, you can choose from the “Resolve Conflicts” submenu the item “Launch external tool”. This will open the MacOS X application named FileMerge. Here you can easily see all changes in the two files, clearly named local and remote.
The “Actions” menu will let you choose all available options for each conflict, clearly marked in red. In our case, only the second change is conflicting, while the first is just a line addition; a black arrow indicates a successful merge, a red one a conflict.
Selecting the conflicting change, we can select as an action “Both (left first)”, or in other words: both changes are good, but first keep the left version, and next apply the changes in the right. The bottom of the FileMerge window will show you a preview of the changes.
Once you save and quit FileMerge, you can go back to SourceTree and see that, in your working copy, there is already a staged change: our manual change on the conflicting file. There is, however, an intruder: a spurious a.txt.orig file. This file contains the merged conflicted file as it was before resolving all conflicts. You can safely delete it.
If you commit, you will see that a message is already present:
Merge branch ‘bar’ into foo
We can now commit, and push our changes to the origin.
This isn’t a real tutorial on git. I have simplified a lot, and for example, some operations may not be needed. The purpose of this is to provide you with a simple and visual representation of what happens when you use git, with the simplest and most used operations.
For instance, we may not choose to fetch and pull changes of our friend’s branch: we may simply pull the remote branch into our working copy. This is a shortcut to what we’ve done before: as stated above, a pull is simply a fetch and a merge. It is quite useful to know before pulling right away what a pull operation does, and I hope now you’ve got a grasp of it. I haven’t explained what the HEAD is, and with this countless other details that many won’t even bother to understand: it just works. I also left the “distributed” part: git may have many remote repositories, but this is material for other documents that you may already read online.
There are several awesome tutorials online, far more comprehensive than this one, so please go visit git’s homepage and read the documentation.