While Centralized Version Control Systems (VCSs) are looking at the sunset, Distributed Version Control Systems (DVCSs) are still enjoying the morning sun. Subversion and GIT are two of the most widely used, libre, and powerful implementations of these concepts. But of course, powerful is defined in relative terms, and by definition DVCSs are more powerful than VCSs.

It's not that Subversion or its relatives are bad products, it's just that their proposed methodology is deprecated. So, it's time to import from __future__. This is why in these series of articles, I'll try to show you how to migrate completely from Subversion to GIT for you to enjoy all this power.

To fully accomplish the migration, you must succeed at the following steps (provided you already got here willing to migrate and learn).

  1. Understand the conceptual differences between VCSs and DVCSs, and between Subversion and GIT.
  2. Move your code from a Subversion repository to a pure GIT repository.
  3. Learn how to achieve the same results you were achieving with Subversion using GIT, and in most cases: improve them.
  4. Never stop learning.

Let's start right away with the first step, I'll try to be as objective as I can. Something to keep in mind is that since VCSs and DVCSs are so different, not trying to match the new concepts with the old ones will definitely make understanding them easier.

Repositories

The greatest difference between VCSs and DVCSs is where are the repositories kept: VCSs have a single main repository keeping the whole history. Every time someone checkouts from this repository, only the latest changes are provided, and the main repository will still be the only repository in existence. On the other hand, DVCSs know nothing about main repositories, but instead, it treats every single repository the same way. That's right, DVCSs allow the existence of multiple repositories per project: every time someone wants to get the code from a DVCSs repository, not only the latest changes are provided, but also, the entire history of commits, branches, tags, etc (depending on the implementation, but this is true for GIT at least). That's why you clone a DVCS repository instead of checking out from it. Crazy as it could sound to you, this is main reason why DVSCs are so powerful.

By having the whole history in a repository, tasks such as reviewing the repository log, checking the changes introduced by a previous commit, and even those tasks involving writing new data such as committing or branch merging don't need remote interaction, you could just disable your network connection and it would work. You'll only need some kind of connectivity with the rest of the world from time to time, once you decide you want to share your improvements with someone else or vice versa. Also, since everything happens locally, actions on DVCSs repositories tend to be faster than VCSs (Except in GIT, where they are extremely faster). Oh, and just in case you are wondering: No, keeping the entire history of a repository doesn't make it larger than two or three VCS checkouts, that's due to GIT's object model and its packing heuristics.

A repository can be cloned multiple times, and each of this clones can be cloned again and again. However, no matter if it is the first repository ever created or the result of cloning and cloning and cloning again, all the repositories have the same privileges: repository A can share changes with repository B, and repository C can get those changes directly from B together with some other changes from D and send them back to A without asking anyone else, effectively creating a decentralized network where you don't need to rely on anyone else and everyone can play together nicely. You know, that low coupling, high cohesion thing.

Sharing repositories is particularly easy with GIT, you can both send and receive data through network using SSH, HTTP, HTTPS, or GIT's own transfer protocol; or locally, using system paths or just copying around the directory where the repository is located. Also, you can use authentication methods provided by this protocols if you need it. This pretty much covers all scenarios.

Commits

In DVCSs (just like in VCSs), committing means introducing a group of changes to the repository contents. Later we will be able to refer to this very state of the repository (or any previous state) with an unique identifier (called revision id in Subversion, or commit id in GIT). Each commit (or revision, or version) represents the state of the entire repository at some time in history. However, there are subtle differences on how Subversion and GIT store these commits.

When you commit to a Subversion repository the change set between the current working tree and the parent commit (The commit you are applying changes on top of) and some metadata are saved somewhere and given an unique identifier. This is quite an interesting approach that fits quite well the linear development model Subversion and other VCSs have. However, it's the linear development model itself which, as of today, doesn't quite fit real world needs.

A non-linear development model is usually used with GIT, and it's well served by snapshot based commits holding all the information about how did the entire repository looked at certain point on history (a tree in GIT lingo), along with some extra data like its author name, etc. No deltas against previous commits are ever stored. This makes tasks involving more than one parent commit like merging branches, or reapplying the same changes again on top of a different commit more efficient --and actually, effective.

Committing in a Centralized VCS involves sending our changes to a remote repository. However, in GIT and other DVCSs, we only deal directly with local repositories, that's it, our committed changes are kept local until you decide they are worth sharing. This has some additional benefits like: you are free to create and work on multiple branches locally without polluting some "central repository", and: haven't you ever wished about getting your history done right since the very first time you commit? Well, GIT allows you to change history to achieve that.

Tags, Branches & Heads

While in Subversion (and probably others) branches and tags are just an idea, in GIT (and probably others, too), they are first class citizens. For example, if you want to create a branch in Subversion, for example, you would usually copy your project's trunk folder (or the folder you want to branch from) in the repository to a folder named (by convention) branches, and that's all about it. Subversion does not understand about what you are trying to do: it just tracks directory contents. The same is true for tags (even if tagged content isn't supposed to evolve), you just name your folder tags instead. That's all Subversion knows about branches and tags: nothing. No wonder why branching and merging between branches is seen as serious shit in Subversion.

Please notice that branches have nothing to do with tags, the only reason why I'm putting them together here is because they are treated the same by Subversion, even when they shouldn't be. Chances are you have probably seen some misuse of tags or branches, so the purpose of each one and what roles they play in your Everyday GIT must be explained.

Tags are probably easier to understand than branches, so let's start by them. In GIT, a tag is used to add some metadata like a "friendly name" to a commit. They are usually used to mark some commit as a new release of your software, so you can later refer to that specific revision using it's tag name (i.e, v1.2) instead of using it's SHA1 id in most GIT commands. They also provide some nice features like addding your PGP signature to it to make it unique among all other repos.

Ideas are constantly evolving, and together with them: code. Branches allow a developer to go along with the flow of different ideas at the same time, and merge them whenever they materialize. No, seriously. You use branches because you need to be able to accomplish your current task without worrying too much about what's the other developer next to you doing, and of course, since you are a good citizen, you don't want to distract others neither.

You usually start a branch when you want to do some stuff like refactoring some pieces of your code, or adding some big feature, or just reorganizing your files: some serious shit if you know what I mean. At least, that's how I've seen branches used in Subversion, and the reason for this is that both branching and merging are expensive, time consuming and error prone operations. However, in GIT, these operations are really cheap, effective and efficient, and due to that, you'll usually find yourself branching your code for anything you know it would take more than one commit to accomplish: it helps you to keep your mind organized while making your commit history as beautiful as it should be.

In GIT, a commit can be referenced by something called head, which is just a pointer to a commit with a fancy name. Every branch has a related head attached which references the last commit on the history of that branch, and, since every commit has references to its parent commits, the whole history of a branch can be reached from the branch head. Most of the time when working with GIT, you will find yourself treating branches just as commits, because essentially, that's what they are under the hood. However, if you picture yourself a branch as the human-friendly concept of: all the commits that are only reachable from some branch head, that will be OK too: GIT will understand what you mean.

And the last topic in this section is merging, which differs from the linear development model Subversion has where diffs between the branches to be merged are involved, since in GIT, merging branches simply means creating a new commit with two or more parents which all happen to be branch heads. Merging is what makes GIT shine. But of course, you already knew that.

http://farm4.static.flickr.com/3646/3456495261_11267ed337_o.gif

History of a GIT repository deliberately showing a lot of branching and merging. Every blue dot represents a commit: the leftmost being the oldest commit and the rightmost being the newest. Consecutive lines of a same color represent a branch. You won't ever find a history as complex as this in real life, though.

Where to go from here

Well, now that you know the basics. You can either wait till the second part of these series (shame on you), or you can start right now experimenting with GIT so that you'll be ready to play with your converted Subversion repository once it cames out from the oven. Most of the links you will find in this very article are for you to follow them and learn something new.

The last thing to mention (and probably the most important one), is GIT community. It's amazing how much work, documentation, and love you can find around GIT. So go STFW, use the mailing list or join the IRC because you have no excuses anymore.

Posted by k0001 under Programming. Created on Apr 19, 2009 21:51 Last modified on Jun 7, 2010 20:14

Language: en Comments: 3 Pingbacks: 0 Tags: free-software, git, scm, subversion