When I started working on my project, I needed to setup an scm (source control management). One of the goal of the project was to look at new technologies and trends and see how well they can solve some issues I have encountered throughout my career. I am familiar with 2 popular scms: cvs
and svn
.
At LinkedIn, we started with cvs
. Although being widely popular, it has several shortcomings, mainly non transactional commits and very heavy branching / tagging operations. Transactional commit is quite important as it insures that whenever you checkout code, you will always get a consistent view and not a partial commit that somebody else is currently checking in (or even worse if two people are checking in at the same time). "Consistent" is obviously from the point of view of the scm, as a developer may still check in an inconsistent set of files by forgetting to check in one of them for example. cvs
fails in that regard as there is no consistency guaranteed. Branching and tagging in cvs
is painful because it happens at the file level, so the bigger the project is, the longer it takes.
With a rapidly growing project and team, it was becoming unmanageable with cvs
. We then moved to svn
which was solving the two main issues I was talking about: with svn
, commits are transactional: every time you commit you actually have an ever increasing commit number which refers to the entire commit. Either the entire commit will go through or not and another developer will never see an inconsistent view. Branching and tagging are very lightweight operations in svn
as it does not copy the entire tree.
After using svn
for a while, although being an improvement over cvs
I am still pretty unhappy with it as there are several big issues like loosing the history of commits (mild problem) to loosing changes when files are moved around on a different branch (really bad). Also, although branching is much easier/faster than cvs
, the syntax to merge is quite complicated and error prone by having to constantly figure out the correct commit numbers to use. To be fair, some of those issues are being addressed in more recent versions (although it is still not there yet).
In my quest for something better, I started hunting around and clearly the new trend is Distributed SCM. Open source projects are the perfect example of highly distributed development environments and having a centralized scm (like cvs
or svn
) is showing its limitations. Hence the need for an scm that would handle this kind of projects. I believe that the most prominent distributed scms today are git
and mercurial
. Both projects were started around the same time as a free/open source response to BitKeeper which was not going to be free anymore. git
was created by Linus Torvald for handling the Linux kernel development. mercurial
was created by Matt Mackall.
On the paper, despite some minor differences the concepts are essentially the same. Not having any preconceived ideas with either of them, I decided to use git
for my own project mainly because of the IDE support: it has very good support in Intellij IDEA. I am by no mean an expert in git
but I can share my experience after using it for several months now.
svn
: creating branches, switching between them, merging,... is all a breeze. Nonetheless, there is something that takes time to get used to: the 'staging' area which is definitely not very intuitive at first especially coming from a different model. Let's take an example: you start modifying a file then you want to commit the changes. You must first move the file in the staging area by issuing 'git add <file>
'. If before committing, you modify the file again, you must add it to the staging area again. In the IDE, you don't have to worry about any of this which makes it very transparent. On the command line, you just need to be a little bit more careful, but it has become a habit to always issue a 'git status
' command which tells you the state of your changes and which ones need to be moved in the staging area.
svn
for example, I would have to be able to somehow connect to my desktop to be able to continue working... But I don't have any of this issue with git
at all: I simply cloned my repository on my laptop before leaving which you can do over ssh very easily as it is built-in: 'git clone user@machine:/path/to/repo
'. I can then work on my laptop during the entire time I am away, creating branches, committing as many time as I want. When I come back I will simply issue a 'git push
' command (yes that is it... I don't even have to tell it where to push to!!) to move back all the changes I have made onto my desktop with full history! Really neat! By the way since it is all local, it is also perfect for plane work!
git
to facilitate the creation of open source projects. By lowering the barriers to be able to contribute to open source projects, I think it has great potential to become quite a nice platform.
I have not tried mercurial
at all so I do not have anything on my own I can share. I have heard people complaining about the fact that mercurial
does not allow you to rewrite the history where git
does and depending on which camp you are, it may be good or bad. I don't mind that git
allows you to do that and if you don't like it then you just don't use the feature.
As I was mentioning previously, distributed scm is the new trend. After trying it for myself for a while I have become very excited about it and I really don't believe it is just a fad. My personal opinion is that it is the future. Even for non distributed projects (like mine at the moment), the benefits are obvious. I do not regret the decision I have made as it allowed me to see the potential of this emerging technology! Distributed scm is a brilliant idea and I invite you to try it out: once you make the switch you will not want to go back.