Against Centralized Source Control
Linus Torvalds gave an hour-long talk about the Git source control system.
The talk was hosted by Google.
The most important points seem to be:
- git users are all peers with a complete copy of the tree. There is no repository and no commit access.
- git focuses on hashing and merging sets of files really, really fast
- git can't scale above 1 million files, but Linus believes that projects with that many files should be split into multiple projects for other reasons.
Speaking as someone who has a lot of experience using one particular centralized system, Perforce, I can say that when our central sever went down, we couldn't get any work done. Git users, on the other hand, can continue to work on patch sets even when the network is down or the main server is down, and even merge two patch sets when the network is unavailable.
git provides some security against malicious attempts to corrupt the central server. Because git checksums all files using the cryptographically sound hash SHA-1, people who are syncing up (or "pulling") a git branch should notice if that branch has been tampered with by people going outside the source control system.
One thing that is overlooked in this talk is that most corporations are also interested in securing their closed-source code against attempts to steal it. This is not an issue for Linux, and in fact, Linus wants as many people to mirror the kernel as possible, so that if any of the main server disks go down, he can copy it right back from them. "I don't do backups," Linus states flatly.
Linus also makes some jabs at Perforce that really hit home. He complains that all clients in a centralized source control system have to share the same namespace, so they often have to be named funny things in order to keep them from conflicting. Rather than naming them purely by what changes they contain, people must put some unique identifer in the client name in case anyone has a similar client. This is not a big deal at a small company. But try scaling that up to a really large organization with multiple sites, or an open source project, and it becomes difficult.
Interestingly enough, Linus states that his parting with the Bitkeeper people was amicable. He seems to respect their product, and states that using Bitkeeper really showed him what a source control system should be. I guess a lot of people in the open source community were upset that Linus was using a tool that was not itself open source to maintain his open source kernel. I haven't really looked into the debate much, but it's still interesting to get Linus' angle.
Linus states that git reduces political pressures because he doesn't have to decide who to give commit access to. I suppose what he really means is that it makes it easier for him to have degrees of trust. Rather than giving him a binary decision to make-- give this person commit access, or not?-- he can just pull patchsets from people with different levels of scrutiny depending on how much he trusts them. Ultimately, of course, Linus must decide what goes into head-of-line, aka "Linus' Tree," and that will always involve some amount of politics.
I will have to try out git the next time I get a chance. It just seems like one of those tools that will make an impact in the way people work, even if they don't end up using it directly. Obviously, most commercial projects are not the Linux kernel, where you have thousands of functionally anonymous contributors submitting bug fixes and patches. But a lot of big companies have multiple sites, and if those sites are going to develop software together effectively, they're going to need tools like git.