Tal Galili
2010-Oct-26 16:16 UTC
[R] Which version control system to learn for managing R projects?
Hello all, I wish to learn a version control system for managing my R (data analysis) projects. I know of SVN and github, and wonder if there is any reason for which I should prefer the one over the other (or any other platform). An example for a reason could be if it will make it easier for me to later work with R-forge or CRAN or any other platform for R code distribution. Thanks, Tal ----------------Contact Details:------------------------------------------------------- Contact me: Tal.Galili@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) ---------------------------------------------------------------------------------------------- [[alternative HTML version deleted]]
Duncan Murdoch
2010-Oct-26 16:22 UTC
[R] Which version control system to learn for managing R projects?
On 26/10/2010 12:16 PM, Tal Galili wrote:> Hello all, > > I wish to learn a version control system for managing my R (data analysis) > projects. > > I know of SVN and github, and wonder if there is any reason for which I > should prefer the one over the other (or any other platform). An example for > a reason could be if it will make it easier for me to later work with > R-forge or CRAN or any other platform for R code distribution.R mainly uses svn so that might be an advantage. On the other hand, git is newer and has some nice features. Duncan Murdoch
Gabor Grothendieck
2010-Oct-26 22:55 UTC
[R] Which version control system to learn for managing R projects?
On Tue, Oct 26, 2010 at 12:16 PM, Tal Galili <tal.galili at gmail.com> wrote:> Hello all, > > I wish to learn a version control system for managing my R (data analysis) > projects. > > I know of SVN and github, and wonder if there is any reason for which I > should prefer the one over the other (or any other platform). An example for > a reason could be if it will make it easier for me to later work with > R-forge or CRAN or any other platform for R code distribution. > > > Thanks, > TalThere are several considerations: 1. What is everyone else using? The network effect is important since you want people to be able to access your repository and you want to leverage your knowledge of the version control system for other projects' repositories. To that extent Subversion is the clear choice since its used on R-Forge, by R itself and on Google code (Google code also supports Mercurial). 2. Features. Git, Mercurial, Bazaar and a few others are distributed version control systems which represent the next generation after centralized ones like Subversion. Git claims to be the most popular and is the fastest of the three distributed systems being written in C (the others in Python), Git is an ugly combination of C, shell scripts and other languages. The underlying basic design is attractive despite a somewhat messy implementation. At the time I evaluated them Mercurial was the only one with decent cross platform support and is the one used by Google but the situation relative to cross platform support could have changed since then, and Bazaar has the advantage of being pre-installed on many systems. 3. Repositories. The repository you use (R-Forge, Google code, Github) is a key associated decision. R-Forge uses Subversion and has the advantage that it does automatic builds but it has an annoying delay of about 20 minutes every time you change your home page, etc. before it appears. Google code supports Subversion and Mercurial. Google code is easier to use and, in particular, it uses http rather than R-Forge's ssh. http is more convenient, particularly for Windows users. Google code also has no delay and it also has integrated issue tracking, a wiki and a download area for your project. It is possible to host your project on Google code but still have R-Forge front-end it and at least one R developer has posted that he does precisely that. Google code is more restrictive regarding the licenses that it accepts although it does accept most of the popular free ones such as GPL, Apache, etc. On the plus side, Google code does have specific support for separate code and documentation licenses which may be an advantage. R-Forge does not restrict the license and Github allows non-free projects but you have to pay for using it in that case. 4. Editor/IDE. You don't really need integration of your editor and version control system yet is may be convenient if its available. Subversion is integrated with Microsoft Common Source Code Control Interface (MSSCCI) so any editor or IDE that supports that standard integrates with Subversion. There are subversion plugins for vim, Emacs, Eclipse and likely many other editors and IDEs. There may be plugins for some of the other version control systems too. 5. Windows Explorer. There are Windows explorer extensions for Subversion, Mercurial, Git and Bazaar. The Subversion and Mercurial "Tortoise" extensions are reasonably mature. I believe the Git and Bazaar Tortoise extensions are newer. 6. Client vs. Server and Installation. With subversion you typically use a subversion repository on a remote system to host your software and you only install client software. You can use the command line or a number of other alternative command line or GUI clients. With the distributed ones every installation is both a client and a repository (and potentially a server). Its harder to install a subversion repository but that does not really matter since normally you only install the client. The distributed systems are generally easy to install although if you need to install them from source on a non-supported or old system Mercurial was the easiest to get going based on my experience in having to do exactly that. Bazaar is often pre-installed on some distributions in which case there is zero installation. Although I have focused on the most popular there are other version control systems too. darcs is a powerful distributed system but may not be able to handle as large projects as the ones discussed here. fossil was developed by the SQLite developer and features particularly streamlined installation. Both these have their enthusiastic adherents. I personally mostly use Subversion, the TortoiseSVN client and when I want a distributed system I use Mercurial, usually from the command line or to a lesser extent via TortoiseHG. I have contributed an extension to the Mercurial project (not related to R) which might bias me slightly toward it so caveat emptor. -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com