I have been trying to determine the size of the R user base, and was asked to share my findings with this mailing list. Although I still don't have any definite estimate of this number, I do have some interesting and indicative information: 1. It appears that there are about 100,000 S-PLUS users. Rationale: According to Insightful's 2002 Annual Report, over 100,000 people use Insightful software; since license revenues from S-PLUS and add-on modules accounted for nearly all of their license revenues in 2002, and their other products are much more costly than S-PLUS, it seems that the great majority of users of Insightful software are S-PLUS users. Conclusion: S-PLUS costs $3500 (Windows) or $4500 (Linux/Unix) for an individual copy; R is free. This suggests that there may be more R users than S-PLUS users, which suggests > 100,000 R users. Does anyone has any other information that would give some notion as to the RELATIVE numbers of R and S-PLUS users? 2. At least one R book has achieved sales of just over 5,000 copies. (I could not find sales figures for other R books, as it appears that publishers are closed-mouthed about such figures. And no, I can't reveal which particular book this was, so don't ask.) Conclusion: Very few books sell to more than 12% of the population of potential buyers, and most books have a far lower penetration -- 1% or less is not uncommon. A 12% penetration for the book in question implies 42,000 R users; a more reasonable 5% penetration implies 100,000 users. A low 1% penetration implies 500,000 users. 3. There are a total of 3225 unique subscribers to the three R mailing lists.
A very intriguing commentary! Some comments to "modulate" these estimates. On 19-Apr-04 Kevin S. Van Horn wrote:> 1. It appears that there are about 100,000 S-PLUS users. > > Rationale: According to Insightful's 2002 Annual Report, over 100,000 > people use Insightful software; since license revenues from S-PLUS and > add-on modules accounted for nearly all of their license revenues in > 2002, and their other products are much more costly than S-PLUS, it > seems that the great majority of users of Insightful software are > S-PLUS > users. > > Conclusion: S-PLUS costs $3500 (Windows) or $4500 (Linux/Unix) for an > individual copy; R is free. This suggests that there may be more R > users than S-PLUS users, which suggests > 100,000 R users. > > Does anyone has any other information that would give some notion as to > the RELATIVE numbers of R and S-PLUS users?There is one major factor in here. The number of Windows users in the world is much higher than the number of Unix/Linux users, especially in the corporate sector. Organisations whose work needs R/S-PLUS and whose IT is Windows based will (I believe) mostly go for S-PLUS (I could expand in my reasons for believing this). Therefore I suspect that in the 2-way table Windows Unix/Linux S-PLUS N11 N12 R N21 N22 you are likely to find that N11/N21 >> N12/N22. Certainly N11+N21 > N12+N22. This tends to imply N11+N12 > N12+N22. The relative cost of S-PLUS vs R is not likely to be a factor in the choice, for most corporate users. Therefore I would lower your estimate, here, of R usage quite a bit (though I can't guess by how much).> 2. At least one R book has achieved sales of just over 5,000 copies. > (I could not find sales figures for other R books, as it appears that > publishers are closed-mouthed about such figures. And no, I can't > reveal which particular book this was, so don't ask.) > > Conclusion: Very few books sell to more than 12% of the population of > potential buyers, and most books have a far lower penetration -- 1% or > less is not uncommon. A 12% penetration for the book in question > implies 42,000 R users; a more reasonable 5% penetration implies > 100,000 > users. A low 1% penetration implies 500,000 users.Comment: More R users are likely to buy a book on R than S-PLUS users are likely to buy a book on S-PLUS. S-PLUS users who do buy a book may in fact buy a book on R rather than S-PLUS, if that book is well known to be good. (I'm assuming that the "R book" you refer to is R-specific rather than written for both R and S-PLUS or for "S-PLUS with R variations"; otherwise you have to take off the S-PLUS-only purchasers)> 3. There are a total of 3225 unique subscribers to the three R mailing > lists.I think this may be the most directly informative piece of data (though still on the soft side). People who use R are likely to become aware of the mailing lists, and to subscribe. So I suspect that this number exceeds say 20-40% of R users (you can't be precise with this sort of intuitive guess). This would suggest 7000-16000 R users. You might perhaps double or triple this to allow for groups where one member of the group subscribes as the "spokesman" for the rest. Maybe also inflate a bit to allow for R users who don't think they need to consult mailing lists (who are they??). Hmmm! Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk> Fax-to-email: +44 (0)870 167 1972 Date: 19-Apr-04 Time: 21:22:51 ------------------------------ XFMail ------------------------------
> From: Ted.Harding at nessie.mcc.ac.uk > > A very intriguing commentary! > Some comments to "modulate" these estimates. > > On 19-Apr-04 Kevin S. Van Horn wrote: > > 1. It appears that there are about 100,000 S-PLUS users. > > > > Rationale: According to Insightful's 2002 Annual Report, > over 100,000 > > people use Insightful software; since license revenues from > S-PLUS and > > add-on modules accounted for nearly all of their license > revenues in > > 2002, and their other products are much more costly than S-PLUS, it > > seems that the great majority of users of Insightful software are > > S-PLUS > > users. > > > > Conclusion: S-PLUS costs $3500 (Windows) or $4500 > (Linux/Unix) for an > > individual copy; R is free. This suggests that there may be more R > > users than S-PLUS users, which suggests > 100,000 R users. > > > > Does anyone has any other information that would give some > notion as to > > the RELATIVE numbers of R and S-PLUS users? > > There is one major factor in here. The number of Windows users > in the world is much higher than the number of Unix/Linux users, > especially in the corporate sector. Organisations whose work > needs R/S-PLUS and whose IT is Windows based will (I believe) > mostly go for S-PLUS (I could expand in my reasons for believing > this). Therefore I suspect that in the 2-way table > > Windows Unix/Linux > S-PLUS N11 N12 > > R N21 N22 > > you are likely to find that N11/N21 >> N12/N22. > Certainly N11+N21 > N12+N22. This tends to imply N11+N12 > N12+N22. > The relative cost of S-PLUS vs R is not likely to be a factor in > the choice, for most corporate users. Therefore I would lower your > estimate, here, of R usage quite a bit (though I can't guess by > how much). > > > 2. At least one R book has achieved sales of just over > 5,000 copies. > > (I could not find sales figures for other R books, as it > appears that > > publishers are closed-mouthed about such figures. And no, I can't > > reveal which particular book this was, so don't ask.) > > > > Conclusion: Very few books sell to more than 12% of the > population of > > potential buyers, and most books have a far lower > penetration -- 1% or > > less is not uncommon. A 12% penetration for the book in question > > implies 42,000 R users; a more reasonable 5% penetration implies > > 100,000 > > users. A low 1% penetration implies 500,000 users. > > Comment: More R users are likely to buy a book on R than S-PLUS > users are likely to buy a book on S-PLUS. S-PLUS users who do > buy a book may in fact buy a book on R rather than S-PLUS, if > that book is well known to be good. (I'm assuming that the > "R book" you refer to is R-specific rather than written for > both R and S-PLUS or for "S-PLUS with R variations"; otherwise > you have to take off the S-PLUS-only purchasers) > > > 3. There are a total of 3225 unique subscribers to the > three R mailing > > lists. > > I think this may be the most directly informative piece of data > (though still on the soft side). People who use R are likely to > become aware of the mailing lists, and to subscribe. So I suspect > that this number exceeds say 20-40% of R users (you can't be precise > with this sort of intuitive guess). This would suggest 7000-16000 R > users. > You might perhaps double or triple this to allow for groups where > one member of the group subscribes as the "spokesman" for the rest. > Maybe also inflate a bit to allow for R users who don't think > they need to consult mailing lists (who are they??).How about those poor students who don't know how lucky they are to have instructors forcing R upon them for a course? I'd bet they are very unlikely to subscribe to the list(s). Although I don't know if one would want to include them as `R users'... Best, Andy> Hmmm! > Ted. > > > -------------------------------------------------------------------- > E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk> > Fax-to-email: +44 (0)870 167 1972 > Date: 19-Apr-04 Time: 21:22:51 > ------------------------------ XFMail ------------------------------ > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > >
On Mon, 19 Apr 2004, Kevin S. Van Horn wrote:> 2. At least one R book has achieved sales of just over 5,000 copies. (I > could not find sales figures for other R books, as it appears that > publishers are closed-mouthed about such figures. And no, I can't > reveal which particular book this was, so don't ask.)Some of us know quite accurately, though.> Conclusion: Very few books sell to more than 12% of the population of > potential buyers, and most books have a far lower penetration -- 1% orWhere did you get that 12% from?> less is not uncommon. A 12% penetration for the book in question > implies 42,000 R users; a more reasonable 5% penetration implies 100,000 > users. A low 1% penetration implies 500,000 users.One S book has sold half your number of S-PLUS users, although some sales are known to be to R users. I have big problems with the definition. What is an `R user'? Someone who has ever used R, even for a one-hour practical class? Someone who has used R in the last 3 months? Even given a definition, I would not be able to give you an accurate answer for our site, for either S-PLUS or R. (There are machines with each installed that I strongly suspect are unused.) -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
On Mon, 19 Apr 2004 12:12:47 -0600 "Kevin S. Van Horn" <kvanhorn at ksvanhorn.com> wrote:> I have been trying to determine the size of the R user baseHey, that one is easy: We are legion .... :-) detlef -- Detlef Steuer --- http://fawn.unibw-hamburg.de/steuer.html ***** Encrypted mail preferred *****
I wonder if there is a count of the number of downloads of each version of the R installations (for example rw1090.exe)? Bendix Carstensen ---------------------- Bendix Carstensen Senior Statistician Steno Diabetes Center Niels Steensens Vej 2 DK-2820 Gentofte Denmark tel: +45 44 43 87 38 mob: +45 30 75 87 38 fax: +45 44 43 07 06 bxc at steno.dk www.biostat.ku.dk/~bxc
This question of the R-users is of course very interesting, but not from the point of view of the absolute number of users (which has only a limited interest). Here is what I see interesting: 1) What is the fraction of stat software users that use R (in particular R versus S-PLUS, as it was the initial question)? 2) How this fraction fluctuates in time? 3) How this fraction changes according to the platform (Windows, Linux/Unix, MacOS)? Ad even interesting, but even more difficult to assess: 4) Does R have an impact on the number of stat software users (i.e., do more people use "serious" stat systems than Excel, for instance)? An example: I teach biostats in a Belgium University. Before me, student had to use Excel... a big mistake, of course. Now, they learn R... and some of them become true R users (whatever the definition you give to it). 5) Does R have an impact on the quality of statistical analyses done (better use of methods, and use of less common methods but appropriate for a study)? All these questions need an estimate of the number of R users, of course. Plus (4) and (5) are subjective, and difficult to evaluate at a large scale. However, it is perhaps possible to do at the scale of a company, or of an university. If someone has some experience in such kind of evaluation or can point me to the right (not specialized, please!) documentation, I am interested. A last comment/question: would it be possible to add some code in R that does the following: 1) it is triggered only if the software was used at least, let's say 10, or 20 times on the computer where it was installed, 2) then it checks if an update of R is available (just by looking if a given link in a centralized web site -CRAN?- exists), 3) when it finds that link, it just warns the user of an update in a not annoying way, for instance like that: R : Copyright 2004, The R Foundation for Statistical Computing Version 1.9.0 (2004-04-12), ISBN 3-900051-00-3 R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for a HTML browser interface to help. Type 'q()' to quit R. An update is available at http://cran.r-project.org.>and, 4) it deactivate itself once the link is found. The main role of this code would be, of course, to warn users that an update is available. One side effect would be that it should be possible to monitor access to the link in the centralized web site and to know: 1) How many people installed and used R at least 10 (or 20) times on their computer? 2) On which platform? 3) Perhaps some more infos, like location of the machines? Of course, this will only work with computers connected to the internet,... but at least, it could be one way to evaluate the number of R users. Would that be an infringment of Open Source, or any other rule of freedom? I don't know, but it does seem to be quite widespread (at least for commercial software). so, why an Open Source software would not be able to monitor the number of users? Best, Philippe Grosjean .......................................................<??}))><.... ) ) ) ) ) ( ( ( ( ( Prof. Philippe Grosjean \ ___ ) \/ECO\ ( Numerical Ecology of Aquatic Systems /\___/ ) Mons-Hainaut University, Pentagone / ___ /( 8, Av. du Champ de Mars, 7000 Mons, Belgium /NUM\/ ) \___/\ ( phone: + 32.65.37.34.97, fax: + 32.65.37.33.12 \ ) email: Philippe.Grosjean at umh.ac.be ) ) ) ) ) SciViews project coordinator (http://www.sciviews.org) ( ( ( ( ( ...................................................................
Hello,> -----Original Message----- > From: Philippe Grosjean [SMTP:phgrosjean at sciviews.org] > Sent: Tuesday, April 20, 2004 10:47 AM > To: r-help at stat.math.ethz.ch > Subject: RE: [R] Size of R user base > > Of course, this will only work with computers connected to the > internet,... > but at least, it could be one way to evaluate the number of R users. Would > that be an infringment of Open Source, or any other rule of freedom? I > don't > know, but it does seem to be quite widespread (at least for commercial > software). so, why an Open Source software would not be able to monitor > the > number of users? >I don't know if it would violate any part of the GPL under which R is licensed. But I think it is against the spirit of free software (free as in "free speech"not "free beer"...) to try to control the users (in German: legitim vs. legal). But more importantly: the GPL reserves the right to change the source code. So what happens if someone removes this part of the code before compiling? Then there would not be any chance of tracing those R users, right? And, as far as I understood, it is also allowed to distribute those modified source-code versions. I assume that there would be widespread interest in such a derivative work of R where this feedback-code would have been left out. So again, there is a problem of getting to know the size of the R User Base. Just some thoughts, Roland +++++ This mail has been sent through the MPI for Demographic Rese...{{dropped}}
Tom Mulholland
2004-Apr-20 11:50 UTC
[R] Size of R user base. It's not what you've got it's what you do with it
While reading all of these comments I started thinking about how you might collect the data. As pointed out, in many ways, there are immediate problems with the validity of what's collected. That is what is the concept that we are trying to measure. So, that was always going to make it hard, but assuming that you got past that point and you could ask a question who would you ask it of. Here in Western Australia there's a small and mostly unconnected fraternity of people who use R. In a place where even strangers seem somehow kind of familiar (you've been sitting on the same bus for the last 20 years and still don't know who they are), so my gut feeling is that it's not a major penetration. Trying to do a random sample would be problematic to say the least. If you started to try and stratify the sample where would you start. If you go to places where you know R is being used you have problems with bias. That might rule out universities as a stratified sample because we're not really clear about what we are asking. So what other sources are there. Well there's been some comment about the mailing list and all the problems that might be involved in that and counting downloads. Guess that's out. Then I thought about the vibrancy of the R email list. For a topic such as a statistical programming language it's unusual to see such activity (at least in my experience.) So my mind turned to trying to measure some smaller but representative process. In much the same way that criminologists use the homicide rate as the pointed end of the violence spectrum (let's not go there, find another list to have that debate) So I started thinking about what questions you might ask the typical SPSS or SAS or ... user that would help. That's when it struck me. Lot's of us, R Users that is, also have the luck (some might say misfortune) to also work with other languages. Why do we pick R? The one thing that I have noticed that separates R Users and S Plus users (and probably some of the other products that I don't use and know about) with the mainstream use of SPSS and SAS (at least here in WA) is that the mainstream users are not pushing the envelope. So the question is not "How many users are there ?", it's "What are people doing with it ?" I put the formal citation in each publication I produce and while they are mainly in-house productions (a problem with applied research) maybe they'll eventually start to get into the citation charts. I hear some academics love browsing these. (That is aimed at no-one who's on this list) Tom Mulholland Tom Mulholland Associates Footnote: When 5 out of 6 paragraphs start with the word "So" it's time to get a life. So I'm off to get a life. -----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch]On Behalf Of Kevin S. Van Horn Sent: Tuesday, April 20, 2004 2:13 AM To: r-help at stat.math.ethz.ch Subject: [R] Size of R user base I have been trying to determine the size of the R user base, and was asked to share my findings with this mailing list. Although I still don't have any definite estimate of this number, I do have some interesting and indicative information: 1. It appears that there are about 100,000 S-PLUS users. Rationale: According to Insightful's 2002 Annual Report, over 100,000 people use Insightful software; since license revenues from S-PLUS and add-on modules accounted for nearly all of their license revenues in 2002, and their other products are much more costly than S-PLUS, it seems that the great majority of users of Insightful software are S-PLUS users. Conclusion: S-PLUS costs $3500 (Windows) or $4500 (Linux/Unix) for an individual copy; R is free. This suggests that there may be more R users than S-PLUS users, which suggests > 100,000 R users. Does anyone has any other information that would give some notion as to the RELATIVE numbers of R and S-PLUS users? 2. At least one R book has achieved sales of just over 5,000 copies. (I could not find sales figures for other R books, as it appears that publishers are closed-mouthed about such figures. And no, I can't reveal which particular book this was, so don't ask.) Conclusion: Very few books sell to more than 12% of the population of potential buyers, and most books have a far lower penetration -- 1% or less is not uncommon. A 12% penetration for the book in question implies 42,000 R users; a more reasonable 5% penetration implies 100,000 users. A low 1% penetration implies 500,000 users. 3. There are a total of 3225 unique subscribers to the three R mailing lists. ______________________________________________ R-help at stat.math.ethz.ch mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html --- Incoming mail is certified Virus Free. ---
Hello, Am Montag, 19. April 2004 19:12 schrieb Kevin S. Van Horn:> 1. It appears that there are about 100,000 S-PLUS users.Let me add one thought that I haven't seen in the discussion yet: To estimate the user base, one should perhaps not only look at S-Plus for comparison. R can be used for many other things than statistics. The users I know (physicists and geoscientists) use it for quite different purpose (e.g. numerically solving ODEs) for which they might have used Matlab instead, or they use it for plotting (as a replacement of IDL), whereas they may only need the most basic statistical features. I could imagine that uses outside "generic" statistics will become more common as more packages are added and R becomes more widely known. Perhaps these users are also less likely to buy books on R (which are probably more geared towards statistics), and perhaps also less likely to subscribe to the mailing list. However, I have no idea how to estimate the number of these users.... It seems to me that the number of downloads is most direct. What about the number of package updates? A package update (from within R) requires a working installation, and also a user who actually starts the update. And I suspect that even users who get it on CD would do an internet package update every now and again. Might be better than counting the number of downloads of the program itself. Cheers Stephan
"Philippe Grosjean" <phgrosjean at sciviews.org> wrote: A last comment/question: would it be possible to add some code in R that does the following: ["calls home" to say that it is being used/asks for updates/&c] There are all sorts of things the R developers might like to know about how it is used. There are also all sorts of reasons why they shouldn't do anything like that. Any habitual reader of comp.risks can think of more reasons than I care to spend typing up. I'll mention just one: a number of Microsoft users got hit with unexpectedly large phone bills a while back. Their software was "calling home" *without* asking the user's permission or even telling the user, and Microsoft's normal lines were out of service, so normal full cost calls were made. As far as informing the user that there is an update, An update is available at http://cran.r-project.org. the only *really* useful information here is the URL, and that can be displayed without calling home. If one's R installation is more than a couple of months old there is almost certainly an update. It would suffice to say You can check for updates by visiting http://cran.r-project.org or by using the check.CRAN.for.updates function. Another reason for not calling home, of course, is that R already takes quite long enough to start up, thank you very much. (And that doesn't count opening a graphics window, just time to first prompt.) Of course, this will only work with computers connected to the internet,... but at least, it could be one way to evaluate the number of R users. Would that be an infringment of Open Source, or any other rule of freedom? I don't know, but it does seem to be quite widespread (at least for commercial software). Yes, and it's an unwarrnated invasion of privacy there. The fact that some be****ed program is sending who knows what information about me to who knows where without my say-so is one big reason why I avoid commercial software (read: Windows software; none of the commercial software I use on my Solaris box does this). so, why an Open Source software would not be able to monitor the number of users? Because even if R *did* do the unwise and unforgivable, we STILL could not know the number of users! You would, to start with, only know about copies of R on machines that were connected to the internet and allowed this kind of traffic through their firewall. Now I have R on two old Macs at home, and you'd never hear about those. Worse, here at work I have accounts on a G3 Mac, a G4 Mac, three different UltraSPARCs, three Alphas, and a couple of Linux boxes. That's about 10 different accounts. (How do I keep track of 10 different passwords? Easy: every so often I ask our sysadmin to give me new passwords on the machines I use less often because I've forgotten them.) How is your monitoring site to know that these 10 users are really the same person? And when I fire up R on a student's Linux box to demonstrate a point (to a student who _isn't_ an R user), how is the monitoring site to know that it's really me, not the student, so that the number of "users" should not be incremented? In fact, the more I think about it, the more it seems to me that "the number of users" is not a well defined concept. For a commercial system, you can count the number of licences sold, and that means something pretty clear, because each licence is money in your pocket. For a system like R, the amount of traffic on the mailing list is reasonably well defined and of interest because it's stuff that the maintainers have to at least glance at, so it directly affects their lives. If you are thinking about popularity contests, bear in mind that a Microsoft staffer wrote an article "Evangelism is WAR" in which he explicitly stated that other software producers are the "enemy" and users are "pawns"; do you really want to get into that kind of contest? If you're concerned about mind- share rather than market-share, I have talked a data-mining student into at least looking at R. She has tried it. She's doing a literature survey first. Is she an "R user" yet? If she uses it for a month, and drops it for a year, is she still an "R user"? I use R in bursts myself; intensely for a couple of days, then stop and think about things and do other work for a week or so, then come back. When, precisely, am I an "R user", and when would I stop being one? The first rule of measurement is "Don't bother with a measurement if you don't know what you're going to do with the answer". If you knew the number of "R users", however defined, how would that actually help you? Why do bad things to make a measurement that's ill-defined, arguably impossible to measure meaningfully, and not that much use when you have it?
One statistic that might be worth collecting is the number of unique IP addresses that request update.packages( ) or install.packages( ) on CRAN or one of its mirrors. Frank Harrell --- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University