I would like to get some idea of which R-packages are popular, and what R is used for in general. Are there any statistics available on which R packages are downloaded often, or is there something like a package-survey? Something similar to http://popcon.debian.org/ maybe? Any tips are welcome! ----- Jeroen Ooms * Dept. of Methodology and Statistics * Utrecht University Visit http://www.jeroenooms.com www.jeroenooms.com to explore some of my current projects. -- View this message in context: http://www.nabble.com/popular-R-packages-tp22391260p22391260.html Sent from the R help mailing list archive at Nabble.com.
This function will show which other packages depend on a particular package:> dep <- function(pkg, AP = available.packages()) {+ pkg <- paste("\\b", pkg, "\\b", sep = "") + cat("Depends:", rownames(AP)[grep(pkg, AP[, "Depends"])], "\n") + cat("Suggests:", rownames(AP)[grep(pkg, AP[, "Suggests"])], "\n") + }> dep("zoo")Depends: AER BootPR FinTS PerformanceAnalytics RBloomberg StreamMetabolism TSfame TShistQuote VhayuR dyn dynlm fda fxregime lmtest meboot party quantmod sandwich sde strucchange tripEstimation tseries xts Suggests: TSMySQL TSPostgreSQL TSSQLite TSdbi TSodbc UsingR Zelig gsubfn playwith pscl tframePlus On Sat, Mar 7, 2009 at 2:57 PM, Jeroen Ooms <j.c.l.ooms at uu.nl> wrote:> > I would like to get some idea of which R-packages are popular, and what R is > used for in general. Are there any statistics available on which R packages > are downloaded often, or is there something like a package-survey? Something > similar to http://popcon.debian.org/ maybe? Any tips are welcome! > > ----- > Jeroen Ooms * Dept. of Methodology and Statistics * Utrecht University > > Visit ?http://www.jeroenooms.com www.jeroenooms.com ?to explore some of my > current projects. > > > > > > > -- > View this message in context: http://www.nabble.com/popular-R-packages-tp22391260p22391260.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
When the question arises "How many R-users there are?", the consensus seems to be that there is no valid method to address the question. The thread "R-business case" from 2004 can be found here: https://stat.ethz.ch/pipermail/r-help/2004-March/047606.html I did not see any material revision to that conclusion during the recent discussion of the New York Times article on the r-challenge to SAS. Gmane tracks the number of r-help activity (I realize not what you asked for): http://www.gmane.org/info.php?group=gmane.comp.lang.r.general The distribution of r-packages is, well ... distributed: http://cran.r-project.org/mirrors.html At least one of the participants in the 2004 thread suggested that it would be a "good thing" to track the numbers of downloads by package. I have not heard of any such system being installed in the mirror software and I see nothing that suggests data gathering in the CRAN Mirror How-to: http://cran.r-project.org/mirror-howto.html On the other hand I am not part of R-core, so you must await more authoritative opinion since a 5 year-old thread and amateur speculation is not much of a leg to stand on. There are lexicographic packages for R. One approach to a de novo analysis would be to do some sort of natural language analysis of the r-help archives counting up either package names with non-English names or close proximity of the words "library" or "package" to package names that overlap the 30,000 common English words. That would have the danger of inflating counts of the packages with the least adequate documentation or a paucity of good worked examples, but there are many readers of this list who suspect that new users don't look at the documentation, so who knows? -- David Winsemius On Mar 7, 2009, at 2:57 PM, Jeroen Ooms wrote:> > I would like to get some idea of which R-packages are popular, and > what R is > used for in general. Are there any statistics available on which R > packages > are downloaded often, or is there something like a package-survey? > Something > similar to http://popcon.debian.org/ maybe? Any tips are welcome! > > ----- > Jeroen Ooms * Dept. of Methodology and Statistics * Utrecht University > > Visit http://www.jeroenooms.com www.jeroenooms.com to explore some > of my > current projects. > > > > > > > -- > View this message in context: http://www.nabble.com/popular-R-packages-tp22391260p22391260.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD Heritage Laboratories West Hartford, CT
Hi Spencer, XLSolutions is currently analyzing r-help archived questions to rank packages for the upcoming R-PLUS 3.3 Professional version and we will be happy to share the outcome with interested parties. Please email dan at xlsolutions-corp.com Regards - Sue Turner Senior Account Manager XLSolutions Corporation North American Division 1700 7th Ave Suite 2100 Seattle, WA 98101 Phone: 206-686-1578 Email: sue at xlsolutions-corp.com web: www.xlsolutions-corp.com --- On Sat, 3/7/09, Spencer Graves <spencer.graves at prodsyse.com> wrote:> From: Spencer Graves <spencer.graves at prodsyse.com> > Subject: Re: [R] popular R packages > To: "Wacek Kusnierczyk" <Waclaw.Marcin.Kusnierczyk at idi.ntnu.no> > Cc: r-help at r-project.org, "Jeroen Ooms" <j.c.l.ooms at uu.nl>, "Thomas Adams" <Thomas.Adams at noaa.gov> > Date: Saturday, March 7, 2009, 5:22 PM > I just did RSiteSearch("library(xxx)") with xxx > the names of 6 packages familiar to me, with the following > numbers of hits: > > hits package > > 169 lme4 > 165 nlme > 6 fda > 4 maps > 2 FinTS > 2 DierckxSpline > > Software could be written to (1) extract the names of > current packages from CRAN then (2) perform queries similar > to this on all such packages and summarize the results. I > don't have the time now to write code for this, but > I've written similar code before for step (1); it can > be found in "scripts/TsayFiles.R" in the > "FinTS" package on CRAN. For step (2), Sundar > Dorai-Raj wrote code that is is included in the preliminary > "RSiteSearch" package available from R-Forge via > install.'packages("RSiteSearch",repos="http://r-forge.r-project.org")'. > > Code to do this could probably be written (a) in a > matter of seconds by many of those in the R Core team or (b) > in a matter of hours by virtually any reader of this list > using the examples I just cited. And it could provide > numbers without a need to convince others to keep download > statistics and make them available later. > Hope this helps. Spencer Graves > Wacek Kusnierczyk wrote: > > i have kept r installed on more than ten computers > during the past few > > years, some of them running win + more than one linux > distro, all of > > them having r, most often installed from a separate > download. > > > > i know of many cases where students download r for the > purpose of a > > course in statistics -- often an introductory course > for students who > > otherwise have little to do with stats. some of them > do it more than > > once during the semester, and many of them never use r > again. > > > > taking into account that basic statistics courses are > taught to most > > university students and that r is surely the most > popular free > > statistical computing environment, download-based > usage estimates may be > > a bit optimistic, unless 'usage' is taken to > include 'learn-pass-forget'. > > > > vQ > > > > > > > > Tal Galili wrote: > > > >> I agree with Thomas, over the years I have > installed R on at least 5 > >> computers. > >> > >> BTW: does any one knows how the website statistics > of r-project are > >> being analyzed? > >> Since I can't see any "google > analytics" or other tracking code in the main > >> website, I am guessing someone might be running > some log-file analyzer - but > >> I'd rather hear that then assume. > >> > >> > >> > >> > >> > >> > >> On Sun, Mar 8, 2009 at 12:45 AM, Thomas Adams > <Thomas.Adams at noaa.gov> wrote: > >> > >> > >>> I don't think "At least one of the > participants in the 2004 thread > >>> suggested that it would be a "good > thing" to track the numbers of downloads > >>> by package." is reasonable because I > download R packages for 2 home > >>> computers (laptop & desktop) and 2 at work > (1 Linux & 1 Mac). There must be > >>> many such cases? > >>> > >>> Tom > >>> > >>> > > > > ______________________________________________ > > R-help at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, > reproducible code. > > > > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, > reproducible code.
Hi all, I'm kind of amazed at the answers suggested for the relatively simple question, "How many times has each R package been downloaded?". Some have veered off in another direction, like working out how many packages a package depends upon, or whether someone downloads more than one copy. The response about ranking packages by the number of questions asked about them may be interesting, but may not relate very well at all to popularity in terms of downloads. If people were constantly asking questions about one of the packages I maintain, I would be working on the help pages to improve them, not basking in the inferred glory of having a popular package. There is one way that the download count would be very useful for package maintainers, if no one else. Take as an example the package concord, that has not been maintained for a year or more since the content was merged into the irr package. If I knew that no one downloaded concord any more, I would surely petition those in charge of the archive to remove it or at least transfer it to the package museum. No point in having ever more packages on CRAN if they are never downloaded. Jim
Rolf Turner wrote:> > On 9/03/2009, at 4:14 AM, Duncan Murdoch wrote: > >> ... analyzing bad data will just give bad conclusions. > > Fortune? >looking for fortunes? got one for you: "A key reason that R is a good thing is because it is a language" who/where is left as an (easy) exercise. vQ
Dear Rolf, Tukey put it nicely: "The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data." Inasmuch as there are no current fortunes from Tukey, I nominate this one. Regards, John> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]On> Behalf Of Rolf Turner > Sent: March-08-09 4:06 PM > To: R help > Cc: Duncan Murdoch > Subject: Re: [R] popular R packages > > > On 9/03/2009, at 4:14 AM, Duncan Murdoch wrote: > > > ... analyzing bad data will just give bad conclusions. > > Fortune? > > cheers, > > Rolf Turner > > ###################################################################### > Attention:\ This e-mail message is privileged and confid...{{dropped:9}} > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.
On 9/03/2009, at 10:23 AM, John Fox wrote:> Dear Rolf, > > Tukey put it nicely: "The combination of some data and an aching > desire for > an answer does not ensure that a reasonable answer can be extracted > from a > given body of data." Inasmuch as there are no current fortunes from > Tukey, I > nominate this one.Indeed. That is one of my favourites. I second the nomination. cheers, Rolf ###################################################################### Attention:\ This e-mail message is privileged and confid...{{dropped:9}}
On 08-Mar-09 20:06:21, Rolf Turner wrote:> On 9/03/2009, at 4:14 AM, Duncan Murdoch wrote: > >> ... analyzing bad data will just give bad conclusions. > > Fortune? > > cheers, > > Rolf TurnerMaybe ... ! (I have sometimes got very good answers from bad data, precisely by analysing how they were bad -- including ascertaining a change of lab technician from playtykurtosis and, once, identifying potential occasions of theft from delivery lorries from anomalies in their cargo docs). Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 08-Mar-09 Time: 21:48:43 ------------------------------ XFMail ------------------------------
Given we are talking about statistical software, one bibliometric measure of relative package popularity is scientific citations. Web of Science is not too useful where the citation has been to a website or computer package, but Google Scholar for "lme4: Linear mixed-effects models using S4 classes" gives us 108 journal citations; "mgcv: GAMs and generalized ridge regression for R" 80 etc Cheers, David Duffy. -- | David Duffy (MBBS PhD) ,-_|\ | email: davidD at qimr.edu.au ph: INT+61+7+3362-0217 fax: -0101 / * | Epidemiology Unit, Queensland Institute of Medical Research \_,-._/ | 300 Herston Rd, Brisbane, Queensland 4029, Australia GPG 4D0B994A v
On 10-Mar-09 01:07:54, David Duffy wrote:> Given we are talking about statistical software, one bibliometric > measure of relative package popularity is scientific citations. > Web of Science is not too useful where the citation has been to a > website or computer package, but Google Scholar for "lme4: Linear > mixed-effects models using S4 classes" gives us 108 journal > citations; "mgcv: GAMs and generalized ridge regression for R" 80 etc > > Cheers, David Duffy.A good point. But such numbers must be considered in the context of the prevalence of the kind of study for which the respective methods would be used. A great number of epidemiological studies would be suitable for application of glm(). Fewer would involve GAMs. "Popularity" of a package by citation frequency would (other things being equal) be proportional to the frequency of the kind of study for which it could be used. So one should either evaluate the proportion of studies in which an R package *could* be used, in which it *was* used; or compare the number of citations of an R package with the number of citations of an equiavlent package/module/proc in other software. Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 10-Mar-09 Time: 02:03:22 ------------------------------ XFMail ------------------------------