thr3ads.net - R devel - [Rd] CRAN Server download statistics (Was: R Usage Statistics) [Nov 2009]

If this information is useful, please help other people find it:
Share via:

Fellows, Ian

2009-Nov-23 00:18 UTC

[Rd] CRAN Server download statistics (Was: R Usage Statistics)

Hi All,

It seems that the question of how may people use (or download) R, and it's
packages is one that comes up on a fairly regular basis in a variety of forums
(There was also recent thread on the subject on Stack Overflow). A couple of
students at UCLA (including myself), wanted to address the issue, so we set up a
system to get and parse the cran.stat.ucla.edu APACHE logs every night, and
display some basic statistics. Right now, we have a working sketch of a site
based on one week of observations.

http://neolab.stat.ucla.edu/cranstats/

We would very much like to incorporate data from all CRAN mirrors, including
cran.r-project.org. We would also like to set this up in a way that is minimally
invasive for the site administrators. Internally, our administrator has set up a
protected directory with the last couple days of cran activity. We then pull
that down using curl.

What would be the best and easiest way for the CRAN mirrors to share their data?
Is the contact information for the administrators available anywhere?


Thank you,
Ian Fellows



________________________________________
From: r-devel-bounces at r-project.org [r-devel-bounces at r-project.org] On
Behalf Of Steven McKinney [smckinney at bccrc.ca]
Sent: Thursday, November 19, 2009 2:21 PM
To: Kevin R. Coombes; r-devel at r-project.org
Subject: Re: [Rd] R Usage Statistics

Hi Kevin,

What a surprising comment from a reviewer for BMC Bioinformatics.

I just did a PubMed search for "limma" and
"aroma.affymetrix",
just two methods for which I use R software regularly.
"limma" yields 28 hits, several of which are published
in BMC Bioinformatics.  Bengtsson's aroma.affymetrix paper
"Estimation and assessment of raw copy numbers at the single locus
level."
is already cited by 6 others.

It almost seems too easy to work up lists of usage of R packages.

Spotfire is an application built around S-Plus that has widespread use
in the biopharmaceutical industry at a minimum.  Vivek Ranadive's
TIBCO company just purchased Insightful, the S-Plus company.
(They bought Spotfire previously.)
Mr. Ranadive does not spend money on environments that are
not appropriate for deploying applications.

You could easily cull a list of corporation names from the
various R email listservs as well.

Press back with the reviewer.  Reviewers can learn new things
and will respond to arguments with good evidence behind them.
Good luck!


Steven McKinney


________________________________________
From: r-devel-bounces at r-project.org [r-devel-bounces at r-project.org] On
Behalf Of Kevin R. Coombes [krcoombes at mdacc.tmc.edu]
Sent: November 19, 2009 10:47 AM
To: r-devel at r-project.org
Subject: [Rd] R Usage Statistics

Hi,

I got the following comment from the reviewer of a paper (describing an
algorithm implemented in R) that I submitted to BMC Bioinformatics:

"Finally, which useful for exploratory work and some prototyping,
neither R nor S-Plus are appropriate environments for deploying user
applications that would receive much use."

I can certainly respond by pointing out that CRAN contains more than
2000 packages and Bioconductor contains more than 350. However, does
anyone have statistics on how often R (and possibly some R packages) are
downloaded, or on how many people actually use R?

Thanks,
    Kevin

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Detlef Steuer

2009-Nov-23 07:48 UTC

head link

[Rd] CRAN Server download statistics (Was: R Usage Statistics)

Hi!

Nice work!
But keep in mind, that for example the opensuse packages are no longer
kept up to date on CRAN, but in openSUSE's Build Service. So the stats
are biased towards windows and mac.

It seems you only count binary downloads of contributed packages?
Introduces some nice bias, too.

Nevertheless, a nice starting point. Good luck!

Detlef

On Sun, 22 Nov 2009 16:18:11 -0800
"Fellows, Ian" <ifellows at ucsd.edu> wrote:
> Hi All,
> 
> It seems that the question of how may people use (or download) R, and
it's packages is one that comes up on a fairly regular basis in a variety of
forums (There was also recent thread on the subject on Stack Overflow). A couple
of students at UCLA (including myself), wanted to address the issue, so we set
up a system to get and parse the cran.stat.ucla.edu APACHE logs every night, and
display some basic statistics. Right now, we have a working sketch of a site
based on one week of observations.
> 
> http://neolab.stat.ucla.edu/cranstats/
> 
> We would very much like to incorporate data from all CRAN mirrors,
including cran.r-project.org. We would also like to set this up in a way that is
minimally invasive for the site administrators. Internally, our administrator
has set up a protected directory with the last couple days of cran activity. We
then pull that down using curl.
> 
> What would be the best and easiest way for the CRAN mirrors to share their
data? Is the contact information for the administrators available anywhere?
> 
> 
> Thank you,
> Ian Fellows
> 
> 
> 
> ________________________________________
> From: r-devel-bounces at r-project.org [r-devel-bounces at r-project.org]
On Behalf Of Steven McKinney [smckinney at bccrc.ca]
> Sent: Thursday, November 19, 2009 2:21 PM
> To: Kevin R. Coombes; r-devel at r-project.org
> Subject: Re: [Rd] R Usage Statistics
> 
> Hi Kevin,
> 
> What a surprising comment from a reviewer for BMC Bioinformatics.
> 
> I just did a PubMed search for "limma" and
"aroma.affymetrix",
> just two methods for which I use R software regularly.
> "limma" yields 28 hits, several of which are published
> in BMC Bioinformatics.  Bengtsson's aroma.affymetrix paper
> "Estimation and assessment of raw copy numbers at the single locus
level."
> is already cited by 6 others.
> 
> It almost seems too easy to work up lists of usage of R packages.
> 
> Spotfire is an application built around S-Plus that has widespread use
> in the biopharmaceutical industry at a minimum.  Vivek Ranadive's
> TIBCO company just purchased Insightful, the S-Plus company.
> (They bought Spotfire previously.)
> Mr. Ranadive does not spend money on environments that are
> not appropriate for deploying applications.
> 
> You could easily cull a list of corporation names from the
> various R email listservs as well.
> 
> Press back with the reviewer.  Reviewers can learn new things
> and will respond to arguments with good evidence behind them.
> Good luck!
> 
> 
> Steven McKinney
> 
> 
> ________________________________________
> From: r-devel-bounces at r-project.org [r-devel-bounces at r-project.org]
On Behalf Of Kevin R. Coombes [krcoombes at mdacc.tmc.edu]
> Sent: November 19, 2009 10:47 AM
> To: r-devel at r-project.org
> Subject: [Rd] R Usage Statistics
> 
> Hi,
> 
> I got the following comment from the reviewer of a paper (describing an
> algorithm implemented in R) that I submitted to BMC Bioinformatics:
> 
> "Finally, which useful for exploratory work and some prototyping,
> neither R nor S-Plus are appropriate environments for deploying user
> applications that would receive much use."
> 
> I can certainly respond by pointing out that CRAN contains more than
> 2000 packages and Bioconductor contains more than 350. However, does
> anyone have statistics on how often R (and possibly some R packages) are
> downloaded, or on how many people actually use R?
> 
> Thanks,
>     Kevin
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

hadley wickham

2009-Nov-23 14:12 UTC

head link

[Rd] CRAN Server download statistics (Was: R Usage Statistics)

Hi Ian,

I've spoken with Stefan Theussl (cran maintainer) about this, and he's
concerned about the privacy implications of making the apache access
logs public.  A compromise that he mentioned was having a script run
on the cran mirror that processed the log files and output summary
statistics.  Then a central process could aggregate these and produce
a single overall summary.

A few comments on your current site:

 * Are you just including packages downloaded interactively from within R?

 * I don't think the continent from which the package was download is
of much interest.  There's definitely no need to include it on the
main page.

 * I'd be far more interested in changes over time.  Sparklines of the
last month worth of data would be a neat addition to the main page.

 * More vertical whitespace or subtle zebra striping would make it
much easier to read across rows.

 * I'm also not sure about displaying the number of unique IPs. R is
used a lot in the university setting and until ipv6 comes along, many
university downloads will appear to be coming from a single ip
address.

 * It's not very useful to sort by % Windows because the variance
increases as the sample size decreases so the packages with the
highest and lowest % windows are just the packages that aren't
downloaded very often.  Maybe a shrunken estimate?

 * Have you thought at all about how to take package dependences into account?

Hadley

On Sun, Nov 22, 2009 at 6:18 PM, Fellows, Ian <ifellows at ucsd.edu>
wrote:> Hi All,
>
> It seems that the question of how may people use (or download) R, and
it's packages is one that comes up on a fairly regular basis in a variety of
forums (There was also recent thread on the subject on Stack Overflow). A couple
of students at UCLA (including myself), wanted to address the issue, so we set
up a system to get and parse the cran.stat.ucla.edu APACHE logs every night, and
display some basic statistics. Right now, we have a working sketch of a site
based on one week of observations.
>
> http://neolab.stat.ucla.edu/cranstats/
>
> We would very much like to incorporate data from all CRAN mirrors,
including cran.r-project.org. We would also like to set this up in a way that is
minimally invasive for the site administrators. Internally, our administrator
has set up a protected directory with the last couple days of cran activity. We
then pull that down using curl.
>
> What would be the best and easiest way for the CRAN mirrors to share their
data? Is the contact information for the administrators available anywhere?
>
>
> Thank you,
> Ian Fellows
>
>
>
> ________________________________________
> From: r-devel-bounces at r-project.org [r-devel-bounces at r-project.org]
On Behalf Of Steven McKinney [smckinney at bccrc.ca]
> Sent: Thursday, November 19, 2009 2:21 PM
> To: Kevin R. Coombes; r-devel at r-project.org
> Subject: Re: [Rd] R Usage Statistics
>
> Hi Kevin,
>
> What a surprising comment from a reviewer for BMC Bioinformatics.
>
> I just did a PubMed search for "limma" and
"aroma.affymetrix",
> just two methods for which I use R software regularly.
> "limma" yields 28 hits, several of which are published
> in BMC Bioinformatics. ?Bengtsson's aroma.affymetrix paper
> "Estimation and assessment of raw copy numbers at the single locus
level."
> is already cited by 6 others.
>
> It almost seems too easy to work up lists of usage of R packages.
>
> Spotfire is an application built around S-Plus that has widespread use
> in the biopharmaceutical industry at a minimum. ?Vivek Ranadive's
> TIBCO company just purchased Insightful, the S-Plus company.
> (They bought Spotfire previously.)
> Mr. Ranadive does not spend money on environments that are
> not appropriate for deploying applications.
>
> You could easily cull a list of corporation names from the
> various R email listservs as well.
>
> Press back with the reviewer. ?Reviewers can learn new things
> and will respond to arguments with good evidence behind them.
> Good luck!
>
>
> Steven McKinney
>
>
> ________________________________________
> From: r-devel-bounces at r-project.org [r-devel-bounces at r-project.org]
On Behalf Of Kevin R. Coombes [krcoombes at mdacc.tmc.edu]
> Sent: November 19, 2009 10:47 AM
> To: r-devel at r-project.org
> Subject: [Rd] R Usage Statistics
>
> Hi,
>
> I got the following comment from the reviewer of a paper (describing an
> algorithm implemented in R) that I submitted to BMC Bioinformatics:
>
> "Finally, which useful for exploratory work and some prototyping,
> neither R nor S-Plus are appropriate environments for deploying user
> applications that would receive much use."
>
> I can certainly respond by pointing out that CRAN contains more than
> 2000 packages and Bioconductor contains more than 350. However, does
> anyone have statistics on how often R (and possibly some R packages) are
> downloaded, or on how many people actually use R?
>
> Thanks,
> ? ?Kevin
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>


-- 
http://had.co.nz/

Jeff Ryan

2009-Nov-23 16:17 UTC

head link

[Rd] CRAN Server download statistics (Was: R Usage Statistics)

While I think download statistics are potentially interesting for
developers, done incorrectly it can very likely damage the community.
A basic data reporting problem, with all of the caveats attached.

This information has also been readily available from the main CRAN
mirror for years:

http://www.r-project.org/awstats/awstats.cran.r-project.org.html
http://cran.r-project.org/report_cran.html

Best,
Jeff

On Sun, Nov 22, 2009 at 6:18 PM, Fellows, Ian <ifellows at ucsd.edu>
wrote:> Hi All,
>
> It seems that the question of how may people use (or download) R, and
it's packages is one that comes up on a fairly regular basis in a variety of
forums (There was also recent thread on the subject on Stack Overflow). A couple
of students at UCLA (including myself), wanted to address the issue, so we set
up a system to get and parse the cran.stat.ucla.edu APACHE logs every night, and
display some basic statistics. Right now, we have a working sketch of a site
based on one week of observations.
>
> http://neolab.stat.ucla.edu/cranstats/
>
> We would very much like to incorporate data from all CRAN mirrors,
including cran.r-project.org. We would also like to set this up in a way that is
minimally invasive for the site administrators. Internally, our administrator
has set up a protected directory with the last couple days of cran activity. We
then pull that down using curl.
>
> What would be the best and easiest way for the CRAN mirrors to share their
data? Is the contact information for the administrators available anywhere?
>
>
> Thank you,
> Ian Fellows
>
>
>
> ________________________________________
> From: r-devel-bounces at r-project.org [r-devel-bounces at r-project.org]
On Behalf Of Steven McKinney [smckinney at bccrc.ca]
> Sent: Thursday, November 19, 2009 2:21 PM
> To: Kevin R. Coombes; r-devel at r-project.org
> Subject: Re: [Rd] R Usage Statistics
>
> Hi Kevin,
>
> What a surprising comment from a reviewer for BMC Bioinformatics.
>
> I just did a PubMed search for "limma" and
"aroma.affymetrix",
> just two methods for which I use R software regularly.
> "limma" yields 28 hits, several of which are published
> in BMC Bioinformatics. ?Bengtsson's aroma.affymetrix paper
> "Estimation and assessment of raw copy numbers at the single locus
level."
> is already cited by 6 others.
>
> It almost seems too easy to work up lists of usage of R packages.
>
> Spotfire is an application built around S-Plus that has widespread use
> in the biopharmaceutical industry at a minimum. ?Vivek Ranadive's
> TIBCO company just purchased Insightful, the S-Plus company.
> (They bought Spotfire previously.)
> Mr. Ranadive does not spend money on environments that are
> not appropriate for deploying applications.
>
> You could easily cull a list of corporation names from the
> various R email listservs as well.
>
> Press back with the reviewer. ?Reviewers can learn new things
> and will respond to arguments with good evidence behind them.
> Good luck!
>
>
> Steven McKinney
>
>
> ________________________________________
> From: r-devel-bounces at r-project.org [r-devel-bounces at r-project.org]
On Behalf Of Kevin R. Coombes [krcoombes at mdacc.tmc.edu]
> Sent: November 19, 2009 10:47 AM
> To: r-devel at r-project.org
> Subject: [Rd] R Usage Statistics
>
> Hi,
>
> I got the following comment from the reviewer of a paper (describing an
> algorithm implemented in R) that I submitted to BMC Bioinformatics:
>
> "Finally, which useful for exploratory work and some prototyping,
> neither R nor S-Plus are appropriate environments for deploying user
> applications that would receive much use."
>
> I can certainly respond by pointing out that CRAN contains more than
> 2000 packages and Bioconductor contains more than 350. However, does
> anyone have statistics on how often R (and possibly some R packages) are
> downloaded, or on how many people actually use R?
>
> Thanks,
> ? ?Kevin
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>


-- 
Jeffrey Ryan
jeffrey.ryan at insightalgo.com

ia: insight algorithmics
www.insightalgo.com

Gabor Grothendieck

2009-Nov-23 16:34 UTC

head link

[Rd] CRAN Server download statistics (Was: R Usage Statistics)

On Mon, Nov 23, 2009 at 11:11 AM, Friedrich Leisch
<friedrich.leisch at stat.uni-muenchen.de> wrote:>>>>>> On ,
>>>>>> Anonymous () wrote:
> ?> Knowing what percentage of different OSes are being used is of
> ?> interest to package developers and would be obscured by the proposal
> ?> to massage the data. ?I prefer to see the raw figure as is.
>
> ?> Also the number of IPs are important and should not be removed in my
> ?> opinion since (1) it is a measure of clustering. ?If a package is
> ?> mainly used by the courses of a few universities where the students
> ?> really have no choice then that seems a lot different than if its
used
> ?> by a variety of people around the world. ?Only the IPs would give any
> ?> clue to that. ?(2) it helps to diagnose intentional distortion of the
> ?> figures by repeat downloads to the same machine.
>
> As Hadley already pointed out we cannot make CRAN logs publicly
> available for privacy reasons. That would be a violation of national
> laws.
I think that's unlikely.  There is no info given out identifying
users.  There are lots of web stats on the net.

Gabor Grothendieck

2009-Nov-23 17:25 UTC

head link

[Rd] CRAN Server download statistics (Was: R Usage Statistics)

On Mon, Nov 23, 2009 at 12:15 PM, Friedrich Leisch
<friedrich.leisch at stat.uni-muenchen.de> wrote:> IP address plus time will always allow sysadmins to recover
> identities. For static adresses or in combination with mail headers
> etc it is also not exactly rocket science for others.
I had not suggested that identifying information be posted.  In fact,
quite the contrary -- I pointed out that it would be easy to code them
to preserve privacy.

Gabor Grothendieck

2009-Nov-23 17:55 UTC

head link

[Rd] CRAN Server download statistics (Was: R Usage Statistics)

On Mon, Nov 23, 2009 at 12:37 PM, Friedrich Leisch
<friedrich.leisch at stat.uni-muenchen.de> wrote:>
> ?> On Mon, Nov 23, 2009 at 12:15 PM, Friedrich Leisch
> ?> <friedrich.leisch at stat.uni-muenchen.de> wrote:
> ?>> IP address plus time will always allow sysadmins to recover
> ?>> identities. For static adresses or in combination with mail
headers
> ?>> etc it is also not exactly rocket science for others.
>
> ?> I had not suggested that identifying information be posted. ?In fact,
> ?> quite the contrary -- I pointed out that it would be easy to code
them
> ?> to preserve privacy.
>
> Deliberately only citing parts of my email? I answered (in parallel
> with Hadley) to your original email and there you requested
>
> ?"... Only the IPs would give any clue to that."
You are quoting this out of context.  I was referring to the fact that
the web page has no IP addresses.  I was not referring to the log.
This all started as supposed reasons that the web page in question
could not be distributed (because it depended on the log which had
identifying information) and I was pointing out that the web page
itself did not have identifying information and the identifying
information in the log could easily be coded.

Possibly Parallel Threads

Search for more maybe matching threads

R devel - Nov 2009 - CRAN Server download statistics (Was: R Usage Statistics)

[Rd] CRAN Server download statistics (Was: R Usage Statistics)

[Rd] CRAN Server download statistics (Was: R Usage Statistics)

[Rd] CRAN Server download statistics (Was: R Usage Statistics)

[Rd] CRAN Server download statistics (Was: R Usage Statistics)

[Rd] CRAN Server download statistics (Was: R Usage Statistics)

[Rd] CRAN Server download statistics (Was: R Usage Statistics)

[Rd] CRAN Server download statistics (Was: R Usage Statistics)

Possibly Parallel Threads