Marcelo Perlin
2016-Jun-09 14:24 UTC
[R] About identification of CRAN CHECK machines in logs
Hi, I recently released two packages (RndTexExams and GetTDData) in CRAN and I'm trying to track the number of downloads and location of users. I wrote a simple script to download and analyze the log files in http://cran -logs.rstudio.com. I realized, however, that during the release of a new version of the packages there is a spike in the number of downloads. I believe that the CRAN checks are included in the number of installations of the package in the log files. I see from the log files the existence of column "ip_id", which sets a daily unique id for each new ip. My question is, can CRAN set the ip_id of the CRAN machines to a fixed value so that we can filter only "real" users out of the data? Can anyone see any other way around it? Thanks. -- Marcelo Perlin Professor Adjunto | Escola de Administra??o Universidade Federal do Rio Grande do Sul Rua Washington Luiz, 855 | 90010-460| Porto Alegre RS| Brasil Tel.: (51) 3308-3303 | www.ea.ufrgs.br http://lattes.cnpq.br/3262699324398819 https://sites.google.com/site/marceloperlin/ [[alternative HTML version deleted]]
Hadley Wickham
2016-Jun-09 21:18 UTC
[R] About identification of CRAN CHECK machines in logs
On Thu, Jun 9, 2016 at 9:24 AM, Marcelo Perlin <marceloperlin at gmail.com> wrote:> Hi, > > I recently released two packages (RndTexExams and GetTDData) in CRAN and > I'm trying to track the number of downloads and location of users. > > I wrote a simple script to download and analyze the log files in http://cran > -logs.rstudio.com. > I realized, however, that during the release of a new version of the > packages there is a spike in the number of downloads. I believe that the > CRAN checks are included in the number of installations of the package in > the log files.I don't think that's true. Why would CRAN be installing the package from a mirror? Hadley -- http://hadley.nz
Marcelo Perlin
2016-Jun-10 13:27 UTC
[R] About identification of CRAN CHECK machines in logs
I don't know Hadley. But you can see evidence of "something"
systematically
installing the packages in the log data. From my two CRAN packages I
noticed a high correlation in the number of downloads.
Try the following script, which will pick 5 random packages from CRAN and
calculate the correlation matrix between their differenced number of
downloads. To avoid spurious correlations, I removed the weekends since we
can expect some seasonality and also the zero entries. Its crude, I know,
but it does shows some positive associations between the number of
installations of the packages.
If not CRAN, who/what is downloading this packages and how can I set it
apart from the actual user installations?
Many thanks!
____
# get packages
df <- as.data.frame(available.packages())
# choose 5 random
idx <- sample(seq(nrow(df)))[1:5]
df<- df[idx,]
my.pkgs <- as.character(df$Package)
#my.pkgs <- c('RndTexExams','GetTDData')
dl.df <- cranlogs::cran_downloads(my.pkgs, from = '2015-01-01', to
Sys.Date())
# remove zeros entries
dl.df$count[dl.df$count==0] <- NA
# remove weekends
dl.df$sat.sun <- as.POSIXlt(dl.df$date)$wday
dl.df <- dplyr::filter(dl.df, sat.sun != 0, sat.sun != 6)
# to wide (for corr)
dl.df <- tidyr::spread(dl.df, key = package,value = count)
# remove na
dl.df <- dl.df[complete.cases(dl.df), ]
diff.mat <- diff(as.matrix(dl.df[,3:ncol(dl.df)]))
cor(diff.mat)
___
On Thu, Jun 9, 2016 at 6:18 PM, Hadley Wickham <h.wickham at gmail.com>
wrote:
> On Thu, Jun 9, 2016 at 9:24 AM, Marcelo Perlin <marceloperlin at
gmail.com>
> wrote:
> > Hi,
> >
> > I recently released two packages (RndTexExams and GetTDData) in CRAN
and
> > I'm trying to track the number of downloads and location of users.
> >
> > I wrote a simple script to download and analyze the log files in
> http://cran
> > -logs.rstudio.com.
> > I realized, however, that during the release of a new version of the
> > packages there is a spike in the number of downloads. I believe that
the
> > CRAN checks are included in the number of installations of the package
in
> > the log files.
>
> I don't think that's true. Why would CRAN be installing the package
> from a mirror?
>
> Hadley
>
> --
> http://hadley.nz
>
--
Marcelo Perlin
Professor Adjunto | Escola de Administra??o
Universidade Federal do Rio Grande do Sul
Rua Washington Luiz, 855 | 90010-460| Porto Alegre RS| Brasil
Tel.: (51) 3308-3303 | www.ea.ufrgs.br
http://lattes.cnpq.br/3262699324398819
https://sites.google.com/site/marceloperlin/
[[alternative HTML version deleted]]