Hi Dirk,
Dirk Eddelbuettel <edd at debian.org> writes:> Is there a way, maybe using Duncan TL's RCurl, to efficiently test
whether
> an URL such as
>
> http://$CRAN/src/contrib/
>
> has changed? I.e. one way is via a test of a page in that directory as per
> (sorry about the long line, and this would be on Linux with links and awk
> installed)
>
> > strptime(system("links -width 160 -dump
http://cran.r-project.org/src/contrib/ | awk '/PACKAGES.html/ {print
$3,$4}\'", intern=TRUE), "%d-%b-%Y %H:%M")
> [1] "2007-07-12 18:16:00"
> >
>
> and one can then compare the POSIXt with a cached value --- but requesting
> the header would presumably be more efficient.
>
> Is there are way to request the 'has changed' part of the http 1.1
spe
> directly in R?
Here's a way to use RCurl obtain HTTP headers:
h <- basicTextGatherer()
junk <- getURI(url, writeheader=h$update, header=TRUE, nobody=TRUE)
h <- h$value()
If you want to check many URLs, I think you will find the following
much faster as opposed to looping the above:
h <- multiTextGatherer(urls)
junk <- getURIAsynchronous(urls, write=h, header=TRUE, nobody=TRUE)
yourInfo <- sapply(h, function(x) something(x$value()))
I've used this in the pkgDepTools package to retrieve package download
sizes.
Cheers,
+ seth
--
Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center
http://bioconductor.org