Greg Hirson
2009-Aug-13 18:01 UTC
[R] multiple downloads of data when evaluating plot() vs. xyplot()
I have noticed an interesting behavior when comparing how the base plot() function deals with a data argument that downloads data from the internet vs. how xyplot() in lattice performs the same task. The goal is to plot hourly temperature data. The data is downloaded and formatted for R using the function cimishourly() in the package cimis. There is a line within the function that outputs the name of the file being downloaded using cat(). When using plot() to plot the data, the following is written to the console: library(cimis) plot(air_temp ~ datetime, data = cimishourly("006")) Downloading: ftp://ftpcimis.water.ca.gov/pub/hourly/hourly006.csv Downloading: ftp://ftpcimis.water.ca.gov/pub/hourly/hourly006.csv When using xyplot() to perform the same plot, the data is only downloaded once: library(lattice) xyplot(air_temp ~ datetime, data = cimishourly("006")) Downloading: ftp://ftpcimis.water.ca.gov/pub/hourly/hourly006.csv Is this caused by a difference in how the two functions evaluate the data argument? Even more interesting, when adding a type = "l" argument to plot, the data is downloaded 3 times. Thank you for your time, Greg -- Greg Hirson ghirson at ucdavis.edu Graduate Student Agricultural and Environmental Chemistry 1106 Robert Mondavi Institute North One Shields Avenue Davis, CA 95616
Uwe Ligges
2009-Aug-23 18:18 UTC
[R] multiple downloads of data when evaluating plot() vs. xyplot()
Greg Hirson wrote:> I have noticed an interesting behavior when comparing how the base > plot() function deals with a data argument that downloads data from the > internet vs. how xyplot() in lattice performs the same task. > > The goal is to plot hourly temperature data. The data is downloaded and > formatted for R using the function cimishourly() in the package cimis. > There is a line within the function that outputs the name of the file > being downloaded using cat(). > > When using plot() to plot the data, the following is written to the > console: > > library(cimis) > plot(air_temp ~ datetime, data = cimishourly("006")) > Downloading: ftp://ftpcimis.water.ca.gov/pub/hourly/hourly006.csv > Downloading: ftp://ftpcimis.water.ca.gov/pub/hourly/hourly006.csv > > When using xyplot() to perform the same plot, the data is only > downloaded once: > > library(lattice) > xyplot(air_temp ~ datetime, data = cimishourly("006")) > Downloading: ftp://ftpcimis.water.ca.gov/pub/hourly/hourly006.csv > > Is this caused by a difference in how the two functions evaluate the > data argument?Looks like nobody answered so far: Yes, there are several differences. I think you should not encapsulate downloading-functions into others anyway and download the data once before anything else and then start to work on it. It is evaluated in plot.formula at two positions: if (is.matrix(eval(m$data, parent.frame()))) mf <- eval(m, parent.frame()) Generally this is not a big issue but for your function it shows quite some performance penalty that can easily be avoided by downloading in advance. Best, Uwe Ligges> Even more interesting, when adding a type = "l" argument to plot, the > data is downloaded 3 times. > > Thank you for your time, > > Greg >