Hi, A major attraction to R and to S-plus are the graphics. (Up to now my experience is with STATA and SAS.) Most of the graphical examples that I have seen in the documentation are for relatively small size data sets. I am working with a moderately large data set -- the order of magnitude is 180,000 observations by 50 variables. There seem to be standard problems that I keep bumping into in the graphics: eg. the graphics work hard to accomodate outliers leaving the main action area a thick cloud, very slow operations by R, etc. I have been doing some obvious things to deal with these issues, eg. trimming, restricting attention to data subsamples, etc. But these must be pretty standard issues. I would like to take advantage of what is already known. What should I be reading that explains how best to do graphics with a somewhat larger data set? (Pointers to an appropriate FAQ would be great since I have looked but not managed to find it.) Thanks in advance for any advice. Thanks. Murray. Murray Z. Frank Faculty of Commerce University of British Columbia Vancouver, B.C. Canada V6T 1Z2 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Michael A. Miller
2001-Dec-12 15:27 UTC
[R] Graphics with moderately large amounts of data
>>>>> "Frank," == Frank, Murray <murray.frank at commerce.ubc.ca> writes:> I am working with a moderately large data set -- the order > of magnitude is 180,000 observations by 50 variables. There > seem to be standard problems that I keep bumping into in > the graphics: eg. the graphics work hard to accomodate > outliers leaving the main action area a thick cloud, very > slow operations by R, etc. I have been doing some obvious > things to deal with these issues, eg. trimming, restricting > attention to data subsamples, etc. I don't know of an appropriate FAQ to refer you to. The issue of outliers can be dealt with by setting the plotting limits with xlim and ylim. This might be faster than trimming the data set. As far as graphics speed goes, a non-R possibility for faster plotting of large data sets is root (http://root.cern.ch). (Caveat: I've used root for large data sets (~10^6 measurements of many parameters) and I have not used R for data sets larger than several thousand measurements of a dozen or so parameters. I have not made a direct comparison with large data sets - YMMV.) Mike -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._