Hello, I'm fairly new to R so please excuse me if I am asking something obvious. I have looked in the FAQ, Introduction, and help pages, and searched the archives, but I don't know much about graphics yet. I'm running Red Hat Linux 2.14.18 on a machine blessed with dual 1.5 Xeon processors and 3.7GB of RAM. I have a very large dataset with 27 variables, and in exploring the data I want to take snapshots using pairs(). The lower matrix and diagonal are filled with other graphics. (Please don't suggest that I cut down the variable number! This is in fact the trimmed-down, must-have set of variables.) Of course, even with all that memory, I get a crash about 2/3 of the way through. This is one of those cases where it's hard to troubleshoot since everything works fine for small datasets. It is tantalizing because the process takes over two hours to display most of the figure before the freeze happens. However, it seems to me that the crash is more related to the kind of graphics device that I'm using and the size of the device.For instance, if I'm using X11 it crashes slower than using png, and right now I'm trying bitmap to produce a png file (it hasn't crashed after a half hour now, but there's always time for that later.) The plot also gets further along if I set a small area for the device, but of course then the plots are ridiculously tiny and hard to interpret. I have 729 little plots, and I'd be satisfied if they were at least .75 inches on each side... about 21 in. square altogether. What can I do to increase the chances that I'll be able to produce a viewable, printable image? Suppose that bitmap works-- can I raise the resolution up from 72 without fear? Thanks, Jean
Sounds like the problem is in your X server and not in R. I've seen this with Xfree (and don't use that myself on Linux). 1) I suggest you try a postscript() device, and convert later if you need to. Expect a very large file size. 2) Don't plot all the points. You say you have a `very large dataset'. In statistics, we give numbers, not vague descriptions. However, with what that means to me (many millions of rows) a scatterplot of a very large dataset is going to be mainly black at least in places. (We've experienced that with 1.4 million points, for example.) That's not a good way to display the data. Either use a density plot, or if you are interested in outliers, thin the centre. We did this by estimating a density phat, then randomly selecting points with probability min(1, const/phat(x)) for a suitable `const'. On Thu, 9 Jan 2003, J C wrote:> Hello, > > I'm fairly new to R so please excuse me if I am asking something obvious. > I have looked in the FAQ, Introduction, and help pages, and searched the > archives, but I don't know much about graphics yet. > > I'm running Red Hat Linux 2.14.18 on a machine blessed with dual 1.5 Xeon > processors and 3.7GB of RAM. I have a very large dataset with 27 variables, > and in exploring the data I want to take snapshots using pairs(). The lower > matrix and diagonal are filled with other graphics. (Please don't suggest > that I cut down the variable number! This is in fact the trimmed-down, > must-have set of variables.) > > Of course, even with all that memory, I get a crash about 2/3 of the way > through. This is one of those cases where it's hard to troubleshoot since > everything works fine for small datasets. It is tantalizing because the > process takes over two hours to display most of the figure before the > freeze happens. > > However, it seems to me that the crash is more related to the kind of > graphics device that I'm using and the size of the device.For instance, if > I'm using X11 it crashes slower than using png, and right now I'm trying > bitmap to produce a png file (it hasn't crashed after a half hour now, but > there's always time for that later.) The plot also gets further along if I > set a small area for the device, but of course then the plots are > ridiculously tiny and hard to interpret. I have 729 little plots, and I'd > be satisfied if they were at least .75 inches on each side... about 21 in. > square altogether. > > What can I do to increase the chances that I'll be able to produce a > viewable, printable image? > Suppose that bitmap works-- can I raise the resolution up from 72 without > fear? > > Thanks, > Jean > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > http://www.stat.math.ethz.ch/mailman/listinfo/r-help >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UK Fax: +44 1865 272595
>1) I suggest you try a postscript() device, and convert later if you need >to. Expect a very large file size.Dear Dr. Ripley, Thank you! Postscript was able to finish the job (bitmap killed itself.) The filesizes are indeed large: 1.4G and requiring over two hours to display by gv, but ultimately viewable. I'm new to manipulating ps files but hopefully I can find a fast way to convert the files into a small format. I found an archived message of yours that suggested not to use pch="." as a symbol for graphing large datasets, and upon experimentation I found that the default symbol, pch=21, seemed to produce the smallest files for some sets of test data when compared with some other symbols. Running "pch=21, cex=0.35" produced a fairly small point but consumed much less space than pch="." Is this the best solution for producing plot symbols that take up little room both on the plot and the hard drive?>Sounds like the problem is in your X server and not in R. I've seen this >with Xfree (and don't use that myself on Linux).It's possible... however, I wouldn't know how to fix it from that end, either...>2) Don't plot all the points. You say you have a `very large dataset'. In >statistics, we give numbers, not vague descriptions. However, with what >that means to me (many millions of rows) a scatterplot of a very large >dataset is going to be mainly black at least in places. (We've >experienced that with 1.4 million points, for example.) That's not a good >way to display the data. Either use a density plot, or if you are >interested in outliers, thin the centre. We did this by estimating a >density phat, then randomly selecting points with probability min(1, >const/phat(x)) for a suitable `const'I have a set of textfiles, each containing a 450,000 x 41 matrix (1.845 million datapoints) and roughly 300M. Indeed, the scatterplots are overprinted, but I am interested in getting a "feel" for the data before charging ahead. The data (measurements on artificial phylogenetic trees) were produced by simulation and although I have been running checks all along I wanted to make sure that my simulations weren't producing any strange outliers or oddly shaped distributions. On the other hand, I had no real guess as to what the data would look like or even what variables would show strong correlations. Since many of these datapoints are from repeats, I was in fact able to discern a lot of pattern, rather than getting all-black plots. Using both a density plot and a thinned plot may be the way to go, if I don't find a way to shrink down the graphs. I hoped that "pairs" would be a fast, one-line way to take in all my data at once, but of course nothing has been that easy with all this data. Jean