thr3ads.net - R help - [R] pairs [Jan 2003]

If this information is useful, please help other people find it:
Share via:

J C

2003-Jan-09 07:09 UTC

[R] pairs

Hello, 

I'm fairly new to R so please excuse me if I am asking something obvious.
I have looked in the FAQ, Introduction, and help pages, and searched the
archives, but I don't know much about graphics yet.  

I'm running Red Hat Linux 2.14.18  on a machine blessed with dual 1.5 Xeon
processors and 3.7GB of RAM. I have a very large dataset with 27 variables,
and in exploring the data I want to take snapshots using pairs(). The lower
matrix and diagonal are filled with other graphics. (Please don't suggest
that I cut down the variable number! This is in fact the trimmed-down,
must-have set of variables.)

Of course, even with all that memory, I get a crash about 2/3 of the way
through. This is one of those cases where it's hard to troubleshoot since
everything works fine for small datasets.  It is tantalizing because the
process takes over two hours to display most of the figure before the
freeze happens.

However, it seems to me that the crash is more related to the kind of
graphics device that I'm using and the size of the device.For instance, if
I'm using X11 it crashes slower than using png, and right now I'm trying
bitmap to produce a png file (it hasn't crashed after a half hour now, but
there's always time for that later.)  The plot also gets further along if I
set a small area for the device, but of course then the plots are
ridiculously tiny and hard to interpret.  I have 729 little plots, and I'd
be satisfied if they were at least .75 inches on each side... about 21 in.
square altogether.

What can I do to increase the chances that I'll be able to produce a
viewable, printable image?
Suppose that bitmap works-- can I raise the resolution up from 72 without
fear? 
 
Thanks,
Jean

Prof Brian Ripley

2003-Jan-09 09:45 UTC

head link

[R] pairs

Sounds like the problem is in your X server and not in R.  I've seen this
with Xfree (and don't use that myself on Linux).

1) I suggest you try a postscript() device, and convert later if you need
to.  Expect a very large file size.

2) Don't plot all the points.  You say you have a `very large dataset'.
In
statistics, we give numbers, not vague descriptions.  However, with what
that means to me (many millions of rows) a scatterplot of a very large
dataset is going to be mainly black at least in places.  (We've
experienced that with 1.4 million points, for example.) That's not a good
way to display the data.  Either use a density plot, or if you are
interested in outliers, thin the centre.  We did this by estimating a
density phat, then randomly selecting points with probability min(1,
const/phat(x))  for a suitable `const'.

On Thu, 9 Jan 2003, J C wrote:
> Hello,
>
> I'm fairly new to R so please excuse me if I am asking something
obvious.
> I have looked in the FAQ, Introduction, and help pages, and searched the
> archives, but I don't know much about graphics yet.
>
> I'm running Red Hat Linux 2.14.18  on a machine blessed with dual 1.5
Xeon
> processors and 3.7GB of RAM. I have a very large dataset with 27 variables,
> and in exploring the data I want to take snapshots using pairs(). The lower
> matrix and diagonal are filled with other graphics. (Please don't
suggest
> that I cut down the variable number! This is in fact the trimmed-down,
> must-have set of variables.)
>
> Of course, even with all that memory, I get a crash about 2/3 of the way
> through. This is one of those cases where it's hard to troubleshoot
since
> everything works fine for small datasets.  It is tantalizing because the
> process takes over two hours to display most of the figure before the
> freeze happens.
>
> However, it seems to me that the crash is more related to the kind of
> graphics device that I'm using and the size of the device.For instance,
if
> I'm using X11 it crashes slower than using png, and right now I'm
trying
> bitmap to produce a png file (it hasn't crashed after a half hour now,
but
> there's always time for that later.)  The plot also gets further along
if I
> set a small area for the device, but of course then the plots are
> ridiculously tiny and hard to interpret.  I have 729 little plots, and
I'd
> be satisfied if they were at least .75 inches on each side... about 21 in.
> square altogether.
>
> What can I do to increase the chances that I'll be able to produce a
> viewable, printable image?
> Suppose that bitmap works-- can I raise the resolution up from 72 without
> fear?
>
> Thanks,
> Jean
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> http://www.stat.math.ethz.ch/mailman/listinfo/r-help
>
-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

J C

2003-Jan-11 06:33 UTC

head link

[R] pairs

>1) I suggest you try a postscript() device, and convert later if you need
>to.  Expect a very large file size.
Dear Dr. Ripley,

Thank you!  Postscript was able to finish the job (bitmap killed itself.)
The filesizes are indeed large: 1.4G and requiring over two hours to
display by gv, but ultimately viewable.  I'm new to manipulating ps files
but hopefully I can find a fast way to convert the files into a small
format. I found an archived message of yours that suggested not to use
pch="." as a symbol  for graphing large datasets, and upon
experimentation
I found that the default symbol, pch=21, seemed to produce the smallest
files for some sets of test data when compared with some other symbols.
Running "pch=21, cex=0.35" produced a fairly small point but consumed
much
less space than pch="."   Is this the best solution for producing plot
symbols that take up little room both on the plot and the hard drive?
>Sounds like the problem is in your X server and not in R.  I've seen
this
>with Xfree (and don't use that myself on Linux).It's possible... however, I wouldn't know how to fix it from that end,
either...
>2) Don't plot all the points. You say you have a `very large
dataset'. In
>statistics, we give numbers, not vague descriptions. However, with what 
>that means to me (many millions of rows) a scatterplot of a very large 
>dataset is going to be mainly black at least in places. (We've 
>experienced that with 1.4 million points, for example.) That's not a
good
>way to display the data. Either use a density plot, or if you are 
>interested in outliers, thin the centre. We did this by estimating a 
>density phat, then randomly selecting points with probability min(1, 
>const/phat(x)) for a suitable `const'
I have a set of textfiles, each containing a  450,000 x 41 matrix (1.845
million datapoints)  and roughly 300M. Indeed, the scatterplots are
overprinted, but I am interested in getting a "feel" for the data
before
charging ahead. The data (measurements on artificial phylogenetic trees)
were produced by simulation and although I have been running checks all
along I wanted to make sure that my simulations weren't producing any
strange outliers or oddly shaped distributions. On the other hand, I had no
real guess as to what the data would look like or even what variables would
show strong correlations. Since many of these datapoints are from repeats,
I was in fact able to discern a lot of pattern, rather than getting
all-black plots.   

Using both a density plot and a thinned plot may be the way to go, if I
don't find a way to shrink down the graphs.  I hoped that "pairs"
would be
a fast, one-line way to take in all my data at once, but of course nothing
has been that easy with all this data. 

Jean

Apparently Analagous Threads

Search for more maybe matching threads

R help - Jan 2003 - pairs

[R] pairs

[R] pairs

[R] pairs

Apparently Analagous Threads