thr3ads.net - R help - [R] Scatterplot Showing All Points [Dec 2007]

If this information is useful, please help other people find it:
Share via:

Wayne Aldo Gavioli

2007-Dec-18 01:14 UTC

[R] Scatterplot Showing All Points

Hello all,


I'm trying to graph a scatterplot of a large (5,000 x,y coordinates) of data
with the caveat that many of the data points overlap with each other (share the
same x AND y coordinates).  In using the usual "plot" command,

> plot(education, xlab="etc", ylab="etc")

it seems that the overlap of points is not shown in the graph.  Namely, there
are 5,000 points that should be plotted, as I mentioned above, but because so
many of the points overlap with each other exactly, only about 50-60 points are
actually plotted on the graph.  Thus, there's no indication that Point A
shares
its coordinates with 200 other pieces of data and thus is very common while
Point B doesn't share its coordinates with any other pieces of data and thus
isn't common at all.  Is there anyway to indicate the frequency of such
points
on such a graph?  Should I be using a different command than "plot"?


Thanks,


Wayne

jim holtman

2007-Dec-18 01:49 UTC

head link

[R] Scatterplot Showing All Points

Use 'hexbin' from bioconductor to show how many points are in a grid
on the graph.

On Dec 17, 2007 8:14 PM, Wayne Aldo Gavioli <wgavioli at fas.harvard.edu>
wrote:>
>
> Hello all,
>
>
> I'm trying to graph a scatterplot of a large (5,000 x,y coordinates) of
data
> with the caveat that many of the data points overlap with each other (share
the
> same x AND y coordinates).  In using the usual "plot" command,
>
>
> > plot(education, xlab="etc", ylab="etc")
>
>
> it seems that the overlap of points is not shown in the graph.  Namely,
there
> are 5,000 points that should be plotted, as I mentioned above, but because
so
> many of the points overlap with each other exactly, only about 50-60 points
are
> actually plotted on the graph.  Thus, there's no indication that Point
A shares
> its coordinates with 200 other pieces of data and thus is very common while
> Point B doesn't share its coordinates with any other pieces of data and
thus
> isn't common at all.  Is there anyway to indicate the frequency of such
points
> on such a graph?  Should I be using a different command than
"plot"?
>
>
> Thanks,
>
>
> Wayne
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

Duncan Murdoch

2007-Dec-18 03:43 UTC

head link

[R] Scatterplot Showing All Points

On 17/12/2007 8:14 PM, Wayne Aldo Gavioli wrote:> 
> Hello all,
> 
> 
> I'm trying to graph a scatterplot of a large (5,000 x,y coordinates) of
data
> with the caveat that many of the data points overlap with each other (share
the
> same x AND y coordinates).  In using the usual "plot" command,
> 
> 
>> plot(education, xlab="etc", ylab="etc")
> 
> 
> it seems that the overlap of points is not shown in the graph.  Namely,
there
> are 5,000 points that should be plotted, as I mentioned above, but because
so
> many of the points overlap with each other exactly, only about 50-60 points
are
> actually plotted on the graph.  Thus, there's no indication that Point
A shares
> its coordinates with 200 other pieces of data and thus is very common while
> Point B doesn't share its coordinates with any other pieces of data and
thus
> isn't common at all.  Is there anyway to indicate the frequency of such
points
> on such a graph?  Should I be using a different command than
"plot"?
The jitter() function can add a bit of noise to your data, so that 
repeated points show up as groupings instead of isolated points.

Duncan Murdoch

Johannes HÃ¼sing

2007-Dec-18 03:58 UTC

head link

[R] Scatterplot Showing All Points

Wayne Aldo Gavioli <wgavioli at fas.harvard.edu> [Tue, Dec 18, 2007 at
02:14:23AM CET]:> Is there anyway to indicate the frequency of such points
> on such a graph?  Should I be using a different command than
"plot"?
?sunflowerplot

-- 
Johannes H?sing               There is something fascinating about science. 
                              One gets such wholesale returns of conjecture 
mailto:johannes at huesing.name  from such a trifling investment of fact.
http://derwisch.wikidot.com         (Mark Twain, "Life on the
Mississippi")

Jim Porzak

2007-Dec-18 04:26 UTC

head link

[R] Scatterplot Showing All Points

Wayne,

I am fond of the bagplot (think 2D box plot) to replace scatter plots
for large N. See
http://www.wiwi.uni-bielefeld.de/~wolf/software/aplpack/ and aplpack
in CRAN.

-- 
HTH,
Jim Porzak
Responsys, Inc.
San Francisco, CA
http://www.linkedin.com/in/jimporzak

On Dec 17, 2007 5:14 PM, Wayne Aldo Gavioli <wgavioli at fas.harvard.edu>
wrote:>
>
> Hello all,
>
>
> I'm trying to graph a scatterplot of a large (5,000 x,y coordinates) of
data
> with the caveat that many of the data points overlap with each other (share
the
> same x AND y coordinates).  In using the usual "plot" command,
>
>
> > plot(education, xlab="etc", ylab="etc")
>
>
> it seems that the overlap of points is not shown in the graph.  Namely,
there
> are 5,000 points that should be plotted, as I mentioned above, but because
so
> many of the points overlap with each other exactly, only about 50-60 points
are
> actually plotted on the graph.  Thus, there's no indication that Point
A shares
> its coordinates with 200 other pieces of data and thus is very common while
> Point B doesn't share its coordinates with any other pieces of data and
thus
> isn't common at all.  Is there anyway to indicate the frequency of such
points
> on such a graph?  Should I be using a different command than
"plot"?
>
>
> Thanks,
>
>
> Wayne
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Jim Lemon

2007-Dec-18 08:10 UTC

head link

[R] Scatterplot Showing All Points

Wayne Aldo Gavioli wrote:> 
> Hello all,
> 
> 
> I'm trying to graph a scatterplot of a large (5,000 x,y coordinates) of
data
> with the caveat that many of the data points overlap with each other (share
the
> same x AND y coordinates).  In using the usual "plot" command,
> 
> 
> 
>>plot(education, xlab="etc", ylab="etc")
> 
> 
> 
> it seems that the overlap of points is not shown in the graph.  Namely,
there
> are 5,000 points that should be plotted, as I mentioned above, but because
so
> many of the points overlap with each other exactly, only about 50-60 points
are
> actually plotted on the graph.  Thus, there's no indication that Point
A shares
> its coordinates with 200 other pieces of data and thus is very common while
> Point B doesn't share its coordinates with any other pieces of data and
thus
> isn't common at all.  Is there anyway to indicate the frequency of such
points
> on such a graph?  Should I be using a different command than
"plot"?
> Hi Wayne,
While this is not a really pretty picture, you can get a viewable plot 
with count.overplot if the first two elements of "education" are named
"x" and "y" and they are the coordinates you want to plot.
Otherwise,
pass the x and y coordinates separately.

library(plotrix)
count.overplot(education,
  tol=c(diff(range(education$x))/10,
  diff(range(education$y))/10))

Jim

Jari Oksanen

2007-Dec-18 08:22 UTC

head link

[R] Scatterplot Showing All Points

Wayne Aldo Gavioli <wgavioli <at> fas.harvard.edu> writes:
> 
> 
> Hello all,
> 
> I'm trying to graph a scatterplot of a large (5,000 x,y coordinates) of
data
> with the caveat that many of the data points overlap with each other (share
the
> same x AND y coordinates).  In using the usual "plot" command,
> 
> > plot(education, xlab="etc", ylab="etc")
> 
> it seems that the overlap of points is not shown in the graph.  Namely,
there
> are 5,000 points that should be plotted, as I mentioned above, but because
so
> many of the points overlap with each other exactly, only about 50-60 points
are
> actually plotted on the graph.  Thus, there's no indication that Point
A shares
> its coordinates with 200 other pieces of data and thus is very common while
> Point B doesn't share its coordinates with any other pieces of data and
thus
> isn't common at all.  Is there anyway to indicate the frequency of such
points
> on such a graph?  Should I be using a different command than
"plot"?
> 
> One suggestion seems to be still missing: 'sunflowerplot' of base R. May
look
taggy, though, if you have 200 "petals". 

Actually the documentation of sunflowerplot is wrong in botanical sense.
Sunflowers have composite flowers in capitula, and the things called
'petals' in
documentation are ligulate, sterile ray-florets (each with vestigial petals
which are not easily visible in sunflower, but in some other species you may see
three (occasionally two) teeth). 

cheers, jari oksanen

Antony Unwin

2007-Dec-18 12:31 UTC

head link

[R] Scatterplot Showing All Points

Wayne,

Try the iplot command in iPlots.  You can then vary both the  
pointsize and the transparency of your scatterplot interactively and  
decide which scatterplot conveys the information best.  Sometimes  
it's helpful to use more than one scatterplot when presenting your  
results.

(I must admit to being very surprised that jittering and sunflower  
plots have been suggested for a dataset of 5000 points.  Do those who  
mentioned these methods have examples on that scale where they are  
effective?)

Antony Unwin
Professor of Computer-Oriented Statistics and Data Analysis,
University of Augsburg,
Germany
	[[alternative HTML version deleted]]

S Ellison

2007-Dec-18 13:29 UTC

head link

[R] Scatterplot Showing All Points

>> Antony Unwin <unwin at math.uni-augsburg.de> >>
>I must admit to being very surprised that jittering and sunflower  
>plots have been suggested for a dataset of 5000 points.  Do those who 
>mentioned these methods have examples on that scale where they are  
>effective?)
You have a point. haha. 
But check the microarray literature; scatterplots have been used -
often - to display microarray data with 10000 observations at a time.
And in their defence, even on screen, a 600x600 pixel plot window holds
360000 pixels - 5000 is not a large fraction of that. Jittering has
visible effects on data at that resolution. Compare the two plots in 

library(MASS)
Sigma <- matrix(c(10,4,4,2),2,2)
xy<- round(mvrnorm(n=5000, rep(0, 2), Sigma), 1)
plot(xy,pch=".")
plot(jitter(xy, factor=2),pch=".")

But you're of course right to question how sensible this is. The best
you can get is a visual impression of the 'shape' of the data with a
greater perceived density at multiple observations which otherwise
overlapped. 

S.

bogdan romocea

2007-Dec-18 18:21 UTC

head link

[R] Scatterplot Showing All Points

Another approach which I'm pleased with but was not suggested so far
is jitter + kde2d from MASS:

plot(jitter(x), jitter(y))
if (!exists("kde2d")) require(MASS)
kdesamp <- 20000  #depending on your RAM
forkde <- if (kdesamp < length(x)) sample(1:length(x), kdesamp,
replace=FALSE) else 1:length(x)
d <- kde2d(x[forkde], y[forkde])
contour(d, add=TRUE)


> -----Original Message-----
> From: r-help-bounces at r-project.org
> Subject: Re: [R] Scatterplot Showing All Points
>

Maybe Matching Threads

Search for more maybe matching threads

R help - Dec 2007 - Scatterplot Showing All Points

[R] Scatterplot Showing All Points

[R] Scatterplot Showing All Points

[R] Scatterplot Showing All Points

[R] Scatterplot Showing All Points

[R] Scatterplot Showing All Points

[R] Scatterplot Showing All Points

[R] Scatterplot Showing All Points

[R] Scatterplot Showing All Points

[R] Scatterplot Showing All Points

[R] Scatterplot Showing All Points

Maybe Matching Threads