thr3ads.net - R help - [R] Comparing distributions [Jun 2010]

If this information is useful, please help other people find it:
Share via:

Ralf B

2010-Jun-23 19:33 UTC

[R] Comparing distributions

I am trying to do something in R and would appreciate a push into the
right direction. I hope some of you experts can help.

I have two distributions obtrained from 10000 datapoints each (about
10000 datapoints each, non-normal with multi-model shape (when
eye-balling densities) but other then that I know little about its
distribution). When plotting the two distributions together I can see
that the two densities are alike with a certain distance to each other
(e.g. 50 units on the X axis). I tried to plot a simplified picture of
the density plot below:




|
|                                                         *
|                                                      *     *
|                                                   *    +   *
|                                              *     +     +  *
|                     *        +           *   +            +  *
|                 *        +*     +   *  +                   + *
|              *       +       *     +                           +*
|           *       +                                               +*
|        *       +                                                    +*
|     *      +                                                          + *
|  *      +                                                               + *
|___________________________________________________________________


What I would like to do is to formally test their similarity or
otherwise measure it more reliably than just showing and discussing a
plot. Is there a general approach other then using a Mann-Whitney test
which is very strict and seems to assume a perfect match. Is there a
test that takes in a certain 'band' (e.g. 50,100, 150 units on X) or
are there any other similarity measures that could give me a statistic
about how close these two distributions are to each other ? All I can
say from eye-balling is that they seem to follow each other and it
appears that one distribution is shifted by a amount from the other.
Any ideas?

Ralf

Bert Gunter

2010-Jun-23 20:08 UTC

head link

[R] Comparing distributions

?qqplot

Bert Gunter
Genentech Nonclinical Biostatistics
 
 

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On
Behalf Of Ralf B
Sent: Wednesday, June 23, 2010 12:34 PM
To: r-help at r-project.org
Subject: [R] Comparing distributions

I am trying to do something in R and would appreciate a push into the
right direction. I hope some of you experts can help.

I have two distributions obtrained from 10000 datapoints each (about
10000 datapoints each, non-normal with multi-model shape (when
eye-balling densities) but other then that I know little about its
distribution). When plotting the two distributions together I can see
that the two densities are alike with a certain distance to each other
(e.g. 50 units on the X axis). I tried to plot a simplified picture of
the density plot below:




|
|                                                         *
|                                                      *     *
|                                                   *    +   *
|                                              *     +     +  *
|                     *        +           *   +            +  *
|                 *        +*     +   *  +                   + *
|              *       +       *     +                           +*
|           *       +                                               +*
|        *       +                                                    +*
|     *      +                                                          + *
|  *      +                                                               +
*
|___________________________________________________________________


What I would like to do is to formally test their similarity or
otherwise measure it more reliably than just showing and discussing a
plot. Is there a general approach other then using a Mann-Whitney test
which is very strict and seems to assume a perfect match. Is there a
test that takes in a certain 'band' (e.g. 50,100, 150 units on X) or
are there any other similarity measures that could give me a statistic
about how close these two distributions are to each other ? All I can
say from eye-balling is that they seem to follow each other and it
appears that one distribution is shifted by a amount from the other.
Any ideas?

Ralf

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Joris Meys

2010-Jun-23 21:38 UTC

head link

[R] Comparing distributions

A qqplot would indeed help. ?ks.test for more formal testing, but be
aware: You should also think about what you call similar
distributions. See following example :

set.seed(12345)
x1 <- c(rnorm(100),rnorm(150,3.3,0.7))
x2 <- c(rnorm(140,1,1.2),rnorm(110,3.3,0.6))
x3 <- c(rnorm(140,2,1.2),rnorm(110,4.3,0.6))
d1 <-density(x1)
d2 <- density(x2)
d3 <- density(x3)

xlim <- 1.2*c(min(x1,x2,x3),max(x1,x2,x3))
ylim <- 1.2*c(0,max(d1$y,d2$y,d3$y))

op <- par(mfrow=c(1,3))
plot(d1,xlim=xlim,ylim=ylim)
lines(d2,col="red")
lines(d3,col="blue")
qqplot(x1,x2)
qqplot(x2,x3)
par(op)

# formal testing
ks.test(x1,x2)
ks.test(x2,x3)

# relocate x3
x3b <- x3 - mean(x3-x2)
x3c <- x3 - mean(x3-x1)

# formal testing
ks.test(x2,x3b)
ks.test(x1,x3c)

# test location
t.test(x2-x1)
t.test(x3-x2)
t.test(x3-x1)

Cheers
Joris

On Wed, Jun 23, 2010 at 9:33 PM, Ralf B <ralf.bierig at gmail.com>
wrote:> I am trying to do something in R and would appreciate a push into the
> right direction. I hope some of you experts can help.
>
> I have two distributions obtrained from 10000 datapoints each (about
> 10000 datapoints each, non-normal with multi-model shape (when
> eye-balling densities) but other then that I know little about its
> distribution). When plotting the two distributions together I can see
> that the two densities are alike with a certain distance to each other
> (e.g. 50 units on the X axis). I tried to plot a simplified picture of
> the density plot below:
>
>
>
>
> |
> | ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? *
> | ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?* ? ? *
> | ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? * ? ?+ ? *
> | ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?* ? ? + ? ? + ?*
> | ? ? ? ? ? ? ? ? ? ? * ? ? ? ?+ ? ? ? ? ? * ? + ? ? ? ? ? ?+ ?*
> | ? ? ? ? ? ? ? ? * ? ? ? ?+* ? ? + ? * ?+ ? ? ? ? ? ? ? ? ? + *
> | ? ? ? ? ? ? ?* ? ? ? + ? ? ? * ? ? + ? ? ? ? ? ? ? ? ? ? ? ? ? +*
> | ? ? ? ? ? * ? ? ? + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? +*
> | ? ? ? ?* ? ? ? + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?+*
> | ? ? * ? ? ?+ ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?+ *
> | ?* ? ? ?+ ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? +
*
> |___________________________________________________________________
>
>
> What I would like to do is to formally test their similarity or
> otherwise measure it more reliably than just showing and discussing a
> plot. Is there a general approach other then using a Mann-Whitney test
> which is very strict and seems to assume a perfect match. Is there a
> test that takes in a certain 'band' (e.g. 50,100, 150 units on X)
or
> are there any other similarity measures that could give me a statistic
> about how close these two distributions are to each other ? All I can
> say from eye-balling is that they seem to follow each other and it
> appears that one distribution is shifted by a amount from the other.
> Any ideas?
>
> Ralf
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
Joris.Meys at Ugent.be
-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

Tommy Chheng

2010-Jun-23 23:00 UTC

head link

[R] Comparing distributions

Check out the KL divergence test
http://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence 
<http://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence>

@tommychheng
Programmer and UC Irvine Graduate Student
Find a great grad school based on research interests: http://gradschoolnow.com


On 6/23/10 12:33 PM, Ralf B wrote:> I am trying to do something in R and would appreciate a push into the
> right direction. I hope some of you experts can help.
>
> I have two distributions obtrained from 10000 datapoints each (about
> 10000 datapoints each, non-normal with multi-model shape (when
> eye-balling densities) but other then that I know little about its
> distribution). When plotting the two distributions together I can see
> that the two densities are alike with a certain distance to each other
> (e.g. 50 units on the X axis). I tried to plot a simplified picture of
> the density plot below:
>
>
>
>
> |
> |                                                         *
> |                                                      *     *
> |                                                   *    +   *
> |                                              *     +     +  *
> |                     *        +           *   +            +  *
> |                 *        +*     +   *  +                   + *
> |              *       +       *     +                           +*
> |           *       +                                               +*
> |        *       +                                                    +*
> |     *      +                                                          + *
> |  *      +                                                               +
*
> |___________________________________________________________________
>
>
> What I would like to do is to formally test their similarity or
> otherwise measure it more reliably than just showing and discussing a
> plot. Is there a general approach other then using a Mann-Whitney test
> which is very strict and seems to assume a perfect match. Is there a
> test that takes in a certain 'band' (e.g. 50,100, 150 units on X)
or
> are there any other similarity measures that could give me a statistic
> about how close these two distributions are to each other ? All I can
> say from eye-balling is that they seem to follow each other and it
> appears that one distribution is shifted by a amount from the other.
> Any ideas?
>
> Ralf
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
	[[alternative HTML version deleted]]

Robert A LaBudde

2010-Jun-24 02:07 UTC

head link

[R] Comparing distributions

Your "*" curve apparently dominates your "+" curve.

If they have the same total number of data each, as you say, they 
both cannot sum to the same value (e.g., N = 10000 or 1.000).

So there is something going on that you aren't mentioning.

Try comparing CDFs instead of pdfs.

At 03:33 PM 6/23/2010, Ralf B wrote:>I am trying to do something in R and would appreciate a push into the
>right direction. I hope some of you experts can help.
>
>I have two distributions obtrained from 10000 datapoints each (about
>10000 datapoints each, non-normal with multi-model shape (when
>eye-balling densities) but other then that I know little about its
>distribution). When plotting the two distributions together I can see
>that the two densities are alike with a certain distance to each other
>(e.g. 50 units on the X axis). I tried to plot a simplified picture of
>the density plot below:
>
>
>
>
>|
>|                                                         *
>|                                                      *     *
>|                                                   *    +   *
>|                                              *     +     +  *
>|                     *        +           *   +            +  *
>|                 *        +*     +   *  +                   + *
>|              *       +       *     +                           +*
>|           *       +                                               +*
>|        *       +                                                    +*
>|     *      +                                                          + *
>|  *      +                                                               +
*
>|___________________________________________________________________
>
>
>What I would like to do is to formally test their similarity or
>otherwise measure it more reliably than just showing and discussing a
>plot. Is there a general approach other then using a Mann-Whitney test
>which is very strict and seems to assume a perfect match. Is there a
>test that takes in a certain 'band' (e.g. 50,100, 150 units on X) or
>are there any other similarity measures that could give me a statistic
>about how close these two distributions are to each other ? All I can
>say from eye-balling is that they seem to follow each other and it
>appears that one distribution is shifted by a amount from the other.
>Any ideas?
>
>Ralf
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
===============================================================Robert A.
LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: ral at lcfltd.com
Least Cost Formulations, Ltd.            URL: http://lcfltd.com/
824 Timberlake Drive                     Tel: 757-467-0954
Virginia Beach, VA 23464-3239            Fax: 757-467-2947

"Vere scire est per causas scire"

Seemingly Similar Threads

Search for more apparently analagous threads

R help - Jun 2010 - Comparing distributions

[R] Comparing distributions

[R] Comparing distributions

[R] Comparing distributions

[R] Comparing distributions

[R] Comparing distributions

Seemingly Similar Threads