I am trying to do something in R and would appreciate a push into the right direction. I hope some of you experts can help. I have two distributions obtrained from 10000 datapoints each (about 10000 datapoints each, non-normal with multi-model shape (when eye-balling densities) but other then that I know little about its distribution). When plotting the two distributions together I can see that the two densities are alike with a certain distance to each other (e.g. 50 units on the X axis). I tried to plot a simplified picture of the density plot below: | | * | * * | * + * | * + + * | * + * + + * | * +* + * + + * | * + * + +* | * + +* | * + +* | * + + * | * + + * |___________________________________________________________________ What I would like to do is to formally test their similarity or otherwise measure it more reliably than just showing and discussing a plot. Is there a general approach other then using a Mann-Whitney test which is very strict and seems to assume a perfect match. Is there a test that takes in a certain 'band' (e.g. 50,100, 150 units on X) or are there any other similarity measures that could give me a statistic about how close these two distributions are to each other ? All I can say from eye-balling is that they seem to follow each other and it appears that one distribution is shifted by a amount from the other. Any ideas? Ralf
?qqplot Bert Gunter Genentech Nonclinical Biostatistics -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Ralf B Sent: Wednesday, June 23, 2010 12:34 PM To: r-help at r-project.org Subject: [R] Comparing distributions I am trying to do something in R and would appreciate a push into the right direction. I hope some of you experts can help. I have two distributions obtrained from 10000 datapoints each (about 10000 datapoints each, non-normal with multi-model shape (when eye-balling densities) but other then that I know little about its distribution). When plotting the two distributions together I can see that the two densities are alike with a certain distance to each other (e.g. 50 units on the X axis). I tried to plot a simplified picture of the density plot below: | | * | * * | * + * | * + + * | * + * + + * | * +* + * + + * | * + * + +* | * + +* | * + +* | * + + * | * + + * |___________________________________________________________________ What I would like to do is to formally test their similarity or otherwise measure it more reliably than just showing and discussing a plot. Is there a general approach other then using a Mann-Whitney test which is very strict and seems to assume a perfect match. Is there a test that takes in a certain 'band' (e.g. 50,100, 150 units on X) or are there any other similarity measures that could give me a statistic about how close these two distributions are to each other ? All I can say from eye-balling is that they seem to follow each other and it appears that one distribution is shifted by a amount from the other. Any ideas? Ralf ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
A qqplot would indeed help. ?ks.test for more formal testing, but be aware: You should also think about what you call similar distributions. See following example : set.seed(12345) x1 <- c(rnorm(100),rnorm(150,3.3,0.7)) x2 <- c(rnorm(140,1,1.2),rnorm(110,3.3,0.6)) x3 <- c(rnorm(140,2,1.2),rnorm(110,4.3,0.6)) d1 <-density(x1) d2 <- density(x2) d3 <- density(x3) xlim <- 1.2*c(min(x1,x2,x3),max(x1,x2,x3)) ylim <- 1.2*c(0,max(d1$y,d2$y,d3$y)) op <- par(mfrow=c(1,3)) plot(d1,xlim=xlim,ylim=ylim) lines(d2,col="red") lines(d3,col="blue") qqplot(x1,x2) qqplot(x2,x3) par(op) # formal testing ks.test(x1,x2) ks.test(x2,x3) # relocate x3 x3b <- x3 - mean(x3-x2) x3c <- x3 - mean(x3-x1) # formal testing ks.test(x2,x3b) ks.test(x1,x3c) # test location t.test(x2-x1) t.test(x3-x2) t.test(x3-x1) Cheers Joris On Wed, Jun 23, 2010 at 9:33 PM, Ralf B <ralf.bierig at gmail.com> wrote:> I am trying to do something in R and would appreciate a push into the > right direction. I hope some of you experts can help. > > I have two distributions obtrained from 10000 datapoints each (about > 10000 datapoints each, non-normal with multi-model shape (when > eye-balling densities) but other then that I know little about its > distribution). When plotting the two distributions together I can see > that the two densities are alike with a certain distance to each other > (e.g. 50 units on the X axis). I tried to plot a simplified picture of > the density plot below: > > > > > | > | ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? * > | ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?* ? ? * > | ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? * ? ?+ ? * > | ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?* ? ? + ? ? + ?* > | ? ? ? ? ? ? ? ? ? ? * ? ? ? ?+ ? ? ? ? ? * ? + ? ? ? ? ? ?+ ?* > | ? ? ? ? ? ? ? ? * ? ? ? ?+* ? ? + ? * ?+ ? ? ? ? ? ? ? ? ? + * > | ? ? ? ? ? ? ?* ? ? ? + ? ? ? * ? ? + ? ? ? ? ? ? ? ? ? ? ? ? ? +* > | ? ? ? ? ? * ? ? ? + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? +* > | ? ? ? ?* ? ? ? + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?+* > | ? ? * ? ? ?+ ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?+ * > | ?* ? ? ?+ ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? + * > |___________________________________________________________________ > > > What I would like to do is to formally test their similarity or > otherwise measure it more reliably than just showing and discussing a > plot. Is there a general approach other then using a Mann-Whitney test > which is very strict and seems to assume a perfect match. Is there a > test that takes in a certain 'band' (e.g. 50,100, 150 units on X) or > are there any other similarity measures that could give me a statistic > about how close these two distributions are to each other ? All I can > say from eye-balling is that they seem to follow each other and it > appears that one distribution is shifted by a amount from the other. > Any ideas? > > Ralf > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 Joris.Meys at Ugent.be ------------------------------- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
Check out the KL divergence test http://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence <http://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence> @tommychheng Programmer and UC Irvine Graduate Student Find a great grad school based on research interests: http://gradschoolnow.com On 6/23/10 12:33 PM, Ralf B wrote:> I am trying to do something in R and would appreciate a push into the > right direction. I hope some of you experts can help. > > I have two distributions obtrained from 10000 datapoints each (about > 10000 datapoints each, non-normal with multi-model shape (when > eye-balling densities) but other then that I know little about its > distribution). When plotting the two distributions together I can see > that the two densities are alike with a certain distance to each other > (e.g. 50 units on the X axis). I tried to plot a simplified picture of > the density plot below: > > > > > | > | * > | * * > | * + * > | * + + * > | * + * + + * > | * +* + * + + * > | * + * + +* > | * + +* > | * + +* > | * + + * > | * + + * > |___________________________________________________________________ > > > What I would like to do is to formally test their similarity or > otherwise measure it more reliably than just showing and discussing a > plot. Is there a general approach other then using a Mann-Whitney test > which is very strict and seems to assume a perfect match. Is there a > test that takes in a certain 'band' (e.g. 50,100, 150 units on X) or > are there any other similarity measures that could give me a statistic > about how close these two distributions are to each other ? All I can > say from eye-balling is that they seem to follow each other and it > appears that one distribution is shifted by a amount from the other. > Any ideas? > > Ralf > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.[[alternative HTML version deleted]]
Your "*" curve apparently dominates your "+" curve. If they have the same total number of data each, as you say, they both cannot sum to the same value (e.g., N = 10000 or 1.000). So there is something going on that you aren't mentioning. Try comparing CDFs instead of pdfs. At 03:33 PM 6/23/2010, Ralf B wrote:>I am trying to do something in R and would appreciate a push into the >right direction. I hope some of you experts can help. > >I have two distributions obtrained from 10000 datapoints each (about >10000 datapoints each, non-normal with multi-model shape (when >eye-balling densities) but other then that I know little about its >distribution). When plotting the two distributions together I can see >that the two densities are alike with a certain distance to each other >(e.g. 50 units on the X axis). I tried to plot a simplified picture of >the density plot below: > > > > >| >| * >| * * >| * + * >| * + + * >| * + * + + * >| * +* + * + + * >| * + * + +* >| * + +* >| * + +* >| * + + * >| * + + * >|___________________________________________________________________ > > >What I would like to do is to formally test their similarity or >otherwise measure it more reliably than just showing and discussing a >plot. Is there a general approach other then using a Mann-Whitney test >which is very strict and seems to assume a perfect match. Is there a >test that takes in a certain 'band' (e.g. 50,100, 150 units on X) or >are there any other similarity measures that could give me a statistic >about how close these two distributions are to each other ? All I can >say from eye-balling is that they seem to follow each other and it >appears that one distribution is shifted by a amount from the other. >Any ideas? > >Ralf > >______________________________________________ >R-help at r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.===============================================================Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: ral at lcfltd.com Least Cost Formulations, Ltd. URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239 Fax: 757-467-2947 "Vere scire est per causas scire"
Seemingly Similar Threads
- Simple qqplot question
- Bug in model.matrix.default for higher-order interaction encoding when specific model terms are missing
- Bug in model.matrix.default for higher-order interaction encoding when specific model terms are missing
- Bug in model.matrix.default for higher-order interaction encoding when specific model terms are missing
- Bug in model.matrix.default for higher-order interaction encoding when specific model terms are missing