volker.franz@tuebingen.mpg.de
2003-Feb-26 21:40 UTC
[Rd] [Package car/data.ellipse]: confidence intervals off by factor sqrt(2)??? (PR#2584)
Full_Name: Volker Franz Version: Version 1.6.2 (2003-01-10) OS: Debian Submission from: (NULL) (192.124.28.104) Hi there, it seems to me that data.ellipse of package "car" (Version 1.0-1) produces confidence interval's which are too big. To see this, do: library(car) plot(c(-2,2),c(-2,2),pch=0) data.ellipse(rnorm(10000),rnorm(10000),levels=0.68,plot.points=F) abline(v=+1) abline(v=-1) abline(h=+1) abline(h=-1) To my knowledge, this should result in a circle with radius 1. However, the circle is larger. It seems that the problem is due to an erroneous specification of the degrees of freedom and can be fixed with the following patch: =====================================================================--- Ellipse.R Wed Feb 26 17:49:43 2003 +++ Ellipse.orig Thu Sep 19 18:20:41 2002 @@ -34,7 +34,7 @@ stop("x and y must be vectors of the same length") if (plot.points & !add) plot(x, y, xlab=xlab, ylab=ylab, col=col, pch=pch, las=las, ...) if (plot.points & add) points(x, y, col=col, pch=pch, ...) - dfn<-1 + dfn<-2 dfd<-length(x)-1 if (robust) { require(MASS) ===================================================================== Or --- am I totally on the wrong track here? Best Volker
John Fox
2003-Feb-26 22:31 UTC
[Rd] [Package car/data.ellipse]: confidence intervals off by factor sqrt(2)??? (PR#2584)
Dear Volker, If the data ellipse (or, in this case, circle) is scaled so that its shadows (projections) on the axes each includes 68% of the data (that is of the marginal distribution of each variable), then the ellipse will include less than 68% of the data (i.e., of the joint distribution of the two variables). Conversely, to include 68% of the data in the ellipse, the shadows of the ellipse have to be larger. Did I understand your point correctly? John At 09:40 PM 2/26/2003 +0100, volker.franz@tuebingen.mpg.de wrote:>Full_Name: Volker Franz >Version: Version 1.6.2 (2003-01-10) >OS: Debian >Submission from: (NULL) (192.124.28.104) > > >Hi there, > >it seems to me that data.ellipse of package "car" (Version 1.0-1) >produces confidence interval's which are too big. To see this, do: > >library(car) >plot(c(-2,2),c(-2,2),pch=0) >data.ellipse(rnorm(10000),rnorm(10000),levels=0.68,plot.points=F) >abline(v=+1) >abline(v=-1) >abline(h=+1) >abline(h=-1) > >To my knowledge, this should result in a circle with radius >1. However, the circle is larger. It seems that the problem is due to >an erroneous specification of the degrees of freedom and can be fixed >with the following patch: > >=====================================================================>--- Ellipse.R Wed Feb 26 17:49:43 2003 >+++ Ellipse.orig Thu Sep 19 18:20:41 2002 >@@ -34,7 +34,7 @@ > stop("x and y must be vectors of the same length") > if (plot.points & !add) plot(x, y, xlab=xlab, ylab=ylab, col=col, > pch=pch, >las=las, ...) > if (plot.points & add) points(x, y, col=col, pch=pch, ...) >- dfn<-1 >+ dfn<-2 > dfd<-length(x)-1 > if (robust) { > require(MASS) >=====================================================================> >Or --- am I totally on the wrong track here? > >Best >Volker > >______________________________________________ >R-devel@stat.math.ethz.ch mailing list >http://www.stat.math.ethz.ch/mailman/listinfo/r-devel____________________________ John Fox Department of Sociology McMaster University email: jfox@mcmaster.ca web: http://www.socsci.mcmaster.ca/jfox
Volker Franz
2003-Feb-26 23:21 UTC
[Rd] [Package car/data.ellipse]: confidence intervals off by factor sqrt(2)??? (PR#2584)
Hi John,>>>>> "JF" == John Fox <jfox@mcmaster.ca> writes:JF> Dear Volker, If the data ellipse (or, in this case, circle) is JF> scaled so that its shadows (projections) on the axes each JF> includes 68% of the data (that is of the marginal distribution JF> of each variable), then the ellipse will include less than 68% JF> of the data (i.e., of the joint distribution of the two JF> variables). Conversely, to include 68% of the data in the JF> ellipse, the shadows of the ellipse have to be larger. JF> Did I understand your point correctly? I am not sure. I will try to rephrase my initial request: Let X by a one--dimensional random variable (standard normal distribution; mean=0; std=1). The 68% confidence intervall of X will approximately be: [-1,1]. Now, if I combine X with a stochastically independent second random variable Y, the marginal distribution of X should not change. Therefore, the projections of the error ellipse on the X--axis should still be: [-1,1]. If I do this with the function data.ellipse: data.ellipse(rnorm(10000),rnorm(10000),levels=0.68,plot.points=F) I get a projection on the X-axis which is larger than [-1,1]. In fact, it is a little bit larger than [-sqrt(2),+sqrt(2)]. My interpretation is that this is due to the construction of the radius in data.ellipse: dfn<-2 radius <- sqrt ( dfn * qf(level, dfn, dfd )) I would expect a dfn<-1 here (such that the radius would correspond to the t-distribution). Does this make sense? Volker -- ___________________________________________________________ Dr. Volker Franz Max-Planck-Institute for Biological Cybernetics Tuebingen, Germany
Deepayan Sarkar
2003-Feb-26 23:41 UTC
[Rd] [Package car/data.ellipse]: confidence intervals off by factor sqrt(2)??? (PR#2584)
On Wednesday 26 February 2003 04:23 pm, Volker Franz wrote:> Hi John, > > >>>>> "JF" == John Fox <jfox@mcmaster.ca> writes: > > JF> Dear Volker, If the data ellipse (or, in this case, circle) is > JF> scaled so that its shadows (projections) on the axes each > JF> includes 68% of the data (that is of the marginal distribution > JF> of each variable), then the ellipse will include less than 68% > JF> of the data (i.e., of the joint distribution of the two > JF> variables). Conversely, to include 68% of the data in the > JF> ellipse, the shadows of the ellipse have to be larger. > JF> Did I understand your point correctly? > > I am not sure. I will try to rephrase my initial request: > > Let X by a one--dimensional random variable (standard normal > distribution; mean=0; std=1). The 68% confidence intervall of X will > approximately be: [-1,1]. Now, if I combine X with a stochastically > independent second random variable Y, the marginal distribution of X > should not change. Therefore, the projections of the error ellipse on > the X--axis should still be: [-1,1].Why so ? Let Y be an independent copy of X (i.e., Y ~ N(0,1) too, independent of X). Then P(Y is in [-Inf , Inf]) = 1. Now, think of the 2-D confidence region [-1, 1] x [-Inf, Inf]. This will have (by independence of X and Y) probability 0.68. Now, how can you expect an ellipse that will have the same X-range, that is a strict subset of this region, to still have joint probability 0.68 ? Hope that helps, Deepayan
John Fox
2003-Feb-26 23:48 UTC
[Rd] [Package car/data.ellipse]: confidence intervals off by factor sqrt(2)??? (PR#2584)
Dear Volker, At 11:23 PM 2/26/2003 +0100, Volker Franz wrote:>Hi John, > > >>>>> "JF" == John Fox <jfox@mcmaster.ca> writes: > JF> Dear Volker, If the data ellipse (or, in this case, circle) is > JF> scaled so that its shadows (projections) on the axes each > JF> includes 68% of the data (that is of the marginal distribution > JF> of each variable), then the ellipse will include less than 68% > JF> of the data (i.e., of the joint distribution of the two > JF> variables). Conversely, to include 68% of the data in the > JF> ellipse, the shadows of the ellipse have to be larger. > JF> Did I understand your point correctly? > >I am not sure. I will try to rephrase my initial request: > >Let X by a one--dimensional random variable (standard normal >distribution; mean=0; std=1). The 68% confidence intervall of X will >approximately be: [-1,1]. Now, if I combine X with a stochastically >independent second random variable Y, the marginal distribution of X >should not change. Therefore, the projections of the error ellipse on >the X--axis should still be: [-1,1]. > >If I do this with the function data.ellipse: > > data.ellipse(rnorm(10000),rnorm(10000),levels=0.68,plot.points=F) > >I get a projection on the X-axis which is larger than [-1,1]. In fact, >it is a little bit larger than [-sqrt(2),+sqrt(2)]. > >My interpretation is that this is due to the construction of the >radius in data.ellipse: > > dfn<-2 > radius <- sqrt ( dfn * qf(level, dfn, dfd )) > >I would expect a dfn<-1 here (such that the radius would correspond to >the t-distribution). > >Does this make sense?This is a data ellipse, not a confidence ellipse, but the same point arises in both cases: For the ellipse to enclose 68 percent of the joint distribution of the two variables, its projections on the axes must include more than 68% of each marginal distribution. Just think about projecting the individual points onto the axes -- there are points outside of the ellipse that are inside its shadow on an individual axis. I hope that this helps, John ____________________________ John Fox Department of Sociology McMaster University email: jfox@mcmaster.ca web: http://www.socsci.mcmaster.ca/jfox
Volker Franz
2003-Feb-27 00:09 UTC
[Rd] [Package car/data.ellipse]: confidence intervals off by factor sqrt(2)??? (PR#2584)
Hi John and Deepayan, ok, I got your points and agree. You are right --- and I am sorry for being too fast in sending this report. Thank you for the help!!! Volker --