Leeds, Mark (IED)
2006-Dec-05 20:42 UTC
[R] stat question - not R question so ignore if not interested
If do a scattrplot of data ( x and y ) and there are two clouds of points. One cloud is in the left bottom corner of the plot and the other cloud is in the upper right. If I fit a regression line to this data ( or equivalently , calculate a correlation ), then obviously, it is going to seem like x and y are related because a line has to be connected between the 2 clouds. But, there must be a regression assumption that is violated here because if the regressions are done separately on each cloud, then there really isn't a relationship between x and y. I was just wondering 1) what assumption in regression is being violated in the first case or 2) possibly if the regression is valid and the results just have some different interpreation ? Thanks. Mark -------------------------------------------------------- This is not an offer (or solicitation of an offer) to buy/se...{{dropped}}
Richard M. Heiberger
2006-Dec-05 21:34 UTC
[R] stat question - not R question so ignore if not interested
The missing piece is why there are two clusters. There is most likely a two-level factor distinguishing the groups that was not included in the model. It might not even have been measured and now you need to find it. Rich
Jonathan Baron
2006-Dec-05 21:44 UTC
[R] stat question - not R question so ignore if not interested
A classic example used by my colleague Paul Rozin (when he teaches Psych 1) is to compute the correlation between height and number of shoes owned, in the class. Shorter students own more shoes. But ... On 12/05/06 16:34, Richard M. Heiberger wrote:> The missing piece is why there are two clusters. There is > most likely a two-level factor distinguishing the groups > that was not included in the model. It might not even have > been measured and now you need to find it. > > Rich-- Jonathan Baron, Professor of Psychology, University of Pennsylvania Home page: http://www.sas.upenn.edu/~baron Editor: Judgment and Decision Making (http://journal.sjdm.org)
Michael Kubovy
2006-Dec-05 22:21 UTC
[R] stat question - not R question so ignore if not interested
On Dec 5, 2006, at 3:42 PM, Leeds, Mark ((IED)) wrote:> If do a scattrplot of data ( x and y ) and there are two clouds of > points. One cloud is in the left > bottom corner of the plot and the other cloud is in the upper right. > > If I fit a regression line to this data ( or equivalently , > calculate a > correlation ), then obviously, it is going to seem like > x and y are related because a line has to be connected between the 2 > clouds. But, there must be a regression assumption that > is violated here because if the regressions are done separately on > each > cloud, then there really isn't > a relationship between x and y. I was just wondering 1) what > assumption > in regression is being violated in > the first case or 2) possibly if the regression is valid and the > results > just have some different interpreation ?One needs only to look at diagnostic plots: Suppose set.seed(2) xy <- data.frame(y = c(rnorm(300), rnorm(300, 5)), x = c(rnorm(300), rnorm(300, 5))) op <- par(mfrow = c(2,2)) plot(lm(y ~ x, xy)) par(op) The model does not fit well because the residuals aren't flat as a function of fit and because homoscedasticity is violated. When this happens we might try a different approach: require(sm) xy.sm <- sm.regression(xy$x, xy$y) Whenever there's a big discrepancy between an OLS fit and a robust one, we should not pursue the OLS one w/o reinterpretation, which others have discussed in their replies. _____________________________ Professor Michael Kubovy University of Virginia Department of Psychology USPS: P.O.Box 400400 Charlottesville, VA 22904-4400 Parcels: Room 102 Gilmer Hall McCormick Road Charlottesville, VA 22903 Office: B011 +1-434-982-4729 Lab: B019 +1-434-982-4751 Fax: +1-434-982-4766 WWW: http://www.people.virginia.edu/~mk9y/