Dear all, I have one dependent variable y and two independent variables x1 and x2 which I would like to use to explain y. x1 and x2 are design factors in an experiment and are not correlated with each other. For example assume that: x1 <- rbind(1,1,1,2,2,2,3,3,3) x2 <- rbind(1,2,3,1,2,3,1,2,3) cor(x1,x2) The problem is that I do not only want to analyze the effect of x1 and x2 on y but also of their interaction x1*x2. Evidently this interaction term has a substantial correlation with both x1 and x2: x3 <- x1*x2 cor(x1,x3) cor(x2,x3) I therefore expect that a simple regression of y on x1, x2 and x1*x2 will lead to biased results due to multicollinearity. For example, even when y is completely random and unrelated to x1 and x2, I obtain a substantial R2 for a simple linear model which includes all three variables. This evidently does not make sense: y <- rnorm(9) model <- lm (y ~ x1 + x2 + x1*x2) summary(model) Is there some function within R or in some separate library that allows me to estimate such a regression without obtaining inconsistent results? Thanks for your help in advance, Michael Michael Haenlein Associate Professor of Marketing ESCP Europe Paris, France [[alternative HTML version deleted]]
Are x1 and x2 are factors (dummy variables)? cor does not make sense in this case. Nikhil Kaza Asst. Professor, City and Regional Planning University of North Carolina nikhil.list at gmail.com On Aug 3, 2010, at 9:10 AM, Michael Haenlein wrote:> Dear all, > > I have one dependent variable y and two independent variables x1 and > x2 > which I would like to use to explain y. x1 and x2 are design factors > in an > experiment and are not correlated with each other. For example > assume that: > > x1 <- rbind(1,1,1,2,2,2,3,3,3) > x2 <- rbind(1,2,3,1,2,3,1,2,3) > cor(x1,x2) > > The problem is that I do not only want to analyze the effect of x1 > and x2 on > y but also of their interaction x1*x2. Evidently this interaction > term has a > substantial correlation with both x1 and x2: > > x3 <- x1*x2 > cor(x1,x3) > cor(x2,x3) > > I therefore expect that a simple regression of y on x1, x2 and x1*x2 > will > lead to biased results due to multicollinearity. For example, even > when y is > completely random and unrelated to x1 and x2, I obtain a substantial > R2 for > a simple linear model which includes all three variables. This > evidently > does not make sense: > > y <- rnorm(9) > model <- lm (y ~ x1 + x2 + x1*x2) > summary(model) > > Is there some function within R or in some separate library that > allows me > to estimate such a regression without obtaining inconsistent results? > > Thanks for your help in advance, > > Michael > > > Michael Haenlein > Associate Professor of Marketing > ESCP Europe > Paris, France > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
I think you are attributing to "collinearity" a problem that is due to your small sample size. You are predicting 9 points with 3 predictor terms, and incorrectly concluding that there is some "inconsistency" because you get an R^2 that is above some number you deem surprising. (I got values between 0.2 and 0.4 on several runs. Try: x1 <- rnorm(100) x2 <- rnorm(100) x3 <- x1*x2 y <- rnorm(100) model <- lm (y ~ x1 + x2 + x1*x2) summary(model) # Multiple R-squared: 0.04269 -- David. On Aug 3, 2010, at 9:10 AM, Michael Haenlein wrote:> Dear all, > > I have one dependent variable y and two independent variables x1 and > x2 > which I would like to use to explain y. x1 and x2 are design factors > in an > experiment and are not correlated with each other. For example > assume that: > > x1 <- rbind(1,1,1,2,2,2,3,3,3) > x2 <- rbind(1,2,3,1,2,3,1,2,3) > cor(x1,x2) > > The problem is that I do not only want to analyze the effect of x1 > and x2 on > y but also of their interaction x1*x2. Evidently this interaction > term has a > substantial correlation with both x1 and x2: > > x3 <- x1*x2 > cor(x1,x3) > cor(x2,x3) > > I therefore expect that a simple regression of y on x1, x2 and x1*x2 > will > lead to biased results due to multicollinearity. For example, even > when y is > completely random and unrelated to x1 and x2, I obtain a substantial > R2 for > a simple linear model which includes all three variables. This > evidently > does not make sense: > > y <- rnorm(9) > model <- lm (y ~ x1 + x2 + x1*x2) > summary(model) > > Is there some function within R or in some separate library that > allows me > to estimate such a regression without obtaining inconsistent results? > > Thanks for your help in advance, > > Michael > > > Michael Haenlein > Associate Professor of Marketing > ESCP Europe > Paris, France > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD West Hartford, CT