This is actually more like a Statistics problem: I have a dataset with two dummy variables controlling three levels. The problem is, one level does not have many observations compared with other two levels (a couple of data points compared with 1000+ points on other levels). When I run the regression, the result is bad. I have unbalanced SE and VIF. Does this kind of problem also belong to "near sigularity" problem? Does it make any difference if I code the level that lacks data (0,0) in stead of (0,1)? thanks a lot! -- View this message in context: http://www.nabble.com/A-regression-problem-using-dummy-variables-tp18214377p18214377.html Sent from the R help mailing list archive at Nabble.com.
Do you have a reason to treat all 3 levels together and not have a separate regression for each level? --- On Tue, 1/7/08, rlearner309 <unixunix99 at gmail.com> wrote:> From: rlearner309 <unixunix99 at gmail.com> > Subject: [R] A regression problem using dummy variables > To: r-help at r-project.org > Received: Tuesday, 1 July, 2008, 11:38 PM > This is actually more like a Statistics problem: > I have a dataset with two dummy variables controlling three > levels. The > problem is, one level does not have many observations > compared with other > two levels (a couple of data points compared with 1000+ > points on other > levels). When I run the regression, the result is bad. I > have unbalanced > SE and VIF. Does this kind of problem also belong to > "near sigularity" > problem? Does it make any difference if I code the level > that lacks data > (0,0) in stead of (0,1)? > > thanks a lot! > -- > View this message in context: > http://www.nabble.com/A-regression-problem-using-dummy-variables-tp18214377p18214377.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, > reproducible code.
Yes. Because the slopes are supposed to be the same. Level shifts are needed to be modeled. Moshe Olshansky-2 wrote:> > Do you have a reason to treat all 3 levels together and not have a > separate regression for each level? > > > --- On Tue, 1/7/08, rlearner309 <unixunix99 at gmail.com> wrote: > >> From: rlearner309 <unixunix99 at gmail.com> >> Subject: [R] A regression problem using dummy variables >> To: r-help at r-project.org >> Received: Tuesday, 1 July, 2008, 11:38 PM >> This is actually more like a Statistics problem: >> I have a dataset with two dummy variables controlling three >> levels. The >> problem is, one level does not have many observations >> compared with other >> two levels (a couple of data points compared with 1000+ >> points on other >> levels). When I run the regression, the result is bad. I >> have unbalanced >> SE and VIF. Does this kind of problem also belong to >> "near sigularity" >> problem? Does it make any difference if I code the level >> that lacks data >> (0,0) in stead of (0,1)? >> >> thanks a lot! >> -- >> View this message in context: >> http://www.nabble.com/A-regression-problem-using-dummy-variables-tp18214377p18214377.html >> Sent from the R help mailing list archive at Nabble.com. >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, >> reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >-- View this message in context: http://www.nabble.com/A-regression-problem-using-dummy-variables-tp18214377p18230346.html Sent from the R help mailing list archive at Nabble.com.
I think the covariance between dummy variables or between dummy variables and intercept should always be zero. meaning: no sigularity problem?? rlearner309 wrote:> > This is actually more like a Statistics problem: > I have a dataset with two dummy variables controlling three levels. The > problem is, one level does not have many observations compared with other > two levels (a couple of data points compared with 1000+ points on other > levels). When I run the regression, the result is bad. I have unbalanced > SE and VIF. Does this kind of problem also belong to "near sigularity" > problem? Does it make any difference if I code the level that lacks data > (0,0) in stead of (0,1)? > > thanks a lot! >-- View this message in context: http://www.nabble.com/A-regression-problem-using-dummy-variables-tp18214377p18237666.html Sent from the R help mailing list archive at Nabble.com.
On Wed, 2 Jul 2008, rlearner309 wrote:> > I think the covariance between dummy variables or between dummy variables and > intercept should always be zero. meaning: no sigularity problem?? >No. You can easily check that this is not true using the cov() function. Indicator variables for mutually exclusive groups are negatively correlated. -thomas> > rlearner309 wrote: >> >> This is actually more like a Statistics problem: >> I have a dataset with two dummy variables controlling three levels. The >> problem is, one level does not have many observations compared with other >> two levels (a couple of data points compared with 1000+ points on other >> levels). When I run the regression, the result is bad. I have unbalanced >> SE and VIF. Does this kind of problem also belong to "near sigularity" >> problem? Does it make any difference if I code the level that lacks data >> (0,0) in stead of (0,1)? >> >> thanks a lot! >> > > -- > View this message in context: http://www.nabble.com/A-regression-problem-using-dummy-variables-tp18214377p18237666.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >Thomas Lumley Assoc. Professor, Biostatistics tlumley at u.washington.edu University of Washington, Seattle
I think it is zero, because you have lots of zeros there. It is not like continous variables. Thomas Lumley wrote:> > On Wed, 2 Jul 2008, rlearner309 wrote: > >> >> I think the covariance between dummy variables or between dummy variables >> and >> intercept should always be zero. meaning: no sigularity problem?? >> > > No. You can easily check that this is not true using the cov() function. > Indicator variables for mutually exclusive groups are negatively > correlated. > > -thomas > > > >> >> rlearner309 wrote: >>> >>> This is actually more like a Statistics problem: >>> I have a dataset with two dummy variables controlling three levels. The >>> problem is, one level does not have many observations compared with >>> other >>> two levels (a couple of data points compared with 1000+ points on other >>> levels). When I run the regression, the result is bad. I have >>> unbalanced >>> SE and VIF. Does this kind of problem also belong to "near sigularity" >>> problem? Does it make any difference if I code the level that lacks >>> data >>> (0,0) in stead of (0,1)? >>> >>> thanks a lot! >>> >> >> -- >> View this message in context: >> http://www.nabble.com/A-regression-problem-using-dummy-variables-tp18214377p18237666.html >> Sent from the R help mailing list archive at Nabble.com. >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > Thomas Lumley Assoc. Professor, Biostatistics > tlumley at u.washington.edu University of Washington, Seattle > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >-- View this message in context: http://www.nabble.com/A-regression-problem-using-dummy-variables-tp18214377p18248187.html Sent from the R help mailing list archive at Nabble.com.