thr3ads.net - R help - [R] A regression problem using dummy variables [Jul 2008]

If this information is useful, please help other people find it:
Share via:

rlearner309

2008-Jul-01 13:38 UTC

[R] A regression problem using dummy variables

This is actually more like a Statistics problem:
I have a dataset with two dummy variables controlling three levels.  The
problem is, one level does not have many observations compared with other
two levels (a couple of data points compared with 1000+ points on other
levels).  When I run the regression, the result is bad.  I have unbalanced
SE and VIF.  Does this kind of problem also belong to "near
sigularity"
problem?  Does it make any difference if I code the level that lacks data
(0,0) in stead of (0,1)?

thanks a lot!
-- 
View this message in context:
http://www.nabble.com/A-regression-problem-using-dummy-variables-tp18214377p18214377.html
Sent from the R help mailing list archive at Nabble.com.

Moshe Olshansky

2008-Jul-02 00:23 UTC

head link

[R] A regression problem using dummy variables

Do you have a reason to treat all 3 levels together and not have a separate
regression for each level?


--- On Tue, 1/7/08, rlearner309 <unixunix99 at gmail.com> wrote:
> From: rlearner309 <unixunix99 at gmail.com>
> Subject: [R]  A regression problem using dummy variables
> To: r-help at r-project.org
> Received: Tuesday, 1 July, 2008, 11:38 PM
> This is actually more like a Statistics problem:
> I have a dataset with two dummy variables controlling three
> levels.  The
> problem is, one level does not have many observations
> compared with other
> two levels (a couple of data points compared with 1000+
> points on other
> levels).  When I run the regression, the result is bad.  I
> have unbalanced
> SE and VIF.  Does this kind of problem also belong to
> "near sigularity"
> problem?  Does it make any difference if I code the level
> that lacks data
> (0,0) in stead of (0,1)?
> 
> thanks a lot!
> -- 
> View this message in context:
>
http://www.nabble.com/A-regression-problem-using-dummy-variables-tp18214377p18214377.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained,
> reproducible code.

rlearner309

2008-Jul-02 05:01 UTC

head link

[R] A regression problem using dummy variables

Yes.  Because the slopes are supposed to be the same.
Level shifts are needed to be modeled.


Moshe Olshansky-2 wrote:> 
> Do you have a reason to treat all 3 levels together and not have a
> separate regression for each level?
> 
> 
> --- On Tue, 1/7/08, rlearner309 <unixunix99 at gmail.com> wrote:
> 
>> From: rlearner309 <unixunix99 at gmail.com>
>> Subject: [R]  A regression problem using dummy variables
>> To: r-help at r-project.org
>> Received: Tuesday, 1 July, 2008, 11:38 PM
>> This is actually more like a Statistics problem:
>> I have a dataset with two dummy variables controlling three
>> levels.  The
>> problem is, one level does not have many observations
>> compared with other
>> two levels (a couple of data points compared with 1000+
>> points on other
>> levels).  When I run the regression, the result is bad.  I
>> have unbalanced
>> SE and VIF.  Does this kind of problem also belong to
>> "near sigularity"
>> problem?  Does it make any difference if I code the level
>> that lacks data
>> (0,0) in stead of (0,1)?
>> 
>> thanks a lot!
>> -- 
>> View this message in context:
>>
http://www.nabble.com/A-regression-problem-using-dummy-variables-tp18214377p18214377.html
>> Sent from the R help mailing list archive at Nabble.com.
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained,
>> reproducible code.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 
-- 
View this message in context:
http://www.nabble.com/A-regression-problem-using-dummy-variables-tp18214377p18230346.html
Sent from the R help mailing list archive at Nabble.com.

rlearner309

2008-Jul-02 13:38 UTC

head link

[R] A regression problem using dummy variables

I think the covariance between dummy variables or between dummy variables and
intercept should always be zero.  meaning: no sigularity problem??



rlearner309 wrote:> 
> This is actually more like a Statistics problem:
> I have a dataset with two dummy variables controlling three levels.  The
> problem is, one level does not have many observations compared with other
> two levels (a couple of data points compared with 1000+ points on other
> levels).  When I run the regression, the result is bad.  I have unbalanced
> SE and VIF.  Does this kind of problem also belong to "near
sigularity"
> problem?  Does it make any difference if I code the level that lacks data
> (0,0) in stead of (0,1)?
> 
> thanks a lot!
> 
-- 
View this message in context:
http://www.nabble.com/A-regression-problem-using-dummy-variables-tp18214377p18237666.html
Sent from the R help mailing list archive at Nabble.com.

Thomas Lumley

2008-Jul-02 14:47 UTC

head link

[R] A regression problem using dummy variables

On Wed, 2 Jul 2008, rlearner309 wrote:
>
> I think the covariance between dummy variables or between dummy variables
and
> intercept should always be zero.  meaning: no sigularity problem??
>
No.  You can easily check that this is not true using the cov() function.
Indicator variables for mutually exclusive groups are negatively correlated.

     -thomas


>
> rlearner309 wrote:
>>
>> This is actually more like a Statistics problem:
>> I have a dataset with two dummy variables controlling three levels. 
The
>> problem is, one level does not have many observations compared with
other
>> two levels (a couple of data points compared with 1000+ points on other
>> levels).  When I run the regression, the result is bad.  I have
unbalanced
>> SE and VIF.  Does this kind of problem also belong to "near
sigularity"
>> problem?  Does it make any difference if I code the level that lacks
data
>> (0,0) in stead of (0,1)?
>>
>> thanks a lot!
>>
>
> --
> View this message in context:
http://www.nabble.com/A-regression-problem-using-dummy-variables-tp18214377p18237666.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Thomas Lumley			Assoc. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle

rlearner309

2008-Jul-02 22:24 UTC

head link

[R] A regression problem using dummy variables

I think it is zero, because you have lots of zeros there.  It is not like
continous variables.



Thomas Lumley wrote:> 
> On Wed, 2 Jul 2008, rlearner309 wrote:
> 
>>
>> I think the covariance between dummy variables or between dummy
variables
>> and
>> intercept should always be zero.  meaning: no sigularity problem??
>>
> 
> No.  You can easily check that this is not true using the cov() function.
> Indicator variables for mutually exclusive groups are negatively
> correlated.
> 
>      -thomas
> 
> 
> 
>>
>> rlearner309 wrote:
>>>
>>> This is actually more like a Statistics problem:
>>> I have a dataset with two dummy variables controlling three levels.
The
>>> problem is, one level does not have many observations compared with
>>> other
>>> two levels (a couple of data points compared with 1000+ points on
other
>>> levels).  When I run the regression, the result is bad.  I have
>>> unbalanced
>>> SE and VIF.  Does this kind of problem also belong to "near
sigularity"
>>> problem?  Does it make any difference if I code the level that
lacks
>>> data
>>> (0,0) in stead of (0,1)?
>>>
>>> thanks a lot!
>>>
>>
>> --
>> View this message in context:
>>
http://www.nabble.com/A-regression-problem-using-dummy-variables-tp18214377p18237666.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
> 
> Thomas Lumley			Assoc. Professor, Biostatistics
> tlumley at u.washington.edu	University of Washington, Seattle
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 
-- 
View this message in context:
http://www.nabble.com/A-regression-problem-using-dummy-variables-tp18214377p18248187.html
Sent from the R help mailing list archive at Nabble.com.

R help - Jul 2008 - A regression problem using dummy variables

[R] A regression problem using dummy variables

[R] A regression problem using dummy variables

[R] A regression problem using dummy variables

[R] A regression problem using dummy variables

[R] A regression problem using dummy variables

[R] A regression problem using dummy variables