thr3ads.net - R help - [R] How to intepret a factor response model? [May 2005]

If this information is useful, please help other people find it:
Share via:

Maciej Bliziński

2005-May-04 07:23 UTC

[R] How to intepret a factor response model?

Hello,

I'd like to create a model with a factor-type response variable. This is
an example:
> mydata <- data.frame(factor_var = as.factor(c(rep('one', 100),
rep('two', 100), rep('three', 100))), real_var = c(rnorm(150),
rnorm(150) + 5))
> summary(mydata) factor_var     real_var        
 one  :100   Min.   :-2.742877  
 three:100   1st Qu.:-0.009493  
 two  :100   Median : 2.361669  
             Mean   : 2.490411  
             3rd Qu.: 4.822394  
             Max.   : 6.924588  > mymodel = glm(factor_var ~ real_var, family = 'binomial', data =
mydata)
> summary(mymodel)
Call:
glm(formula = factor_var ~ real_var, family = "binomial", data =
mydata)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.7442  -0.6774   0.1849   0.3133   2.1187  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -0.6798     0.1882  -3.613 0.000303 ***
real_var      0.8971     0.1066   8.417  < 2e-16 ***
---
Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` '
1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 381.91  on 299  degrees of freedom
Residual deviance: 213.31  on 298  degrees of freedom
AIC: 217.31

Number of Fisher Scoring iterations: 6

---------------------------------------------------------------------

For models with real-type response variable it's easy to figure out,
what's the equation for the response variable (in the model). But here
- how do I interpret the model?

-- 
God made the world in six days, and was arrested on the seventh.

(Ted Harding)

2005-May-04 08:21 UTC

head link

[R] How to intepret a factor response model?

On 04-May-05 Maciej Blizi??ski wrote:> Hello,
> 
> I'd like to create a model with a factor-type response variable.
> This is an example:
> 
>> mydata <- data.frame(factor_var = as.factor(c(rep('one',
100),
>> rep('two', 100), rep('three', 100))), real_var =
c(rnorm(150),
>> rnorm(150) + 5))
>> summary(mydata)
>  factor_var     real_var        
>  one  :100   Min.   :-2.742877  
>  three:100   1st Qu.:-0.009493  
>  two  :100   Median : 2.361669  
>              Mean   : 2.490411  
>              3rd Qu.: 4.822394  
>              Max.   : 6.924588  
>> mymodel = glm(factor_var ~ real_var, family = 'binomial', data
>> mydata)
>> summary(mymodel)
> 
> Call:
> glm(formula = factor_var ~ real_var, family = "binomial", data
> mydata)
> 
> Deviance Residuals: 
>     Min       1Q   Median       3Q      Max  
> -1.7442  -0.6774   0.1849   0.3133   2.1187  
> 
> Coefficients:
>             Estimate Std. Error z value Pr(>|z|)    
> (Intercept)  -0.6798     0.1882  -3.613 0.000303 ***
> real_var      0.8971     0.1066   8.417  < 2e-16 ***
> ---
> Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 `
' 1
> 
> (Dispersion parameter for binomial family taken to be 1)
> 
>     Null deviance: 381.91  on 299  degrees of freedom
> Residual deviance: 213.31  on 298  degrees of freedom
> AIC: 217.31
> 
> Number of Fisher Scoring iterations: 6
Have you noticed that you get identical results with

set.seed(214354)
mydata <- data.frame(factor.var = as.factor(c(rep('one', 100),
   rep('two',100), rep('three', 100))),
   real.var = c(rnorm(150), rnorm(150) + 5))

mymodel <- glm(factor.var ~ real.var, family='binomial', data=mydata)
summary(mymodel)

and

set.seed(214354)
mydata <- data.frame(factor.var = as.factor(c(rep('one', 100),
   rep('two',200))),real.var = c(rnorm(150),rnorm(150) + 5))

mymodel <- glm(factor.var ~ real.var, family='binomial', data=mydata)
summary(mymodel)

(I've left out the "summary(mydata)" since these do naturally
differ, and I've replaced "factor_var" with "factor.var"
and
"real_var" with "real.var" because of potential
complications
with "_"; also "mymodel =" to "mymodel <-").

So I think the interpretation of the results from your first
model is that, because of the "family='binomial'", glm is
treating "factor.var='one'" as binomial response
"0", say,
and "factor.var='two'" or
"factor.var='three'" as binomial
response "1".

You're trying to fit a multinomial response, but you've
specified a binomial family to 'glm'. 'glm' does not have
a multinomial response family.

You could try 'multinom' from package 'nnet' which fits
loglinear models to factor responses with more than 2 levels.

E.g.

  library(nnet)
  mymodel <- multinom(factor.var ~ real.var,data=mydata)
   ### weights:  9 (4 variable)
   ##  initial  value 329.583687 
   ##  iter  10 value 209.780666
   ##  final  value 209.779951 
   ##  converged
  summary(mymodel)
   ## Re-fitting to get Hessian
   ## Call:
   ## multinom(formula = factor.var ~ real.var, data = mydata)
   ##  Coefficients:
   ##        (Intercept)  real.var
   ##  three  -3.4262565 1.3838231
   ##  two    -0.6754253 0.7116955
   ##
   ## Std. Errors:
   ##   (Intercept)  real.var
   ## three   0.5028541 0.1480138
   ## two     0.1846827 0.1068821
   ##
   ## Residual Deviance: 419.5599 
   ## AIC: 427.5599 
   ##
   ## Correlation of Coefficients:
   ##             three:(Intercept) three:real.var two:(Intercept)
   ## three:real.var  -0.7286258                                      
   ## two:(Intercept)  0.1986995        -0.1261034                    
   ## two:real.var    -0.1411377         0.7012481     -0.3285741

This output does suggest a fairly clear interpretation!

Hoping this helps,
Ted.


--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 04-May-05                                       Time: 09:18:03
------------------------------ XFMail ------------------------------

Prof Brian Ripley

2005-May-04 08:37 UTC

head link

[R] How to intepret a factor response model?

On Wed, 4 May 2005, Maciej [iso-8859-2] BliziDski wrote:
> I'd like to create a model with a factor-type response variable. This
is
> an example:
What you have done here is to fit a logistic regression.  The 
interpretation of that is covered in many good books: for example there 
are plots of the predicted values in MASS4.

I do wonder if that is what you intended, though.  You have fitted a model 
of 'two or three' vs 'one'.  You may have intended a multinomial
logistic
model: again MASS4 has details of such models.
>> mydata <- data.frame(factor_var = as.factor(c(rep('one',
100), rep('two', 100), rep('three', 100))), real_var =
c(rnorm(150), rnorm(150) + 5))
>> summary(mydata)
> factor_var     real_var
> one  :100   Min.   :-2.742877
> three:100   1st Qu.:-0.009493
> two  :100   Median : 2.361669
>             Mean   : 2.490411
>             3rd Qu.: 4.822394
>             Max.   : 6.924588
>> mymodel = glm(factor_var ~ real_var, family = 'binomial', data
= mydata)
>> summary(mymodel)
>
> Call:
> glm(formula = factor_var ~ real_var, family = "binomial", data =
mydata)
>
> Deviance Residuals:
>    Min       1Q   Median       3Q      Max
> -1.7442  -0.6774   0.1849   0.3133   2.1187
>
> Coefficients:
>            Estimate Std. Error z value Pr(>|z|)
> (Intercept)  -0.6798     0.1882  -3.613 0.000303 ***
> real_var      0.8971     0.1066   8.417  < 2e-16 ***
> ---
> Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 `
' 1
>
> (Dispersion parameter for binomial family taken to be 1)
>
>    Null deviance: 381.91  on 299  degrees of freedom
> Residual deviance: 213.31  on 298  degrees of freedom
> AIC: 217.31
>
> Number of Fisher Scoring iterations: 6
>
> ---------------------------------------------------------------------
>
> For models with real-type response variable it's easy to figure out,
> what's the equation for the response variable (in the model). But here
> - how do I interpret the model?
-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Peter Flom

2005-May-04 12:08 UTC

head link

[R] How to intepret a factor response model?

>>> Maciej Blizi(B??ski <m.blizinski at wsisiz.edu.pl> 5/4/2005
6:02:14 AM >>><<<
I'm trying to analyze a survey. Most of the variables are of factor
type, with values for example {"no_at_all", "a_little",
"mostly",
"a_lot"}.>>>
In that case, you probably want to look at ordinal logistic regression.  This is
covered in numerous texts, one good one which uses R is Harrell's Regression
Modeling Strategies (an excellent book in other regards, as well).

Another book which might be useful (although not R specific) isLong's
Regression Models for Categorical and Limited Dependent Variables

<<<
I thought about mapping those answers to numbers, but I didn't know what
numbers should I assign them to: {1, 2, 3, 4} (linear) or maybe
{1, 2, 4, 8} (exponential)? So I rather tried to analyze the original
factor survey data.

Multinomial factor response wasn't covered in the lectures in my school
so I'm trying to use my intuition and trial/error technique (please
forgive me :-) ).>>>
Using your intuition and trial and error seems to me to be a way to guarantee
lots of trials and lots of errors, but not necessarily to guarantee success. 
You might want to consult a statistician before proceeding; you certainly want
to consult a text.

HTH

Peter


Peter L. Flom, PhD
Assistant Director, Statistics and Data Analysis Core
Center for Drug Use and HIV Research
National Development and Research Institutes
71 W. 23rd St
www.peterflom.com
New York, NY 10010
(212) 845-4485 (voice)
(917) 438-0894 (fax)

Seemingly Similar Threads

Search for more maybe matching threads

R help - May 2005 - How to intepret a factor response model?

[R] How to intepret a factor response model?

[R] How to intepret a factor response model?

[R] How to intepret a factor response model?

[R] How to intepret a factor response model?

Seemingly Similar Threads