thr3ads.net - R help - [R] Getting vastly different results when running GLMs [Aug 2011]

If this information is useful, please help other people find it:
Share via:

Luke Duncan

2011-Aug-17 15:43 UTC

[R] Getting vastly different results when running GLMs

Dear R gurus

I am analysing data from a study of behaviour and shade utilization of
chimpanzees. I am using GLMs in R (version 2.13.0) to test whether shade/sun
utilization is predicted by behaviour observed. I am thus interested in
whether an interaction of behaviour (as a predictor) and presence in the
sun/shade (also predictor) predicts the counts I have for the respective
categories.

I have my data organised as such:

 behaviour location specific total  Travel Sun 131 303  Travel Shade 172 303
Foraging Sun 248 651  Foraging Shade 403 651  Vigilance Sun 97 224
Vigilance Shade 127 224  Rest Sun 502 1143  Rest Shade 641 1143  Abnormal
Sun 33 58  Abnormal Shade 25 58  Play Sun 58 173  Play Shade 115 173
SelfGrooming Sun 183 595  SelfGrooming Shade 412 595  SocialGrooming Sun 59
358  SocialGrooming Shade 299 358  Other Sun 4 39  Other Shade 35 39  Hidden
Sun 120 656  Hidden Shade 536 656

I have coded the response variable as a specific count of the times
individuals were in the sun or shade, for each behaviour, out of a total
number of times a specific behaviour was observed (regardless of location
[sun/shade]). These are represented by the columns 'specific' and
'total'
respectively. I had originally coded these values as a proportion variable,
but had similar mismatch problems between R and Statistica (as described
below). The GLM I am running is a binomial one (as my count response
variables are divided dichotomously by the sun/shade predictor variable)
with a logit link function. My problem is this: I originally ran the data
through another stats program (Statistica) and got significant effects for
all first- and second-order effects. When I examined the raw data, the
patterns seen in the raw data suggested that these outcomes (of the GLM)
conformed to the raw data (i.e. confirmed the GLM results). I then ran the *
same* data through R using the following code:
> behdata<-read.csv("behaviourshade.csv",header=TRUE)
> behdata #Just to check that everything is there and working...
> behav<-behdata$behaviour
> loc<-behdata$location
> prop<-behdata$proportion
> spec<-behdata$specific
> total<-behdata$total
> model<-glm((cbind(spec,total))~behav*loc,family=binomial,data=behdata)
> summary(model)Call:
glm(formula = (cbind(spec, total)) ~ behav * loc, family = binomial,
    data = behdata)
Deviance Residuals:
 [1]  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
Coefficients:
                            Estimate Std. Error z value Pr(>|z|)
(Intercept)                  -0.8416     0.2393  -3.517 0.000436 ***
behavForagingfeeding          0.3620     0.2475   1.463 0.143585
behavHidden                   0.6395     0.2462   2.597 0.009396 **
behavOther                    0.7334     0.3338   2.197 0.028044 *
behavPlay                     0.4332     0.2678   1.618 0.105739
behavRest                     0.2632     0.2443   1.077 0.281320
behavSelfGrooming             0.4740     0.2477   1.914 0.055644 .
behavSocialGrooming           0.6615     0.2518   2.627 0.008602 **
behavTravel                   0.2753     0.2576   1.069 0.285142
behavVigilance                0.2741     0.2638   1.039 0.298732
locSun                        0.2776     0.3237   0.858 0.391077
behavForagingfeeding:locSun  -0.7631     0.3382  -2.257 0.024036 *
behavHidden:locSun           -1.7743     0.3436  -5.164 2.41e-07 ***
behavOther:locSun            -2.4467     0.6593  -3.711 0.000206 ***
behavPlay:locSun             -0.9621     0.3772  -2.551 0.010752 *
behavRest:locSun             -0.5221     0.3318  -1.573 0.115615
behavSelfGrooming:locSun     -1.0892     0.3406  -3.197 0.001387 **
behavSocialGrooming:locSun   -1.9005     0.3615  -5.258 1.46e-07 ***
behavTravel:locSun           -0.5499     0.3533  -1.556 0.119597
behavVigilance:locSun        -0.5471     0.3632  -1.506 0.131952
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
    Null deviance:  4.5553e+02  on 19  degrees of freedom
Residual deviance: -1.3700e-13  on  0  degrees of freedom
  (19 observations deleted due to missingness)
AIC: 165.12
Number of Fisher Scoring iterations: 3
When I ran it through R I got VERY different results. The R GLM suggested
that behaviour and the behaviour*location interactions were significant
predictors of the counts but the location (sun/shade) was not. In addition,
within each factor, where differences lay were very different between tests.
While I understand that it is entirely possible to have significant 2nd
order interactions when 1st order effects may not be significant, these
patterns described seem to defy apparently obvious patterns in the raw data.

To further complicate things, I was reading through Crawley's R book where
he describes the relationships between orthogonal and non-orthogonal studies
and the order of factor entry; how order can complicate GLM type analyses
for non-orthogonal studies. My data are orthogonal, yet the order in which I
enter variables into the model influences which factors are significant and
which are not. Why is this the case? Am I doing something wrong here with my
data structure or coding? For example, if I switch the order from
'behav*loc' to 'loc*behav' I get yet another set of results that
match
neither the first R GLM results, nor the original results from the other
program.

I have checked the model for overdispersion and found that it is not
overdispersed. What am I doing wrong here? How can the same dataset be
generating such vastly different outcomes? I suspect that it may lie in the
way in which the model is fitted (R does iteratively reweighted least
squares whereas, Statistica may use something entirely different; what
exactly, I have no clue...) but I am none the wiser in this regard, so I
really don't know.

Regards, in despiration...

Luke

*PhD Candidate*
*School of Animal, Plant and Environmental Sciences*
*University of the Witwatersrand*
*Johannesburg, South Africa*

	[[alternative HTML version deleted]]

Mark Difford

2011-Aug-17 20:23 UTC

head link

[R] Getting vastly different results when running GLMs

On Aug 17, 2011; 5:43pm Luke Duncan wrote:

Hi Luke,

The differences you are seeing are almost certainly due to different
contrast codings: Statistica probably uses sum-to-zero contrasts whereas R
uses treatment (Dunnett) contrasts by default. You would be well advised to
consult a local statistician for a deeper understanding.

For some immediate insight do the following:

## Fits your model with different contrasts + a few other things.
##
library(car)
?contrast
?contr.treatment
model1 <- glm((cbind(spec,total)) ~ behav * loc, family=binomial,
data=behdata, contrasts=list(behav="contr.treatment",
loc="contr.treatment"))
model2 <- glm((cbind(spec,total)) ~ behav * loc, family=binomial,
data=behdata, contrasts=list(behav="contr.sum",
loc="contr.sum"))

summary(model1)
summary(model2)
anova(model1, model2)      ## see: models seem different but are identical

## Type I SS
anova(model1)
anova(model2)

## Type II SS
library(car)
Anova(model1, type="II")
Anova(model2, type="II")

Regards, Mark.

-----
Mark Difford (Ph.D.)
Research Associate
Botany Department
Nelson Mandela Metropolitan University
Port Elizabeth, South Africa
--
View this message in context:
http://r.789695.n4.nabble.com/Getting-vastly-different-results-when-running-GLMs-tp3750496p3751115.html
Sent from the R help mailing list archive at Nabble.com.

Michael Dewey

2011-Aug-18 15:58 UTC

head link

[R] Getting vastly different results when running GLMs

At 16:43 17/08/2011, Luke Duncan wrote:>Dear R gurus
Response in line below
>I am analysing data from a study of behaviour and shade utilization of
>chimpanzees. I am using GLMs in R (version 2.13.0) to test whether shade/sun
>utilization is predicted by behaviour observed. I am thus interested in
>whether an interaction of behaviour (as a predictor) and presence in the
>sun/shade (also predictor) predicts the counts I have for the respective
>categories.
>
>I have my data organised as such:
>
>  behaviour location specific total  Travel Sun 131 303  Travel Shade 172
303
>Foraging Sun 248 651  Foraging Shade 403 651  Vigilance Sun 97 224
>Vigilance Shade 127 224  Rest Sun 502 1143  Rest Shade 641 1143  Abnormal
>Sun 33 58  Abnormal Shade 25 58  Play Sun 58 173  Play Shade 115 173
>SelfGrooming Sun 183 595  SelfGrooming Shade 412 595  SocialGrooming Sun 59
>358  SocialGrooming Shade 299 358  Other Sun 4 39  Other Shade 35 39  Hidden
>Sun 120 656  Hidden Shade 536 656
>
>I have coded the response variable as a specific count of the times
>individuals were in the sun or shade, for each behaviour, out of a total
>number of times a specific behaviour was observed (regardless of location
>[sun/shade]). These are represented by the columns 'specific' and
'total'
>respectively. I had originally coded these values as a proportion variable,
>but had similar mismatch problems between R and Statistica (as described
>below). The GLM I am running is a binomial one (as my count response
>variables are divided dichotomously by the sun/shade predictor variable)
>with a logit link function. My problem is this: I originally ran the data
>through another stats program (Statistica) and got significant effects for
>all first- and second-order effects. When I examined the raw data, the
>patterns seen in the raw data suggested that these outcomes (of the GLM)
>conformed to the raw data (i.e. confirmed the GLM results). I then ran the *
>same* data through R using the following code:
>
> > behdata<-read.csv("behaviourshade.csv",header=TRUE)
> > behdata #Just to check that everything is there and working...
> > behav<-behdata$behaviour
> > loc<-behdata$location
> > prop<-behdata$proportion
> > spec<-behdata$specific
> > total<-behdata$total
> >
model<-glm((cbind(spec,total))~behav*loc,family=binomial,data=behdata)
If you have extracted your variables from the data.frame you do not 
need the data> > summary(model)
>Call:
>glm(formula = (cbind(spec, total)) ~ behav * loc, family = binomial,
>     data = behdata)
>Deviance Residuals:
>  [1]  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
>Coefficients:
>                             Estimate Std. Error z value Pr(>|z|)
>(Intercept)                  -0.8416     0.2393  -3.517 0.000436 ***
>behavForagingfeeding          0.3620     0.2475   1.463 0.143585
>behavHidden                   0.6395     0.2462   2.597 0.009396 **
>behavOther                    0.7334     0.3338   2.197 0.028044 *
>behavPlay                     0.4332     0.2678   1.618 0.105739
>behavRest                     0.2632     0.2443   1.077 0.281320
>behavSelfGrooming             0.4740     0.2477   1.914 0.055644 .
>behavSocialGrooming           0.6615     0.2518   2.627 0.008602 **
>behavTravel                   0.2753     0.2576   1.069 0.285142
>behavVigilance                0.2741     0.2638   1.039 0.298732
>locSun                        0.2776     0.3237   0.858 0.391077
>behavForagingfeeding:locSun  -0.7631     0.3382  -2.257 0.024036 *
>behavHidden:locSun           -1.7743     0.3436  -5.164 2.41e-07 ***
>behavOther:locSun            -2.4467     0.6593  -3.711 0.000206 ***
>behavPlay:locSun             -0.9621     0.3772  -2.551 0.010752 *
>behavRest:locSun             -0.5221     0.3318  -1.573 0.115615
>behavSelfGrooming:locSun     -1.0892     0.3406  -3.197 0.001387 **
>behavSocialGrooming:locSun   -1.9005     0.3615  -5.258 1.46e-07 ***
>behavTravel:locSun           -0.5499     0.3533  -1.556 0.119597
>behavVigilance:locSun        -0.5471     0.3632  -1.506 0.131952
>---
>Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05
'.' 0.1 ' ' 1
>(Dispersion parameter for binomial family taken to be 1)
>     Null deviance:  4.5553e+02  on 19  degrees of freedom
>Residual deviance: -1.3700e-13  on  0  degrees of freedom
>   (19 observations deleted due to missingness)
Why did it delete 19 observations?
>AIC: 165.12
>Number of Fisher Scoring iterations: 3
>When I ran it through R I got VERY different results. The R GLM suggested
>that behaviour and the behaviour*location interactions were significant
>predictors of the counts but the location (sun/shade) was not.
In general if you have an interaction you need to be cautious about 
making statement about the underlying main effects. You have found 
that the effect of sun differs for different behaviours so making an 
overall statement about sun may be problematic.
>  In addition,
>within each factor, where differences lay were very different between tests.
>While I understand that it is entirely possible to have significant 2nd
>order interactions when 1st order effects may not be significant, these
>patterns described seem to defy apparently obvious patterns in the raw data.
>
>To further complicate things, I was reading through Crawley's R book
where
>he describes the relationships between orthogonal and non-orthogonal studies
>and the order of factor entry; how order can complicate GLM type analyses
>for non-orthogonal studies. My data are orthogonal, yet the order in which I
>enter variables into the model influences which factors are significant and
>which are not. Why is this the case? Am I doing something wrong here with my
>data structure or coding? For example, if I switch the order from
>'behav*loc' to 'loc*behav' I get yet another set of results
that match
>neither the first R GLM results, nor the original results from the other
>program.
>
>I have checked the model for overdispersion and found that it is not
>overdispersed. What am I doing wrong here? How can the same dataset be
>generating such vastly different outcomes? I suspect that it may lie in the
>way in which the model is fitted (R does iteratively reweighted least
>squares whereas, Statistica may use something entirely different; what
>exactly, I have no clue...) but I am none the wiser in this regard, so I
>really don't know.
>
>Regards, in despiration...
>
>Luke
>
>*PhD Candidate*
>*School of Animal, Plant and Environmental Sciences*
>*University of the Witwatersrand*
>*Johannesburg, South Africa*
>
>         [[alternative HTML version deleted]]
Michael Dewey
info at aghmed.fsnet.co.uk
http://www.aghmed.fsnet.co.uk/home.html

Maybe Matching Threads

Search for more maybe matching threads

R help - Aug 2011 - Getting vastly different results when running GLMs

[R] Getting vastly different results when running GLMs

[R] Getting vastly different results when running GLMs

[R] Getting vastly different results when running GLMs

Maybe Matching Threads