Without going deeply into your analysis, 2 comments:
1) Use the anova command to test two nested models using:
anova(model1, model2, test="Chisq")
2) glm's are non-trivial models (at least to me), be sure to google for
some tutorials in order to understand what you are looking at...
Cheers,
Tal
----------------Contact
Details:-------------------------------------------------------
Contact me: Tal.Galili@gmail.com | 972-52-7275845
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
www.r-statistics.com (English)
----------------------------------------------------------------------------------------------
On Fri, May 4, 2012 at 6:38 PM, lincoln <miseno77@hotmail.com> wrote:
> Hi,
>
> I have a data set with 999 observations, for each of them I have data on
> four variables:
> site, colony, gender (quite a few NA values), and cohort.
>
> This is how the data set looks like:
> > str(dispersal)
> 'data.frame': 999 obs. of 4 variables:
> $ site : Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1
2 2 ...
> $ gender: Factor w/ 2 levels "0","1": NA NA 2 1 2 NA 1
2 2 NA ...
> $ colony: Factor w/ 2 levels "main","other": 2 2 2 2 2
2 2 2 2 2 ...
> $ cohort: Factor w/ 11 levels "1996","2000",..: 8 8 8
8 8 8 8 8 8 6 ...
>
> Now, I want to estimate if sites 1 and site 2 differ on some of the other
> variables. For instance there are relatively more males in site 1 with
> respect to site 2, more individuals of the main colony in site 2 with
> respect to site 1 and this sort of things.
>
> I thought I might do a binomial GLM considering as response variable the
> site, I tried to run the more general model to have a look to
> overdispersion
> but I believe there is something wrong even before worrying about
> overdispersion. I know (I did a chisq.test) that cohort2004 is very
> diversly
> represented between the two sites but it is not reflected in the results of
> GLM. Here there are the results of chisq.test and the GLM:
>
> 1)
> />
>
>
age_cohort<-as.table(rbind(c(142,95,46,33,14,59,18,12,7,1,0),c(258,144,54,70,20,11,6,8,2,3,1)))
> > dimnames(age_cohort)<-list(site=c("M","D"),
> +
> cohorts=c(2010,2009,2008,2007,2006,2004,2003,2002,2001,2000,1996))
> > age_cohort
> cohorts
> site 2010 2009 2008 2007 2006 2004 2003 2002 2001 2000 1996
> M 142 95 46 33 14 59 18 12 7 1 0
> D 258 144 54 70 20 11 6 8 2 3 1
> > (Xsqagec <- chisq.test(age_cohort)) # Prints test summary
>
> Pearson's Chi-squared test
>
> data: age_cohort
> X-squared = 82.6016, df = 10, p-value = 1.549e-13
>
> Mensajes de aviso perdidos
> In chisq.test(age_cohort) : Chi-squared approximation may be incorrect
> > Xsqagec$observed # observed counts
> cohorts
> site 2010 2009 2008 2007 2006 2004 2003 2002 2001 2000 1996
> M 142 95 46 33 14 59 18 12 7 1 0
> D 258 144 54 70 20 11 6 8 2 3 1
> > Xsqagec$expected # expected counts under the null
> cohorts
> site 2010 2009 2008 2007 2006 2004 2003
> M 170.1195 101.6464 42.52988 43.80578 14.46016 29.77092 10.20717
> D 229.8805 137.3536 57.47012 59.19422 19.53984 40.22908 13.79283
> cohorts
> site 2002 2001 2000 1996
> M 8.505976 3.827689 1.701195 0.4252988
> D 11.494024 5.172311 2.298805 0.5747012
> > Xsqagec$residuals # Pearson residuals
> cohorts
> site 2010 2009 2008 2007 2006
> M -2.1559111 -0.6592367 0.5321050 -1.6326395 -0.1210101
> D 1.8546283 0.5671101 -0.4577448 1.4044825 0.1040993
> cohorts
> site 2004 2003 2002 2001 2000
> M 5.3569686 2.4391720 1.1980192 1.6214643 -0.5376032
> D -4.6083465 -2.0983042 -1.0305993 -1.3948690 0.4624746
> cohorts
> site 1996
> M -0.6521494
> D 0.5610132
> > Xsqagec$stdres # standardized residuals
> cohorts
> site 2010 2009 2008 2007 2006
> M -3.6665549 -0.9962228 0.7397057 -2.2733888 -0.1623984
> D 3.6665549 0.9962228 -0.7397057 2.2733888 0.1623984
> cohorts
> site 2004 2003 2002 2001 2000
> M 7.3264142 3.2566808 1.5962909 2.1485311 -0.7105713
> D -7.3264142 -3.2566808 -1.5962909 -2.1485311 0.7105713
> cohorts
> site 1996
> M -0.8606814
> D 0.8606814
> /
>
>
> 2)
> /> model1<-glm(site~gender*colony*cohort,binomial)
> > summary(model1)
>
> Call:
> glm(formula = site ~ gender * colony * cohort, family = binomial)
>
> Deviance Residuals:
> Min 1Q Median 3Q Max
> -1.84648 -0.96954 -0.00036 1.11269 2.03933
>
> Coefficients: (12 not defined because of singularities)
> Estimate Std. Error z value
> (Intercept) -1.657e+01 2.400e+03 -0.007
> gender1 -2.231e-01 9.220e-01 -0.242
> colonyother 9.531e-02 8.006e-01 0.119
> cohort2002 1.717e-08 3.393e+03 0.000
> cohort2003 1.766e+01 2.400e+03 0.007
> cohort2004 1.807e+01 2.400e+03 0.008
> cohort2006 1.697e+01 2.400e+03 0.007
> cohort2007 1.726e+01 2.400e+03 0.007
> cohort2008 1.606e+01 2.400e+03 0.007
> cohort2009 1.657e+01 2.400e+03 0.007
> cohort2010 1.587e+01 2.400e+03 0.007
> gender1:colonyother 9.163e-01 1.087e+00 0.843
> gender1:cohort2002 1.719e+01 2.400e+03 0.007
> gender1:cohort2003 -1.823e-01 1.713e+00 -0.106
> gender1:cohort2004 2.231e-01 1.329e+00 0.168
> gender1:cohort2006 -5.878e-01 1.586e+00 -0.371
> gender1:cohort2007 -6.454e-02 1.784e+00 -0.036
> gender1:cohort2008 8.881e-01 1.156e+00 0.768
> gender1:cohort2009 -2.817e-02 1.199e+00 -0.023
> gender1:cohort2010 NA NA NA
> colonyother:cohort2002 NA NA NA
> colonyother:cohort2003 NA NA NA
> colonyother:cohort2004 1.497e+01 1.697e+03 0.009
> colonyother:cohort2006 -1.707e+01 2.400e+03 -0.007
> colonyother:cohort2007 -7.885e-01 1.772e+00 -0.445
> colonyother:cohort2008 NA NA NA
> colonyother:cohort2009 NA NA NA
> colonyother:cohort2010 NA NA NA
> gender1:colonyother:cohort2002 NA NA NA
> gender1:colonyother:cohort2003 NA NA NA
> gender1:colonyother:cohort2004 -9.163e-01 2.400e+03 0.000
> gender1:colonyother:cohort2006 NA NA NA
> gender1:colonyother:cohort2007 -2.575e+00 2.379e+00 -1.082
> gender1:colonyother:cohort2008 NA NA NA
> gender1:colonyother:cohort2009 NA NA NA
> gender1:colonyother:cohort2010 NA NA NA
> Pr(>|z|)
> (Intercept) 0.994
> gender1 0.809
> colonyother 0.905
> cohort2002 1.000
> cohort2003 0.994
> cohort2004 0.994
> cohort2006 0.994
> cohort2007 0.994
> cohort2008 0.995
> cohort2009 0.994
> cohort2010 0.995
> gender1:colonyother 0.399
> gender1:cohort2002 0.994
> gender1:cohort2003 0.915
> gender1:cohort2004 0.867
> gender1:cohort2006 0.711
> gender1:cohort2007 0.971
> gender1:cohort2008 0.442
> gender1:cohort2009 0.981
> gender1:cohort2010 NA
> colonyother:cohort2002 NA
> colonyother:cohort2003 NA
> colonyother:cohort2004 0.993
> colonyother:cohort2006 0.994
> colonyother:cohort2007 0.656
> colonyother:cohort2008 NA
> colonyother:cohort2009 NA
> colonyother:cohort2010 NA
> gender1:colonyother:cohort2002 NA
> gender1:colonyother:cohort2003 NA
> gender1:colonyother:cohort2004 1.000
> gender1:colonyother:cohort2006 NA
> gender1:colonyother:cohort2007 0.279
> gender1:colonyother:cohort2008 NA
> gender1:colonyother:cohort2009 NA
> gender1:colonyother:cohort2010 NA
>
> (Dispersion parameter for binomial family taken to be 1)
>
> Null deviance: 311.91 on 224 degrees of freedom
> Residual deviance: 271.61 on 201 degrees of freedom
> (774 observations deleted due to missingness)
> AIC: 319.61
>
> Number of Fisher Scoring iterations: 15
> /
>
> I thought that perhaps keeping the gender as explanatory variable was
> reducing a lot the sample size and it was the matter (I removed it):
>
>
>
> /> model1<-glm(site~colony*cohort,binomial)
> > summary(model1)
>
> Call:
> glm(formula = site ~ colony * cohort, family = binomial)
>
> Deviance Residuals:
> Min 1Q Median 3Q Max
> -1.8683 -0.9712 -0.8757 1.3468 1.7470
>
> Coefficients: (6 not defined because of singularities)
> Estimate Std. Error z value Pr(>|z|)
> (Intercept) -15.5661 1455.3976 -0.011 0.991
> colonyother 0.2544 0.2167 1.174 0.240
> cohort2000 14.4675 1455.3981 0.010 0.992
> cohort2001 16.8188 1455.3978 0.012 0.991
> cohort2002 15.9715 1455.3977 0.011 0.991
> cohort2003 16.6647 1455.3977 0.011 0.991
> cohort2004 17.1194 1455.3976 0.012 0.991
> cohort2006 15.2607 1455.3976 0.010 0.992
> cohort2007 14.9470 1455.3976 0.010 0.992
> cohort2008 15.3837 1455.3976 0.011 0.992
> cohort2009 15.1762 1455.3976 0.010 0.992
> cohort2010 14.8053 1455.3976 0.010 0.992
> colonyother:cohort2000 NA NA NA NA
> colonyother:cohort2001 NA NA NA NA
> colonyother:cohort2002 NA NA NA NA
> colonyother:cohort2003 NA NA NA NA
> colonyother:cohort2004 13.7583 550.0887 0.025 0.980
> colonyother:cohort2006 -15.5151 1455.3976 -0.011 0.991
> colonyother:cohort2007 -0.9163 0.5979 -1.533 0.125
> colonyother:cohort2008 NA NA NA NA
> colonyother:cohort2009 -0.6912 0.5214 -1.326 0.185
> colonyother:cohort2010 NA NA NA NA
>
> (Dispersion parameter for binomial family taken to be 1)
>
> Null deviance: 1361.4 on 998 degrees of freedom
> Residual deviance: 1267.9 on 983 degrees of freedom
> AIC: 1299.9
>
> Number of Fisher Scoring iterations: 14
> /
>
> Any comment/suggestion on this?
> Thanks for any help
>
> --
> View this message in context:
> http://r.789695.n4.nabble.com/Binomial-GLM-chisq-test-or-tp4608941.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]