Hi guys, I'm very new to R and have been teaching myself over the past few months - it's a great tool and I'm hoping to use it to analyse my PhD data.As I'm a bit of a newb, I'd really appreciate any feedback and/or guidance with regards to the following questions that relate to generalized linearmixed modelling (or, at least, I think they do!)(if there is a 'better', more appropriate way that I could attempt to answer my questions, please let me know). I've spent a lot of time researching this approach on the internet, but can'tseem to find any directly applicable examples. Thanks in advance, and, if you need any further information, please let me know. # My experiment:# I have 1 site on 3 different rivers (independent)(sites 1,2 and 3). # I visit each site 2 times (time 1 and 2). # On each visit, I take 5x replicate insect samples and calculate total abundance for each replicate.# Site 1 is in an area called "yellow" and sites 2 and 3 are in an area called "blue". # My data frame: data=data.frame(site=c(rep(1,10),rep(2,10),rep(3,10)),replicate=c(rep(1:5,6)),time=c(rep(1,5),rep(2,5),rep(1,5),rep(2,5),rep(1,5),rep(2,5)),abundance=c(1,2,1,2,1,2,1,2,1,2,30,32,30,32,30,32,30,32,30,32,30,31,33,32,31,31,33,32,31,32),sitetype=c(rep("yellow",10),rep("blue",20))) data$site=factor(data$site)data$replicate=factor(data$replicate)data$time=factor(data$time) data # Initial remarks: # As each replicate (1-5) was taken from within each site (1-3) on both sampling times (1-2),# I figure that 'replicate' should be treated as nested within 'site' and that both should be treated as random factors? # First question: Is there is difference in abundance between sites?# Second question: Is there is difference in abundance between sitetypes (blue or yellow)? #If my 'initial remarks' statement is correct (please tell me if not), then I think a generalized linear mixed model is appropriate and would be something along these lines: # Fitting the model: require(lme4) glmm1=glmer(abundance~time+sitetype+(1|site/replicate),family="poisson",data=data) #I chose to use poisson as abundance is count data... not sure if that's a good reason... summary(glmm1) #Output: ################################################################Generalized linear mixed model fit by the Laplace approximation Formula: abundance ~ time + sitetype + (1 | site/replicate) Data: data AIC BIC logLik deviance 12.31 19.31 -1.153 2.306Random effects: Groups Name Variance Std.Dev. replicate:site (Intercept) 0 0 site (Intercept) 0 0 Number of obs: 30, groups: replicate:site, 15; site, 3 Fixed effects: Estimate Std. Error z value Pr(>|z|) (Intercept) 3.43579 0.05641 60.91 <2e-16 ***time2 0.01560 0.07900 0.20 0.843 sitetypeyellow -3.03815 0.26127 -11.63 <2e-16 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Correlation of Fixed Effects: (Intr) time2 time2 -0.706 sitetypyllw -0.108 0.000################################################################ # Inferences: #I'm unsure how to assess the variance and std dev scores for site... some guidance here would be appreciated....i.e. how do I answer my original question: Is there is difference in abundance between sites? #There is no statistically significant difference between the two time periods (P=>0.05) #Using the above output, the model suggests that there is a statistically significant difference between site types (p<0.05) # Further questions: #1 Are the above inferences correct? #2 I have read about overdispersion.... how would I test for this in this example? #3 How could I build an interaction term into the model and answer the following: "Is there a statistically significant site*time interaction?" #4 Finally, are there any obvious steps or things I should be doing in order to get a 'robust' or 'correct' answer from this problem? i.e. further tests... alternative models and comparisons... [[alternative HTML version deleted]]
You are more likely to get a helpful (or any) response on mixed models issues by posting to the r-sig-mixed-models list, not here. -- Bert On Thu, Dec 6, 2012 at 11:12 AM, Ben Gillespie <nebstah@hotmail.com> wrote:> Hi guys, > I'm very new to R and have been teaching myself over the past few months - > it's a great tool and I'm hoping to use it to analyse my PhD data.As I'm a > bit of a newb, I'd really appreciate any feedback and/or guidance with > regards to the following questions that relate to generalized linearmixed > modelling (or, at least, I think they do!)(if there is a 'better', more > appropriate way that I could attempt to answer my questions, please let me > know). > I've spent a lot of time researching this approach on the internet, but > can'tseem to find any directly applicable examples. > Thanks in advance, and, if you need any further information, please let me > know. > # My experiment:# I have 1 site on 3 different rivers (independent)(sites > 1,2 and 3). # I visit each site 2 times (time 1 and 2). # On each visit, I > take 5x replicate insect samples and calculate total abundance for each > replicate.# Site 1 is in an area called "yellow" and sites 2 and 3 are in > an area called "blue". > # My data frame: > > > data=data.frame(site=c(rep(1,10),rep(2,10),rep(3,10)),replicate=c(rep(1:5,6)),time=c(rep(1,5),rep(2,5),rep(1,5),rep(2,5),rep(1,5),rep(2,5)),abundance=c(1,2,1,2,1,2,1,2,1,2,30,32,30,32,30,32,30,32,30,32,30,31,33,32,31,31,33,32,31,32),sitetype=c(rep("yellow",10),rep("blue",20))) > > data$site=factor(data$site)data$replicate=factor(data$replicate)data$time=factor(data$time) > data > > # Initial remarks: # As each replicate (1-5) was taken from within each > site (1-3) on both sampling times (1-2),# I figure that 'replicate' should > be treated as nested within 'site' and that both should be treated as > random factors? > # First question: Is there is difference in abundance between sites?# > Second question: Is there is difference in abundance between sitetypes > (blue or yellow)? > #If my 'initial remarks' statement is correct (please tell me if > not), then I think a generalized linear mixed model is appropriate and > would be something along these lines: > # Fitting the model: > require(lme4) > glmm1=glmer(abundance~time+sitetype+(1|site/replicate),family="poisson",data=data) > #I chose to use poisson as abundance is count data... not sure if > that's a good reason... summary(glmm1) > #Output: > ################################################################Generalized > linear mixed model fit by the Laplace approximation Formula: abundance ~ > time + sitetype + (1 | site/replicate) Data: data AIC BIC logLik > deviance 12.31 19.31 -1.153 2.306Random effects: Groups Name > Variance Std.Dev. replicate:site (Intercept) 0 0 site > (Intercept) 0 0 Number of obs: 30, groups: > replicate:site, 15; site, 3 > Fixed effects: Estimate Std. Error z value Pr(>|z|) > (Intercept) 3.43579 0.05641 60.91 <2e-16 ***time2 > 0.01560 0.07900 0.20 0.843 sitetypeyellow -3.03815 0.26127 > -11.63 <2e-16 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ > 0.1 ‘ ’ 1 > Correlation of Fixed Effects: (Intr) time2 time2 -0.706 > sitetypyllw -0.108 > 0.000################################################################ > # Inferences: > #I'm unsure how to assess the variance and std dev scores > for site... some guidance here would be appreciated....i.e. how do I answer > my original question: Is there is difference in abundance between sites? > #There is no statistically significant difference between the two > time periods (P=>0.05) #Using the above output, the model > suggests that there is a statistically significant difference between site > types (p<0.05) > # Further questions: > #1 Are the above inferences correct? #2 I have > read about overdispersion.... how would I test for this in this example? > #3 How could I build an interaction term into the model and > answer the following: "Is there a statistically significant site*time > interaction?" #4 Finally, are there any obvious steps or things I > should be doing in order to get a 'robust' or 'correct' answer from this > problem? i.e. further tests... alternative models and comparisons... > > [[alternative HTML version deleted]] > > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >-- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm [[alternative HTML version deleted]]
Thanks Bert - I was unaware of that list. I have since posted my questions there (in plain text). ________________________________> Date: Thu, 6 Dec 2012 12:50:05 -0800 > Subject: Re: [R] lme4 glmer general help wanted - code included > From: gunter.berton at gene.com > To: nebstah at hotmail.com > > Incidentally, you are explicitly asked NOT to post in HTML. > > -- Bert > > On Thu, Dec 6, 2012 at 12:26 PM, Bert Gunter > <bgunter at gene.com<mailto:bgunter at gene.com>> wrote: > You are more likely to get a helpful (or any) response on mixed models > issues by posting to the r-sig-mixed-models list, not here. > > -- Bert > > On Thu, Dec 6, 2012 at 11:12 AM, Ben Gillespie > <nebstah at hotmail.com<mailto:nebstah at hotmail.com>> wrote: > Hi guys, > I'm very new to R and have been teaching myself over the past few > months - it's a great tool and I'm hoping to use it to analyse my PhD > data.As I'm a bit of a newb, I'd really appreciate any feedback and/or > guidance with regards to the following questions that relate to > generalized linearmixed modelling (or, at least, I think they do!)(if > there is a 'better', more appropriate way that I could attempt to > answer my questions, please let me know). > I've spent a lot of time researching this approach on the internet, but > can'tseem to find any directly applicable examples. > Thanks in advance, and, if you need any further information, please let > me know. > # My experiment:# I have 1 site on 3 different rivers > (independent)(sites 1,2 and 3). # I visit each site 2 times (time 1 and > 2). # On each visit, I take 5x replicate insect samples and calculate > total abundance for each replicate.# Site 1 is in an area called > "yellow" and sites 2 and 3 are in an area called "blue". > # My data frame: > > data=data.frame(site=c(rep(1,10),rep(2,10),rep(3,10)),replicate=c(rep(1:5,6)),time=c(rep(1,5),rep(2,5),rep(1,5),rep(2,5),rep(1,5),rep(2,5)),abundance=c(1,2,1,2,1,2,1,2,1,2,30,32,30,32,30,32,30,32,30,32,30,31,33,32,31,31,33,32,31,32),sitetype=c(rep("yellow",10),rep("blue",20))) > data$site=factor(data$site)data$replicate=factor(data$replicate)data$time=factor(data$time) > data > > # Initial remarks: # As each replicate (1-5) was taken from within each > site (1-3) on both sampling times (1-2),# I figure that 'replicate' > should be treated as nested within 'site' and that both should be > treated as random factors? > # First question: Is there is difference in abundance between sites?# > Second question: Is there is difference in abundance between sitetypes > (blue or yellow)? > #If my 'initial remarks' statement is correct (please tell me > if not), then I think a generalized linear mixed model is appropriate > and would be something along these lines: > # Fitting the model: > require(lme4) > glmm1=glmer(abundance~time+sitetype+(1|site/replicate),family="poisson",data=data) > #I chose to use poisson as abundance is count data... not sure if > that's a good reason... summary(glmm1) > #Output: > ################################################################Generalized > linear mixed model fit by the Laplace approximation Formula: abundance > ~ time + sitetype + (1 | site/replicate) Data: data AIC BIC > logLik deviance 12.31 19.31 -1.153 2.306Random effects: Groups > Name Variance Std.Dev. replicate:site (Intercept) 0 0 > site (Intercept) 0 0 Number of obs: 30, > groups: replicate:site, 15; site, 3 > Fixed effects: Estimate Std. Error z value Pr(>|z|) > (Intercept) 3.43579 0.05641 60.91 <2e-16 ***time2 > 0.01560 0.07900 0.20 0.843 sitetypeyellow -3.03815 > 0.26127 -11.63 <2e-16 ***---Signif. codes: 0 ?***? 0.001 ?**? 0.01 > ?*? 0.05 ?.? 0.1 ? ? 1 > Correlation of Fixed Effects: (Intr) time2 time2 > -0.706 sitetypyllw -0.108 > 0.000################################################################ > # Inferences: > #I'm unsure how to assess the variance and std dev > scores for site... some guidance here would be appreciated....i.e. how > do I answer my original question: Is there is difference in abundance > between sites? #There is no statistically significant > difference between the two time periods (P=>0.05) #Using > the above output, the model suggests that there is a statistically > significant difference between site types (p<0.05) > # Further questions: > #1 Are the above inferences correct? #2 I > have read about overdispersion.... how would I test for this in this > example? #3 How could I build an interaction term into the > model and answer the following: "Is there a statistically significant > site*time interaction?" #4 Finally, are there any obvious steps > or things I should be doing in order to get a 'robust' or 'correct' > answer from this problem? i.e. further tests... alternative models and > comparisons... > > [[alternative HTML version deleted]] > > > ______________________________________________ > R-help at r-project.org<mailto:R-help at r-project.org> mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > > > > -- > > Bert Gunter > Genentech Nonclinical Biostatistics > > Internal Contact Info: > Phone: 467-7374 > Website: > http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm > > > > > > -- > > Bert Gunter > Genentech Nonclinical Biostatistics > > Internal Contact Info: > Phone: 467-7374 > Website: > http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm > >
Seemingly Similar Threads
- randomly select another observation with same grouping factor and year value, do for every record in dataframe
- F-Tests in generalized linear mixed models (GLMM)
- predict function problem for glmmPQL
- Some questions to GLMM
- Independent variable dependent on offset in GLMM