Jo,
It looks like farm is your level of replication, so you don't need to
specify farm as a random factor. A linear model 'lm' with binomial
errors
(a.k.a. logistic regression) is enough. You only need to specify different
error strata if, say, you had sampled each farm several times. Is that what
you mean by 'sampling cluster'?
BUT, there is very likely some spatial dependence among farms, so you will
also need to model this.
If you want to constrain the analysis, check out 'subset'.
Missing values: you have to remove farms with missing values from the
analysis. Look up 'na.omit'.
I think perhaps you need to consult a statistician at the Edinburgh stats
department to get info on the appropriate analyses, as the R-help list is
usually restricted to R-specific questions.
There is a massive amount of literature on agricultural epidemiology (esp.
following foot & mouth), so read up to see what has been done before.
Cheers,
Dan Bebber
Department of Plant Sciences
University of Oxford
South Parks Road
Oxford OX1 3RB
>
> Message: 4
> Date: Mon, 28 Mar 2005 12:06:25 +0100
> From: JEB Halliday <s0454869 at sms.ed.ac.uk>
> Subject: [R] glmmPQL questions
> To: r-help at stat.math.ethz.ch
> Message-ID: <1112007985.4247e531657c5 at sms.ed.ac.uk>
> Content-Type: text/plain; charset=ISO-8859-15
>
>
> I am looking a risk factors for disease in cattle and am
> interested in modelling
> farm and sampling cluster as random effects (My outcome is
> positive or negative
> at the level of the farm). I am using R version 2.0.1 on a Mac and have
> identified glmmPQL as hopefully the correct function to use. I have run a
> couple of models using this but was hoping that you might be able
> to answer a
> few questions.
>
> e.g. model<-glmmPQL(farmstatus~cattlenumber,random~1|farm,binomial)
>
> I am pretty new to both R and stats so if these questions are
> very simple and I
> am just missing something, suggestions about good texts on GLMM
> in R would be
> great.
>
> First up, what is the best way to constrain the model to only
> look at certain
> levels of a multi-level factor e.g. a categorisation of cattle
> number where all
> points of high influence
>
> (as determined using: summary(influence.measures(model)) )
>
> are confined to the largest class (D) and I want to run the model
> which just
> looks at levels A,B and C? (or only months May-September..)
>
> I was also wondering about the best way to force specified
> variables to remain
> in the model when running e.g. stepwise selection of interaction terms?
>
> Finally, is there is a recognised method for dealing with missing
> values in
> these models?
> and as a minor point the models do not run unless i specify the
> data= part of
> the syntax and as this is apparently an optional piece of
> information I was
> wondering why this is required when all of my variables are in
> the same data
> frame (and even when this data frame is attached?)
>
> Any help would be greatly appreciated
>
> Jo Halliday
> MSc student
> University of Edinburgh
> s0454869 at sms.ed.ac.uk
>