Dear R Wizards,
After much frustration and days of confusion I have finally broken down and
am asking for help, which I don’t like doing, but I just can’t figure this
one out on my own. I’ve conducted a laboratory experiment testing the
effects of temperature and salinity on whether or not a biological event
will occur (Go or NoGo). I’ve coded the factors temperature and salinity
as factors for the binomial glm, and I haven’t had any trouble fitting the
model and checking assumptions.
I am however having trouble with the predict.glm function. I want to
create a graph using my data that is similar to the one produced by the
budworm example at the bottom of the predict.glm R documentation. In my
case I want temperature on the x axis, probability on the y axis, and the
lines on the graph to represent the probability of the event occurring at
the different salinities tested at the different temperatures.
I created a smaller version of my data and have included it and the R code
I used below. I get two main problems that I hope you’re willing to help
with.
1. When I input the text argument the output in the graph gives salinity
values that are superimposed on one another. And, the values don’t seem to
make sense – they are returned as probabilities of either 1.0 or 0.
2. When I input the lines argument I get the following error messages:
Error: variable 'fTemp' was fitted with type "factor" but type
"numeric"
was supplied
In addition: Warning message:
In model.frame.default(Terms, newdata, na.action = na.action, xlev
object$xlevels) : variable 'fTemp' is not a factor
grrrrrrrrrr
Pleasehelp<-read.table("Rhelp.txt",h=T)
attach(Pleasehelp)
fix(Pleasehelp)
Temp Sal Go Total
5 34 1 1
5 34 1 1
5 34 1 1
5 21 1 1
5 21 1 1
5 21 0 1
10 34 1 1
10 34 0 1
10 34 1 1
10 21 1 1
10 21 0 1
10 21 0 1
15 34 0 1
15 34 0 1
15 34 0 1
15 21 0 1
15 21 0 1
15 21 0 1
fTemp<-factor(Temp)
fSal<-factor(Sal)
Go<-Go
NoGo<-Total-Go
Went<-cbind(Go,NoGo)
DF<-data.frame(fTemp,fSal,Went,Total)
glm<-glm(Went~fTemp+fSal+fTemp*fSal,family="binomial",data=DF)
require(graphics)
plot(c(5,15),c(0,1),type="n",xlab="Temperature",ylab="Probability
of going")
text(Temp,Go/Total,as.character(Sal))
ld<-(seq(5,15,1))
lines(ld,predict(glm,data.frame(fTemp=ld,fSal=factor(rep("34",length(ld)),levels=levels(fSal))),type="response"))
Thank you very much for your time and expertise!
Kindly,
Chad
[[alternative HTML version deleted]]
ONKELINX, Thierry
2012-Dec-12 16:17 UTC
[R] help with predict.glm, and charting with factors
Dear Chad,
Did you post your entire dataset? If so:
1) your model is too complex for the amount of data you have. See the quotes
below...
2) There is complete separation, leading to large parameter estimates and fits
very close to 0 and 1 (in terms of probabilities)
3) You fit temperature as a factor, thus removing all qualitative aspects and
ordering in that variable. Though this makes sense given your experiment, it
removes the possibility to make predictions for other temperatures.
4) Learn not to use attach(). It will bite you if you don't use it with
extreme care.
5) Try to get advise from a local statistician.
Best regards,
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium
+ 32 2 525 02 51
+ 32 54 43 61 85
Thierry.Onkelinx at inbo.be
www.inbo.be
To call in the statistician after the experiment is done may be no more than
asking him to perform a post-mortem examination: he may be able to say what the
experiment died of.
~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data.
~ Roger Brinner
The combination of some data and an aching desire for an answer does not ensure
that a reasonable answer can be extracted from a given body of data.
~ John Tukey
-----Oorspronkelijk bericht-----
Van: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
Namens Chad Widmer
Verzonden: woensdag 12 december 2012 13:50
Aan: r-help at r-project.org
Onderwerp: [R] help with predict.glm, and charting with factors
Dear R Wizards,
After much frustration and days of confusion I have finally broken down and am
asking for help, which I don't like doing, but I just can't figure this
one out on my own. I've conducted a laboratory experiment testing the
effects of temperature and salinity on whether or not a biological event will
occur (Go or NoGo). I've coded the factors temperature and salinity as
factors for the binomial glm, and I haven't had any trouble fitting the
model and checking assumptions.
I am however having trouble with the predict.glm function. I want to create a
graph using my data that is similar to the one produced by the budworm example
at the bottom of the predict.glm R documentation. In my case I want temperature
on the x axis, probability on the y axis, and the lines on the graph to
represent the probability of the event occurring at the different salinities
tested at the different temperatures.
I created a smaller version of my data and have included it and the R code I
used below. I get two main problems that I hope you're willing to help
with.
1. When I input the text argument the output in the graph gives salinity values
that are superimposed on one another. And, the values don't seem to make
sense - they are returned as probabilities of either 1.0 or 0.
2. When I input the lines argument I get the following error messages:
Error: variable 'fTemp' was fitted with type "factor" but type
"numeric"
was supplied
In addition: Warning message:
In model.frame.default(Terms, newdata, na.action = na.action, xlev
object$xlevels) : variable 'fTemp' is not a factor
grrrrrrrrrr
Pleasehelp<-read.table("Rhelp.txt",h=T)
attach(Pleasehelp)
fix(Pleasehelp)
Temp Sal Go Total
5 34 1 1
5 34 1 1
5 34 1 1
5 21 1 1
5 21 1 1
5 21 0 1
10 34 1 1
10 34 0 1
10 34 1 1
10 21 1 1
10 21 0 1
10 21 0 1
15 34 0 1
15 34 0 1
15 34 0 1
15 21 0 1
15 21 0 1
15 21 0 1
fTemp<-factor(Temp)
fSal<-factor(Sal)
Go<-Go
NoGo<-Total-Go
Went<-cbind(Go,NoGo)
DF<-data.frame(fTemp,fSal,Went,Total)
glm<-glm(Went~fTemp+fSal+fTemp*fSal,family="binomial",data=DF)
require(graphics)
plot(c(5,15),c(0,1),type="n",xlab="Temperature",ylab="Probability
of going")
text(Temp,Go/Total,as.character(Sal))
ld<-(seq(5,15,1))
lines(ld,predict(glm,data.frame(fTemp=ld,fSal=factor(rep("34",length(ld)),levels=levels(fSal))),type="response"))
Thank you very much for your time and expertise!
Kindly,
Chad
[[alternative HTML version deleted]]
* * * * * * * * * * * * * D I S C L A I M E R * * * * * * * * * * * * *
Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en
binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is
door een geldig ondertekend document.
The views expressed in this message and any annex are purely those of the writer
and may not be regarded as stating an official position of INBO, as long as the
message is not confirmed by a duly signed document.
Chad-
The plot you hope to create is meaningful when the variable temperature is
treated as a continuous variable. Below is some code that treats
temperature as a continuous variable and creates the plot. Note that
temperature enters the model "linearly", and you may want to explore
other
possibilities for temperature beyond this one.
-tgs
Z <- textConnection("
Temp Sal Go Total
5 34 1 1
5 34 1 1
5 34 1 1
5 21 1 1
5 21 1 1
5 21 0 1
10 34 1 1
10 34 0 1
10 34 1 1
10 21 1 1
10 21 0 1
10 21 0 1
15 34 0 1
15 34 0 1
15 34 0 1
15 21 0 1
15 21 0 1
15 21 0 1")
ddd <- read.table(Z,header=TRUE)
close(Z)
#Fit model
glm1 <-
glm(cbind(Go,Total-Go)~Temp*factor(Sal),data=ddd,family="binomial")
#Calculate predicted values
newdata1 <- data.frame(Temp = seq(5,15,length=50), Sal = rep(34,50))
pred34 <- predict(glm1,newdata=newdata1,type="response")
newdata2 <- data.frame(Temp = seq(5,15,length=50), Sal = rep(21,50))
pred21 <- predict(glm1,newdata=newdata2, type="response")
#Plot model predicted curves
plot(c(5,15),c(0,1),type="n",xlab="Temperature",ylab="Probability
of going")
lines(newdata1$Temp,pred34,lwd=3,col="red")
lines(newdata2$Temp,pred21,lwd=3,col="blue")
#This next bit of code calculates the "observed" probabilities. I
admit,
it isn't the clearest code.
ttt <- as.data.frame(xtabs(~Temp+Sal+Go,data=ddd)[,,2] /
xtabs(~Temp+Sal+Total,data=ddd)[,,1])
ttt$X <- as.numeric(rownames(ttt))
#Plot observed probabilities
points(ttt$X,ttt$"21",cex=4,lwd=2,pch=16,col="white")
points(ttt$X,ttt$"34",cex=4,lwd=2,pch=16,col="white")
points(ttt$X,ttt$"21",cex=4,lwd=2,col="blue")
points(ttt$X,ttt$"34",cex=4,lwd=2,col="red")
text(ttt$X,ttt$"21","21",col="blue")
text(ttt$X,ttt$"34","34",col="red")
On Wed, Dec 12, 2012 at 7:50 AM, Chad Widmer <cw647@st-andrews.ac.uk>
wrote:
> Dear R Wizards,
>
> After much frustration and days of confusion I have finally broken down and
> am asking for help, which I don’t like doing, but I just can’t figure this
> one out on my own. I’ve conducted a laboratory experiment testing the
> effects of temperature and salinity on whether or not a biological event
> will occur (Go or NoGo). I’ve coded the factors temperature and salinity
> as factors for the binomial glm, and I haven’t had any trouble fitting the
> model and checking assumptions.
>
> I am however having trouble with the predict.glm function. I want to
> create a graph using my data that is similar to the one produced by the
> budworm example at the bottom of the predict.glm R documentation. In my
> case I want temperature on the x axis, probability on the y axis, and the
> lines on the graph to represent the probability of the event occurring at
> the different salinities tested at the different temperatures.
>
> I created a smaller version of my data and have included it and the R code
> I used below. I get two main problems that I hope you’re willing to help
> with.
>
> 1. When I input the text argument the output in the graph gives salinity
> values that are superimposed on one another. And, the values don’t seem to
> make sense – they are returned as probabilities of either 1.0 or 0.
>
> 2. When I input the lines argument I get the following error messages:
>
> Error: variable 'fTemp' was fitted with type "factor" but
type "numeric"
> was supplied
>
> In addition: Warning message:
>
> In model.frame.default(Terms, newdata, na.action = na.action, xlev >
object$xlevels) : variable 'fTemp' is not a factor
>
> grrrrrrrrrr
>
> Pleasehelp<-read.table("Rhelp.txt",h=T)
>
> attach(Pleasehelp)
>
> fix(Pleasehelp)
>
> Temp Sal Go Total
>
> 5 34 1 1
>
> 5 34 1 1
>
> 5 34 1 1
>
> 5 21 1 1
>
> 5 21 1 1
>
> 5 21 0 1
>
> 10 34 1 1
>
> 10 34 0 1
>
> 10 34 1 1
>
> 10 21 1 1
>
> 10 21 0 1
>
> 10 21 0 1
>
> 15 34 0 1
>
> 15 34 0 1
>
> 15 34 0 1
>
> 15 21 0 1
>
> 15 21 0 1
>
> 15 21 0 1
>
> fTemp<-factor(Temp)
>
> fSal<-factor(Sal)
>
> Go<-Go
>
> NoGo<-Total-Go
>
> Went<-cbind(Go,NoGo)
>
> DF<-data.frame(fTemp,fSal,Went,Total)
>
> glm<-glm(Went~fTemp+fSal+fTemp*fSal,family="binomial",data=DF)
>
> require(graphics)
>
>
plot(c(5,15),c(0,1),type="n",xlab="Temperature",ylab="Probability
of
> going")
>
> text(Temp,Go/Total,as.character(Sal))
>
> ld<-(seq(5,15,1))
>
>
>
lines(ld,predict(glm,data.frame(fTemp=ld,fSal=factor(rep("34",length(ld)),levels=levels(fSal))),type="response"))
>
>
>
> Thank you very much for your time and expertise!
>
> Kindly,
>
> Chad
>
> [[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
[[alternative HTML version deleted]]