Dear R Wizards, After much frustration and days of confusion I have finally broken down and am asking for help, which I don’t like doing, but I just can’t figure this one out on my own. I’ve conducted a laboratory experiment testing the effects of temperature and salinity on whether or not a biological event will occur (Go or NoGo). I’ve coded the factors temperature and salinity as factors for the binomial glm, and I haven’t had any trouble fitting the model and checking assumptions. I am however having trouble with the predict.glm function. I want to create a graph using my data that is similar to the one produced by the budworm example at the bottom of the predict.glm R documentation. In my case I want temperature on the x axis, probability on the y axis, and the lines on the graph to represent the probability of the event occurring at the different salinities tested at the different temperatures. I created a smaller version of my data and have included it and the R code I used below. I get two main problems that I hope you’re willing to help with. 1. When I input the text argument the output in the graph gives salinity values that are superimposed on one another. And, the values don’t seem to make sense – they are returned as probabilities of either 1.0 or 0. 2. When I input the lines argument I get the following error messages: Error: variable 'fTemp' was fitted with type "factor" but type "numeric" was supplied In addition: Warning message: In model.frame.default(Terms, newdata, na.action = na.action, xlev object$xlevels) : variable 'fTemp' is not a factor grrrrrrrrrr Pleasehelp<-read.table("Rhelp.txt",h=T) attach(Pleasehelp) fix(Pleasehelp) Temp Sal Go Total 5 34 1 1 5 34 1 1 5 34 1 1 5 21 1 1 5 21 1 1 5 21 0 1 10 34 1 1 10 34 0 1 10 34 1 1 10 21 1 1 10 21 0 1 10 21 0 1 15 34 0 1 15 34 0 1 15 34 0 1 15 21 0 1 15 21 0 1 15 21 0 1 fTemp<-factor(Temp) fSal<-factor(Sal) Go<-Go NoGo<-Total-Go Went<-cbind(Go,NoGo) DF<-data.frame(fTemp,fSal,Went,Total) glm<-glm(Went~fTemp+fSal+fTemp*fSal,family="binomial",data=DF) require(graphics) plot(c(5,15),c(0,1),type="n",xlab="Temperature",ylab="Probability of going") text(Temp,Go/Total,as.character(Sal)) ld<-(seq(5,15,1)) lines(ld,predict(glm,data.frame(fTemp=ld,fSal=factor(rep("34",length(ld)),levels=levels(fSal))),type="response")) Thank you very much for your time and expertise! Kindly, Chad [[alternative HTML version deleted]]
ONKELINX, Thierry
2012-Dec-12 16:17 UTC
[R] help with predict.glm, and charting with factors
Dear Chad, Did you post your entire dataset? If so: 1) your model is too complex for the amount of data you have. See the quotes below... 2) There is complete separation, leading to large parameter estimates and fits very close to 0 and 1 (in terms of probabilities) 3) You fit temperature as a factor, thus removing all qualitative aspects and ordering in that variable. Though this makes sense given your experiment, it removes the possibility to make predictions for other temperatures. 4) Learn not to use attach(). It will bite you if you don't use it with extreme care. 5) Try to get advise from a local statistician. Best regards, ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium + 32 2 525 02 51 + 32 54 43 61 85 Thierry.Onkelinx at inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -----Oorspronkelijk bericht----- Van: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] Namens Chad Widmer Verzonden: woensdag 12 december 2012 13:50 Aan: r-help at r-project.org Onderwerp: [R] help with predict.glm, and charting with factors Dear R Wizards, After much frustration and days of confusion I have finally broken down and am asking for help, which I don't like doing, but I just can't figure this one out on my own. I've conducted a laboratory experiment testing the effects of temperature and salinity on whether or not a biological event will occur (Go or NoGo). I've coded the factors temperature and salinity as factors for the binomial glm, and I haven't had any trouble fitting the model and checking assumptions. I am however having trouble with the predict.glm function. I want to create a graph using my data that is similar to the one produced by the budworm example at the bottom of the predict.glm R documentation. In my case I want temperature on the x axis, probability on the y axis, and the lines on the graph to represent the probability of the event occurring at the different salinities tested at the different temperatures. I created a smaller version of my data and have included it and the R code I used below. I get two main problems that I hope you're willing to help with. 1. When I input the text argument the output in the graph gives salinity values that are superimposed on one another. And, the values don't seem to make sense - they are returned as probabilities of either 1.0 or 0. 2. When I input the lines argument I get the following error messages: Error: variable 'fTemp' was fitted with type "factor" but type "numeric" was supplied In addition: Warning message: In model.frame.default(Terms, newdata, na.action = na.action, xlev object$xlevels) : variable 'fTemp' is not a factor grrrrrrrrrr Pleasehelp<-read.table("Rhelp.txt",h=T) attach(Pleasehelp) fix(Pleasehelp) Temp Sal Go Total 5 34 1 1 5 34 1 1 5 34 1 1 5 21 1 1 5 21 1 1 5 21 0 1 10 34 1 1 10 34 0 1 10 34 1 1 10 21 1 1 10 21 0 1 10 21 0 1 15 34 0 1 15 34 0 1 15 34 0 1 15 21 0 1 15 21 0 1 15 21 0 1 fTemp<-factor(Temp) fSal<-factor(Sal) Go<-Go NoGo<-Total-Go Went<-cbind(Go,NoGo) DF<-data.frame(fTemp,fSal,Went,Total) glm<-glm(Went~fTemp+fSal+fTemp*fSal,family="binomial",data=DF) require(graphics) plot(c(5,15),c(0,1),type="n",xlab="Temperature",ylab="Probability of going") text(Temp,Go/Total,as.character(Sal)) ld<-(seq(5,15,1)) lines(ld,predict(glm,data.frame(fTemp=ld,fSal=factor(rep("34",length(ld)),levels=levels(fSal))),type="response")) Thank you very much for your time and expertise! Kindly, Chad [[alternative HTML version deleted]] * * * * * * * * * * * * * D I S C L A I M E R * * * * * * * * * * * * * Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document. The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document.
Chad- The plot you hope to create is meaningful when the variable temperature is treated as a continuous variable. Below is some code that treats temperature as a continuous variable and creates the plot. Note that temperature enters the model "linearly", and you may want to explore other possibilities for temperature beyond this one. -tgs Z <- textConnection(" Temp Sal Go Total 5 34 1 1 5 34 1 1 5 34 1 1 5 21 1 1 5 21 1 1 5 21 0 1 10 34 1 1 10 34 0 1 10 34 1 1 10 21 1 1 10 21 0 1 10 21 0 1 15 34 0 1 15 34 0 1 15 34 0 1 15 21 0 1 15 21 0 1 15 21 0 1") ddd <- read.table(Z,header=TRUE) close(Z) #Fit model glm1 <- glm(cbind(Go,Total-Go)~Temp*factor(Sal),data=ddd,family="binomial") #Calculate predicted values newdata1 <- data.frame(Temp = seq(5,15,length=50), Sal = rep(34,50)) pred34 <- predict(glm1,newdata=newdata1,type="response") newdata2 <- data.frame(Temp = seq(5,15,length=50), Sal = rep(21,50)) pred21 <- predict(glm1,newdata=newdata2, type="response") #Plot model predicted curves plot(c(5,15),c(0,1),type="n",xlab="Temperature",ylab="Probability of going") lines(newdata1$Temp,pred34,lwd=3,col="red") lines(newdata2$Temp,pred21,lwd=3,col="blue") #This next bit of code calculates the "observed" probabilities. I admit, it isn't the clearest code. ttt <- as.data.frame(xtabs(~Temp+Sal+Go,data=ddd)[,,2] / xtabs(~Temp+Sal+Total,data=ddd)[,,1]) ttt$X <- as.numeric(rownames(ttt)) #Plot observed probabilities points(ttt$X,ttt$"21",cex=4,lwd=2,pch=16,col="white") points(ttt$X,ttt$"34",cex=4,lwd=2,pch=16,col="white") points(ttt$X,ttt$"21",cex=4,lwd=2,col="blue") points(ttt$X,ttt$"34",cex=4,lwd=2,col="red") text(ttt$X,ttt$"21","21",col="blue") text(ttt$X,ttt$"34","34",col="red") On Wed, Dec 12, 2012 at 7:50 AM, Chad Widmer <cw647@st-andrews.ac.uk> wrote:> Dear R Wizards, > > After much frustration and days of confusion I have finally broken down and > am asking for help, which I don’t like doing, but I just can’t figure this > one out on my own. I’ve conducted a laboratory experiment testing the > effects of temperature and salinity on whether or not a biological event > will occur (Go or NoGo). I’ve coded the factors temperature and salinity > as factors for the binomial glm, and I haven’t had any trouble fitting the > model and checking assumptions. > > I am however having trouble with the predict.glm function. I want to > create a graph using my data that is similar to the one produced by the > budworm example at the bottom of the predict.glm R documentation. In my > case I want temperature on the x axis, probability on the y axis, and the > lines on the graph to represent the probability of the event occurring at > the different salinities tested at the different temperatures. > > I created a smaller version of my data and have included it and the R code > I used below. I get two main problems that I hope you’re willing to help > with. > > 1. When I input the text argument the output in the graph gives salinity > values that are superimposed on one another. And, the values don’t seem to > make sense – they are returned as probabilities of either 1.0 or 0. > > 2. When I input the lines argument I get the following error messages: > > Error: variable 'fTemp' was fitted with type "factor" but type "numeric" > was supplied > > In addition: Warning message: > > In model.frame.default(Terms, newdata, na.action = na.action, xlev > object$xlevels) : variable 'fTemp' is not a factor > > grrrrrrrrrr > > Pleasehelp<-read.table("Rhelp.txt",h=T) > > attach(Pleasehelp) > > fix(Pleasehelp) > > Temp Sal Go Total > > 5 34 1 1 > > 5 34 1 1 > > 5 34 1 1 > > 5 21 1 1 > > 5 21 1 1 > > 5 21 0 1 > > 10 34 1 1 > > 10 34 0 1 > > 10 34 1 1 > > 10 21 1 1 > > 10 21 0 1 > > 10 21 0 1 > > 15 34 0 1 > > 15 34 0 1 > > 15 34 0 1 > > 15 21 0 1 > > 15 21 0 1 > > 15 21 0 1 > > fTemp<-factor(Temp) > > fSal<-factor(Sal) > > Go<-Go > > NoGo<-Total-Go > > Went<-cbind(Go,NoGo) > > DF<-data.frame(fTemp,fSal,Went,Total) > > glm<-glm(Went~fTemp+fSal+fTemp*fSal,family="binomial",data=DF) > > require(graphics) > > plot(c(5,15),c(0,1),type="n",xlab="Temperature",ylab="Probability of > going") > > text(Temp,Go/Total,as.character(Sal)) > > ld<-(seq(5,15,1)) > > > lines(ld,predict(glm,data.frame(fTemp=ld,fSal=factor(rep("34",length(ld)),levels=levels(fSal))),type="response")) > > > > Thank you very much for your time and expertise! > > Kindly, > > Chad > > [[alternative HTML version deleted]] > > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >[[alternative HTML version deleted]]