Heystek, A, Me <15418693@sun.ac.za>
2011-Sep-21  07:55 UTC
[R] Problem with predict and lines in plotting binomial glm
Problems with predict and lines in plotting binomial glm Dear R-helpers I have found quite a lot of tips on how to work with glm through this mailing list, but still have a problem that I can't solve. I have got a data set of which the x-variable is count data and the y-variable is proportional data, and I want to know what the relationship between the variables are. The data was overdispersed (the residual deviance is much larger than the residual degrees of freedom) therefore I am using the quasibinomial family, thus the y-variable is a matrix of successes and failures (20 trials for every sample, thus each y-variable row counts up to 20). x <- c(1200, 1200, 1200, 1200, 1200, 1200, 1200, 1200, 1800, 1800, 1800, 1800, 1800, 1800, 1800, 1800, 1800, 2400, 2400, 2400, 2400, 2400, 2400, 2400, 3000, 3000, 3600, 3600, 3600, 3600, 4200, 4200, 4800, 4800, 5400, 6600, 6600, 7200, 7800, 7800, 8400, 8400, 8400, 9000, 9600, 10200, 13200, 18000, 20400, 24000, 25200, 36600) successes <- c(6, 16, 11, 14, 11, 16, 13, 13, 14, 16, 12, 12, 11, 15, 12, 9, 7, 7, 17, 15, 13, 9, 9, 12, 14, 8, 9, 16, 7, 9, 14, 11, 8, 8, 13, 6, 16, 11, 9, 7, 9, 8, 4, 14, 7, 3, 3, 9, 12, 8, 4, 6) failures <- c(14, 4, 9, 6, 9, 4, 7, 7, 6, 4, 8, 8, 9, 5, 8, 11, 13, 13, 3, 5, 7, 11, 11, 8, 6, 12, 11, 4, 13, 11, 6, 9, 12, 12, 7, 14, 4, 9, 11, 13, 11, 12, 16, 6, 13, 17, 17, 11, 8, 12, 16, 14) y <- cbind(successes, failures) data <- data.frame(y, x) glm1 <- glm(y ~ x, family= quasibinomial, data= data) glm2 <- glm(y ~ log(x), family=quasibinomial, data= data) # residual deviance is lower with log transformed x-value plot(x, successes) lines(x, predict(glm1, type= "response"), lwd=2) Firstly, because of the skewed distribution of the x variable I am not sure whether it should be log transformed or not. When I do log transform it, the residual deviance and the p-value for the slope is lower. Either way, the lines command does not plot any line and neither does it give any error messages. On some of my other data it plots a line way below all the data points. From what I can gather, the predict function as it is now uses the fitted values because no newdata argument is specified. I want the line to be predicted from the same x-values. I tried two ways of adding the newdata argument: ## a data.frame using the original x-values lines(x, predict(glm2, type= "response", newdata= as.data.frame(x))) ## or a data.frame with values (the same length as y) from the range of x values newdf <- data.frame(seq(min(x), max(x), length=52)) lines(x, predict(glm2, type="response", newdata= newdf)) Only the second option plotted a line once, but then I could never get it to do the same again on a new plot even though I used the same variables and same code. Thank you very much for your time and patient Anina Heystek BSc Honours student Department of Botany and Zoology University of Stellenbosch, Stellenbosch, South Africa 15418693@sun.ac.za [[alternative HTML version deleted]]
Eik Vettorazzi
2011-Sep-21  09:17 UTC
[R] Problem with predict and lines in plotting binomial glm
Hi Anina, predict.glm returns predicted probabilities, when used with type="response", so you have to either scale the probs to the number of trials for any x or you plot probs from start: par(mfcol=c(1,2)) plot(x, successes) lines(x, (successes+failures)*predict(glm1, type= "response"), lwd=2) plot(x, successes/(successes+failures)) lines(x, predict(glm1, type= "response"), lwd=2) If you use predict with new data, please name the rows accordingly, when you actually want to use them: newdf <- data.frame(seq(min(x), max(x), length=20)) #you would expect 20 predictions, but *surprise* you get length(predict(glm2, type="response", newdata= newdf)) #52! newdf <- data.frame(x=seq(min(x), max(x), length=20)) length(predict(glm2, type="response", newdata= newdf)) #ok And you should use the appropriate x in "lines" as well: plot(x,successes/(successes+failures)) lines(newdf$x, predict(glm2, type="response", newdata= newdf),col="red") Cheers Am 21.09.2011 09:55, schrieb Heystek, A, Me <15418693 at sun.ac.za>:> Problems with predict and lines in plotting binomial glm > Dear R-helpers > > I have found quite a lot of tips on how to work with glm through this mailing list, but still have a problem that I can't solve. > I have got a data set of which the x-variable is count data and the y-variable is proportional data, and I want to know what the relationship between the variables are. > The data was overdispersed (the residual deviance is much larger than the residual degrees of freedom) therefore I am using the quasibinomial family, thus the y-variable is a matrix of successes and failures (20 trials for every sample, thus each y-variable row counts up to 20). > > x <- c(1200, 1200, 1200, 1200, 1200, 1200, 1200, 1200, 1800, 1800, 1800, 1800, 1800, 1800, 1800, 1800, 1800, 2400, 2400, 2400, 2400, 2400, 2400, 2400, 3000, 3000, 3600, 3600, 3600, 3600, 4200, 4200, 4800, 4800, 5400, 6600, 6600, 7200, 7800, 7800, 8400, 8400, 8400, 9000, 9600, 10200, 13200, 18000, 20400, 24000, 25200, 36600) > successes <- c(6, 16, 11, 14, 11, 16, 13, 13, 14, 16, 12, 12, 11, 15, 12, 9, 7, 7, 17, 15, 13, 9, 9, 12, 14, 8, 9, 16, 7, 9, 14, 11, 8, 8, 13, 6, 16, 11, 9, 7, 9, 8, 4, 14, 7, 3, 3, 9, 12, 8, 4, 6) > failures <- c(14, 4, 9, 6, 9, 4, 7, 7, 6, 4, 8, 8, 9, 5, 8, 11, 13, 13, 3, 5, 7, 11, 11, 8, 6, 12, 11, 4, 13, 11, 6, 9, 12, 12, 7, 14, 4, 9, 11, 13, 11, 12, 16, 6, 13, 17, 17, 11, 8, 12, 16, 14) > y <- cbind(successes, failures) > data <- data.frame(y, x) > > glm1 <- glm(y ~ x, family= quasibinomial, data= data) > glm2 <- glm(y ~ log(x), family=quasibinomial, data= data) # residual deviance is lower with log transformed x-value > plot(x, successes) > lines(x, predict(glm1, type= "response"), lwd=2) > > > > > Firstly, because of the skewed distribution of the x variable I am not sure whether it should be log transformed or not. When I do log transform it, the residual deviance and the p-value for the slope is lower. > Either way, the lines command does not plot any line and neither does it give any error messages. On some of my other data it plots a line way below all the data points. From what I can gather, the predict function as it is now uses the fitted values because no newdata argument is specified. I want the line to be predicted from the same x-values. I tried two ways of adding the newdata argument: > > ## a data.frame using the original x-values > lines(x, predict(glm2, type= "response", newdata= as.data.frame(x))) > ## or a data.frame with values (the same length as y) from the range of x values > newdf <- data.frame(seq(min(x), max(x), length=52)) > lines(x, predict(glm2, type="response", newdata= newdf)) > > Only the second option plotted a line once, but then I could never get it to do the same again on a new plot even though I used the same variables and same code. > > > Thank you very much for your time and patient > Anina Heystek > > BSc Honours student > Department of Botany and Zoology > University of Stellenbosch, Stellenbosch, South Africa > 15418693 at sun.ac.za > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Eik Vettorazzi Department of Medical Biometry and Epidemiology University Medical Center Hamburg-Eppendorf Martinistr. 52 20246 Hamburg T ++49/40/7410-58243 F ++49/40/7410-57790 -- Pflichtangaben gem?? Gesetz ?ber elektronische Handelsregister und Genossenschaftsregister sowie das Unternehmensregister (EHUG): Universit?tsklinikum Hamburg-Eppendorf; K?rperschaft des ?ffentlichen Rechts; Gerichtsstand: Hamburg Vorstandsmitglieder: Prof. Dr. J?rg F. Debatin (Vorsitzender), Dr. Alexander Kirstein, Joachim Pr?l?, Prof. Dr. Dr. Uwe Koch-Gromus