Hello to all, I'm a biologist trying to tackle a "fish" (Poisson Regression) which is just too big for my modest understanding of stats!!! Here goes... I want to find good literature or proper mathematical procedure to calculate a confidence interval for an inverse prediction of a Poisson regression using R. I'm currently trying to analyse a "dose-response" experiment. I want to calculate the dose (X) for 50% inhibition of a biological response (Y). My "response" is a "count" data that fits a Poisson distribution perfectly. I could make my life easy and calculate: "dose response/control response" = % of total response... and then use logistic regression, but somehow, that doesn't sound right. Should I just stick to logistic regression and go on with my life? Can I be cured of this paranoia? ;-) I thought a Poisson regression would be more appropriate, but I don't know how to "properly" calculate the dose equivalent to 50% inhibition. i/e confidence intervals, etc on the "X" = dose. Basically an "inverse" prediction problem. By the way, my data is "graphically" linear for Log(Y) = log(X) where Y is counts and X is dose. I use a Poisson regression to fit my dose-response experiment by EXCLUDING the response for dose = 0, because of log(0) Under "R" => glm.dose <- glm(response[-1] ~ log(dose[-1]),family=poisson())(that's why you see the "dose[-1]" term. The "first" dose in the dose vector is 0. This is really a nice fit. I can obtain a nice slope (B) and intercept (A): log(Y) = B log(x) + A I do have a biological value for dose = 0 from my "control". i/e Ymax = some number with a Poisson error again So, what I want is EC50x : Y/Ymax = 0.5 = exp(B log(EC50x) + A) / Ymax exp((log(0.5) + Log(Ymax)) - A)/B) = EC50x That's all fine, except I don't have a clue on how to calculate the confidence intervals of EC50x or even if I can model this inverse prediction with a Poisson regression. In OLS linear regression, fitting X based on Y is not a good idea because of the way OLS calculates the slope and intercept. Is the same problem found in GLM/Poisson regression? Moreover, I also have a Poisson error on Ymax that I would have to consider, right? Help!!!! -- Vincent Philion, M.Sc. agr. Phytopathologiste Institut de Recherche et de D?veloppement en Agroenvironnement (IRDA) 3300 Sicotte, St-Hyacinthe Qu?bec J2S 7B8 t?l?phone: 450-778-6522 poste 233 courriel: vincent.philion at irda.qc.ca Site internet : www.irda.qc.ca
1. If you provide a toy data set with, e.g., 5 observations, to accompany your example, it would be much easier for people to try out ideas and then give you a more solid response. 2. Have you tried something like log(dose+0.5) or I(log(dose+0.5)) in your model statement in conjunction with "predict" or "predict.glm" on the output from "glm"? hope this helps. spencer graves Vincent Philion wrote:> Hello to all, I'm a biologist trying to tackle a "fish"(Poisson Regression) which is just too big for my modest understanding of stats!!!> > Here goes... > > I want to find good literature or proper mathematicalprocedure to calculate a confidence interval for an inverse prediction of a Poisson regression using R.> > I'm currently trying to analyse a "dose-response"experiment.> > I want to calculate the dose (X) for 50% inhibitionof a biological response (Y). My "response" is a "count" data that fits a Poisson distribution perfectly.> > I could make my life easy and calculate: "doseresponse/control response" = % of total response... and then use logistic regression, but somehow, that doesn't sound right.> > Should I just stick to logistic regression and goon with my life? Can I be cured of this paranoia?> ;-) > > I thought a Poisson regression would be moreappropriate, but I don't know how to "properly" calculate the dose equivalent to 50% inhibition. i/e confidence intervals, etc on the "X" = dose. Basically an "inverse" prediction problem.> > By the way, my data is "graphically" linear forLog(Y) = log(X) where Y is counts and X is dose.> > I use a Poisson regression to fit my dose-responseexperiment by EXCLUDING the response for dose = 0, because of log(0)> > Under "R" = > > >> glm.dose <- glm(response[-1] ~ log(dose[-1]),family=poisson()) > > > (that's why you see the "dose[-1]" term. The"first" dose in the dose vector is 0.> > This is really a nice fit. I can obtain a niceslope (B) and intercept (A):> > log(Y) = B log(x) + A > > I do have a biological value for dose = 0 frommy "control". i/e Ymax = some number with a Poisson error again> > So, what I want is EC50x : > > Y/Ymax = 0.5 = exp(B log(EC50x) + A) / Ymax > > exp((log(0.5) + Log(Ymax)) - A)/B) = EC50x > > That's all fine, except I don't have a clue on howto calculate the confidence intervals of EC50x or even if I can model this inverse prediction with a Poisson regression. In OLS linear regression, fitting X based on Y is not a good idea because of the way OLS calculates the slope and intercept. Is the same problem found in GLM/Poisson regression? Moreover, I also have a Poisson error on Ymax that I would have to consider, right?> > Help!!!! > >
Dear Prof. Ripley & M. Philion: First some commentary then questions for Prof. Ripley and M. Philion. COMMENTARY Prof. Ripley said, "to fit a curve of mean response vs dose, and find the dose at which the mean response is half of that at dose 0. That one is easy." Unfortunately, it is not obvious to me at the moment. From "www.r-project.org" -> search -> "R site search" -> "LD50", I found "dose.p", described on p. 193, sec. 7.2, of Venables and Ripley (2002) Modern Applied Statistics with S, 4th ed. (Springer). Then I cut the data set down to a size that I could easily play with, and fit Poisson regression: Phytopath <- data.frame(x=c(0, 0.03, 0.1), y=c(28, 21, 11)) (fitP100 <- glm(y~log(x+0.015), data=Phytopath[rep(1:3, 100),], family="poisson")) Call: glm(formula = y ~ log(x + 0.015), family = "poisson", data = Phytopath[rep(1:3, 100), ]) Coefficients: (Intercept) log(x + 0.015) 1.6088 -0.4203 (LD50P100 <- dose.p(fitP100, p=14)) Dose SE p = 14: -2.451018 0.04858572 To get a 95% confidence interval from this, in S-Plus 6.1, I did: exp(LD50 + c(-2,0, 2)*LD50 at SE)-0.015 [1] 0.01762321 0.07120579 0.21279601 R 1.7.1 seemed to require a different syntax, which I couldn't parse in my present insomniac state (3:20 AM in California). QUESTIONS: PROF. RIPLEY: Is this what you said was easy? M. PHILION: Does this provide sufficient information for you to now solve your problem? hope this helps. spencer graves Prof Brian Ripley wrote: > On Fri, 25 Jul 2003, Vincent Philion wrote: > > >>Hello and thank you for your interest in this problem. >> >>"real life data" would look like this: >> >>x y >>0 28 >>0.03 21 >>0.1 11 >>0.3 15 >>1 5 >>3 4 >>10 1 >>30 0 >>100 0 >> >>x y >>0 30 >>0.0025 30 >>0.02 25 >>0.16 25 >>1.28 10 >>10.24 0 >>81.92 0 >> >>X Y >>0 35 >>0.00025 23 >>0.002 14 >>0.016 6 >>0.128 5 >>1.024 3 >>8.192 2 >> >>X Y >>0 43 >>0.00025 35 >>0.002 20 >>0.016 16 >>0.128 11 >>1.024 6 >>8.192 0 >> >>Where X is dose and Y is response. >>the relation is linear for log(response) = b log(dose) + intercept > > > Is that log(*mean* response), that is a log link and exponential decay > with dose? > > >>Response for dose 0 is a "control" = Ymax. So, What I want is the dose >>for 50% response. For instance, in example 1: >> >>Ymax = 28 (this is also an observation with Poisson error) > > > Once you observe Ymax, Y is no longer Poisson. > > >>So I want dose for response = 14 = approx. 0.3 > > > What exactly is Ymax? Is it the response at dose 0? The mean response at > dose 0? The largest response? About the only thing I can actually > interpret is that you want to fit a curve of mean response vs dose, and > find the dose at which the mean response is half of that at dose 0. > That one is easy. > > I think you are confusing response with mean response, and we can't > disentangle them for you. >
Vincent: Here is a simple solution using Prof. Bates' non-linear least squares algorithm: Best, Ravi.> Phytopath <- data.frame(x=c(0, 0.03, 0.1), y=c(28, 21, 11))> Phyto.nls <- nls(y ~ Ymax/(1 + x/x50),data=Phytopath,start=list(Ymax=20.0,x50=0.01),trace=T) 404.3058 : 20.00 0.01 15.76932 : 27.96313636 0.04960484 2.043625 : 28.2145584 0.0694645 1.851401 : 28.33886844 0.07198951 1.851231 : 28.34892493 0.07185953 1.851230 : 28.34843670 0.07186804 1.851230 : 28.3484688 0.0718675> summary(Phyto.nls)Formula: y ~ Ymax/(1 + x/x50) Parameters: Estimate Std. Error t value Pr(>|t|) Ymax 28.34847 1.31522 21.554 0.0295 * x50 0.07187 0.01348 5.332 0.1180 --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Residual standard error: 1.361 on 1 degrees of freedom Correlation of Parameter Estimates: Ymax x50 -0.6001 ----- Original Message ----- From: Vincent Philion <vincent.philion at irda.qc.ca> Date: Friday, July 25, 2003 9:25 am Subject: Re: [R] inverse prediction and Poisson regression> Hi, ... and good morning! > > ;-) > > On 2003-07-25 08:43:35 -0400 Spencer Graves > <spencer.graves at PDF.COM> wrote: > > > The Poisson assumption means that Y is a number of > independent events from > > a theoretically infinite population occurring in a specific time > or place. > > The function "glm" with 'family="poisson"' with the default link > = "log" > > assumes that the logarithm of the mean of Y is a linear model in > the > > explanatory variable. > > OK, I think my data can fit that description. > > > > > How is Y measured? > > Y is the number of line intercepts which encounters mycelial > growth. i/e if mycelia intercepts the line twice, 2 is reported. > This follows poisson. > > If it the number out N, with N approximately 500 (and you know N), > > then you have a logistic regression situation. > > No, 500 spores can grow, but there is no "real" limit on the > amount of growth possible, and so no limit on the number of > intercepts. So this is why I adopted Poisson, not knowing how > complicated my life would become!!! > ;-) > > In that case, section 7.2 in > > Venables and Ripley (2002) should do what you want. If Y is a > percentage > > increase > > ... But you may be right, that I'm making this just too > complicated and that I should simply look at percentage... Any > comments on that? > > > > When dose = 0, log(dose) = (-Inf). Since 0 is a legitimate > dose, > > log(dose) is not acceptable in a model like this. You need a > model like > > Peter suggested. > > OK, I see I will need stronger coffee to tackle this, but I will > read this in depth today. > > Depending on you purpose, log(dose+0.015) might be > > sufficiently close to a model like what Peter suggested to > answer your > > question. If not, perhaps this solution will help you find a > better > > solution. > > In other words, "cheat" and model Y_0 with a "small" value = > log(0.015) ? How would this affect the LD50 value calculated and > the confidence intervals? I guess I could try several methods, but > how would I go about choosing the right one? Criteria? > > > I previously was able to get dose.p to work in R, and I just > now was able > > to compute from its output. The following worked in both S-Plus > 6.1 and R > > 1.7.1: > > > >> LD50P100p <- print(LD50P100) > > Dose SE > > p = 14: -2.451018 0.04858572 > >> exp(LD50P100p[1,1]+c(-2,0,2)*LD50P100p[1,2])-0.015 > > [1] 0.06322317 0.07120579 0.08000303 > > OK, I will need to try this (later today). I don't see "dose.p" in > this? > again, many thanks, > > -- > Vincent Philion, M.Sc. agr. > Phytopathologiste > Institut de Recherche et de D?veloppement en Agroenvironnement (IRDA) > 3300 Sicotte, St-Hyacinthe > Qu?bec > J2S 7B8 > > t?l?phone: 450-778-6522 poste 233 > courriel: vincent.philion at irda.qc.ca > Site internet : www.irda.qc.ca > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help >
Thanks Peter, for the wonderful illustration of various model-fitting options for the non-linear models. Thinking about your comment that the "variance stabilizing" transform does better than the weighted non-linear least-squars - I am interpreting this to mean that the residual sum of squares is smaller, 0.16 versus 0.64. A possible explanation may be that in the former case the responses are actually smaller because of the square-rooting (range about 3-5), so the residuals are smaller, whereas in the "gnls" case the response is the original variable (range about 10-30). Does this seem reasonable? Ravi. ----- Original Message ----- From: Peter Dalgaard BSA <p.dalgaard at biostat.ku.dk> Date: Friday, July 25, 2003 3:48 pm Subject: Re: [R] inverse prediction and Poisson regression> Spencer Graves <spencer.graves at pdf.com> writes: > > > The problem with nls is that it is NOT maximum likelihood for the > > Poisson distribution. For the Poisson, the standard deviation > is the > > square root of the mean, while nls assumes constant standard > > deviation. That's why I stayed with glm. The answers should be > > comparable, but I would prefer the glm answers. spencer graves > > That's why I was suggesting either a variance stabilizing > transformation or using gnls() with a weights function. In both cases > you need to watch out for scaling of SE's stemming from the fact that > the variance is supposed to be known in the Poisson distribution, but > the fitting routines estimate a sigma from the residuals anyway. > > I.e. for instance: > > > Phyto.nls2 <- nls(sqrt(y+.5) ~ sqrt(Ymax/(1 + > > x/x50)+.5),data=Phytopath,start=list(Ymax=20.0,x50=0.01),trace=T) > 9.400602 : 20.00 0.01 > 0.9064723 : 27.21511020 0.03593643 > 0.06338235 : 28.21654101 0.05995647 > 0.02616589 : 28.50221759 0.06815302 > 0.02608587 : 28.54871243 0.06835046 > 0.02608586 : 28.54960242 0.06834083 > 0.02608586 : 28.5495605 0.0683413 > > summary(Phyto.nls2) > > Formula: sqrt(y + 0.5) ~ sqrt(Ymax/(1 + x/x50) + 0.5) > > Parameters: > Estimate Std. Error t value Pr(>|t|) > Ymax 28.54956 1.65113 17.291 0.0368 * > x50 0.06834 0.01264 5.407 0.1164 > --- > Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 > > Residual standard error: 0.1615 on 1 degrees of freedom > > Correlation of Parameter Estimates: > Ymax > x50 -0.6833 > > but here you need to know that the theoretical sd is 0.5, so the Std. > errors all need multiplication by 0.5/0.1615. > > Using gnls() we'd get (if I got the calling sequence right...) > > > Phyto.gnls <- gnls(y ~ Ymax/(1 + x/x50), > + > data=Phytopath,weights=varPower(fixed=.5),start=list(Ymax=20.0,x50=0.01))> summary(Phyto.gnls)> Generalized nonlinear least squares fit > Model: y ~ Ymax/(1 + x/x50) > Data: Phytopath > AIC BIC logLik > 13.71292 11.00875 -3.856458 > > Variance function: > Structure: Power of variance covariate > Formula: ~fitted(.) > Parameter estimates: > power > 0.5 > > Coefficients: > Value Std.Error t-value p-value > Ymax 29.393796 2.4828723 11.838626 0.0536 > x50 0.063028 0.0125244 5.032376 0.1249 > > Correlation: > Ymax > x50 -0.84 > > Standardized residuals: > [1] -0.4894266 0.7621749 -0.4237346 > attr(,"std") > [1] 2.8478142 1.4239071 0.8586483 > attr(,"label") > [1] "Standardized residuals" > > Residual standard error: 0.6367906 > Degrees of freedom: 3 total; 1 residual > > and again, it is necessary to adjust the SE's by multiplying with > 1/0.6368 > > It is in fact rather easy to do direct MLE for this kind of model: > > > with(Phytopath, > > optim(c(28,.7),fn=function(p){Ymax<-p[1];x50<-p[2]; > + -sum(dpois(y,lambda=Ymax/(1 > + + x/x50),log=TRUE))}, method="BFGS", hessian=T)) > $par > [1] 28.55963487 0.06833524 > > $value > [1] 7.21251 > > $counts > function gradient > 42 13 > > $convergence > [1] 0 > > $message > NULL > > $hessian > [,1] [,2] > [1,] 0.07356072 6.631868 > [2,] 6.63186764 1294.792539 > > Warning message: > NaNs produced in: dpois(x, lambda, log) > > and we can get SE's from the inverse hessian with > > > sqrt(diag(solve(with(Phytopath, > + optim(c(28,.7),fn=function(p){Ymax<-p[1];x50<-p[2]; > + -sum(dpois(y,lambda=Ymax/(1 > + + x/x50),log=TRUE))}, method="BFGS", hessian=T))$hessian))) > [1] 5.02565893 0.03788052 > > Notice that the variance stabilizing transform seems to do rather > better than gnls() compared to true MLE. I'm slightly puzzled by this, > but presumably it has to do with the fact that MLE for a Gaussian > model with a mean/variance relationship is *not* identical to the > iteratively reweighted least squares that glm() et al. are doing. (I > wouldn't quite rule out that I've simply made a mistake somewhere, > though...) > > -- > O__ ---- Peter Dalgaard Blegdamsvej 3 > c/ /'_ --- Dept. of Biostatistics 2200 Cph. N > (*) \(*) -- University of Copenhagen Denmark Ph: (+45) > 35327918~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: > (+45) 35327907 >
Vincent, As long as you are going to digest all those suggestions, let me offer one more: If you want to avoid making an assumption about the functional form of the dose-response relationship (you don't even need to assume monotonicity!), and also avoid having to trust the asymptotic distributions of MLE's, a completely different approach is described in Stepwise Confidence Intervals without Mulitplicity Adjustment for Dose Response and Toxicity Studies Jason C. Hsu and Roger L. Berger June 99 issue of JASA In your case, this approach would involve a pairwise test for each positive dose population with the zero dose population (but without multiplicity adjustment, even though you are doing multiple tests!). So if you can pick a good test for comparing the means of two Poisson distributions (I'm sure there are plenty, though I'm not sure what I would recommend), then you can apply this method very easily. You lose the power of "pooling" that comes with the assumption of a functional relationship, but you may make up for this by getting more exact (not asymptotic) confidence bounds (and it's always comforting to make fewer assumptions). Just one more thing for you to stew on... Happy thinking, Jim Rogers> Hello to all: first and foremost: thank you for all this input. I onlydiscovered about "R" last week (!) and I think I will dump my SAS license!!!> > > ;-) > > > This is a very dynamic listserve! > You "R" all great! Thank you! > > > I just hope some day I can help out a student the way you did today. > > > I will spend part of the weekend studying the different suggestions indetail. Again, I'm not a stats person, so I will need some time and good coffee to digest all this correctly. (I'm most worried about understanding the nonlin suggestion.) Early next week, I will post a "summary" of the suggestions and the path I chose to follow. (with proper syntax Professor Ripley, I promise)> > > ;-) > > > Have a nice weekend > > > Best regards, > > > Vincent PhilionJames A. Rogers, Ph.D. <rogers at cantatapharm.com> Statistical Scientist Cantata Pharmaceuticals 300 Technology Square, 5th floor Cambridge, MA 02139 617.225.9009 x312 Fax 617.225.9010