gheine at blm.gov
2011-Feb-04 21:33 UTC
[R] Quadratic regression: estimating the maximizing value
A bioligist colleague sent me the following data. x Y 3 1 7 5 14 8 24 0 (Yes, only four data points.) I don't know much about the application, but apparently there are good empirical reasons to use a quadratic model. The goal is to find the X value which maximizes the response Y, and to find a confidence interval for this X value. Finding the maximizing X value is pretty straightforward:>Ldat <- data.frame("X"=c(3,7,14,24), "Y"=c(1,5,8,0)) >(LM<-lm(formula = Y ~ X + I(X^2), data = Ldat))Call: lm(formula = Y ~ X + I(X^2), data = Ldat) Coefficients: (Intercept) X I(X^2) -3.86978 1.77762 -0.06729> DZ<-function(B,C) { (-B)/(2*C) } # Solve d/dx(A + Bx + Cx^2) = 0 > DZ(LM$coefficients[2],LM$coefficients[3])X 13.20961 To find a confidence interval, I used "confint()". Default confidence level of 95% was not useful; used 80% instead, and then computed DZ with the extreme X and I(X^2) coefficients:>(CI80<-confint(LM,level=0.8))10 % 90 % (Intercept) -5.6147948 -2.12476138 X 1.4476460 2.10759306 I(X^2) -0.0790312 -0.05553898> DZ(CI80[2,1],CI80[3,1])[1] 9.1587> DZ(CI80[2,2],CI80[3,2])[1] 18.97400 Conclusion: the 80% confidence level for the maximizing X value is included in the range (9.158, 18.974) ################# Questions: 1) Is this the right procedure, or totally off base? 2) The coefficient of the "Intercept" is irrelevant to calculating the maximizing value of X. Is there a way to reduce the size of the confidence interval by doing a computation that leaves out this parameter? 3) I believe that confint() indicates the axes of an ellipsoid, rather than the corners of a box, in parameter space; so that the above procedure is (slightly) too conservative. 4) Where are the calculations for confint() documented ? Thanks, George Heine (Embedded image moved to file: pic23206.gif)Please consider the environment before printing this page <>=<>=<>=<>=<>=<> =<>=<>=<>=<>=<> George Heine, PhD Mathematical Analyst National Operations Center Bureau of Land Management voice (303) 236-0099 fax (303) 236-1974 cell (303) 905-5296 <>=<>=<>=<>=<>=<> =<>=<>=<>=<>=<>
No, your approach is not correct. For one you have not taken the covariance between B and C into account, another thing is that the ratio of 2 normal random variables is not necessarily normal, in some cases it can even follow a Cauchy distribution. Also note that with only 1 degree of freedom the Central Limit Theorem is not justified for using normal theory for non normal distributions. So the normal based confidence intervals on the coefficients are only reasonable if you are certain that the true error distribution is normal or nearly normal. Some possibilities that may work for you: Use the nls function with the curve parameterized to use the vertex point as one of the parameters (still need normality). Do bootstrapping (with only 4 points you are not going to get much with non-parametric bootstrap, but parametric with some assumptions may work). Use a Bayesian approach (this may be best if you can find some meaningful prior information). -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.snow at imail.org 801.408.8111> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- > project.org] On Behalf Of gheine at blm.gov > Sent: Friday, February 04, 2011 2:34 PM > To: r-help at r-project.org > Subject: [R] Quadratic regression: estimating the maximizing value > > > A bioligist colleague sent me the following data. > > x Y > 3 1 > 7 5 > 14 8 > 24 0 > > (Yes, only four data points.) I don't know much about the > application, but apparently there are good empirical > reasons to use a quadratic model. > > The goal is to find the X value which maximizes the > response Y, and to find a confidence interval for this X > value. > > Finding the maximizing X value is pretty straightforward: > > >Ldat <- data.frame("X"=c(3,7,14,24), "Y"=c(1,5,8,0)) > >(LM<-lm(formula = Y ~ X + I(X^2), data = Ldat)) > Call: > lm(formula = Y ~ X + I(X^2), data = Ldat) > > Coefficients: > (Intercept) X I(X^2) > -3.86978 1.77762 -0.06729 > > > DZ<-function(B,C) { (-B)/(2*C) } # Solve d/dx(A + Bx + Cx^2) = 0 > > DZ(LM$coefficients[2],LM$coefficients[3]) > X > 13.20961 > > > To find a confidence interval, I used "confint()". > Default confidence level of 95% was not useful; used 80% instead, > and then computed DZ with the extreme X and I(X^2) coefficients: > > >(CI80<-confint(LM,level=0.8)) > > 10 % 90 % > (Intercept) -5.6147948 -2.12476138 > X 1.4476460 2.10759306 > I(X^2) -0.0790312 -0.05553898 > > > DZ(CI80[2,1],CI80[3,1]) > [1] 9.1587 > > DZ(CI80[2,2],CI80[3,2]) > [1] 18.97400 > > Conclusion: the 80% confidence level for the maximizing X value is > included in the range (9.158, 18.974) > > ################# > Questions: > > 1) Is this the right procedure, or totally off base? > > 2) The coefficient of the "Intercept" is irrelevant to calculating > the maximizing value of X. Is there a way to reduce the size of > the confidence interval by doing a computation that leaves out this > parameter? > > 3) I believe that confint() indicates the axes of an ellipsoid, > rather than the corners of a box, in parameter space; > so that the above procedure is (slightly) too conservative. > > 4) Where are the calculations for confint() documented ? > > Thanks, > George Heine > > (Embedded image moved to file: pic23206.gif)Please consider the > environment > before printing this page > > > > <>=<>=<>=<>=<>=<> > =<>=<>=<>=<>=<> > > George Heine, PhD > > Mathematical Analyst > > National Operations > Center > > Bureau of Land Management > > voice (303) 236-0099 > > fax (303) 236-1974 > > cell (303) 905-5296 > > <>=<>=<>=<>=<>=<> > =<>=<>=<>=<>=<> >