Kevin J Emerson
2009-Mar-24 03:31 UTC
[R] confidence interval or error of x intercept of a linear regression
Hello all, This is something that I am sure has a really suave solution in R, but I can't quite figure out the best (or even a basic) way to do it. I have a simple linear regression that is fit with lm for which I would like to estimate the x intercept with some measure of error around it (confidence interval). In biology, there is the concept of a developmental zero - a temperature under which development will not happen. This is often estimated by extrapolation of a curve of developmental rate as a function of temperature. This developmental zero is generally reported without error. I intend to change this! There has to be some way to assign error to this term, I just have yet to figure it out. Now, it is simple enough to calculate the x-intercept itself ( - intercept / slope ), but it is a whole separate process to generate the confidence interval of it. I can't figure out how to propagate the error of the slope and intercept into the ratio of the two. The option option I have tried to figure out is to use the predict function to look for where the confidence intervals cross the axis but this hasn't been too fruitful either. I would greatly appreciate any insight you may be able to share. Cheers, Kevin Here is a small representative sample of some of my data where Dev.Rate ~ Temperature. t <- structure(list(Temperature = c(5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 9, 9, 10.5, 10.5, 10.5, 10.5, 10.5, 10.5, 10.5, 10.5, 10.5, 10.5, 10.5, 10.5, 10.5, 10.5, 10.5, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 14, 14, 14, 14, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 18, 18, 18, 18, 18, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 22, 22), Dev.Rate = c(0.007518797, 0.007518797, 0.007518797, 0.007194245, 0.007194245, 0.007194245, 0.007194245, 0.007194245, 0.007194245, 0.006896552, 0.006896552, 0.012820513, 0.012820513, 0.012195122, 0.012195122, 0.012195122, 0.012195122, 0.011363636, 0.011363636, 0.011363636, 0.011363636, 0.011363636, 0.011363636, 0.011363636, 0.010869565, 0.00952381, 0.00952381, 0.015151515, 0.015151515, 0.022727273, 0.022727273, 0.022727273, 0.022727273, 0.022727273, 0.022727273, 0.022727273, 0.022727273, 0.022727273, 0.022727273, 0.022727273, 0.022727273, 0.020833333, 0.020833333, 0.020833333, 0.034482759, 0.029411765, 0.029411765, 0.029411765, 0.029411765, 0.029411765, 0.029411765, 0.029411765, 0.029411765, 0.029411765, 0.029411765, 0.029411765, 0.029411765, 0.029411765, 0.029411765, 0.029411765, 0.027027027, 0.025, 0.038461538, 0.03030303, 0.03030303, 0.03030303, 0.052631579, 0.052631579, 0.045454545, 0.045454545, 0.045454545, 0.045454545, 0.045454545, 0.045454545, 0.045454545, 0.045454545, 0.045454545, 0.038461538, 0.038461538, 0.038461538, 0.038461538, 0.038461538, 0.038461538, 0.038461538, 0.038461538, 0.047619048, 0.047619048, 0.047619048, 0.047619048, 0.047619048, 0.0625, 0.0625, 0.0625, 0.0625, 0.0625, 0.0625, 0.0625, 0.0625, 0.0625, 0.0625, 0.052631579, 0.052631579, 0.052631579, 0.052631579, 0.052631579, 0.076923077, 0.071428571)), .Names = c("Temperature", "Dev.Rate"), class = "data.frame", row.names = c(NA, 107L)) -- ==================Kevin J Emerson Bradshaw-Holzapfel Lab Center for Ecology and Evolutionary Biology 1210 University of Oregon Eugene, Oregon 97403 kemerson at uoregon.edu
Thomas Lumley
2009-Mar-24 08:15 UTC
[R] confidence interval or error of x intercept of a linear regression
On Mon, 23 Mar 2009, Kevin J Emerson wrote:> Now, it is simple enough to calculate the x-intercept itself ( - intercept / > slope ), but it is a whole separate process to generate the confidence > interval of it. I can't figure out how to propagate the error of the slope > and intercept into the ratio of the two. The option option I have tried to > figure out is to use the predict function to look for where the confidence > intervals cross the axis but this hasn't been too fruitful either. >A few people have written code to do nonlinear transformations automatically. There's one in my 'survey' package, and I think there's an example in Venables & Ripley's "S Programming", and I'm sure there are others. For example> library(survey) > m<-lm(Dev.Rate~Temperature, data=t) > svycontrast(m, quote(-`(Intercept)`/Temperature))nlcon SE contrast 3.8061 0.1975 The `backquotes` are necessary because the coefficient name, (Intercept), isn't a legal variable name. The svycontrast() function works on anything that has coef() and vcov() methods. The actual delta-method computations are in survey:::nlcon -thomas Thomas Lumley Assoc. Professor, Biostatistics tlumley at u.washington.edu University of Washington, Seattle
Thomas Lumley
2009-Mar-24 08:18 UTC
[R] confidence interval or error of x intercept of a linear regression
I should probably also add the health warning that the delta-method approach to ratios of coefficients is REALLY BAD unless the denominator is well away from zero. When the denominator is near zero nothing really works (Fieller's method is valid but only because 'valid' means less than you might like for a confidence interval method, and the bootstrap is useful but not necessarily valid) -thomas Thomas Lumley Assoc. Professor, Biostatistics tlumley at u.washington.edu University of Washington, Seattle
(Ted Harding)
2009-Mar-24 10:04 UTC
[R] confidence interval or error of x intercept of a linear
On 24-Mar-09 03:31:32, Kevin J Emerson wrote:> Hello all, > > This is something that I am sure has a really suave solution in R, > but I can't quite figure out the best (or even a basic) way to do it. > > I have a simple linear regression that is fit with lm for which I > would like to estimate the x intercept with some measure of error > around it (confidence interval). In biology, there is the concept > of a developmental zero - a temperature under which development > will not happen. This is often estimated by extrapolation of a > curve of developmental rate as a function of temperature. This > developmental zero is generally reported without error. I intend > to change this! > There has to be some way to assign error to this term, I just have > yet to figure it out. > > Now, it is simple enough to calculate the x-intercept itself ( - > intercept / slope ), but it is a whole separate process to generate > the confidence interval of it. I can't figure out how to propagate > the error of the slope and intercept into the ratio of the two. > The option option I have tried to figure out is to use the predict > function to look for where the confidence intervals cross the axis > but this hasn't been too fruitful either. > > I would greatly appreciate any insight you may be able to share. > > Cheers, > Kevin > > Here is a small representative sample of some of my data where > Dev.Rate ~ Temperature.The difficulty with the situation you describe is that you are up against what is known as the "Calibration Problem" in Statistics. Essentially, the difficulty is that in the theory of the linear regression there is sufficient probability for the slope to be very close to zero -- and hence that the "X-intercept" maybe very far out to the left or the right -- that the whole dconcept of Standard error ceases to exist. Even the expected position of the intercept does not exist. And this is true regardless of the data (the one exception would be where it is absolutely certain that the data will lie exactly on a straight line). This has been addressed a number of times, and controversally, in the statistical literature. One approach which I like (having thought of it myself :)) is the use of "likelihood-based confidence intervals". Given a linear regression Y = a + b*X + E (where the error E has N(0, sig^2) distribution), suppose you want to estimate the value X0 of X for which Y = Y0, a given value (in your case Y0 = 0). For simplicity, measure X and Y from their means sum(X)/N and sum(Y)/N. The approach is based on comparing the likelihoods [A] for unresticted ML estimation of a, b and sig --> a.hat, b.hat, sig.hat (where a.hat = 0 because of the origins of X and Y) [B] for ML estimation assuming a particular value X1 for X0 --> a.tilde, b.tilde, sig.tilde where [A] a.hat = 0 (as above), b.hat = (sum(X*Y))/(sum(X^2) X0.hat = Y0/b.hat [ = -mean(Y)/b.hat in your case ] sig.hat^2 = (sum(Y - b.hat*X)^2)/(N+1) [B] a.tilde = (sum(X^2) - X0*sum(X*Y))/D b.tilde = ((N+1)*sum(X*Y) + N*X1*Y0)/D sig.hat.tilde^2 = (sum((Y - a.tilde - b.tilde*X)^2) + (Y0 - a.tilde - b.tilde*X1)^2)/(N+1) where D = (N+1)*sum(X^2 + N*X1^2 Then let R(X1) = (sig.hat^2)/(sig.tilde^2) A confidence interval for X0 is the set of values of X1 such that R(X1) > R0 say, where Prob(R(X0)>R0) = P, the desired confidence level, when X0 is the true value. It can be established that (N-2)*(1 - R(X0))/R(X0) has the F distribution with 1 and (N-1) degrees of freedom. Since this is independent of X0, the resulting set of values of X1 constitute a confidence interval. This was described in Harding, E.F. (1986). Modelling: the classical approach. The Statistician vol. 35, pp. 115-134 (see pages 123-126) and has been later referred to by P.J. Brown (thought I don't have the references to hand just now). When I have time for it (not today) I'll see if I can implement this neatly in R. It's basically a question of solving (N-2)*(1 - R(X0))/R(X0) = qf(P,1,(N-1)) for X0 (two solutions, maybe one, if any exist). However, it is quite possible that no solution exists (depending on P), so that the confidence interval would then be (-inf,+inf); or, when only one exists, it will be either (-inf,X0) or (X0,inf). These possibilities of infinite intervals are directly related to the possibility that, at significance level alpha = (1-P), the estimated slope may not differ significantly from 0. Maybe more later! Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 24-Mar-09 Time: 10:04:30 ------------------------------ XFMail ------------------------------