Dan E. Kelley
2000-Jun-03 15:32 UTC
[R] How to do linear regression with errors in x and y?
QUESTION: how should I do a linear regression in which there are errors in x as well as y? SUPPLEMENT: I've seen folks approach this problem by computing eigenvectors of the covariance matrix, and that makes sense to me. But I'm wondering if this has a "pedigree" (i.e. if it makes sense to folks on this list, and if it's something that has been published, so I can refer to it.) BACKGROUND: (I'm providing this for interest of readers, since I personally find such ancillary comments on this list to be quite intriguing.) My problem is something that comes up all the time in physics (in this case, fluid mechanics). I have measured variables, let's call them X and Y, and dimensional analysis suggests that these be scaled by Lx and Ly say, so the buckingham Pi theorem says that we must have Y/Ly = f(X/Lx, ...) where the ... is a list of nondimensional parameters of the problem. (As an aside, the X is depth below the ocean surface, Lx is the RMS height of waves on the surface, Y is a measure of the turbulence in the ocean, and Ly is related to the wind stress on the water surface. The ... is a list of parameters that includes how long the wind has been blowing; sailors will know that waves take a while to build up.) A power-law dependence, i.e. Y/Ly = (X/Lx)^alpha seems justified by theory, but the value of alpha is contentious and we seek to determine it empirically. (Engineers reading this will recognize that alpha=-1 is the so-called "law of the wall" for the decay of turbulence away from a frictional wall.) Thus, my approach is to try to fit a line like log(Y/Ly) ~ log(X/Lx) but since there are errors in (X,Y,Lx,Ly) (all of which rely on measurement), we emphatically have errors in both the dependent and independent variable. If our scaling is correct, X/Lx and Y/Ly are roughly of order unity. The data suggest log(X/Lx) and log(Y/Ly) have roughly comparable scatter. Thus, I'd be happy to state that the errors in the dependent and independent variables are comparable. And so my question becomes, on this assumption, how to fit a line through data in which both "x" and "y" have (equal) uncertainty. I'm thinking the eigenvector approach is fine. Comments? -- Dan E. Kelley phone:(902)494-1694 Oceanography Department, Dalhousie University fax:(902)494-2885 Halifax, Nova Scotia mailto:Dan.Kelley at Dal.CA Canada B3H 4J1 http://www.phys.ocean.dal.ca/~kelley/Kelley_Dan.html -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Jan de Leeuw
2000-Jun-03 17:49 UTC
[R] How to do linear regression with errors in x and y?
Distinguished pedigree. Karl Pearson On Lines and Planes of Closest Fit to Systems of Points in Space Phil Mag. 2, 1901, 559-572. Goes back even further (to Adcock, around 1875). At 12:32 -0300 06/03/2000, Dan E. Kelley wrote:>QUESTION: how should I do a linear regression in which there are >errors in x as well as y? > >SUPPLEMENT: I've seen folks approach this problem by computing >eigenvectors of the covariance matrix, and that makes sense to me. >But I'm wondering if this has a "pedigree" (i.e. if it makes sense to >folks on this list, and if it's something that has been published, so >I can refer to it.) > >BACKGROUND: (I'm providing this for interest of readers, since I >personally find such ancillary comments on this list to be quite >intriguing.) My problem is something that comes up all the time in >physics (in this case, fluid mechanics). I have measured variables, >let's call them X and Y, and dimensional analysis suggests that these >be scaled by Lx and Ly say, so the buckingham Pi theorem says that we >must have > > Y/Ly = f(X/Lx, ...) > >where the ... is a list of nondimensional parameters of the problem. >(As an aside, the X is depth below the ocean surface, Lx is the RMS >height of waves on the surface, Y is a measure of the turbulence in >the ocean, and Ly is related to the wind stress on the water surface. >The ... is a list of parameters that includes how long the wind has >been blowing; sailors will know that waves take a while to build up.) > >A power-law dependence, i.e. > > Y/Ly = (X/Lx)^alpha > >seems justified by theory, but the value of alpha is contentious and >we seek to determine it empirically. (Engineers reading this will >recognize that alpha=-1 is the so-called "law of the wall" for the >decay of turbulence away from a frictional wall.) > >Thus, my approach is to try to fit a line like > > log(Y/Ly) ~ log(X/Lx) > >but since there are errors in (X,Y,Lx,Ly) (all of which rely on >measurement), we emphatically have errors in both the dependent and >independent variable. If our scaling is correct, X/Lx and Y/Ly are >roughly of order unity. The data suggest log(X/Lx) and log(Y/Ly) have >roughly comparable scatter. > >Thus, I'd be happy to state that the errors in the dependent and >independent variables are comparable. And so my question becomes, on >this assumption, how to fit a line through data in which both "x" and >"y" have (equal) uncertainty. I'm thinking the eigenvector approach >is fine. Comments? > >-- >Dan E. Kelley phone:(902)494-1694 >Oceanography Department, Dalhousie University fax:(902)494-2885 >Halifax, Nova Scotia mailto:Dan.Kelley at Dal.CA >Canada B3H 4J1 http://www.phys.ocean.dal.ca/~kelley/Kelley_Dan.html > >-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- >r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html >Send "info", "help", or "[un]subscribe" >(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch >_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._-- ==Jan de Leeuw; Professor and Chair, UCLA Department of Statistics; US mail: 8142 Math Sciences Bldg, Box 951554, Los Angeles, CA 90095-1554 phone (310)-825-9550; fax (310)-206-5658; email: deleeuw at stat.ucla.edu http://www.stat.ucla.edu/~deleeuw and http://home1.gte.net/datamine/ =========================================================================== No matter where you go, there you are. --- Buckaroo Banzai http://webdev.stat.ucla.edu/sounds/nomatter.au ===========================================================================-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Marc R. Feldesman
2000-Jun-03 18:06 UTC
[R] How to do linear regression with errors in x and y?
This is referred to in *my* trade as a Model II regression and is fit by finding either the major axis slope or the reduced major axis slope. We find the RMA slope using principal components analysis of the covariance matrix - the ratio of eigenvectors of x & y variables form the major axis slopes; we get the reduced major axis slope by dividing the linear regression slope by the correlation coefficient for x & y. The original approach to this type of regression traces to at least Haldane and Kermack in 1950. At 12:32 PM 6/3/00 -0300, Dan E. Kelley wrote: >QUESTION: how should I do a linear regression in which there are >errors in x as well as y? > Dr. Marc R. Feldesman email: feldesmanm at pdx.edu email: feldesman at ibm.net fax: 503-725-3905 "Don't know where I'm going. Don't like where I've been. There may be no exit. But hell, I'm going in." Jimmy Buffett Powered by Superchoerus - the 700 MHz Coppermine Box -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Hello Dan, There are extensive sections about errors in variables in my on-line econometrics class notes at http://www.econ.utah.edu/ehrbar/ecmet.pdf (this is a 5 MB pdf file). Maybe this has something interesting for you. Hans Ehrbar. -- Hans G. Ehrbar ehrbar at econ.utah.edu Economics Department, University of Utah (801) 581 7797 (my office) 1645 Campus Center Dr., Rm 308 (801) 581 7481 (econ office) Salt Lake City UT 84112-9300 (801) 585 5649 (FAX) -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Prof Brian D Ripley
2000-Jun-04 05:38 UTC
[R] How to do linear regression with errors in x and y?
On Sat, 3 Jun 2000, Dan E. Kelley wrote:> QUESTION: how should I do a linear regression in which there are > errors in x as well as y?By definition, that is not a linear *regression*. More precisely, what you should do depends critically on the assumptions and purpose of the analysis. For example, for a calibration problem regression of x on y (that is least-squares fitting) is still a good idea. And it depends on whether the observed x values were controlled or the true values or if this is a random sample of (x,y)'s. In what I think you want there is a true linear relationship and both x and y are measured with error, and you are interested in the relationship. That's called a linear functional relationship model. (Econometricians use structural models, the radnom-sample version.) [...]> Thus, I'd be happy to state that the errors in the dependent and > independent variables are comparable. And so my question becomes, on > this assumption, how to fit a line through data in which both "x" and > "y" have (equal) uncertainty. I'm thinking the eigenvector approach > is fine. Comments?As Jan de Leeuw has already commented, this is an extremely well re-discovered result, going back to Adcock ca 1872. But minor variations still seem unknown (and I once wrote a paper on the variation in which the uncertainty in x and y depend on the true value, as occurs in analytical chemistry). There is a whole book on this and related ideas: @Book{Fuller.87, author = "Fuller, W.", title = "Measurement Error Models", publisher = "Wiley", year = "1987", } and you will find treatments in a few linear models books, AFAIR those by G.A.F. Seber and P. Sprent especially. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UK Fax: +44 1865 272595 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._