Jonathon Kopecky
2007-Jan-29 19:52 UTC
[R] Need to fit a regression line using orthogonal residuals
I'm trying to fit a simple linear regression of just Y ~ X, but both X and Y are noisy. Thus instead of fitting a standard linear model minimizing vertical residuals, I would like to minimize orthogonal/perpendicular residuals. I have tried searching the R-packages, but have not found anything that seems suitable. I'm not sure what these types of residuals are typically called (they seem to have many different names), so that may be my trouble. I do not want to use Principal Components Analysis (as was answered to a previous questioner a few years ago), I just want to minimize the combined noise of my two variables. Is there a way for me to do this in R? Jonathon Kopecky University of Michigan
Bill.Venables at csiro.au
2007-Jan-31 01:36 UTC
[R] Need to fit a regression line using orthogonal residuals
Jonathon Kopecky asks: -----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Jonathon Kopecky Sent: Tuesday, 30 January 2007 5:52 AM To: r-help at stat.math.ethz.ch Subject: [R] Need to fit a regression line using orthogonal residuals I'm trying to fit a simple linear regression of just Y ~ X, but both X and Y are noisy. Thus instead of fitting a standard linear model minimizing vertical residuals, I would like to minimize orthogonal/perpendicular residuals. I have tried searching the R-packages, but have not found anything that seems suitable. I'm not sure what these types of residuals are typically called (they seem to have many different names), so that may be my trouble. I do not want to use Principal Components Analysis (as was answered to a previous questioner a few years ago), I just want to minimize the combined noise of my two variables. Is there a way for me to do this in R? [WNV] There's always a way if you are prepared to program it. Your question is a bit like asking "Is there a way to do this in Fortran?" The most direct way to do it is to define a function that gives you the sum of the perpendicular distances and minimise it using, say, optim(). E.g. ppdis <- function(b, x, y) sum((y - b[1] - b[2]*x)^2/(1+b[2]^2)) b0 <- lsfit(x, y)$coef # initial value op <- optim(b0, ppdis, method = "BFGS", x=x, y=y) op # now to check the results plot(x, y, asp = 1) # why 'asp = 1'?? exercise abline(b0, col = "red") abline(op$par, col = "blue") There are a couple of things about this you should be aware of, though First, this is just a fiddly way of finding the first principal component, so your desire not to use Principal Component Analysis is somewhat thwarted, as it must be. Second, the result is sensitive to scale - if you change the scales of either x or y, e.g. from lbs to kilograms, the answer is different. This also means that unless your measurement units for x and y are comparable it's hard to see how the result can make much sense. A related issue is that you have to take some care when plotting the result or orthogonal distances will not appear to be orthogonal. Third, the resulting line is not optimal for either predicting y for a new x or x from a new y. It's hard to see why it is ever of much interest. Bill Venables. Jonathon Kopecky University of Michigan ______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
davidr at rhotrading.com
2007-Jan-31 16:04 UTC
[R] Need to fit a regression line using orthogonal residuals
This problem also comes up in financial hedging problems, but usually the 'errors' need not be of comparable size, so Errors in Variables or Total Least Squares might be used. David L. Reiner Rho Trading Securities, LLC Chicago IL 60605 312-362-4963 -----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Prof Brian Ripley Sent: Wednesday, January 31, 2007 1:57 AM To: Bill.Venables at csiro.au Cc: jkopecky at umich.edu; r-help at stat.math.ethz.ch Subject: Re: [R] Need to fit a regression line using orthogonal residuals Just to pick up> Third, the resulting line is not optimal for either predicting y for a > new x or x from a new y. It's hard to see why it is ever of much > interest.It is not a regression (and hence the subject line was misleading), but it does come up in errors-in-variables problems. Suppose you have two sets of measurements of the same quantity with the same variance of measurement error and you want a line calibrating set 2 to set 1. Then the optimal (in the sense of MLE, for example) line is this one, and it is symmetrical in the two sets. Now those are rather specific assumptions but they do come up in some problems in physics and analytical chemistry, and the result goes back to the 19th century. In the 1980s I implemented a version which allowed for unequal (but known) heteroskedastic error variances which is quite popular in analytical chemistry. The literature is patchy: Fuller's `Measurement Error Models' covers the general area, and I recall this being in Sprent's (1969) book `Models in Regression and Related Topics'. See also the thread starting at http://tolstoy.newcastle.edu.au/R/help/00a/0285.html almost 7 years ago. If that is the thread Jonathon Kopecky refers to (how are we to know?) then he is misquoting me: I said it was the same thing as using the first principal component, not an alternative proposal. On Wed, 31 Jan 2007, Bill.Venables at csiro.au wrote:> Jonathon Kopecky asks: > > -----Original Message----- > From: r-help-bounces at stat.math.ethz.ch > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of JonathonKopecky> Sent: Tuesday, 30 January 2007 5:52 AM > To: r-help at stat.math.ethz.ch > Subject: [R] Need to fit a regression line using orthogonal residuals > > I'm trying to fit a simple linear regression of just Y ~ X, but both X > and Y are noisy. Thus instead of fitting a standard linear model > minimizing vertical residuals, I would like to minimize > orthogonal/perpendicular residuals. I have tried searching the > R-packages, but have not found anything that seems suitable. I'm not > sure what these types of residuals are typically called (they seem to > have many different names), so that may be my trouble. I do not wantto> use Principal Components Analysis (as was answered to a previous > questioner a few years ago), I just want to minimize the combinednoise> of my two variables. Is there a way for me to do this in R?> [WNV] There's always a way if you are prepared to program it. Your > question is a bit like asking "Is there a way to do this in Fortran?" > The most direct way to do it is to define a function that gives youthe> sum of the perpendicular distances and minimise it using, say,optim().> E.g. > ppdis <- function(b, x, y) sum((y - b[1] - b[2]*x)^2/(1+b[2]^2)) > b0 <- lsfit(x, y)$coef # initial value > op <- optim(b0, ppdis, method = "BFGS", x=x, y=y) > op # now to check the results > plot(x, y, asp = 1) # why 'asp = 1'?? exercise > abline(b0, col = "red") > abline(op$par, col = "blue") > There are a couple of things about this you should be aware of, though > First, this is just a fiddly way of finding the first principal > component, so your desire not to use Principal Component Analysis is > somewhat thwarted, as it must be. > Second, the result is sensitive to scale - if you change the scales of > either x or y, e.g. from lbs to kilograms, the answer is different. > This also means that unless your measurement units for x and y are > comparable it's hard to see how the result can make much sense. A > related issue is that you have to take some care when plotting the > result or orthogonal distances will not appear to be orthogonal. > Third, the resulting line is not optimal for either predicting y for a > new x or x from a new y. It's hard to see why it is ever of much > interest. > Bill Venables.-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 ______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.