jonathan_li@agilent.com
2003-Dec-03 22:31 UTC
[R] add a point to regression line and cook's distance
Hi, This is more a statistics question rather than R question. But I thought people on this list may have some pointers. MY question is like the following: I would like to have a robust regression line. The data I have are mostly clustered around a small range. So the regression line tend to be influenced strongly by outlier points (with large cook's distance). From the application 's background, I know that the line should pass (0,0), which is far away from the data cloud. I would like to add this point to have a more robust line. The question is: does it make sense to do this? what are the negative impacts if any? thanks, jonathan
Spencer Graves
2003-Dec-03 22:50 UTC
[R] add a point to regression line and cook's distance
What is the context? What do the "outliers" represent? If you think carefully about the context, you may find the answer. hope this helps. spencer graves p.s. I know statisticians who worked for HP before the split and who still work for either HP or Agilent, I'm not certain which. If you want to contact me off-line, I can give you a couple of names if that might help. jonathan_li at agilent.com wrote:>Hi, > >This is more a statistics question rather than R question. But I thought people on this list may have some pointers. > >MY question is like the following: >I would like to have a robust regression line. The data I have are mostly clustered around a small range. So >the regression line tend to be influenced strongly by outlier points (with large cook's distance). From the application >'s background, I know that the line should pass (0,0), which is far away from the data cloud. I would like to add this >point to have a more robust line. The question is: does it make sense to do this? what are the negative impacts if any? > >thanks, >jonathan > >______________________________________________ >R-help at stat.math.ethz.ch mailing list >https://www.stat.math.ethz.ch/mailman/listinfo/r-help > >
Wiener, Matthew
2003-Dec-04 01:03 UTC
[R] add a point to regression line and cook's distance
If you know that the line should pass through (0,0), would it make sense to do a regression without an intercept? You can do that by putting "-1" in the formula, like: lm(y ~ x - 1). Hope this helps, Matt Matthew Wiener RY84-202 Applied Computer Science & Mathematics Dept. Merck Research Labs 126 E. Lincoln Ave. Rahway, NJ 07065 732-594-5303 -----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Spencer Graves Sent: Wednesday, December 03, 2003 5:51 PM To: jonathan_li at agilent.com Cc: r-help at stat.math.ethz.ch Subject: Re: [R] add a point to regression line and cook's distance What is the context? What do the "outliers" represent? If you think carefully about the context, you may find the answer. hope this helps. spencer graves p.s. I know statisticians who worked for HP before the split and who still work for either HP or Agilent, I'm not certain which. If you want to contact me off-line, I can give you a couple of names if that might help. jonathan_li at agilent.com wrote:>Hi, > >This is more a statistics question rather than R question. But I thoughtpeople on this list may have some pointers.> >MY question is like the following: >I would like to have a robust regression line. The data I have are mostlyclustered around a small range. So>the regression line tend to be influenced strongly by outlier points (withlarge cook's distance). From the application>'s background, I know that the line should pass (0,0), which is far awayfrom the data cloud. I would like to add this>point to have a more robust line. The question is: does it make sense to dothis? what are the negative impacts if any?> >thanks, >jonathan > >______________________________________________ >R-help at stat.math.ethz.ch mailing list >https://www.stat.math.ethz.ch/mailman/listinfo/r-help > >______________________________________________ R-help at stat.math.ethz.ch mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
jonathan_li@agilent.com
2003-Dec-04 01:40 UTC
[R] add a point to regression line and cook's distance
It is likely that the "true" relationship is nonlinear. There isn't a priori knowledge about linearity. In the small range where we do have enough data, the relationship looks linear. Outside the range, the data are very scarse and have high level of noises too. This is why adding (0,0) to the data can potentially improve the fit a great deal. But at the same time, I have never heard people doing it this way. Jonathan -----Original Message----- From: Murray Jorgensen [mailto:maj at stats.waikato.ac.nz] Sent: Wednesday, December 03, 2003 5:18 PM To: Wiener, Matthew Cc: jonathan_li at agilent.com; r-help at stat.math.ethz.ch Subject: Re: [R] add a point to regression line and cook's distance Not a good idea, unless the regression function is *known* to be linear. More likely it is only approximately linear over small ranges. Murray Jorgensen Wiener, Matthew wrote:> If you know that the line should pass through (0,0), would it make sense to > do a regression without an intercept? You can do that by putting "-1" in > the formula, like: lm(y ~ x - 1). > > Hope this helps, > > Matt > > Matthew Wiener > RY84-202 > Applied Computer Science & Mathematics Dept. > Merck Research Labs > 126 E. Lincoln Ave. > Rahway, NJ 07065 > 732-594-5303 > > > -----Original Message----- > From: r-help-bounces at stat.math.ethz.ch > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Spencer Graves > Sent: Wednesday, December 03, 2003 5:51 PM > To: jonathan_li at agilent.com > Cc: r-help at stat.math.ethz.ch > Subject: Re: [R] add a point to regression line and cook's distance > > > What is the context? What do the "outliers" represent? If you > think carefully about the context, you may find the answer. > > hope this helps. spencer graves > p.s. I know statisticians who worked for HP before the split and who > still work for either HP or Agilent, I'm not certain which. If you want > to contact me off-line, I can give you a couple of names if that might > help. > > jonathan_li at agilent.com wrote: > > >>Hi, >> >>This is more a statistics question rather than R question. But I thought > > people on this list may have some pointers. > >>MY question is like the following: >>I would like to have a robust regression line. The data I have are mostly > > clustered around a small range. So > >>the regression line tend to be influenced strongly by outlier points (with > > large cook's distance). From the application > >>'s background, I know that the line should pass (0,0), which is far away > > from the data cloud. I would like to add this > >>point to have a more robust line. The question is: does it make sense to do > > this? what are the negative impacts if any? > >>thanks, >>jonathan >> >>______________________________________________ >>R-help at stat.math.ethz.ch mailing list >>https://www.stat.math.ethz.ch/mailman/listinfo/r-help >> >> > > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > >-- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: maj at waikato.ac.nz Fax 7 838 4155 Phone +64 7 838 4773 wk +64 7 849 6486 home Mobile 021 1395 862
jonathan_li at agilent.com wrote:> Hi, > > > MY question is like the following: > I would like to have a robust regression line. The data I have are> mostly clustered around a small range. So> the regression line tend to be influenced strongly by outlier points> (with large cook's distance). From the application's > background, I know that the line should pass (0,0), which is far > away from the data cloud. I would like to add this> point to have a more robust line. The question is:> does it make sense to do this? what are the negative impacts if any? Have you tried a more robust fit (ltsreg() in the package lqs springs to mind)? Using this, without forcing the intercept to zero, might give you some idea if your idea makes sense. Venables and Ripley (Modern Applied Statistics with S, Springer-Verlag, 2002) give a good introduction to robust linear models, and how to estimate their error distribution. Julian Faraway also gives an overview of the same, in his "Practical Regression and ANOVA using R". http://cran.r-project.org/doc/contrib/Faraway-PRA.pdf Hope that helps Jason -- Indigo Industrial Controls Ltd. http://www.indigoindustrial.co.nz 64-21-343-545 jasont at indigoindustrial.co.nz
One way of implementing some Bayesian techniques is to add data points based on prior knowledge. E.g., see Gelman, Carlin, Stern & Rubin, in "Bayesian Data Analysis" (1997) for how a prior on a regression parameter can be interpreted as an additional data point. (Section 8.9 in my 2000 reprint). hope this helps, Tony Plate At Wednesday 02:31 PM 12/3/2003 -0800, jonathan_li at agilent.com wrote:>Hi, > >This is more a statistics question rather than R question. But I thought >people on this list may have some pointers. > >MY question is like the following: >I would like to have a robust regression line. The data I have are mostly >clustered around a small range. So >the regression line tend to be influenced strongly by outlier points (with >large cook's distance). From the application >'s background, I know that the line should pass (0,0), which is far away >from the data cloud. I would like to add this >point to have a more robust line. The question is: does it make sense to >do this? what are the negative impacts if any? > >thanks, >jonathan > >______________________________________________ >R-help at stat.math.ethz.ch mailing list >https://www.stat.math.ethz.ch/mailman/listinfo/r-helpTony Plate tplate at acm.org