Dear R-users, There are tons of methods out there for fitting independant variables to a dependent variable. All stats books tell you about the assumptions behind OLS (ordinary least squares) and warn against abusive use of the method (which many of us do disregard by lack of a better knowledge). Most introductory text books stop there and don't tell you what the next best option might be. I am aware that there might be many depending on the type of study so here are the data to sort this question out. In this instance, I am performing a regression on observations whose residuals show heteroscedasticity (the variance of residuals is small for small dependant variable values and increases for larger ones), which violates one assumption of the OLS method. Which of the numerous options should I choose? glm, robust lm, ... The problem is kept simple for now. I only try to explain the log of local topographic slope (dependent variable) with regard to the distance to the outlet of a catchment (independent variable) for a fixed drained area. Both variables are continuous. I ordered Venables and Ripley 2002, which I suspect is a sound reading for advanced stats with R, but it has not arrived yet and I need to move on asap. Any advice or pointer to the appropriate literature is greatly appreciated. Thomas Dr Thomas Dewez ENTEC Post-Doctoral Fellow ARN - MAS BRGM (French Geological Survey) 3 Av. C. Guillemin 45000 Orleans - France Phone: +33 (0)2 38644606 Fax: +33 (0)2 38643361 *** Le contenu de cet e-mail et de ses pi??ces jointes est destin...{{dropped}}
Dear Thomas, I believe a GLS (Generalized Least Squares, known also as Aitken estimator) estimate could be used in case of heteroskedasticity of residuals. See: ? lm.gls in MASS package or ? gls in nlme package Another way is to study the relentionship between x (indipendent variable) and the variance of y (dependent variable) usig a graphic. I read, when I was a student, there some transformation which can reduce heteroskedasticity , but I don't remember particulars. For theory about GLS: http://cran.r-project.org/doc/contrib/Fox-Companion/appendix-timeseries-regression.pdf (here you can find some applications of GLS with R) http://www.sinica.edu.tw/as/ssrc/ckuan/pdf/et01/ch4.pdf http://jackman.stanford.edu/papers/gls.pdf Best Vito Dear R-users, There are tons of methods out there for fitting independant variables to a dependent variable. All stats books tell you about the assumptions behind OLS (ordinary least squares) and warn against abusive use of the method (which many of us do disregard by lack of a better knowledge). Most introductory text books stop there and don't tell you what the next best option might be. I am aware that there might be many depending on the type of study so here are the data to sort this question out. In this instance, I am performing a regression on observations whose residuals show heteroscedasticity (the variance of residuals is small for small dependant variable values and increases for larger ones), which violates one assumption of the OLS method. Which of the numerous options should I choose? glm, robust lm, ... The problem is kept simple for now. I only try to explain the log of local topographic slope (dependent variable) with regard to the distance to the outlet of a catchment (independent variable) for a fixed drained area. Both variables are continuous. I ordered Venables and Ripley 2002, which I suspect is a sound reading for advanced stats with R, but it has not arrived yet and I need to move on asap. Any advice or pointer to the appropriate literature is greatly appreciated. Thomas Dr Thomas Dewez ENTEC Post-Doctoral Fellow ARN - MAS BRGM (French Geological Survey) 3 Av. C. Guillemin 45000 Orleans - France ====Diventare costruttori di soluzioni Visitate il portale http://www.modugno.it/ e in particolare la sezione su Palese http://www.modugno.it/archivio/cat_palese.shtml
Just my $0.02... Depending on what you are going to do with the model, heteroscedasticity might be low on the list of things you should worry about. I'd say that the assumption that the model is a straight line might be high, if not the highest, on that list. That might be a reasonable assumption in your case, but you definitely should investigate. If straight line is a reasonable model for the data, then OLS may not be such a bad thing, if you don't have skewed data or outliers. You should try several methods and see which looks most reasonable. (I don't think there's anything wrong with trying different methods of fitting the same model, at least it seems less dangerous than choosing among many models fitted with the same method.) Non-constant variance only affects efficiency of the estimator and the inference (CI, hythothesis tests). If you need to do inference, you need to address that, and two most popular ways are weighted least squares and transformation. HTH, Andy> From: Dewez Thomas > > Dear R-users, > > There are tons of methods out there for fitting independant > variables to a > dependent variable. All stats books tell you about the > assumptions behind > OLS (ordinary least squares) and warn against abusive use of > the method > (which many of us do disregard by lack of a better knowledge). Most > introductory text books stop there and don't tell you what > the next best > option might be. I am aware that there might be many > depending on the type > of study so here are the data to sort this question out. > > In this instance, I am performing a regression on observations whose > residuals show heteroscedasticity (the variance of residuals > is small for > small dependant variable values and increases for larger ones), which > violates one assumption of the OLS method. Which of the > numerous options > should I choose? glm, robust lm, ... > > The problem is kept simple for now. I only try to explain the > log of local > topographic slope (dependent variable) with regard to the > distance to the > outlet of a catchment (independent variable) for a fixed > drained area. Both > variables are continuous. > > I ordered Venables and Ripley 2002, which I suspect is a > sound reading for > advanced stats with R, but it has not arrived yet and I need > to move on > asap. Any advice or pointer to the appropriate literature is greatly > appreciated. > > Thomas > > Dr Thomas Dewez > ENTEC Post-Doctoral Fellow > ARN - MAS > BRGM (French Geological Survey) > 3 Av. C. Guillemin > 45000 Orleans - France > > Phone: +33 (0)2 38644606 > Fax: +33 (0)2 38643361 > *** > Le contenu de cet e-mail et de ses pi??ces jointes est > destin...{{dropped}} > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > >