Greetings, My question is more algorithmic than prectical. What I am trying to determine is, are the GAM algorithms used in the mgcv package affected by nonnormally-distributed residuals? As I understand the theory of linear models the Gauss-Markov theorem guarantees that least-squares regression is optimal over all unbiased estimators iff the data meet the conditions linearity, homoscedasticity, independence, and normally-distributed residuals. Absent the last requirement it is optimal but only over unbiased linear estimators. What I am trying to determine is whether or not it is necessary to check for normally-distributed errors in a GAM from mgcv. I know that the unsmoothed terms, if any, will be fitted by ordinary least-squares but I am unsure whether the default Penalized Iteratively Reweighted Least Squares method used in the package is also based upon this assumption or falls under any analogue to the Gauss-Markov Theorem. Thank you in advance for any help. Sincrely, Collin Lynch.
On Nov 6, 2013, at 12:46 PM, Collin Lynch wrote:> Greetings, My question is more algorithmic than prectical. What I am > trying to determine is, are the GAM algorithms used in the mgcv package > affected by nonnormally-distributed residuals? > > As I understand the theory of linear models the Gauss-Markov theorem > guarantees that least-squares regression is optimal over all unbiased > estimators iff the data meet the conditions linearity, homoscedasticity, > independence, and normally-distributed residuals. Absent the last > requirement it is optimal but only over unbiased linear estimators. > > What I am trying to determine is whether or not it is necessary to check > for normally-distributed errors in a GAM from mgcv. I know that the > unsmoothed terms, if any, will be fitted by ordinary least-squares but I > am unsure whether the default Penalized Iteratively Reweighted Least > Squares method used in the package is also based upon this assumption or > falls under any analogue to the Gauss-Markov Theorem.The default functional link for mgcv::gam is "log", so I doubt that your theoretical understanding applies to GAM's in general. When Simon Wood wrote his book on GAMs his first chapter was on linear models, his second chapter was on generalized lienar models at which point he had written over 100 pages, and only then did he "introduce" GAMs. I think you need to follow the same progression, and this forum is not the correct one for statistics education. Perhaps pose your follow-up questions to CrossValidated.com -- David Winsemius Alameda, CA, USA
If you use GCV smoothness selection then, in the Gaussian case, the key assumptions are constant variance and independence. As with linear modelling, the normality assumption only comes in when you want to find confidence intervals or p-values. (The GM Thm does not require normality btw. but I don't know if it has a penalized analogue). With REML smoothness selection it's less clear (at least to me). Beyond Gaussian the situation is much as it is with GLMs. The key assumptions are independence and that the mean variance relationship is correct. The theory of quasi-likelihood tells you that you can make valid inference based only on specifying the mean-variance relationship for the response, rather than the whole distribution, with the price being a small loss of efficiency. It follows that getting the distribution exactly right is of secondary importance. It's also quite easy to be misled by normal qq plots of the deviance residuals when you have low count data. For example, section 4 of http://opus.bath.ac.uk/27091/1/qq_gam_resub.pdf shows a real example where the usual qq plots look awful, suggesting massive zero inflation, but if you compute the correct reference quantiles for the qq plot you find that there is nothing wrong and no evidence of zero inflation. best, Simon ps. in response to the follow up discussion: The default link depends on the family, rather than being a gam (or glm) default. Eg the default is log for the Poisson, but identity for the Gaussian. On 06/11/13 21:46, Collin Lynch wrote:> Greetings, My question is more algorithmic than prectical. What I am > trying to determine is, are the GAM algorithms used in the mgcv package > affected by nonnormally-distributed residuals? > > As I understand the theory of linear models the Gauss-Markov theorem > guarantees that least-squares regression is optimal over all unbiased > estimators iff the data meet the conditions linearity, homoscedasticity, > independence, and normally-distributed residuals. Absent the last > requirement it is optimal but only over unbiased linear estimators. > > What I am trying to determine is whether or not it is necessary to check > for normally-distributed errors in a GAM from mgcv. I know that the > unsmoothed terms, if any, will be fitted by ordinary least-squares but I > am unsure whether the default Penalized Iteratively Reweighted Least > Squares method used in the package is also based upon this assumption or > falls under any analogue to the Gauss-Markov Theorem. > > Thank you in advance for any help. > > Sincrely, > Collin Lynch. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Simon Wood, Mathematical Science, University of Bath BA2 7AY UK +44 (0)1225 386603 http://people.bath.ac.uk/sw283
Hi Colin, The GAMLSS package allows modelling of the response variable distribution using either Exponential family or non-Exponential family distributions. It also allows modelling of the scale parameter (and hence the dispersion parameter for Exponential family distributions) using explanatory variables. This can be important for selecting mean model terms and is particularly important when interest lies in the variance and/or quantiles of the response variable. Robert Rigby On 06/11/13 21:46, Collin Lynch wrote:> Greetings, My question is more algorithmic than prectical. What I am > trying to determine is, are the GAM algorithms used in the mgcv package > affected by nonnormally-distributed residuals? > > As I understand the theory of linear models the Gauss-Markov theorem > guarantees that least-squares regression is optimal over all unbiased > estimators iff the data meet the conditions linearity, homoscedasticity, > independence, and normally-distributed residuals. Absent the last > requirement it is optimal but only over unbiased linear estimators. > > What I am trying to determine is whether or not it is necessary to check > for normally-distributed errors in a GAM from mgcv. I know that the > unsmoothed terms, if any, will be fitted by ordinary least-squares but I > am unsure whether the default Penalized Iteratively Reweighted Least > Squares method used in the package is also based upon this assumption or > falls under any analogue to the Gauss-Markov Theorem. > > Thank you in advance for any help. > > Sincrely, > Collin LynchCompanies Act 2006 : http://www.londonmet.ac.uk/companyinfo [[alternative HTML version deleted]]