Given a data set and a set of predictors and a response in the data, we would like to find a model that fits the data set best. Suppose that we do not know what kind of model (linear, polynomial regression,... ) might be good, we are wondering if there is R-package(s) can auctomatically do this. Otherwise, can you direct me, or point out reference(s), basic steps to do this. Thanks. -james
First off there are multiple definitions of "best", you need to decide which "best" is best for you. Second, for reasonable definitions of "best", deciding between, linear, polynomial, and ... Requires backgroud knowledge and real thought. R can fit many different models and give you numerical and graphical summaries of those models, but determining the "best" model requires at least 1 person, some actual knowledge, and some real thought. Basic steps: Take a bunch more statistics classes and/or hire a statistician. If you can come back with a bit more of an example of what you are trying to accomplish along with background, what you mean by best, and show what your thought process has been so far, then we may be able to direct you better. You should also read through the posting guide that is linked to at the bottom of most posts to the group. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.snow at imail.org (801) 408-8111> -----Original Message----- > From: r-help-bounces at r-project.org > [mailto:r-help-bounces at r-project.org] On Behalf Of guox at ucalgary.ca > Sent: Thursday, April 24, 2008 10:58 AM > To: r-help at r-project.org > Subject: [R] R Newbie Question/Data Model > > Given a data set and a set of predictors and a response in > the data, we would like to find a model that fits the data set best. > Suppose that we do not know what kind of model (linear, > polynomial regression,... ) might be good, we are wondering > if there is R-package(s) can auctomatically do this. > Otherwise, can you direct me, or point out reference(s), > basic steps to do this. Thanks. > > -james > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
guox at ucalgary.ca wrote:> Given a data set and a set of predictors and a response in the data, > we would like to find a model that fits the data set best. > Suppose that we do not know what kind of model (linear, polynomial > regression,... ) might be good, we are wondering if there is R-package(s) > can auctomatically do this. > Otherwise, can you direct me, or point out reference(s), > basic steps to do this. Thanks. > > -jamesThe best-fitting model for any data is a model with a lot of parameters, so maybe the best fitting model for any data is a model with an infinite number of parameters. However, any model with more parameters than data will have a negative number of degrees of freedom, and you do not want that. The best-fitting model for any data subject to the constraint that the number of degrees of freedom is non-negative, is the data itself, with zero degrees of freedom. The AIC tells you this too. The AIC for the model formed by the data itsel is 2n, whereas the AIC for any model with negative degrees of freedom is > 2n. But I guess you want to make inference from sample to population. If that is indeed the case, then you should consider changing your focus from finding "a model that fits the data set best" to a model that best summarizes the information contained in your sample about the population the sample comes from. To do that, start by defining the nature of your response variable. What is the nature of the natural process generating this response variable? Is it continuous or discrete? Is it univariate or multivariate? Can it take negative and positive values? Can it take values of zero? After you have clarified the probabilistic model for the response variable, then you can start thinking about the mathematical relation between the response variable and the predictors. Is it linear or nonlinear? Are the predictors categorical or continuous? Read the posting guide, formulate a clear question, and maybe you will be given more specific help. Rub?n