Gundala Viswanath
2008-Jul-06 23:50 UTC
[R] Method for checking automatically which distribtions fits a data
Hi, Suppose I have a vector of data. Is there a method in R to help us automatically suggest which distributions fits to that data (e.g. normal, gamma, multinomial etc) ? - Gundala Viswanath Jakarta - Indonesia
ctu at bigred.unl.edu
2008-Jul-07 02:27 UTC
[R] Method for checking automatically which distribtions fits a data
Hi, In my experience, I just plot the data set then figure it out. Maybe you could try this? I really wonder that there is such a R function exists. If yes, please let me know. Thanks Chunhao Tu Quoting Gundala Viswanath <gundalav at gmail.com>:> Hi, > > Suppose I have a vector of data. > Is there a method in R to help us automatically > suggest which distributions fits to that data > (e.g. normal, gamma, multinomial etc) ? > > - Gundala Viswanath > Jakarta - Indonesia > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Stephen Tucker
2008-Jul-07 07:03 UTC
[R] Method for checking automatically which distribtions fits a data
I don't know that there is a single function, but you can perhaps apply a sequence of available functions - For instance, you can use fitdistr() in library(MASS) to estimate optimal parameters for a candidate set of distributions; then look at each fit and also compare the deviance among the fits (possibly penalizing distributions which require more parameters - for instance, using the Akaike Information Criterion(?)). ----- Original Message ---- From: Gundala Viswanath <gundalav at gmail.com> To: r-help at stat.math.ethz.ch Sent: Sunday, July 6, 2008 4:50:20 PM Subject: [R] Method for checking automatically which distribtions fits a data Hi, Suppose I have a vector of data. Is there a method in R to help us automatically suggest which distributions fits to that data (e.g. normal, gamma, multinomial etc) ? - Gundala Viswanath Jakarta - Indonesia ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
David Reinke
2008-Jul-07 16:46 UTC
[R] Method for checking automatically which distribtions fits a data
The function ks.test(x,y, ...) performs a Kolmogorov-Smirnov test on a set of sample values x against a distribution y. Both x and y must be cumulative distributions; y can be either a vector of cumulative values or a predefined distribution such as pnorm(). David Reinke Senior Transportation Engineer/Economist Dowling Associates, Inc. 180 Grand Avenue, Suite 250 Oakland, California 94612-3774 510.839.1742 x104 (voice) 510.839.0871 (fax) www.dowlinginc.com -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of hadley wickham Sent: Monday, July 07, 2008 8:10 AM To: Ben Bolker Cc: r-help at stat.math.ethz.ch Subject: Re: [R] Method for checking automatically which distribtions fits a data>> Suppose I have a vector of data. >> Is there a method in R to help us automatically >> suggest which distributions fits to that data >> (e.g. normal, gamma, multinomial etc) ? >> >> - Gundala Viswanath >> Jakarta - Indonesia >> > > See > > https://stat.ethz.ch/pipermail/r-help/2008-June/166259.html > > for example, normal vs gamma might be a sensible question > (for which you can use fitdistr() as suggested above), but > "multinomial" implies a very specific kind of response -- > discrete data with a specified number of possible outcomes.Yes - the question as it is poorly stated. If you have a small (finite) choice of possible distributions you can use some kind of likelihood based statistic to determine which fits the data best. But what is the population of distributions in this case? All distributions that you see in stats101? All distributions that have names? All continuous distributions? Hadley -- http://had.co.nz/ ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Frank E Harrell Jr
2008-Jul-07 17:00 UTC
[R] Method for checking automatically which distribtions fits a data
David Reinke wrote:> The function ks.test(x,y, ...) performs a Kolmogorov-Smirnov test on a set > of sample values x against a distribution y. Both x and y must be > cumulative distributions; y can be either a vector of cumulative values or > a predefined distribution such as pnorm(). > > David ReinkeIf you find which distribution best fits the empirical distribution, the resulting estimates will have variances (once model uncertainty is taken into account through bootstrapping) that are equal to those from the empirical CDF so nothing is gained. You can use the empirical CDF as the "final answer" unless prior knowledge on the distributional shape is available. Frank Harrell> > Senior Transportation Engineer/Economist > Dowling Associates, Inc. > 180 Grand Avenue, Suite 250 > Oakland, California 94612-3774 > 510.839.1742 x104 (voice) > 510.839.0871 (fax) > www.dowlinginc.com > > -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On > Behalf Of hadley wickham > Sent: Monday, July 07, 2008 8:10 AM > To: Ben Bolker > Cc: r-help at stat.math.ethz.ch > Subject: Re: [R] Method for checking automatically which distribtions fits > a data > >>> Suppose I have a vector of data. >>> Is there a method in R to help us automatically >>> suggest which distributions fits to that data >>> (e.g. normal, gamma, multinomial etc) ? >>> >>> - Gundala Viswanath >>> Jakarta - Indonesia >>> >> See >> >> https://stat.ethz.ch/pipermail/r-help/2008-June/166259.html >> >> for example, normal vs gamma might be a sensible question >> (for which you can use fitdistr() as suggested above), but >> "multinomial" implies a very specific kind of response -- >> discrete data with a specified number of possible outcomes. > > Yes - the question as it is poorly stated. If you have a small > (finite) choice of possible distributions you can use some kind of > likelihood based statistic to determine which fits the data best. But > what is the population of distributions in this case? All > distributions that you see in stats101? All distributions that have > names? All continuous distributions? > > Hadley > >-- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University