digitalpenis@bluebottle.com
2005-Jan-01 16:18 UTC
[R] find parameters for a gamma distribution
hello, i have just started exploring R as an alternative to matlab for data analysis. so far everything is _very_ promising. i have a question though regarding parameter estimation. i have some data which, from a histogram plot, appears to arise from a gamma distribution. i gather that you can fit the data to the distribution using glm(). i am just not quite sure how this is done in practice... so here is a simple example with artificial data: d <- rgamma(100000, 20, scale = 2) h <- hist(d, breaks = c(seq(10, 80, 2), 100)) H <- data.frame(x = h$mids, y = h$density) g <- glm(y ~ x, data = H, family = Gamma) summary(g) Call: glm(formula = y ~ x, family = Gamma, data = H) Deviance Residuals: Min 1Q Median 3Q Max -3.8654 -2.0887 -0.7685 0.7147 1.4508 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 30.4758 26.7258 1.140 0.262 x 1.0394 0.6825 1.523 0.137 (Dispersion parameter for Gamma family taken to be 1.343021) Null deviance: 119.51 on 35 degrees of freedom Residual deviance: 116.28 on 34 degrees of freedom AIC: -260.49 Number of Fisher Scoring iterations: 7 now i suppose that the estimates parameters are: shape = 30.4758 scale = 1.0394 am i interpreting the output correctly? and, if so, why are these estimates so poor? i would, perhaps naively, expected the parameters from an artificial sample like this to be pretty good. my apologies if i am doing something stupid here but my statistics capabilties are rather limited! best regards, andrew collier.
<digitalpenis <at> bluebottle.com> writes: : : hello, : : i have just started exploring R as an alternative to matlab for data analysis. so far everything is _very_ : promising. i have a question though regarding parameter estimation. i have some data which, from a : histogram plot, appears to arise from a gamma distribution. i gather that you can fit the data to the : distribution using glm(). i am just not quite sure how this is done in practice... so here is a simple : example with artificial data: : : d <- rgamma(100000, 20, scale = 2) : h <- hist(d, breaks = c(seq(10, 80, 2), 100)) : : H <- data.frame(x = h$mids, y = h$density) : : g <- glm(y ~ x, data = H, family = Gamma) : summary(g) : : Call: : glm(formula = y ~ x, family = Gamma, data = H) : : Deviance Residuals: : Min 1Q Median 3Q Max : -3.8654 -2.0887 -0.7685 0.7147 1.4508 : : Coefficients: : Estimate Std. Error t value Pr(>|t|) : (Intercept) 30.4758 26.7258 1.140 0.262 : x 1.0394 0.6825 1.523 0.137 : : (Dispersion parameter for Gamma family taken to be 1.343021) : : Null deviance: 119.51 on 35 degrees of freedom : Residual deviance: 116.28 on 34 degrees of freedom : AIC: -260.49 : : Number of Fisher Scoring iterations: 7 : : now i suppose that the estimates parameters are: : : shape = 30.4758 : scale = 1.0394 : : am i interpreting the output correctly? and, if so, why are these estimates so poor? i would, perhaps : naively, expected the parameters from an artificial sample like this to be pretty good. : : my apologies if i am doing something stupid here but my statistics capabilties are rather limited! library(MASS) ?fitdistr example(fitdistr) # note the gamma example
hello, i have just started exploring R as an alternative to matlab for data analysis. so +far everything is _very_ promising. i have a question though regarding parameter +estimation. i have some data which, from a histogram plot, appears to arise from +a gamma distribution. i gather that you can fit the data to the distribution +using glm(). i am just not quite sure how this is done in practice... so here is +a simple example with artificial data: d <- rgamma(100000, 20, scale = 2) h <- hist(d, breaks = c(seq(10, 80, 2), 100)) H <- data.frame(x = h$mids, y = h$density) g <- glm(y ~ x, data = H, family = Gamma) summary(g) Call: glm(formula = y ~ x, family = Gamma, data = H) Deviance Residuals: Min 1Q Median 3Q Max -3.8654 -2.0887 -0.7685 0.7147 1.4508 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 30.4758 26.7258 1.140 0.262 x 1.0394 0.6825 1.523 0.137 (Dispersion parameter for Gamma family taken to be 1.343021) Null deviance: 119.51 on 35 degrees of freedom Residual deviance: 116.28 on 34 degrees of freedom AIC: -260.49 Number of Fisher Scoring iterations: 7 now i suppose that the estimates parameters are: shape = 30.4758 scale = 1.0394 am i interpreting the output correctly? and, if so, why are these estimates so +poor? i would, perhaps naively, expected the parameters from an artificial +sample like this to be pretty good. my apologies if i am doing something stupid here but my statistics capabilties +are rather limited! best regards, andrew collier. -- Andrew B. Collier Antarctic Research Fellow tel: +27 31 2601157 Space Physics Research Institute fax: +27 31 2616550 University of KwaZulu-Natal, Durban, 4041, South Africa
You want: library(MASS) ?fitdist cheers Brett Brett Melbourne, Postdoctoral Fellow Biological Invasions IGERT www.cpb.ucdavis.edu/bioinv Center for Population Biology University of California Davis CA 95616 ----- Original Message ----- From: "Andrew Collier" <colliera at ukzn.ac.za> To: <r-help at stat.math.ethz.ch> Sent: Wednesday, January 05, 2005 11:58 AM Subject: [R] find parameters for a gamma distribution> hello, > > i have just started exploring R as an alternative to matlab for data > analysis. so > +far everything is _very_ promising. i have a question though regarding > parameter > +estimation. i have some data which, from a histogram plot, appears to > arise from > +a gamma distribution. i gather that you can fit the data to the > distribution > +using glm(). i am just not quite sure how this is done in practice... so > here is > +a simple example with artificial data: > > d <- rgamma(100000, 20, scale = 2) > h <- hist(d, breaks = c(seq(10, 80, 2), 100)) > > H <- data.frame(x = h$mids, y = h$density) > > g <- glm(y ~ x, data = H, family = Gamma) > summary(g) > > Call: > glm(formula = y ~ x, family = Gamma, data = H) > > Deviance Residuals: > Min 1Q Median 3Q Max > -3.8654 -2.0887 -0.7685 0.7147 1.4508 > > Coefficients: > Estimate Std. Error t value Pr(>|t|) > (Intercept) 30.4758 26.7258 1.140 0.262 > x 1.0394 0.6825 1.523 0.137 > > (Dispersion parameter for Gamma family taken to be 1.343021) > > Null deviance: 119.51 on 35 degrees of freedom > Residual deviance: 116.28 on 34 degrees of freedom > AIC: -260.49 > > Number of Fisher Scoring iterations: 7 > > now i suppose that the estimates parameters are: > > shape = 30.4758 > scale = 1.0394 > > am i interpreting the output correctly? and, if so, why are these > estimates so > +poor? i would, perhaps naively, expected the parameters from an > artificial > +sample like this to be pretty good. > > my apologies if i am doing something stupid here but my statistics > capabilties > +are rather limited! > > best regards, > andrew collier. > -- > Andrew B. Collier > > Antarctic Research Fellow tel: +27 31 > 2601157 > Space Physics Research Institute fax: +27 31 > 2616550 > University of KwaZulu-Natal, Durban, 4041, South Africa > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html
First, you want to fit the data not the histogram counts (binned data). Second, glm does not do a very principled fit of a gamma (it is a moment estimator). Something like> d <- rgamma(1000, 20, scale = 2) > summary(glm(d ~ 1, Gamma))Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.0253072 0.0001822 138.9 <2e-16 (Dispersion parameter for Gamma family taken to be 0.05185392) where the shape = 1/0.05185392 (no s.e.) and 0.0253072 is the mean = 1/(scale*shape) Try instead (on 100000 obs here) library(MASS) fitdistr(d, "Gamma") shape rate 19.812444027 0.495157589 ( 0.087858938) ( 0.002223778) which works for me. You can make it use scale =1/rate by> fitdistr(d, dgamma, start=list(shape=20,scale=2))shape scale 19.812702659 2.019419771 ( 0.087774433) ( 0.009060263) On Wed, 5 Jan 2005, Andrew Collier wrote:> hello, > > i have just started exploring R as an alternative to matlab for data analysis. so > +far everything is _very_ promising. i have a question though regarding parameter > +estimation. i have some data which, from a histogram plot, appears to arise from > +a gamma distribution. i gather that you can fit the data to the distribution > +using glm(). i am just not quite sure how this is done in practice... so here is > +a simple example with artificial data: > > d <- rgamma(100000, 20, scale = 2) > h <- hist(d, breaks = c(seq(10, 80, 2), 100)) > > H <- data.frame(x = h$mids, y = h$density) > > g <- glm(y ~ x, data = H, family = Gamma) > summary(g) > > Call: > glm(formula = y ~ x, family = Gamma, data = H) > > Deviance Residuals: > Min 1Q Median 3Q Max > -3.8654 -2.0887 -0.7685 0.7147 1.4508 > > Coefficients: > Estimate Std. Error t value Pr(>|t|) > (Intercept) 30.4758 26.7258 1.140 0.262 > x 1.0394 0.6825 1.523 0.137 > > (Dispersion parameter for Gamma family taken to be 1.343021) > > Null deviance: 119.51 on 35 degrees of freedom > Residual deviance: 116.28 on 34 degrees of freedom > AIC: -260.49 > > Number of Fisher Scoring iterations: 7 > > now i suppose that the estimates parameters are: > > shape = 30.4758 > scale = 1.0394 > > am i interpreting the output correctly? and, if so, why are these estimates so > +poor? i would, perhaps naively, expected the parameters from an artificial > +sample like this to be pretty good. > > my apologies if i am doing something stupid here but my statistics capabilties > +are rather limited! > > best regards, > andrew collier. > -- > Andrew B. Collier > > Antarctic Research Fellow tel: +27 31 2601157 > Space Physics Research Institute fax: +27 31 2616550 > University of KwaZulu-Natal, Durban, 4041, South Africa > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Actually, that should be: library(MASS) ?fitdistr> You want: > library(MASS) > ?fitdist > > cheers > Brett > > Brett Melbourne, Postdoctoral Fellow > Biological Invasions IGERT www.cpb.ucdavis.edu/bioinv > Center for Population Biology > University of California Davis CA 95616 > > > ----- Original Message ----- > From: "Andrew Collier" <colliera at ukzn.ac.za> > To: <r-help at stat.math.ethz.ch> > Sent: Wednesday, January 05, 2005 11:58 AM > Subject: [R] find parameters for a gamma distribution > > >> hello, >> >> i have just started exploring R as an alternative to matlab for data >> analysis. so >> +far everything is _very_ promising. i have a question though regarding >> parameter >> +estimation. i have some data which, from a histogram plot, appears to >> arise from >> +a gamma distribution. i gather that you can fit the data to the >> distribution >> +using glm(). i am just not quite sure how this is done in practice... so >> here is >> +a simple example with artificial data: >> >> d <- rgamma(100000, 20, scale = 2) >> h <- hist(d, breaks = c(seq(10, 80, 2), 100)) >> >> H <- data.frame(x = h$mids, y = h$density) >> >> g <- glm(y ~ x, data = H, family = Gamma) >> summary(g) >> >> Call: >> glm(formula = y ~ x, family = Gamma, data = H) >> >> Deviance Residuals: >> Min 1Q Median 3Q Max >> -3.8654 -2.0887 -0.7685 0.7147 1.4508 >> >> Coefficients: >> Estimate Std. Error t value Pr(>|t|) >> (Intercept) 30.4758 26.7258 1.140 0.262 >> x 1.0394 0.6825 1.523 0.137 >> >> (Dispersion parameter for Gamma family taken to be 1.343021) >> >> Null deviance: 119.51 on 35 degrees of freedom >> Residual deviance: 116.28 on 34 degrees of freedom >> AIC: -260.49 >> >> Number of Fisher Scoring iterations: 7 >> >> now i suppose that the estimates parameters are: >> >> shape = 30.4758 >> scale = 1.0394 >> >> am i interpreting the output correctly? and, if so, why are these >> estimates so >> +poor? i would, perhaps naively, expected the parameters from an >> artificial >> +sample like this to be pretty good. >> >> my apologies if i am doing something stupid here but my statistics >> capabilties >> +are rather limited! >> >> best regards, >> andrew collier. >> -- >> Andrew B. Collier >> >> Antarctic Research Fellow tel: +27 31 >> 2601157 >> Space Physics Research Institute fax: +27 31 >> 2616550 >> University of KwaZulu-Natal, Durban, 4041, South Africa >> >> ______________________________________________ >> R-help at stat.math.ethz.ch mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide! >> http://www.R-project.org/posting-guide.html >
On Wed, 5 Jan 2005, Prof Brian Ripley wrote:> First, you want to fit the data not the histogram counts (binned data). > > Second, glm does not do a very principled fit of a gamma (it is a moment > estimator). Something like > >> d <- rgamma(1000, 20, scale = 2) >> summary(glm(d ~ 1, Gamma)) > Coefficients: > Estimate Std. Error t value Pr(>|t|) > (Intercept) 0.0253072 0.0001822 138.9 <2e-16 > > (Dispersion parameter for Gamma family taken to be 0.05185392) > > where the shape = 1/0.05185392 (no s.e.) and 0.0253072 is the mean > = 1/(scale*shape)[In case it confuses you, that got garbled in a cut-and-paste: 0.0253072 is the rate = 1/mean = 1/(scale*shape) as the default link for the Gamma glm is reciprocal. ]> > Try instead (on 100000 obs here) > > library(MASS) > fitdistr(d, "Gamma") > shape rate > 19.812444027 0.495157589 > ( 0.087858938) ( 0.002223778) > > which works for me. You can make it use scale =1/rate by >> fitdistr(d, dgamma, start=list(shape=20,scale=2)) > shape scale > 19.812702659 2.019419771 > ( 0.087774433) ( 0.009060263) > > > On Wed, 5 Jan 2005, Andrew Collier wrote: > >> hello, >> >> i have just started exploring R as an alternative to matlab for data >> analysis. so >> +far everything is _very_ promising. i have a question though regarding >> parameter >> +estimation. i have some data which, from a histogram plot, appears to >> arise from >> +a gamma distribution. i gather that you can fit the data to the >> distribution >> +using glm(). i am just not quite sure how this is done in practice... so >> here is >> +a simple example with artificial data: >> >> d <- rgamma(100000, 20, scale = 2) >> h <- hist(d, breaks = c(seq(10, 80, 2), 100)) >> >> H <- data.frame(x = h$mids, y = h$density) >> >> g <- glm(y ~ x, data = H, family = Gamma) >> summary(g) >> >> Call: >> glm(formula = y ~ x, family = Gamma, data = H) >> >> Deviance Residuals: >> Min 1Q Median 3Q Max >> -3.8654 -2.0887 -0.7685 0.7147 1.4508 >> >> Coefficients: >> Estimate Std. Error t value Pr(>|t|) >> (Intercept) 30.4758 26.7258 1.140 0.262 >> x 1.0394 0.6825 1.523 0.137 >> >> (Dispersion parameter for Gamma family taken to be 1.343021) >> >> Null deviance: 119.51 on 35 degrees of freedom >> Residual deviance: 116.28 on 34 degrees of freedom >> AIC: -260.49 >> >> Number of Fisher Scoring iterations: 7 >> >> now i suppose that the estimates parameters are: >> >> shape = 30.4758 >> scale = 1.0394 >> >> am i interpreting the output correctly? and, if so, why are these estimates >> so >> +poor? i would, perhaps naively, expected the parameters from an artificial >> +sample like this to be pretty good. >> >> my apologies if i am doing something stupid here but my statistics >> capabilties >> +are rather limited! >> >> best regards, >> andrew collier. >> -- >> Andrew B. Collier >> >> Antarctic Research Fellow tel: +27 31 >> 2601157 >> Space Physics Research Institute fax: +27 31 >> 2616550 >> University of KwaZulu-Natal, Durban, 4041, South Africa >> >> ______________________________________________ >> R-help at stat.math.ethz.ch mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide! >> http://www.R-project.org/posting-guide.html >> > > -- > Brian D. Ripley, ripley at stats.ox.ac.uk > Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ > University of Oxford, Tel: +44 1865 272861 (self) > 1 South Parks Road, +44 1865 272866 (PA) > Oxford OX1 3TG, UK Fax: +44 1865 272595 >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595