my problem actually arised with fitting the data to the weibulldistribution, where it is hard to see, if the proposed parameterestimates make sense. data1:2743;4678;21427;6194;10286;1505;12811;2161;6853;2625;14542;694;11491; ?? ?? ?? ?? ?? 14924;28640;17097;2136;5308;3477;91301;11488;3860;64114;14334 how am I supposed to know what starting values i have to take? i get different parameterestimates depending on the starting values i choose, this shouldn't be, no? how am i supposed to know, which the "right" estimates should be?> library(MASS) > fitdistr(data2,densfun=dweibull,start=list(scale=2 ,shape=1 ))scale shape 1.378874e+04 8.788857e-01 (3.842224e+03) (1.312395e-01)> fitdistr(data2,densfun=dweibull,start=list(scale=6 ,shape=2 ))scale shape 7.81875000 0.12500000 (4.18668905) (0.01803669) #if i use the lognormaldistribution instead, i would get the same estimates, #no matter, what starting values i choose. #or if i tried it so fare with mle(), i got different values depending on the #starting values too, i use the trial and error method to find appropriate #starting values, but i am sure, there is a clear way how to do it, no? #shouldn't i actually get more or less the same parameterestimates with both #methods? library(stats4)> ll<-function(alfa,beta)+ {n<-24 + x<-data2 + -n*log(alfa)-n*log(beta)+alfa*sum(x^beta)-(beta-1)*sum(log(x))}> est<-mle(minuslog=ll, start=list(alfa=10, beta=1))There were 50 or more warnings (use warnings() to see the first 50)> summary(est)Maximum likelihood estimation Call: mle(minuslogl = ll, start = list(alfa = 10, beta = 1)) Coefficients: Estimate Std. Error alfa 0.002530163 0.0006828505 beta 0.641873010 0.0333072184 -2 log L: 511.6957> library(stats4) > ll<-function(alfa,beta)+ {n<-24 + x<-data2 + -n*log(alfa)-n*log(beta)+alfa*sum(x^beta)-(beta-1)*sum(log(x))}> est<-mle(minuslog=ll, start=list(alfa=5, beta=17))There were 50 or more warnings (use warnings() to see the first 50)> summary(est)Maximum likelihood estimation Call: mle(minuslogl = ll, start = list(alfa = 5, beta = 17)) Coefficients: Estimate Std. Error alfa 0.002143305 0.000378592 beta 0.660359789 0.026433665 -2 log L: 511.1296 thank you very much for all your comments, it really helps me to get further! Nadja
Dear Nadja, if the loglikelihood function has various local maxima, the result may depend on the starting values. This is not unusual. The best estimator is the one with the maximum loglikelihood, i.e., the smallest value of -2 log L in the mle output. (Unfortunately, it seems that the loglikelihood value is not accessible using fitdistr - you would have to implement the loglikelihood function on you own.) You could use a lot of starting values, for example generated by some random mechanism, and take the best estimator. If you want a single good starting value, you could try to fit a Weibull distribution "by eye" and trial-and error to the histogram and use the corresponding parameters. Best, Christian PS: Please use informative subject lines. On Tue, 6 Sep 2005, Nadja Riedwyl wrote:> my problem actually arised with fitting the data to the weibulldistribution, > where it is hard to see, if the proposed parameterestimates make sense. > > data1:2743;4678;21427;6194;10286;1505;12811;2161;6853;2625;14542;694;11491; > ?? ?? ?? ?? ?? 14924;28640;17097;2136;5308;3477;91301;11488;3860;64114;14334 > > how am I supposed to know what starting values i have to take? > i get different parameterestimates depending on the starting values i choose, > this shouldn't be, no? how am i supposed to know, which the "right" estimates > should be? > > > > library(MASS) > > fitdistr(data2,densfun=dweibull,start=list(scale=2 ,shape=1 )) > scale shape > 1.378874e+04 8.788857e-01 > (3.842224e+03) (1.312395e-01) > > > fitdistr(data2,densfun=dweibull,start=list(scale=6 ,shape=2 )) > scale shape > 7.81875000 0.12500000 > (4.18668905) (0.01803669) > > #if i use the lognormaldistribution instead, i would get the same estimates, > #no matter, what starting values i choose. > > #or if i tried it so fare with mle(), i got different values depending on the > #starting values too, i use the trial and error method to find appropriate > #starting values, but i am sure, there is a clear way how to do it, no? > #shouldn't i actually get more or less the same parameterestimates with both > #methods? > library(stats4) > > ll<-function(alfa,beta) > + {n<-24 > + x<-data2 > + -n*log(alfa)-n*log(beta)+alfa*sum(x^beta)-(beta-1)*sum(log(x))} > > est<-mle(minuslog=ll, start=list(alfa=10, beta=1)) > There were 50 or more warnings (use warnings() to see the first 50) > > summary(est) > Maximum likelihood estimation > > Call: > mle(minuslogl = ll, start = list(alfa = 10, beta = 1)) > > Coefficients: > Estimate Std. Error > alfa 0.002530163 0.0006828505 > beta 0.641873010 0.0333072184 > > -2 log L: 511.6957 > > > library(stats4) > > ll<-function(alfa,beta) > + {n<-24 > + x<-data2 > + -n*log(alfa)-n*log(beta)+alfa*sum(x^beta)-(beta-1)*sum(log(x))} > > est<-mle(minuslog=ll, start=list(alfa=5, beta=17)) > There were 50 or more warnings (use warnings() to see the first 50) > > summary(est) > Maximum likelihood estimation > > Call: > mle(minuslogl = ll, start = list(alfa = 5, beta = 17)) > > Coefficients: > Estimate Std. Error > alfa 0.002143305 0.000378592 > beta 0.660359789 0.026433665 > > -2 log L: 511.1296 > > > thank you very much for all your comments, it really helps me to get further! > Nadja > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >*** --- *** Christian Hennig University College London, Department of Statistical Science Gower St., London WC1E 6BT, phone +44 207 679 1698 chrish at stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche
> -----Original Message----- > From: r-help-bounces at stat.math.ethz.ch > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Nadja Riedwyl > Sent: Tuesday, September 06, 2005 10:22 AM > To: r-help at stat.math.ethz.ch > Subject: [R] (no subject) > > my problem actually arised with fitting the data to the > weibulldistribution, > where it is hard to see, if the proposed parameterestimates > make sense. > > data1:2743;4678;21427;6194;10286;1505;12811;2161;6853;2625;145 > 42;694;11491; > ?? ?? ?? ?? ?? > 14924;28640;17097;2136;5308;3477;91301;11488;3860;64114;14334 > > how am I supposed to know what starting values i have to take? > i get different parameterestimates depending on the starting > values i choose, > this shouldn't be, no? how am i supposed to know, which the > "right" estimates > should be? >This is a general issue with all (gradient-based) optimization methods when the response to be optimized has many local optima and/or is poorly conditioned. As Doug Bates and others have often remarked, finding good starting values is an "art" that is often problem-specific. Ditto for "good" parameterizations. There is no universal "magic" answer. In many respects, this is the monster hiding in the closet of many of the complex modeling methods being proposed in statistics and other disciplines: when the response function to be optimized is a nonlinear function of "many" parameters, convergence may be difficult to achieve. Presumably stochastic optimization methods like simulated annealing and mcmc are less susceptible to such problems, but they pay a large efficiency price to be so. Cheers, -- Bert Gunter Genentech Non-Clinical Statistics South San Francisco, CA