ahimsa campos-arceiz
2008-Feb-22 15:00 UTC
[R] fitting a lognormal distribution using cumulative probabilities
Dear all, I'm trying to estimate the parameters of a lognormal distribution fitted from some data. The tricky thing is that my data represent the time at which I recorded certain events. However, in many cases I don't really know when the event happened. I' only know the time at which I recorded it as already happened. Therefore I want to fit the lognormal from the cumulative distribution function (cdf) rather than from the probability distribution function (pdf). My understanding is that methods based on Maximum Likelihood (e.g. fitdistr {MASS}) are based on the pdf. Nonlinear least-squares methods seem to be based on the cdf... however I was unable to use nls{stat} for lognormal. I found a website that explains how to fit univariate distribution functions based on cumulative probabilities, including a lognormal example, in Matlab: http://www.mathworks.com/products/statistics/demos.html?file=/products/demos/shipping/stats/cdffitdemo.html and other programs like TableCurve 2D seem to do this too. There must be a straightforward method in R which I have overlooked. Any suggestion on how can I estimate these parameters in R or helpful references are very much appreciated. (not sure if it helps but) here is an example of my type of data: treat.1 <- c(21.67, 21.67, 43.38, 35.50, 32.08, 32.08, 21.67, 21.67, 41.33, 41.33, 41.33, 32.08, 21.67, 22.48, 23.25, 30.00, 26.00, 19.37, 26.00 , 32.08, 21.67, 26.00, 26.00, 43.38, 26.00, 21.67, 22.48, 35.50, 38.30, 32.08) treat.2 <- c(35.92, 12.08, 12.08, 30.00, 33.73, 35.92, 12.08, 30.00, 56.00, 30.00, 35.92, 33.73, 12.08, 26.00, 54.00, 12.08, 12.08, 35.92, 35.92 , 12.08, 33.73, 35.92, 63.20, 30.00, 26.00, 33.73, 23.50, 30.00, 35.92 , 30.00) Thank you very much! Ahimsa -- ahimsa campos-arceiz www.camposarceiz.com [[alternative HTML version deleted]]
Prof Brian Ripley
2008-Feb-22 17:35 UTC
[R] fitting a lognormal distribution using cumulative probabilities
On Sat, 23 Feb 2008, ahimsa campos-arceiz wrote:> Dear all, > > I'm trying to estimate the parameters of a lognormal distribution fitted > from some data. > > The tricky thing is that my data represent the time at which I recorded > certain events. However, in many cases I don't really know when the event > happened. I' only know the time at which I recorded it as already happened.So this is a rather extreme form of censoring.> Therefore I want to fit the lognormal from the cumulative distribution > function (cdf) rather than from the probability distribution function (pdf). > > My understanding is that methods based on Maximum Likelihood (e.g. fitdistr > {MASS}) are based on the pdf. Nonlinear least-squares methods seem to be > based on the cdf... however I was unable to use nls{stat} for lognormal.Not so: ML fitting can be done for censored data. However, I don't think you have a valid description here: it seems you never recorded a time at which the event had not happened, and the most likely fit is a probability mass at zero (since this is a perfect explanation for your data). To make any progress with censoring, you need to see both positive and negative events. If you told us that none of these events happened before t=15, it would be possible to fit the model (although you would need far more data to get a good fit). Generally code to handle censoring is in survival analysis: e.g. survreg() in package survival. In the terminiology of the latter, all your observations are left-censored.> I found a website that explains how to fit univariate distribution functions > based on cumulative probabilities, including a lognormal example, in Matlab: > http://www.mathworks.com/products/statistics/demos.html?file=/products/demos/shipping/stats/cdffitdemo.html > > and other programs like TableCurve 2D seem to do this too.Maybe, but that is a different problem. If you have an ECDF, the jumps give you the data so you can just use fitdistr(). (And you will see comparing observed and fitted CDFs in MASS, the book.)> There must be a straightforward method in R which I have overlooked. Any > suggestion on how can I estimate these parameters in R or helpful references > are very much appreciated. > > (not sure if it helps but) here is an example of my type of data: > > treat.1 <- c(21.67, 21.67, 43.38, 35.50, 32.08, 32.08, 21.67, 21.67, 41.33, > 41.33, 41.33, 32.08, 21.67, 22.48, 23.25, 30.00, 26.00, 19.37, 26.00 > , > 32.08, 21.67, 26.00, 26.00, 43.38, 26.00, 21.67, 22.48, 35.50, 38.30, > > 32.08) > > treat.2 <- c(35.92, 12.08, 12.08, 30.00, 33.cy73, 35.92, 12.08, 30.00,56.00,> 30.00, 35.92, 33.73, 12.08, 26.00, 54.00, 12.08, 12.08, 35.92, 35.92 > , > 12.08, 33.73, 35.92, 63.20, 30.00, 26.00, 33.73, 23.50, 30.00, 35.92 > , > 30.00) > > Thank you very much! > > Ahimsa > > > -- > ahimsa campos-arceiz > www.camposarceiz.com > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Apparently Analagous Threads
- reshaping data frame
- scaling y-axis to relative frequency in multiple histogram (multhist)
- silly, extracting the value of "C" from the results of somers2
- lmer Error: Downdated X'X is not positive definite
- how to apply the function cut( ) to many columns in a data.frame?