Gough Lauren
2008-Oct-07 10:11 UTC
[R] Fitting weibull, exponential and lognormal distributions to left-truncated data.
Dear All, I have two questions regarding distribution fitting. I have several datasets, all left-truncated at x=1, that I am attempting to fit distributions to (lognormal, weibull and exponential). I had been using fitdistr in the MASS package as follows: fitdistr<-(x,"weibull") However, this does not take into consideration the truncation at x=1. I read another posting in this forum that suggested using the argument "lower" to truncate the distribution fitting. However, this does not seem to be working. For example, when I attempt to fit a weibull distribution truncated at x=1 using "lower", it seems to set the best-fit shape parameter at 1:> fitdistr(x,"weibull",lower=1)shape scale 1.00000000 9.87964337 (0.02358731) (0.40649570) ##I have tried this on other datasets also truncated at x=1 and get the same result (i.e. shape=1). Does anyone know how to successfully fit the exponential, weibull and lognormal distributions to truncated data? Secondly, as my datasets are large (>1000 data points) assessing the fit of the distribution with kolmogorov smirnov goodness of fit tests is routinely showing statistical significance for all distributions. Therefore, I would like to plot the observed data with the theoretical best fit distributions (weibull, exponential and lognormal) to visually assess which fits the observed data best. So far I have been doing this as follows:>fitdistr(x,"weibull")shape scale a b>D1<-density(x) ##density distribution of observed data >D2<-density(rweibull(1500,shape=a,scale=b)) ##density of a randomvariable following the theoretical best fit weibull distribution with shape parameter =a, scale parameter = b.>plot(range(D1$x),range(D1$y,D2$y),type="n",xlab="x",ylab="Density") >lines(D1,col="red") >lines(D2,col="blue")This successfully plots the two density curves on the same graph, but it plots data below the x=1 threshold - even for the observed data! I have tried limiting the scale of x-axis using xlim=c(1,150) but the graph still plots the origin of the graph as (0,0). I can only get different origins if I limit x more extremely e.g. xlim=c(50,150). Does anyone know how I can successfully change the origin of the graph to (1,0)? Sorry for the long e-mail! Any help would be greatly appreciated. Regards, Lauren This message has been checked for viruses but the contents of an attachment may still contain software viruses, which could damage your computer system: you are advised to perform your own checks. Email communications with the University of Nottingham may be monitored as permitted by UK legislation.
vito muggeo
2008-Oct-07 10:45 UTC
[R] Fitting weibull, exponential and lognormal distributions to left-truncated data.
Hi Gough, A possible solution is to use the survreg() in the survival package without specifying the covariates, i.e. library(survival) survreg(Surv(..)~1, dist="weibull") where Surv(..) accepts information about "times", censoring/truncation variables and dist allows to specify alternative distributions. See ?Surv e ?survreg hope this helps you, Gough Lauren ha scritto:> Dear All, > > I have two questions regarding distribution fitting. > > I have several datasets, all left-truncated at x=1, that I am attempting > to fit distributions to (lognormal, weibull and exponential). I had > been using fitdistr in the MASS package as follows: > > fitdistr<-(x,"weibull") > > However, this does not take into consideration the truncation at x=1. I > read another posting in this forum that suggested using the argument > "lower" to truncate the distribution fitting. However, this does not > seem to be working. For example, when I attempt to fit a weibull > distribution truncated at x=1 using "lower", it seems to set the > best-fit shape parameter at 1: > >> fitdistr(x,"weibull",lower=1) > shape scale > 1.00000000 9.87964337 > (0.02358731) (0.40649570) ##I have tried this on other datasets also > truncated at x=1 and get the same result (i.e. shape=1). > > Does anyone know how to successfully fit the exponential, weibull and > lognormal distributions to truncated data? > > > > Secondly, as my datasets are large (>1000 data points) assessing the fit > of the distribution with kolmogorov smirnov goodness of fit tests is > routinely showing statistical significance for all distributions. > Therefore, I would like to plot the observed data with the theoretical > best fit distributions (weibull, exponential and lognormal) to visually > assess which fits the observed data best. So far I have been doing this > as follows: > >> fitdistr(x,"weibull") > shape scale > a b > >> D1<-density(x) ##density distribution of observed data >> D2<-density(rweibull(1500,shape=a,scale=b)) ##density of a random > variable following the theoretical best fit weibull distribution with > shape parameter =a, scale parameter = b. > >> plot(range(D1$x),range(D1$y,D2$y),type="n",xlab="x",ylab="Density") >> lines(D1,col="red") >> lines(D2,col="blue") > > This successfully plots the two density curves on the same graph, but it > plots data below the x=1 threshold - even for the observed data! I have > tried limiting the scale of x-axis using xlim=c(1,150) but the graph > still plots the origin of the graph as (0,0). I can only get different > origins if I limit x more extremely e.g. xlim=c(50,150). Does anyone > know how I can successfully change the origin of the graph to (1,0)? > > > Sorry for the long e-mail! Any help would be greatly appreciated. > > Regards, > > Lauren > > This message has been checked for viruses but the contents of an attachment > may still contain software viruses, which could damage your computer system: > you are advised to perform your own checks. Email communications with the > University of Nottingham may be monitored as permitted by UK legislation. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >-- ===================================Vito M.R. Muggeo Dip.to Sc Statist e Matem `Vianelli' Universit? di Palermo viale delle Scienze, edificio 13 90128 Palermo - ITALY tel: 091 6626240 fax: 091 485726/485612 http://dssm.unipa.it/vmuggeo
Richard.Cotton at hsl.gov.uk
2008-Oct-07 10:57 UTC
[R] Fitting weibull, exponential and lognormal distributions to left-truncated data.
> > I have several datasets, all left-truncated at x=1, that I amattempting> > to fit distributions to (lognormal, weibull and exponential). I had > > been using fitdistr in the MASS package as follows:> A possible solution is to use the survreg() in the survival package > without specifying the covariates, i.e. > > library(survival) > survreg(Surv(..)~1, dist="weibull") > > where Surv(..) accepts information about "times", censoring/truncation > variables and dist allows to specify alternative distributions. > See ?Surv e ?survregThe survival package is mostly targeted at right-censored data. The NADA package provides wrappers for many of the survival routines so they work with left-censored data. Regards, Richie. Mathematical Sciences Unit HSL ------------------------------------------------------------------------ ATTENTION: This message contains privileged and confidential inform...{{dropped:20}}
Prof Brian Ripley
2008-Oct-07 11:24 UTC
[R] Fitting weibull, exponential and lognormal distributions to left-truncated data.
On Tue, 7 Oct 2008, Richard.Cotton at hsl.gov.uk wrote:>>> I have several datasets, all left-truncated at x=1, that I am > attempting >>> to fit distributions to (lognormal, weibull and exponential). I had >>> been using fitdistr in the MASS package as follows: > >> A possible solution is to use the survreg() in the survival package >> without specifying the covariates, i.e. >> >> library(survival) >> survreg(Surv(..)~1, dist="weibull") >> >> where Surv(..) accepts information about "times", censoring/truncation >> variables and dist allows to specify alternative distributions. >> See ?Surv e ?survreg > > The survival package is mostly targeted at right-censored data. The NADA > package provides wrappers for many of the survival routines so they work > with left-censored data.Left-censoring and left-truncation are not the same thing. With left-censoring you see that you had observations < 1, and with left-truncation you do not (at least how the terms are usually applied: occasionally the meanings are reversed). For left-truncation it is relatively easy, e.g. ltwei <- function(x, shape, scale = 1, log = FALSE) dweibull(x, shape, scale, log)/pweibull(1, shape, scale, lower=FALSE) and use this in fitdistr. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595