Hi all, I'm working with data to which I'm fitting three-parameter weibull distributions (shape, scale & shift). The data are of low sample sizes (between 10 and 80 observations), so I'm reluctant to check my fits using chi-square (also, I'd like to avoid bin choice issues). I'd use the Kolmogorov-Smirnov test, but of course this is invalid when the distribution parameters are estimated from the data. So I'm tinkering with an alternative method (excuse my naivet? if this is a bad idea, I'm a relative statistical novice) that calculates the area of the difference between the ECDF of the data and the CDF of the estimated function (somewhat like KS, which looks at the greatest distance between these). My thought is to compare this observed area to a distribution of simulated areas derived by monte carlo simulation (draw N random samples from the estimated function, calculating area, and repeat 1e5 times). If the observed area is greater than say 95% of the simulated areas, then I'd reject the fit. My problem is that I can't figure out how to efficiently calculate the area between the ECDF and CDF functions. I can of course calculate the integral of each easily, and if one were consistently larger than the other simple subtraction of the integrals would yield the area between. However, when the functions cross, as frequently occurs, the solution seems much more complex. Any suggestions? Since as noted above I'll be doing the area calculation 1e5 times or so per test, a computationally frugal solution would be much appreciated! Here's some code that I've been toying with: #set up some true parameters shape=2 scale=.5 shift=.3 n=10 #generate some observed data obs=obs=rweibull(10,shape,scale)+shift #lets say that the following are the estimated parameters from whatever estimation process I'm using est.shape=1.9 est.scale=.6 est.shift=.35 #Calculate area between ECDF and CDF of the function defined by the #estimated parameters # ??? #The following would work if the ECDF were consistently higher or lower #than the CDF #Get the CDF area between 0 and some large number (here, 10 is pretty #large) cdf.area=integrate(pweibull,0,10,shape=est.shape,scale=est.scale) #Get the ECDF area. #first get rid of the shift in obs obs=obs-est.shift #calculate area by multiplying cumulative proportions by distance #between knots, then summing #add knot at 10 to match cdf k=c(knots(ecdf(obs)),10) ecdf.area=vector("numeric",(n-1)) for(i in 1:n){ ecdf.area[i]=(k[i+1]-k[i])*(sum(obs<=k[i])/n) } ecdf.area=sum(ecdf.area) #again, subtraction of the areas works if the ecdf is consistently lower #than the cdf diff=cdf.area-ecdf.area #or consistently higher than the cdf diff=ecdf.area-cdf.area #but how to calculate when the functions cross? Cheers, Mike -- Mike Lawrence http://artsweb.uwaterloo.ca/~m4lawren "The road to wisdom? Well, it's plain and simple to express: Err and err and err again, but less and less and less." - Piet Hein