Hi all,
How can we use ks.test() to evaluate
goodness of fit of mixtures of distributions?
For example I have the following dataset:
> x
[1] 176.1 176.8 259.6 171.6 90.0 234.3 145.7 113.7 105.9 176.2 168.9 136.1
[13] 109.2 110.3 164.3 117.7 131.3 163.7 200.4 196.4 196.2 168.6 190.4 127.5
[25] 136.0 114.2 112.0 91.9 333.4 295.5 172.0 293.3 91.7 289.7 118.8 55.1
[37] 161.9 233.9 197.7 118.4 139.1 189.8 45.6 167.8 53.1 86.2 148.4 80.3
[49] 105.8 160.6 217.1 134.4 103.1 221.6 163.8 171.5 195.1 201.5 145.3 97.4
[61] 287.9 352.6 173.6 85.7 182.0 166.4 175.4 224.4 167.2 143.7 168.9 205.3
[73] 192.8 203.7 195.5 193.6 201.2 280.9 159.8 115.4 113.5 216.5 140.0 164.6
[85] 341.3 301.8 146.1 182.6 263.5 318.0 168.5 205.2 204.7 213.0 250.2 265.9
[97] 215.4 344.3 191.1 175.1 188.9 206.0 127.5 148.0 172.7 193.1 150.7 195.5
[109] 142.4 232.3 92.7 127.6 227.0 358.5 202.4 224.6 374.6 220.8 173.8 120.6
[121] 265.0 151.1
Having fitted with gamma mixture I obtain the following parameters:
> param_of_2comp
comp.1 comp.2
alpha 5.911165 95.662958
beta 30.455212 1.927715
> lmbd
[1] 0.8058989 0.1941011
> ks <- ks.test(w,"gamma_pdf",lmbd,alpha,beta,noc)
> print(ks$statistics[["D"]])
0.80599> print(ks$p.value)
0
The problem is that over many datasets the ks.test() always
give Pvalue = 0 and very high "D" for testing multiple components.
Morever the plot I have, the two components fits well to the data.
http://docs.google.com/View?docid=dcvdrfrh_3cm63hcfn
(Gamma is at the bottom).
What's wrong with my approach above?
And this is the pdf function of gamma I used.
__BEGIN__
gamma_pdf <- function(x,lambda, alpha,beta, k){
temp<-NULL
al = alpha[1:k]
be = beta[1:k]
for(j in 1:k){
# each being gamma distribution
temp=cbind(temp,pgamma(x,shape=al[j],scale=be[j]))
}
temp=t(lambda*t(temp))
as.vector(temp)
}
__END__
- Gundala Viswanath
Jakarta - Indonesia