Johannes Moser
2014-Apr-30 20:30 UTC
[R] Fitting a Mixture of Noncentral Student t Distributions to a one-dimensional sample
Dear R community, I`d like to extract the parameters of a two-component mixture distribution of noncentral student t distributions which was fitted to a one-dimensional sample. There are many packages for R that are capable of handling mixture distributions in one way or another. Some in the context of a Bayesian framework requiring kernels. Some in a regression framework. Some in a nonparametric framework. ... So far the "mixdist"-package seems to come closest to my wish. This package fits parametric mixtures to a sample of data. Unfortunately it doesn`t support the student t distribution. I have also tried to manually set up a likelihood function as described here: http://stackoverflow.com/questions/6485597/r-how-to-fit-a-large-dataset-with-a-combination-of-distributions But the result is far from perfect. The "gamlss.mx"-package might be helping, but originally it seems to be set up for another context, i.e. regression. I tried to regress my data on a constant and then extract the parameters for the estimated mixture error distribution. But the estimated parameters seem to be not directly accessable individually by some command (such as fit1$sigma). And there seem to be serious convergence problems even in pretty simple and nonambiguous cases (see example 2). The following syntax is my gamlss.mx-setup so far: library(gamlss.dist) library(gamlss.mx) library(MASS) # 1: data(geyser) plot(density(geyser$waiting) ) fit1 <- gamlssMX(waiting~1,data=geyser,family="TF",K=2) fit1 # works fine # 2: N <- 100000 components <- sample(1:2,prob=c(0.6,0.4),size=N,replace=TRUE) mus <- c(3,-6) sds <- c(1,9) nus <- c(25,3) mixsim <- data.frame(rTF2(N,mu=mus[components],sigma=sds[components],nu=nus[components])) colnames(mixsim) <- "MCsim" plot(density(mixsim$MCsim) , xlim=c(-50,50)) fit2 <- gamlssMX(MCsim~1,data=mixsim,family="TF",K=2) fit2 # no convergence With another dataset and when using the same two component densities for the mixture as above I ended up with negative estimates for sigma (which should be positive). I would be very grateful for any advice. I`ve read through many manuals and vignettes today but it seems that I am nearly in the same place where I was this morning. A small example for a setup that works sort of reliably would be fantastic! Thanks a lot in advance!! Johannes
Ingmar Visser
2014-May-08 08:08 UTC
[R] Fitting a Mixture of Noncentral Student t Distributions to a one-dimensional sample
Hi Johannes, Below code gives good results for me; note that trying multiple starting is often important in fitting mixture models, even in simple cases like this. Note also that the sigma and nu parameters in gamlssMX are fitted on a log scale, hence the possible occurrence of negative results. hth, Ingmar # 2: N <- 2000 components <- sample(1:2,prob=c(0.6,0.4),size=N,replace=TRUE) mus <- c(3,-6) sds <- c(1,9) nus <- c(25,3) mixsim <- data.frame(rTF2(N,mu=mus[components],sigma=sds[components],nu=nus[components])) colnames(mixsim) <- "MCsim" plot(density(mixsim$MCsim) , xlim=c(-50,50)) set.seed(2) fit2 <- gamlssMX(MCsim~1,data=mixsim,family="TF",K=2) fit2 On Wed, Apr 30, 2014 at 10:30 PM, Johannes Moser <jzmoser at gmail.com> wrote:> Dear R community, > > I`d like to extract the parameters of a two-component mixture distribution > of noncentral student t distributions which was fitted to a one-dimensional > sample. > > There are many packages for R that are capable of handling mixture > distributions in one way or another. Some in the context of a Bayesian > framework requiring kernels. Some in a regression framework. Some in a > nonparametric framework. ... > > So far the "mixdist"-package seems to come closest to my wish. This package > fits parametric mixtures to a sample of data. Unfortunately it doesn`t > support the student t distribution. > > I have also tried to manually set up a likelihood function as described > here: > http://stackoverflow.com/questions/6485597/r-how-to-fit-a-large-dataset-with-a-combination-of-distributions > But the result is far from perfect. > > The "gamlss.mx"-package might be helping, but originally it seems to be set > up for another context, i.e. regression. I tried to regress my data on a > constant and then extract the parameters for the estimated mixture error > distribution. But the estimated parameters seem to be not directly > accessable individually by some command (such as fit1$sigma). And there seem > to be serious convergence problems even in pretty simple and nonambiguous > cases (see example 2). The following syntax is my gamlss.mx-setup so far: > > > library(gamlss.dist) > library(gamlss.mx) > library(MASS) > > # 1: > data(geyser) > plot(density(geyser$waiting) ) > fit1 <- gamlssMX(waiting~1,data=geyser,family="TF",K=2) > fit1 > # works fine > > # 2: > N <- 100000 > components <- sample(1:2,prob=c(0.6,0.4),size=N,replace=TRUE) > mus <- c(3,-6) > sds <- c(1,9) > nus <- c(25,3) > mixsim <- > data.frame(rTF2(N,mu=mus[components],sigma=sds[components],nu=nus[components])) > colnames(mixsim) <- "MCsim" > plot(density(mixsim$MCsim) , xlim=c(-50,50)) > fit2 <- gamlssMX(MCsim~1,data=mixsim,family="TF",K=2) > fit2 > # no convergence > > With another dataset and when using the same two component densities for the > mixture as above I ended up with negative estimates for sigma (which should > be positive). > > I would be very grateful for any advice. I`ve read through many manuals and > vignettes today but it seems that I am nearly in the same place where I was > this morning. > A small example for a setup that works sort of reliably would be fantastic! > > Thanks a lot in advance!! > Johannes > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Johannes Moser
2014-Jul-15 10:28 UTC
Restricted fitting of two-component mixture distribution in R possible?
Dear list, In fitting a two-component Student's t mixture distribution to some data (standardized GARCH residuals) one of the components has an estimated degree of freedom of 0.6. This means that even the first moment of the mixture distribution would not exist. The gamlss.mx package in R is used for estimation. gamlss.control, glim.control and MX.control seem not to support this kind of option -- or I couldn't find out how... Is there a way to restrict the degree of freedom parameter estimates to be larger than, say, 3? Many thanks in advance, Johannes