Bill McNeill (UW)
2008-Dec-19 05:14 UTC
[R] How do I generate one vector for every row of a data frame?
I am trying to generate a set of data points from a Gaussian mixture model. My mixture model is represented by a data frame that looks like this:> gmmweight mean sd 1 0.3 0 1.0 2 0.2 -2 0.5 3 0.4 4 0.7 4 0.1 5 0.3 I have written the following function that generates the appropriate data: gmm_data <- function(n, gmm) { c(rnorm(n*gmm[1,]$weight, gmm[1,]$mean, gmm[1,]$sd), rnorm(n*gmm[2,]$weight, gmm[2,]$mean, gmm[2,]$sd), rnorm(n*gmm[3,]$weight, gmm[3,]$mean, gmm[3,]$sd), rnorm(n*gmm[4,]$weight, gmm[4,]$mean, gmm[4,]$sd)) } However, the fact that my mixture has four components is hard-coded into this function. A better implementation of gmm_data() would generate data points for an arbitrary number of mixture components (i.e. an arbitrary number of rows in the data frame). How do I do this? I'm sure it's simple, but I can't figure it out. Thanks. -- Bill McNeill http://staff.washington.edu/billmcn/index.shtml
andrew
2008-Dec-19 05:52 UTC
[R] How do I generate one vector for every row of a data frame?
I think this should work rgmm <- function(n, gmm) { M <- sample(1:4, n, replace = TRUE, prob= gmm$weight) mean <- gmm[M, ]$mean sd <- gmm[M, ]$sd return(gmm[M,]$sd*rnorm(n) + gmm[M,]$mean) } hist(rgmm(10000, gmm), breaks = 500) On Dec 19, 4:14?pm, "Bill McNeill (UW)" <bill... at u.washington.edu> wrote:> I am trying to generate a set of data points from a Gaussian mixture > model. ?My mixture model is represented by a data frame that looks > like this: > > > gmm > > ? weight mean ?sd > 1 ? ?0.3 ? ?0 1.0 > 2 ? ?0.2 ? -2 0.5 > 3 ? ?0.4 ? ?4 0.7 > 4 ? ?0.1 ? ?5 0.3 > > I have written the following function that generates the appropriate data: > > gmm_data <- function(n, gmm) { > ? ? ? ? c(rnorm(n*gmm[1,]$weight, gmm[1,]$mean, gmm[1,]$sd), > ? ? ? ? ? ? ? ? rnorm(n*gmm[2,]$weight, gmm[2,]$mean, gmm[2,]$sd), > ? ? ? ? ? ? ? ? rnorm(n*gmm[3,]$weight, gmm[3,]$mean, gmm[3,]$sd), > ? ? ? ? ? ? ? ? rnorm(n*gmm[4,]$weight, gmm[4,]$mean, gmm[4,]$sd)) > > } > > However, the fact that my mixture has four components is hard-coded > into this function. ?A better implementation of gmm_data() would > generate data points for an arbitrary number of mixture components > (i.e. an arbitrary number of rows in the data frame). > > How do I do this? ?I'm sure it's simple, but I can't figure it out. > > Thanks. > -- > Bill McNeillhttp://staff.washington.edu/billmcn/index.shtml > > ______________________________________________ > R-h... at r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Simon Knapp
2008-Dec-19 06:16 UTC
[R] How do I generate one vector for every row of a data frame?
Your code will always generate the same number of samples from each of the normals specified on every call, where the number of samples from each is (roughly) proportional to the weights column. If the weights column in your data frame represents probabilities of draws coming from each distribution, then this behaviour is not correct. Further, it does not guarantee that the sample size is actually n. This definition will work with arbitrary numbers of rows: gmm_data <- function(n, data){ rows <- sample(1:nrow(data), n, T, dat$weight) rnorm(n, data$mean[rows], data$sd[rows]) } and this one enforces a bit more sanity :-) gmm_data <- function(n, data, tol=1e-8){ if(any(data$sd < 0)) stop("all of data$sd must be > 0") if(any(data$weight < 0)) stop("all of data$weight must be > 0") wgts <- if(abs(sum(data$weight) - 1) > tol) { warning("data$weight does not sum to 1 - rescaling") data$weight/sum(data$weight) } else data$weight rows <- sample(1:nrow(data), n, T, wgts) rnorm(n, data$mean[rows], data$sd[rows]) } Regards, Simon Knapp. On Fri, Dec 19, 2008 at 4:14 PM, Bill McNeill (UW) <billmcn at u.washington.edu> wrote:> I am trying to generate a set of data points from a Gaussian mixture > model. My mixture model is represented by a data frame that looks > like this: > >> gmm > weight mean sd > 1 0.3 0 1.0 > 2 0.2 -2 0.5 > 3 0.4 4 0.7 > 4 0.1 5 0.3 > > I have written the following function that generates the appropriate data: > > gmm_data <- function(n, gmm) { > c(rnorm(n*gmm[1,]$weight, gmm[1,]$mean, gmm[1,]$sd), > rnorm(n*gmm[2,]$weight, gmm[2,]$mean, gmm[2,]$sd), > rnorm(n*gmm[3,]$weight, gmm[3,]$mean, gmm[3,]$sd), > rnorm(n*gmm[4,]$weight, gmm[4,]$mean, gmm[4,]$sd)) > } > > However, the fact that my mixture has four components is hard-coded > into this function. A better implementation of gmm_data() would > generate data points for an arbitrary number of mixture components > (i.e. an arbitrary number of rows in the data frame). > > How do I do this? I'm sure it's simple, but I can't figure it out. > > Thanks. > -- > Bill McNeill > http://staff.washington.edu/billmcn/index.shtml
Maybe Matching Threads
- How do I reload sessions from a non-default directory in OS X?
- How do I multiply labeled vectors of numbers?
- How do I tapply to a data frame with arbitrary column labels?
- GMM estimation
- 'gmm' package: How to pass controls to a numerical solver used in the gmm() function?