Ralf B
2010-Jul-15 21:11 UTC
[R] Repeated analysis over groups / Splitting by group variable
I am performing some analysis over a large data frame and would like to conduct repeated analysis over grouped-up subsets. How can I do that? Here some example code for clarification: require("flexmix") # for Kullback-Leibler divergence n <- 23 groups <- c(1,2,3) mydata <- data.frame( sequence=c(1:n), data1=c(rnorm(n)), data2=c(rnorm(n)), group=rep(sample(groups, n, replace=TRUE)) ) # Part 1: full stats (works fine) dataOnly <- cbind(mydata$data1, mydata$data2, mydata$group) KLdiv(dataOnly) # # Part 2: again - but once for each group (error) # by(dataOnly, groups, KLdiv(dataOnly)) The error I am getting is: Error in tapply(1L:23L, list(INDICES = c(1, 2, 3)), function (x) : arguments must have same length Are there better ways than 'by' ? I would like to use different stats and functions and therefore I am looking for a splitter whose output I can hand to any statical function I want. Any ideas? Ralf
Phil Spector
2010-Jul-15 21:42 UTC
[R] Repeated analysis over groups / Splitting by group variable
Ralf - If you want to use by(), I think it should look like this: by(dataOnly,dataOnly[,3],function(x)KLdiv(as.matrix(x))) But you might find the following more useful: lapply(split(as.data.frame(dataOnly),dataOnly[,3]), function(x)KLdiv(as.matrix(x))) since it returns its results in a list. - Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley spector at stat.berkeley.edu On Thu, 15 Jul 2010, Ralf B wrote:> I am performing some analysis over a large data frame and would like > to conduct repeated analysis over grouped-up subsets. How can I do > that? > > Here some example code for clarification: > > require("flexmix") # for Kullback-Leibler divergence > n <- 23 > groups <- c(1,2,3) > mydata <- data.frame( > sequence=c(1:n), > data1=c(rnorm(n)), > data2=c(rnorm(n)), > group=rep(sample(groups, n, replace=TRUE)) > ) > # Part 1: full stats (works fine) > dataOnly <- cbind(mydata$data1, mydata$data2, mydata$group) > KLdiv(dataOnly) > > # > # Part 2: again - but once for each group (error) > # > by(dataOnly, groups, KLdiv(dataOnly)) > > The error I am getting is: Error in tapply(1L:23L, list(INDICES = c(1, > 2, 3)), function (x) : > arguments must have same length > > Are there better ways than 'by' ? I would like to use different stats > and functions and therefore I am looking for a splitter whose output I > can hand to any statical function I want. > > Any ideas? > > Ralf > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >