Hello, I am trying to do something that I am able to do with the "by" function within data.frame but can't figure out how to achieve with data.table. Consider dt<-data.table(name=c(rep("a",5),rep("b",6)),var1=0:10,var2=20:30,var3=40:50) myFunction <- function(x) { mean(x) } I am aware that I can do something like: dt[, .(meanVar1=myFunction(var1)) ,by=.(name)] but how could I do the equivalent of: df<-data.frame(name=c(rep("a",5),rep("b",6)),var1=0:10,var2=20:30,var3=40:50) myFunction <- function(x) { mean(x) } columnNames <- c("var1","var2","var3") result <- by(df, df$name, function(x) { output <- c() for(col in columnNames) { output[col] <- myFunction(x[,col]) } output }) do.call(rbind,result) Thanks in advance, Ramiro [[alternative HTML version deleted]]
try this:> dt[+ , { + result <- list() + for (i in names(.SD)){ + result[[i]] <- myFunction(unlist(.SD[, i, with = FALSE])) + } + result + } + , by = name + ] name var1 var2 var3 1: a 2.0 22 42 2: b 7.5 28 48>Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. On Tue, Jun 9, 2015 at 4:22 PM, Ramiro Barrantes < ramiro at precisionbioassay.com> wrote:> Hello, > > I am trying to do something that I am able to do with the "by" function > within data.frame but can't figure out how to achieve with data.table. > > Consider > > > dt<-data.table(name=c(rep("a",5),rep("b",6)),var1=0:10,var2=20:30,var3=40:50) > myFunction <- function(x) { mean(x) } > > I am aware that I can do something like: > > dt[, .(meanVar1=myFunction(var1)) ,by=.(name)] > > but how could I do the equivalent of: > > > df<-data.frame(name=c(rep("a",5),rep("b",6)),var1=0:10,var2=20:30,var3=40:50) > myFunction <- function(x) { mean(x) } > > columnNames <- c("var1","var2","var3") > result <- by(df, df$name, function(x) { > output <- c() > for(col in columnNames) { > output[col] <- myFunction(x[,col]) > } > output > }) > do.call(rbind,result) > > Thanks in advance, > Ramiro > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Hi Ramiro, There is a demonstration of this on the data.table wiki at https://rawgit.com/wiki/Rdatatable/data.table/vignettes/datatable-intro-vignette.html. You can do dt[, lapply(.SD, mean), by=name] or dt[, as.list(colMeans(.SD)), by=name] BTW, there are pretty straightforward ways to do this in base R as well, e.g, data.frame(t(sapply(split(df[-1], df$name), colMeans))) Best, Ista On Tue, Jun 9, 2015 at 4:22 PM, Ramiro Barrantes <ramiro at precisionbioassay.com> wrote:> Hello, > > I am trying to do something that I am able to do with the "by" function within data.frame but can't figure out how to achieve with data.table. > > Consider > > dt<-data.table(name=c(rep("a",5),rep("b",6)),var1=0:10,var2=20:30,var3=40:50) > myFunction <- function(x) { mean(x) } > > I am aware that I can do something like: > > dt[, .(meanVar1=myFunction(var1)) ,by=.(name)] > > but how could I do the equivalent of: > > df<-data.frame(name=c(rep("a",5),rep("b",6)),var1=0:10,var2=20:30,var3=40:50) > myFunction <- function(x) { mean(x) } > > columnNames <- c("var1","var2","var3") > result <- by(df, df$name, function(x) { > output <- c() > for(col in columnNames) { > output[col] <- myFunction(x[,col]) > } > output > }) > do.call(rbind,result) > > Thanks in advance, > Ramiro > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Ramiro, `dt[, lapply(.SD, mean), by=name]` is the idiomatic way. I suggest reading through the new HTML vignettes at https://github.com/Rdatatable/data.table/wiki/Getting-started Ista, thanks for linking to the new vignette. On Wed, Jun 10, 2015 at 2:17 AM, Ista Zahn <istazahn at gmail.com> wrote:> Hi Ramiro, > > There is a demonstration of this on the data.table wiki at > https://rawgit.com/wiki/Rdatatable/data.table/vignettes/datatable-intro-vignette.html. > You can do > > dt[, lapply(.SD, mean), by=name] > > or > > dt[, as.list(colMeans(.SD)), by=name] > > BTW, there are pretty straightforward ways to do this in base R as well, e.g, > > data.frame(t(sapply(split(df[-1], df$name), colMeans))) > > Best, > Ista > > On Tue, Jun 9, 2015 at 4:22 PM, Ramiro Barrantes > <ramiro at precisionbioassay.com> wrote: >> Hello, >> >> I am trying to do something that I am able to do with the "by" function within data.frame but can't figure out how to achieve with data.table. >> >> Consider >> >> dt<-data.table(name=c(rep("a",5),rep("b",6)),var1=0:10,var2=20:30,var3=40:50) >> myFunction <- function(x) { mean(x) } >> >> I am aware that I can do something like: >> >> dt[, .(meanVar1=myFunction(var1)) ,by=.(name)] >> >> but how could I do the equivalent of: >> >> df<-data.frame(name=c(rep("a",5),rep("b",6)),var1=0:10,var2=20:30,var3=40:50) >> myFunction <- function(x) { mean(x) } >> >> columnNames <- c("var1","var2","var3") >> result <- by(df, df$name, function(x) { >> output <- c() >> for(col in columnNames) { >> output[col] <- myFunction(x[,col]) >> } >> output >> }) >> do.call(rbind,result) >> >> Thanks in advance, >> Ramiro >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.