Hi all, I have a data frame with column over which I would like to run repeated functions for data analysis. Currently I am only running recursively over two columns where I column 1 has two states over which I split and column two has 3 states. The function therefore runs 2 x 3 = 6 times as shown when running the following code: mydata <- data.frame(userid = c(5, 6, 5, 6, 5, 6), taskid = c(1, 1, 2, 2, 3, 3), stuff = 11:16) mydata mydata <- mydata[with(mydata, order(userid, taskid)), ] mydata lapply(split(mydata, mydata[,1]), function(x){ lapply(split(x, x[,2]), function(y){ print(paste("result:",y)) }) }) This traverses the tree like this: 5,1 5,2 5,3 6,1 6,2 6,3 Is there an easier way of doing that? I would like to provide the two columns (index 1 and index 2) directly and have the ?lapply function perform its lambda function directly on each memebr of the tree automatically? How can I do that? Best, Ralf
On Aug 3, 2010, at 10:48 PM, Ralf B wrote:> Hi all, > > I have a data frame with column over which I would like to run > repeated functions for data analysis. Currently I am only running > recursively over two columns where I column 1 has two states over > which I split and column two has 3 states. The function therefore runs > 2 x 3 = 6 times as shown when running the following code: > > mydata <- data.frame(userid = c(5, 6, 5, 6, 5, 6), taskid = c(1, 1, > 2, 2, 3, 3), > stuff = 11:16) > mydata > mydata <- mydata[with(mydata, order(userid, taskid)), ] > mydata > > lapply(split(mydata, mydata[,1]), function(x){ > lapply(split(x, x[,2]), function(y){ > print(paste("result:",y)) > }) > }) > > This traverses the tree like this: > > 5,1 > 5,2 > 5,3 > 6,1 > 6,2 > 6,3 > > Is there an easier way of doing that? I would like to provide the two > columns (index 1 and index 2) directly and have the ?lapply function > perform its lambda function directly on each memebr of the tree > automatically? How can I do that?split(mydata, with(mydata, paste(userid, taskid, sep="."))) Perhaps something like: lapply( split(mydata, with(mydata, paste(userid, taskid, sep="."))), function(x) paste("result:", x))>David Winsemius, MD West Hartford, CT
I don't know what you really want, but reshape will help you. reshape(mydata, timevar="taskid",idvar="userid", direction="wide") ----- A R learner. -- View this message in context: r.789695.n4.nabble.com/split-lapply-over-multiple-columns-tp2312871p2312909.html Sent from the R help mailing list archive at Nabble.com.
Besides beauty, is there an actual advantage in terms of run-time and/or memory use? Ralf On Wed, Aug 4, 2010 at 3:44 PM, Bert Gunter <gunter.berton at gene.com> wrote:> It's not that it's "bad" -- it's just unnecessarily clumsy. ALmost > always, tapply/by will do the same thing more simply. > > -- Bert > > On Wed, Aug 4, 2010 at 10:10 AM, Ralf B <ralf.bierig at gmail.com> wrote: >>> In general, the lapply(split(...)) construction should never be used. >> >> Why? What makes it so bad to use? >> >