Hi All, I'm trying to understand the difference between do.call and lapply for applying a function to a list. Below is one of the variations of programs (by Marc Schwartz) discussed here recently to select the first and last n observations per group. I've looked in several books, the R FAQ and searched the archives, but I can't find enough to figure out why lapply doesn't do what do.call does in this case. The help files & newsletter descriptions of do.call sound like it would do the same thing, but I'm sure that's due to my lack of understanding about their specific terminology. I would appreciate it if you could take a moment to enlighten me. Thanks, Bob mydata <- data.frame( id = c('001','001','001','002','003','003'), math = c(80,75,70,65,65,70), reading = c(65,70,88,NA,90,NA) ) mydata mylast <- lapply( split(mydata,mydata$id), tail, n=1) mylast class(mylast) #It's a list, so lapply will so *something* with it. #This gets the desired result: do.call("rbind", mylast) #This doesn't do the same thing, which confuses me: lapply(mylast,rbind) #...and data.frame won't fix it as I've seen it do in other circumstances: data.frame( lapply(mylast,rbind) ) ======================================================== Bob Muenchen (pronounced Min'-chen), Manager Statistical Consulting Center U of TN Office of Information Technology 200 Stokely Management Center, Knoxville, TN 37996-0520 Voice: (865) 974-5230 FAX: (865) 974-4810 Email: muenchen at utk.edu Web: http://oit.utk.edu/scc, News: http://listserv.utk.edu/archives/statnews.html
On 4/9/07, Muenchen, Robert A (Bob) <muenchen at utk.edu> wrote:> Hi All, > > I'm trying to understand the difference between do.call and lapply for > applying a function to a list. Below is one of the variations of > programs (by Marc Schwartz) discussed here recently to select the first > and last n observations per group. > > I've looked in several books, the R FAQ and searched the archives, but I > can't find enough to figure out why lapply doesn't do what do.call does > in this case. The help files & newsletter descriptions of do.call sound > like it would do the same thing, but I'm sure that's due to my lack of > understanding about their specific terminology. I would appreciate it if > you could take a moment to enlighten me. > > Thanks, > Bob > > mydata <- data.frame( > id = c('001','001','001','002','003','003'), > math = c(80,75,70,65,65,70), > reading = c(65,70,88,NA,90,NA) > ) > mydata > > mylast <- lapply( split(mydata,mydata$id), tail, n=1) > mylast > class(mylast) #It's a list, so lapply will so *something* with it. > > #This gets the desired result: > do.call("rbind", mylast)This is doing a single 'rbind' with the elements of the list as the parameters so you are effectively creating a single data frame from it.> > #This doesn't do the same thing, which confuses me: > lapply(mylast,rbind)This is applying 'rbind' separately to each element of the list (that is what lapply does - call the function with each element) and will return a list which is exactly the same.> > #...and data.frame won't fix it as I've seen it do in other > circumstances: > data.frame( lapply(mylast,rbind) )What you are effectively doing is calling data.frame with as many parameters as you have elements of the list. See what happens with:> data.frame(a=list(a=1,b=2), b=list(a=3,b=4))a.a a.b b.a b.b 1 1 2 3 4> > ========================================================> Bob Muenchen (pronounced Min'-chen), Manager > Statistical Consulting Center > U of TN Office of Information Technology > 200 Stokely Management Center, Knoxville, TN 37996-0520 > Voice: (865) 974-5230 > FAX: (865) 974-4810 > Email: muenchen at utk.edu > Web: http://oit.utk.edu/scc, > News: http://listserv.utk.edu/archives/statnews.html > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve?
On Mon, 2007-04-09 at 12:45 -0400, Muenchen, Robert A (Bob) wrote:> Hi All, > > I'm trying to understand the difference between do.call and lapply for > applying a function to a list. Below is one of the variations of > programs (by Marc Schwartz) discussed here recently to select the first > and last n observations per group. > > I've looked in several books, the R FAQ and searched the archives, but I > can't find enough to figure out why lapply doesn't do what do.call does > in this case. The help files & newsletter descriptions of do.call sound > like it would do the same thing, but I'm sure that's due to my lack of > understanding about their specific terminology. I would appreciate it if > you could take a moment to enlighten me. > > Thanks, > Bob > > mydata <- data.frame( > id = c('001','001','001','002','003','003'), > math = c(80,75,70,65,65,70), > reading = c(65,70,88,NA,90,NA) > ) > mydata > > mylast <- lapply( split(mydata,mydata$id), tail, n=1) > mylast > class(mylast) #It's a list, so lapply will so *something* with it. > > #This gets the desired result: > do.call("rbind", mylast) > > #This doesn't do the same thing, which confuses me: > lapply(mylast,rbind) > > #...and data.frame won't fix it as I've seen it do in other > circumstances: > data.frame( lapply(mylast,rbind) )Bob, A key difference is that do.call() operates (in the above example) as if the actual call was:> rbind(mylast[[1]], mylast[[2]], mylast[[3]])id math reading 3 001 70 88 4 002 65 NA 6 003 70 NA In other words, do.call() takes the quoted function and passes the list object as if it was a list of individual arguments. So rbind() is only called once. In this case, rbind() internally handles all of the factor level issues, etc. to enable a single common data frame to be created from the three independent data frames contained in 'mylast':> str(mylast)List of 3 $ 001:'data.frame': 1 obs. of 3 variables: ..$ id : Factor w/ 3 levels "001","002","003": 1 ..$ math : num 70 ..$ reading: num 88 $ 002:'data.frame': 1 obs. of 3 variables: ..$ id : Factor w/ 3 levels "001","002","003": 2 ..$ math : num 65 ..$ reading: num NA $ 003:'data.frame': 1 obs. of 3 variables: ..$ id : Factor w/ 3 levels "001","002","003": 3 ..$ math : num 70 ..$ reading: num NA On the other hand, lapply() (as above) calls rbind() _separately_ for each component of mylast. It therefore acts as if the following series of three separate calls were made:> rbind(mylast[[1]])id math reading 3 001 70 88> rbind(mylast[[2]])id math reading 4 002 65 NA> rbind(mylast[[3]])id math reading 6 003 70 NA Of course, the result of lapply() is that the above are combined into a single R list object and returned:> lapply(mylast, rbind)$`001` id math reading 3 001 70 88 $`002` id math reading 4 002 65 NA $`003` id math reading 6 003 70 NA It is a subtle, but of course critical, difference in how the internal function is called and how the arguments are passed. Does that help? Regards, Marc Schwartz
Consider this. If L is a list with n components then - do.call(f, L) calls f once - lapply(L, f) calls f n times On 4/9/07, Muenchen, Robert A (Bob) <muenchen at utk.edu> wrote:> Hi All, > > I'm trying to understand the difference between do.call and lapply for > applying a function to a list. Below is one of the variations of > programs (by Marc Schwartz) discussed here recently to select the first > and last n observations per group. > > I've looked in several books, the R FAQ and searched the archives, but I > can't find enough to figure out why lapply doesn't do what do.call does > in this case. The help files & newsletter descriptions of do.call sound > like it would do the same thing, but I'm sure that's due to my lack of > understanding about their specific terminology. I would appreciate it if > you could take a moment to enlighten me. > > Thanks, > Bob > > mydata <- data.frame( > id = c('001','001','001','002','003','003'), > math = c(80,75,70,65,65,70), > reading = c(65,70,88,NA,90,NA) > ) > mydata > > mylast <- lapply( split(mydata,mydata$id), tail, n=1) > mylast > class(mylast) #It's a list, so lapply will so *something* with it. > > #This gets the desired result: > do.call("rbind", mylast) > > #This doesn't do the same thing, which confuses me: > lapply(mylast,rbind) > > #...and data.frame won't fix it as I've seen it do in other > circumstances: > data.frame( lapply(mylast,rbind) ) > > ========================================================> Bob Muenchen (pronounced Min'-chen), Manager > Statistical Consulting Center > U of TN Office of Information Technology > 200 Stokely Management Center, Knoxville, TN 37996-0520 > Voice: (865) 974-5230 > FAX: (865) 974-4810 > Email: muenchen at utk.edu > Web: http://oit.utk.edu/scc, > News: http://listserv.utk.edu/archives/statnews.html > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >