You are chasing ghosts of performance past, Denes. The data.frame function causes no problems, and if it is used then the OP would not need to presume they know the internal structure of the data frame. See below. (I am using R3.1.2.) a1 <- list(x = rnorm(1e6), y = rnorm(1e6)) a2 <- list(x = rnorm(1e6), y = rnorm(1e6)) a3 <- list(x = rnorm(1e6), y = rnorm(1e6)) # get names of the objects out_names <- ls(pattern="a[[:digit:]]$") # amount of memory allocated gc(reset=TRUE) # Explicitly call data frame out2 <- data.frame( a1=a1[["x"]], a2=a2[["x"]], a3=a3[["x"]] ) # No copying. gc() # Your suggested retreival method out3a <- lapply( lapply( out_names, get ), "[[", "x" ) names( out3a ) <- out_names # The "obvious" way to finish the job works fine. out3 <- do.call( data.frame, out3a ) # No copying... well, you do end up with a new list in out3, but the data itself doesn't get copied. gc() On Tue, 16 Dec 2014, D?nes T?th wrote:> On 12/16/2014 06:06 PM, SH wrote: >> Dear List, >> >> I hope this posting is not redundant. I have several list outputs with the >> same components. I ran a function with three different scenarios below >> (e.g., scen1, scen2, and scen3,...,scenN). I would like to extract the >> same components and group them as a data frame. For example, >> pop.inf.r1 <- scen1[['pop.inf.r']] >> pop.inf.r2 <- scen2[['pop.inf.r']] >> pop.inf.r3 <- scen3[['pop.inf.r']] >> ... >> pop.inf.rN<-scenN[['pop.inf.r']] >> new.df <- data.frame(pop.inf.r1, pop.inf.r2, pop.inf.r3,...,pop.inf.rN) >> >> My final output would be 'new.df'. Could you help me how I can do that >> efficiently? > > If efficiency is of concern, do not use data.frame() but create a list and > add the required attributes with data.table::setattr (the setattr function of > the data.table package). (You can also consider creating a data.table instead > of a data.frame.) > > # some largish lists > a1 <- list(x = rnorm(1e6), y = rnorm(1e6)) > a2 <- list(x = rnorm(1e6), y = rnorm(1e6)) > a3 <- list(x = rnorm(1e6), y = rnorm(1e6)) > > # amount of memory allocated > gc(reset=TRUE) > > # get names of the objects > out_names <- ls(pattern="a[[:digit:]]$") > > # create a list > out <- lapply(lapply(out_names, get), "[[", "x") > > # note that no copying occured > gc() > > # decorate the list > data.table::setattr(out, "names", out_names) > data.table::setattr(out, "row.names", seq_along(out[[1]])) > class(out) <- "data.frame" > > # still no copy > gc() > > # output > head(out) > > > HTH, > Denes > > >> >> Thanks in advance, >> >> Steve >> >> P.S.: Below are some examples of summary outputs. >> >> >>> summary(scen1) >> Length Class Mode >> aql 1 -none- numeric >> rql 1 -none- numeric >> alpha 1 -none- numeric >> beta 1 -none- numeric >> n.sim 1 -none- numeric >> N 1 -none- numeric >> n.sample 1 -none- numeric >> n.acc 1 -none- numeric >> lot.inf.r 1 -none- numeric >> pop.inf.n 2000 -none- list >> pop.inf.r 2000 -none- list >> pop.decision.t1 2000 -none- list >> pop.decision.t2 2000 -none- list >> sp.inf.n 2000 -none- list >> sp.inf.r 2000 -none- list >> sp.decision 2000 -none- list >>> summary(scen2) >> Length Class Mode >> aql 1 -none- numeric >> rql 1 -none- numeric >> alpha 1 -none- numeric >> beta 1 -none- numeric >> n.sim 1 -none- numeric >> N 1 -none- numeric >> n.sample 1 -none- numeric >> n.acc 1 -none- numeric >> lot.inf.r 1 -none- numeric >> pop.inf.n 2000 -none- list >> pop.inf.r 2000 -none- list >> pop.decision.t1 2000 -none- list >> pop.decision.t2 2000 -none- list >> sp.inf.n 2000 -none- list >> sp.inf.r 2000 -none- list >> sp.decision 2000 -none- list >>> summary(scen3) >> Length Class Mode >> aql 1 -none- numeric >> rql 1 -none- numeric >> alpha 1 -none- numeric >> beta 1 -none- numeric >> n.sim 1 -none- numeric >> N 1 -none- numeric >> n.sample 1 -none- numeric >> n.acc 1 -none- numeric >> lot.inf.r 1 -none- numeric >> pop.inf.n 2000 -none- list >> pop.inf.r 2000 -none- list >> pop.decision.t1 2000 -none- list >> pop.decision.t2 2000 -none- list >> sp.inf.n 2000 -none- list >> sp.inf.r 2000 -none- list >> sp.decision 2000 -none- list >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >--------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live... DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k
Dear Jeff, On 12/17/2014 01:46 AM, Jeff Newmiller wrote:> You are chasing ghosts of performance past, Denes.In terms of memory efficiency, yes. In terms of CPU time, there can be significant difference, see below. The data.frame> function causes no problems, and if it is used then the OP would not > need to presume they know the internal structure of the data frame. > See below. (I am using R3.1.2.) > > a1 <- list(x = rnorm(1e6), y = rnorm(1e6)) > a2 <- list(x = rnorm(1e6), y = rnorm(1e6)) > a3 <- list(x = rnorm(1e6), y = rnorm(1e6)) > > # get names of the objects > out_names <- ls(pattern="a[[:digit:]]$") > > # amount of memory allocated > gc(reset=TRUE) > > # Explicitly call data frame > out2 <- data.frame( a1=a1[["x"]], a2=a2[["x"]], a3=a3[["x"]] ) > > # No copying. > gc() > > # Your suggested retreival method > out3a <- lapply( lapply( out_names, get ), "[[", "x" ) > names( out3a ) <- out_names > # The "obvious" way to finish the job works fine. > out3 <- do.call( data.frame, out3a )BTW, the even more "obvious" as.data.frame() produces the same with an even more intuitive interface. However, for lists with a larger number of elements the transformation to a data.frame can be pretty slow. In the toy example, we created only a three-element list. Let's increase it a little bit. --- # this is not even that large datlen <- 1e2 listlen <- 1e5 # create a toy list mylist <- matrix(seq_len(datlen * listlen), nrow = datlen, ncol = listlen) mylist <- lapply(1:ncol(mylist), function(i) mylist[, i]) names(mylist) <- paste0("V", seq_len(listlen)) # define the more efficient function --- # note that I put class(x) first so that setattr does not # modify the attributes of the original input (see ?setattr, # you have to be careful) setAttrib <- function(x) { class(x) <- "data.frame" data.table::setattr(x, "row.names", seq_along(x[[1]])) x } # benchmarking # (we do not need microbenchmark here, the differences are # extremely large) - on my machine, 9.4 sec, 8.1 sec vs 0.15 sec gc(reset=TRUE) system.time(df1 <- do.call(data.frame, mylist)) gc() system.time(df2 <- as.data.frame(mylist)) gc() system.time(df3 <- setAttrib(mylist)) gc() # check results identical(df1, df2) identical(df1, df3) ---- Of course for small datasets, one should use the built-in and safe functions (either do.call or as.data.frame). BTW, for the original three-element list, these are even faster than the workaround. All the best, Denes> > # No copying... well, you do end up with a new list in out3, but the > data itself doesn't get copied. > gc() > > > On Tue, 16 Dec 2014, D?nes T?th wrote: > >> On 12/16/2014 06:06 PM, SH wrote: >>> Dear List, >>> >>> I hope this posting is not redundant. I have several list outputs >>> with the >>> same components. I ran a function with three different scenarios below >>> (e.g., scen1, scen2, and scen3,...,scenN). I would like to extract the >>> same components and group them as a data frame. For example, >>> pop.inf.r1 <- scen1[['pop.inf.r']] >>> pop.inf.r2 <- scen2[['pop.inf.r']] >>> pop.inf.r3 <- scen3[['pop.inf.r']] >>> ... >>> pop.inf.rN<-scenN[['pop.inf.r']] >>> new.df <- data.frame(pop.inf.r1, pop.inf.r2, pop.inf.r3,...,pop.inf.rN) >>> >>> My final output would be 'new.df'. Could you help me how I can do that >>> efficiently? >> >> If efficiency is of concern, do not use data.frame() but create a list >> and add the required attributes with data.table::setattr (the setattr >> function of the data.table package). (You can also consider creating a >> data.table instead of a data.frame.) >> >> # some largish lists >> a1 <- list(x = rnorm(1e6), y = rnorm(1e6)) >> a2 <- list(x = rnorm(1e6), y = rnorm(1e6)) >> a3 <- list(x = rnorm(1e6), y = rnorm(1e6)) >> >> # amount of memory allocated >> gc(reset=TRUE) >> >> # get names of the objects >> out_names <- ls(pattern="a[[:digit:]]$") >> >> # create a list >> out <- lapply(lapply(out_names, get), "[[", "x") >> >> # note that no copying occured >> gc() >> >> # decorate the list >> data.table::setattr(out, "names", out_names) >> data.table::setattr(out, "row.names", seq_along(out[[1]])) >> class(out) <- "data.frame" >> >> # still no copy >> gc() >> >> # output >> head(out) >> >> >> HTH, >> Denes >> >> >>> >>> Thanks in advance, >>> >>> Steve >>> >>> P.S.: Below are some examples of summary outputs. >>> >>> >>>> summary(scen1) >>> Length Class Mode >>> aql 1 -none- numeric >>> rql 1 -none- numeric >>> alpha 1 -none- numeric >>> beta 1 -none- numeric >>> n.sim 1 -none- numeric >>> N 1 -none- numeric >>> n.sample 1 -none- numeric >>> n.acc 1 -none- numeric >>> lot.inf.r 1 -none- numeric >>> pop.inf.n 2000 -none- list >>> pop.inf.r 2000 -none- list >>> pop.decision.t1 2000 -none- list >>> pop.decision.t2 2000 -none- list >>> sp.inf.n 2000 -none- list >>> sp.inf.r 2000 -none- list >>> sp.decision 2000 -none- list >>>> summary(scen2) >>> Length Class Mode >>> aql 1 -none- numeric >>> rql 1 -none- numeric >>> alpha 1 -none- numeric >>> beta 1 -none- numeric >>> n.sim 1 -none- numeric >>> N 1 -none- numeric >>> n.sample 1 -none- numeric >>> n.acc 1 -none- numeric >>> lot.inf.r 1 -none- numeric >>> pop.inf.n 2000 -none- list >>> pop.inf.r 2000 -none- list >>> pop.decision.t1 2000 -none- list >>> pop.decision.t2 2000 -none- list >>> sp.inf.n 2000 -none- list >>> sp.inf.r 2000 -none- list >>> sp.decision 2000 -none- list >>>> summary(scen3) >>> Length Class Mode >>> aql 1 -none- numeric >>> rql 1 -none- numeric >>> alpha 1 -none- numeric >>> beta 1 -none- numeric >>> n.sim 1 -none- numeric >>> N 1 -none- numeric >>> n.sample 1 -none- numeric >>> n.acc 1 -none- numeric >>> lot.inf.r 1 -none- numeric >>> pop.inf.n 2000 -none- list >>> pop.inf.r 2000 -none- list >>> pop.decision.t1 2000 -none- list >>> pop.decision.t2 2000 -none- list >>> sp.inf.n 2000 -none- list >>> sp.inf.r 2000 -none- list >>> sp.decision 2000 -none- list >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > --------------------------------------------------------------------------- > Jeff Newmiller The ..... ..... Go Live... > DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... > Live: OO#.. Dead: OO#.. Playing > Research Engineer (Solar/Batteries O.O#. #.O#. with > /Software/Embedded Controllers) .OO#. .OO#. rocks...1k > ---------------------------------------------------------------------------
Dear Dennis, David, Jeff, and Denes, Thanks for your helps and comments. The simple one seems good enough for my works. Best, Steve On Wed, Dec 17, 2014 at 5:46 AM, D?nes T?th <toth.denes at ttk.mta.hu> wrote:> > Dear Jeff, > > On 12/17/2014 01:46 AM, Jeff Newmiller wrote: > >> You are chasing ghosts of performance past, Denes. >> > > In terms of memory efficiency, yes. In terms of CPU time, there can be > significant difference, see below. > > > The data.frame > >> function causes no problems, and if it is used then the OP would not >> need to presume they know the internal structure of the data frame. >> See below. (I am using R3.1.2.) >> >> a1 <- list(x = rnorm(1e6), y = rnorm(1e6)) >> a2 <- list(x = rnorm(1e6), y = rnorm(1e6)) >> a3 <- list(x = rnorm(1e6), y = rnorm(1e6)) >> >> # get names of the objects >> out_names <- ls(pattern="a[[:digit:]]$") >> >> # amount of memory allocated >> gc(reset=TRUE) >> >> # Explicitly call data frame >> out2 <- data.frame( a1=a1[["x"]], a2=a2[["x"]], a3=a3[["x"]] ) >> >> # No copying. >> gc() >> >> # Your suggested retreival method >> out3a <- lapply( lapply( out_names, get ), "[[", "x" ) >> names( out3a ) <- out_names >> # The "obvious" way to finish the job works fine. >> out3 <- do.call( data.frame, out3a ) >> > > BTW, the even more "obvious" as.data.frame() produces the same with an > even more intuitive interface. > > However, for lists with a larger number of elements the transformation to > a data.frame can be pretty slow. In the toy example, we created only a > three-element list. Let's increase it a little bit. > > --- > > # this is not even that large > datlen <- 1e2 > listlen <- 1e5 > > # create a toy list > mylist <- matrix(seq_len(datlen * listlen), > nrow = datlen, ncol = listlen) > mylist <- lapply(1:ncol(mylist), function(i) mylist[, i]) > names(mylist) <- paste0("V", seq_len(listlen)) > > > # define the more efficient function --- > # note that I put class(x) first so that setattr does not > # modify the attributes of the original input (see ?setattr, > # you have to be careful) > setAttrib <- function(x) { > class(x) <- "data.frame" > data.table::setattr(x, "row.names", seq_along(x[[1]])) > x > } > > # benchmarking > # (we do not need microbenchmark here, the differences are > # extremely large) - on my machine, 9.4 sec, 8.1 sec vs 0.15 sec > gc(reset=TRUE) > system.time(df1 <- do.call(data.frame, mylist)) > gc() > system.time(df2 <- as.data.frame(mylist)) > gc() > system.time(df3 <- setAttrib(mylist)) > gc() > > # check results > identical(df1, df2) > identical(df1, df3) > > ---- > > Of course for small datasets, one should use the built-in and safe > functions (either do.call or as.data.frame). BTW, for the original > three-element list, these are even faster than the workaround. > > All the best, > Denes > > > > > > >> # No copying... well, you do end up with a new list in out3, but the >> data itself doesn't get copied. >> gc() >> >> >> On Tue, 16 Dec 2014, D?nes T?th wrote: >> >> On 12/16/2014 06:06 PM, SH wrote: >>> >>>> Dear List, >>>> >>>> I hope this posting is not redundant. I have several list outputs >>>> with the >>>> same components. I ran a function with three different scenarios below >>>> (e.g., scen1, scen2, and scen3,...,scenN). I would like to extract the >>>> same components and group them as a data frame. For example, >>>> pop.inf.r1 <- scen1[['pop.inf.r']] >>>> pop.inf.r2 <- scen2[['pop.inf.r']] >>>> pop.inf.r3 <- scen3[['pop.inf.r']] >>>> ... >>>> pop.inf.rN<-scenN[['pop.inf.r']] >>>> new.df <- data.frame(pop.inf.r1, pop.inf.r2, pop.inf.r3,...,pop.inf.rN) >>>> >>>> My final output would be 'new.df'. Could you help me how I can do that >>>> efficiently? >>>> >>> >>> If efficiency is of concern, do not use data.frame() but create a list >>> and add the required attributes with data.table::setattr (the setattr >>> function of the data.table package). (You can also consider creating a >>> data.table instead of a data.frame.) >>> >>> # some largish lists >>> a1 <- list(x = rnorm(1e6), y = rnorm(1e6)) >>> a2 <- list(x = rnorm(1e6), y = rnorm(1e6)) >>> a3 <- list(x = rnorm(1e6), y = rnorm(1e6)) >>> >>> # amount of memory allocated >>> gc(reset=TRUE) >>> >>> # get names of the objects >>> out_names <- ls(pattern="a[[:digit:]]$") >>> >>> # create a list >>> out <- lapply(lapply(out_names, get), "[[", "x") >>> >>> # note that no copying occured >>> gc() >>> >>> # decorate the list >>> data.table::setattr(out, "names", out_names) >>> data.table::setattr(out, "row.names", seq_along(out[[1]])) >>> class(out) <- "data.frame" >>> >>> # still no copy >>> gc() >>> >>> # output >>> head(out) >>> >>> >>> HTH, >>> Denes >>> >>> >>> >>>> Thanks in advance, >>>> >>>> Steve >>>> >>>> P.S.: Below are some examples of summary outputs. >>>> >>>> >>>> summary(scen1) >>>>> >>>> Length Class Mode >>>> aql 1 -none- numeric >>>> rql 1 -none- numeric >>>> alpha 1 -none- numeric >>>> beta 1 -none- numeric >>>> n.sim 1 -none- numeric >>>> N 1 -none- numeric >>>> n.sample 1 -none- numeric >>>> n.acc 1 -none- numeric >>>> lot.inf.r 1 -none- numeric >>>> pop.inf.n 2000 -none- list >>>> pop.inf.r 2000 -none- list >>>> pop.decision.t1 2000 -none- list >>>> pop.decision.t2 2000 -none- list >>>> sp.inf.n 2000 -none- list >>>> sp.inf.r 2000 -none- list >>>> sp.decision 2000 -none- list >>>> >>>>> summary(scen2) >>>>> >>>> Length Class Mode >>>> aql 1 -none- numeric >>>> rql 1 -none- numeric >>>> alpha 1 -none- numeric >>>> beta 1 -none- numeric >>>> n.sim 1 -none- numeric >>>> N 1 -none- numeric >>>> n.sample 1 -none- numeric >>>> n.acc 1 -none- numeric >>>> lot.inf.r 1 -none- numeric >>>> pop.inf.n 2000 -none- list >>>> pop.inf.r 2000 -none- list >>>> pop.decision.t1 2000 -none- list >>>> pop.decision.t2 2000 -none- list >>>> sp.inf.n 2000 -none- list >>>> sp.inf.r 2000 -none- list >>>> sp.decision 2000 -none- list >>>> >>>>> summary(scen3) >>>>> >>>> Length Class Mode >>>> aql 1 -none- numeric >>>> rql 1 -none- numeric >>>> alpha 1 -none- numeric >>>> beta 1 -none- numeric >>>> n.sim 1 -none- numeric >>>> N 1 -none- numeric >>>> n.sample 1 -none- numeric >>>> n.acc 1 -none- numeric >>>> lot.inf.r 1 -none- numeric >>>> pop.inf.n 2000 -none- list >>>> pop.inf.r 2000 -none- list >>>> pop.decision.t1 2000 -none- list >>>> pop.decision.t2 2000 -none- list >>>> sp.inf.n 2000 -none- list >>>> sp.inf.r 2000 -none- list >>>> sp.decision 2000 -none- list >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> ______________________________________________ >>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>>> >>> ______________________________________________ >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >>> >> ------------------------------------------------------------ >> --------------- >> Jeff Newmiller The ..... ..... Go >> Live... >> DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live >> Go... >> Live: OO#.. Dead: OO#.. Playing >> Research Engineer (Solar/Batteries O.O#. #.O#. with >> /Software/Embedded Controllers) .OO#. .OO#. >> rocks...1k >> ------------------------------------------------------------ >> --------------- >> >[[alternative HTML version deleted]]