I have very large data sets given in a format similar to d below. Converting these to a data frame is a bottleneck in my application. My fastest version is given below, but it look clumsy to me. Any ideas? Dieter # ----------------------- len = 100000 d = replicate(len, list(pH = 3,marker = TRUE,position = "A"),FALSE) # Data are given as d # preallocate vectors pH =rep(0,len) marker =rep(0,len) position =rep(0,len) system.time( { for (i in 1:len) { d1 = d[[i]] #Assign to vectors pH[i] = d1[[1]] marker[i] = d1[[2]] position[i] = d1[[3]] } # combine vectors pHAll = data.frame(pH,marker,position) } ) -- View this message in context: http://n4.nabble.com/Fast-nested-List-data-frame-tp998871p998871.html Sent from the R help mailing list archive at Nabble.com.
Dieter, I'd approach this by first making a matrix, then converting to a data frame with appropriate types. I'm sure there is a way to do it with structure in one step. Operations on matrices are usually faster than on dataframes. len <- 100000 d <- replicate(len, list(pH = 3, marker = TRUE, position = "A"), FALSE) toDF <- function(alist){ d.matrix <- matrix(unlist(alist), ncol = 3, byrow = TRUE) d.df <- as.data.frame(d.matrix) names(d.df) <- c('pH', 'marker', 'position') d.df$pH <- as.numeric(d.df$pH) d.df$marker <- as.logical(d.df$marker) return(d.df) } on my system, system.time(b<-toDF(d)) user system elapsed 0.560 0.033 0.592 and head(b) pH marker position 1 1 TRUE A 2 1 TRUE A 3 1 TRUE A 4 1 TRUE A 5 1 TRUE A 6 1 TRUE A and sapply(b, class) pH marker position "numeric" "logical" "factor" I hope this helps, Greg sessionInfo() ##old, I know. R version 2.9.0 (2009-04-17) i386-apple-darwin8.11.1 locale: en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices datasets utils methods base other attached packages: [1] cimis_0.1-3 RLastFM_0.1-4 RCurl_0.98-1 bitops_1.0-4.1 XML_2.5-3 [6] lattice_0.17-22 loaded via a namespace (and not attached): [1] grid_2.9.0 On 1/4/10 11:43 PM, Dieter Menne wrote:> I have very large data sets given in a format similar to d below. Converting > these to a data frame is a bottleneck in my application. My fastest version > is given below, but it look clumsy to me. > > Any ideas? > > Dieter > > # ----------------------- > len = 100000 > d = replicate(len, list(pH = 3,marker = TRUE,position = "A"),FALSE) > # Data are given as d > > # preallocate vectors > pH =rep(0,len) > marker =rep(0,len) > position =rep(0,len) > > system.time( > { > for (i in 1:len) > { > d1 = d[[i]] > #Assign to vectors > pH[i] = d1[[1]] > marker[i] = d1[[2]] > position[i] = d1[[3]] > } > # combine vectors > pHAll = data.frame(pH,marker,position) > } > ) > > >-- Greg Hirson ghirson at ucdavis.edu Graduate Student Agricultural and Environmental Chemistry 1106 Robert Mondavi Institute North One Shields Avenue Davis, CA 95616
The as.numeric(d.df$pH) should have been an as.numeric(as.character(d.df$pH)). Sorry for the confusion. Greg On 1/5/10 12:33 AM, Greg Hirson wrote:> Dieter, > > I'd approach this by first making a matrix, then converting to a data > frame with appropriate types. I'm sure there is a way to do it with > structure in one step, but here it is: > > a <- function() { > len <- 100000 > d <- replicate(len, list(pH = 3, marker = TRUE, position = "A"), FALSE) > d.matrix <- matrix(unlist(d), ncol = 3, byrow = TRUE) > d.df <- as.data.frame(d) > names(d.df) <- c('pH', 'marker', 'position') > > #d.df$pH <- as.numeric(d.df$pH) #incorrectd.df$pH <- as.numeric(as.character(d.df$pH)) #correct> d.df$marker <- as.logical(d.df$marker) > return(d.df) > } > > system.time(a) > > > On 1/4/10 11:43 PM, Dieter Menne wrote: >> I have very large data sets given in a format similar to d below. Converting >> these to a data frame is a bottleneck in my application. My fastest version >> is given below, but it look clumsy to me. >> >> Any ideas? >> >> Dieter >> >> # ----------------------- >> len = 100000 >> d = replicate(len, list(pH = 3,marker = TRUE,position = "A"),FALSE) >> # Data are given as d >> >> # preallocate vectors >> pH =rep(0,len) >> marker =rep(0,len) >> position =rep(0,len) >> >> system.time( >> { >> for (i in 1:len) >> { >> d1 = d[[i]] >> #Assign to vectors >> pH[i] = d1[[1]] >> marker[i] = d1[[2]] >> position[i] = d1[[3]] >> } >> # combine vectors >> pHAll = data.frame(pH,marker,position) >> } >> ) >> >> >> > > -- > Greg Hirson > ghirson@ucdavis.edu > > Graduate Student > Agricultural and Environmental Chemistry > > 1106 Robert Mondavi Institute North > One Shields Avenue > Davis, CA 95616 >-- Greg Hirson ghirson@ucdavis.edu Graduate Student Agricultural and Environmental Chemistry 1106 Robert Mondavi Institute North One Shields Avenue Davis, CA 95616 [[alternative HTML version deleted]]