I have very large data sets given in a format similar to d below. Converting
these to a data frame is a bottleneck in my application. My fastest version
is given below, but it look clumsy to me.
Any ideas?
Dieter
# -----------------------
len = 100000
d = replicate(len, list(pH = 3,marker = TRUE,position = "A"),FALSE)
# Data are given as d
# preallocate vectors
pH =rep(0,len)
marker =rep(0,len)
position =rep(0,len)
system.time(
{
for (i in 1:len)
{
d1 = d[[i]]
#Assign to vectors
pH[i] = d1[[1]]
marker[i] = d1[[2]]
position[i] = d1[[3]]
}
# combine vectors
pHAll = data.frame(pH,marker,position)
}
)
--
View this message in context:
http://n4.nabble.com/Fast-nested-List-data-frame-tp998871p998871.html
Sent from the R help mailing list archive at Nabble.com.
Dieter,
I'd approach this by first making a matrix, then converting to a data
frame with appropriate types. I'm sure there is a way to do it with
structure in one step. Operations on matrices are usually faster than on
dataframes.
len <- 100000
d <- replicate(len, list(pH = 3, marker = TRUE, position = "A"),
FALSE)
toDF <- function(alist){
d.matrix <- matrix(unlist(alist), ncol = 3, byrow = TRUE)
d.df <- as.data.frame(d.matrix)
names(d.df) <- c('pH', 'marker', 'position')
d.df$pH <- as.numeric(d.df$pH)
d.df$marker <- as.logical(d.df$marker)
return(d.df)
}
on my system,
system.time(b<-toDF(d))
user system elapsed
0.560 0.033 0.592
and
head(b)
pH marker position
1 1 TRUE A
2 1 TRUE A
3 1 TRUE A
4 1 TRUE A
5 1 TRUE A
6 1 TRUE A
and
sapply(b, class)
pH marker position
"numeric" "logical" "factor"
I hope this helps,
Greg
sessionInfo() ##old, I know.
R version 2.9.0 (2009-04-17)
i386-apple-darwin8.11.1
locale:
en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices datasets utils methods base
other attached packages:
[1] cimis_0.1-3 RLastFM_0.1-4 RCurl_0.98-1 bitops_1.0-4.1
XML_2.5-3
[6] lattice_0.17-22
loaded via a namespace (and not attached):
[1] grid_2.9.0
On 1/4/10 11:43 PM, Dieter Menne wrote:> I have very large data sets given in a format similar to d below.
Converting
> these to a data frame is a bottleneck in my application. My fastest version
> is given below, but it look clumsy to me.
>
> Any ideas?
>
> Dieter
>
> # -----------------------
> len = 100000
> d = replicate(len, list(pH = 3,marker = TRUE,position =
"A"),FALSE)
> # Data are given as d
>
> # preallocate vectors
> pH =rep(0,len)
> marker =rep(0,len)
> position =rep(0,len)
>
> system.time(
> {
> for (i in 1:len)
> {
> d1 = d[[i]]
> #Assign to vectors
> pH[i] = d1[[1]]
> marker[i] = d1[[2]]
> position[i] = d1[[3]]
> }
> # combine vectors
> pHAll = data.frame(pH,marker,position)
> }
> )
>
>
>
--
Greg Hirson
ghirson at ucdavis.edu
Graduate Student
Agricultural and Environmental Chemistry
1106 Robert Mondavi Institute North
One Shields Avenue
Davis, CA 95616
The as.numeric(d.df$pH) should have been an as.numeric(as.character(d.df$pH)). Sorry for the confusion. Greg On 1/5/10 12:33 AM, Greg Hirson wrote:> Dieter, > > I'd approach this by first making a matrix, then converting to a data > frame with appropriate types. I'm sure there is a way to do it with > structure in one step, but here it is: > > a <- function() { > len <- 100000 > d <- replicate(len, list(pH = 3, marker = TRUE, position = "A"), FALSE) > d.matrix <- matrix(unlist(d), ncol = 3, byrow = TRUE) > d.df <- as.data.frame(d) > names(d.df) <- c('pH', 'marker', 'position') > > #d.df$pH <- as.numeric(d.df$pH) #incorrectd.df$pH <- as.numeric(as.character(d.df$pH)) #correct> d.df$marker <- as.logical(d.df$marker) > return(d.df) > } > > system.time(a) > > > On 1/4/10 11:43 PM, Dieter Menne wrote: >> I have very large data sets given in a format similar to d below. Converting >> these to a data frame is a bottleneck in my application. My fastest version >> is given below, but it look clumsy to me. >> >> Any ideas? >> >> Dieter >> >> # ----------------------- >> len = 100000 >> d = replicate(len, list(pH = 3,marker = TRUE,position = "A"),FALSE) >> # Data are given as d >> >> # preallocate vectors >> pH =rep(0,len) >> marker =rep(0,len) >> position =rep(0,len) >> >> system.time( >> { >> for (i in 1:len) >> { >> d1 = d[[i]] >> #Assign to vectors >> pH[i] = d1[[1]] >> marker[i] = d1[[2]] >> position[i] = d1[[3]] >> } >> # combine vectors >> pHAll = data.frame(pH,marker,position) >> } >> ) >> >> >> > > -- > Greg Hirson > ghirson@ucdavis.edu > > Graduate Student > Agricultural and Environmental Chemistry > > 1106 Robert Mondavi Institute North > One Shields Avenue > Davis, CA 95616 >-- Greg Hirson ghirson@ucdavis.edu Graduate Student Agricultural and Environmental Chemistry 1106 Robert Mondavi Institute North One Shields Avenue Davis, CA 95616 [[alternative HTML version deleted]]