Mike Nielsen
2006-Jul-08 22:40 UTC
[R] Combining a list of similar dataframes into a single dataframe
I would be very grateful to anyone who could point to the error of my ways in the following. I have a dataframe called net1, as such:> str(net1)`data.frame': 114192 obs. of 9 variables: $ server : Factor w/ 122 levels "AB93-99","AMP93-1",..: 1 1 1 1 1 1 1 1 1 1 ... $ ts :'POSIXct', format: chr "2006-06-30 12:31:44" "2006-06-30 12:31:44" "2006-06-30 12:31:44" "2006-06-30 12:31:44" ... $ instance : Factor w/ 22 levels "1","2","Compaq Ethernet_Fast Ethernet Adapter_Module",..: 4 4 4 4 4 4 4 4 4 4 ... $ instanceno : Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 1 1 ... $ perftime : num 3.16e+13 3.16e+13 3.16e+13 3.16e+13 3.16e+13 ... $ perffreq : num 6.99e+08 6.99e+08 6.99e+08 6.99e+08 6.99e+08 ... $ perftime100nsec: num 1.28e+17 1.28e+17 1.28e+17 1.28e+17 1.28e+17 ... $ countername : Factor w/ 4 levels "Bytes Received/sec",..: 1 3 2 4 1 3 2 4 1 3 ... $ countervalue : num 6.08e+07 6.64e+07 5.58e+06 1.00e+08 6.09e+07 ...>What I am trying to do is subset this thing down by server, instance, instanceno, countername and then apply a function to each subsetted dataframe. The function performs a calculation on countervalue, essentially "collapsing" instanceno and instance down to a single value. Here is a snippet of my code: t1 <- by(net1, list( net1$server, factor(as.character(net1$countername))),# get rid of unused levels of countername for this server function(x){ g <- by(x, list(factor(as.character(x$instance)), # get rid of unused levels of instance for this server factor(as.character(x$instanceno))), # same with instanceno function(y){c(NA,mean(y$perffreq)*diff(y$countervalue)/diff(y$perftime))}) data.frame(server=x$server, ts=x$ts, countername = x$countername, countervalue apply(sapply(g[!sapply(g,is.null)],I),1,sum)) }) So t1 then is a list of dataframes, each with an identical set of columns)> str(t1[[1]])`data.frame': 149 obs. of 4 variables: $ server : Factor w/ 122 levels "AB93-99","AMP93-1",..: 1 1 1 1 1 1 1 1 1 1 ... $ ts :'POSIXct', format: chr "2006-06-30 12:31:44" "2006-06-30 12:32:58" "2006-06-30 12:34:46" "2006-06-30 12:36:55" ... $ countername : Factor w/ 4 levels "Bytes Received/sec",..: 1 1 1 1 1 1 1 1 1 1 ... $ countervalue: num NA 938 816 4213 906 ... What I'd dearly love to do, without looping or lapply-ing through t1 and rbinding (too much data for this to finish quickly enough -- this is about 10% of what I'm eventually going to have to manage), is convert t1 to one big dataframe. On the other hand, I admit that I may be going about this wrongly from the start; perhaps there's a better approach? Any pointers would be most gratefully received. Many thanks! -- Regards, Mike Nielsen
Mike Nielsen
2006-Jul-08 23:20 UTC
[R] Combining a list of similar dataframes into a single dataframe
Well, this worked, and rather more quickly than I had expected. Many thanks to the dogs, who told me the answer in return for walking them and feeding them!> jj <- eval(parse(text=paste(sep=" ","rbind(",paste(sep=" ","t1[[",1:length(t1),"]]",collapse=","),")"))) > str(jj)`data.frame': 85644 obs. of 4 variables: $ server : Factor w/ 122 levels "AB93-99","AMP93-1",..: 1 1 1 1 1 1 1 1 1 1 ... $ ts :'POSIXct', format: chr "2006-06-30 12:31:44" "2006-06-30 12:32:58" "2006-06-30 12:34:46" "2006-06-30 12:36:55" ... $ countername : Factor w/ 4 levels "Bytes Received/sec",..: 1 1 1 1 1 1 1 1 1 1 ... $ countervalue: num NA 938 816 4213 906 ...>On 7/8/06, Mike Nielsen <mr.blacksheep at gmail.com> wrote:> I would be very grateful to anyone who could point to the error of my > ways in the following. > > I have a dataframe called net1, as such: > > > str(net1) > `data.frame': 114192 obs. of 9 variables: > $ server : Factor w/ 122 levels "AB93-99","AMP93-1",..: 1 1 1 > 1 1 1 1 1 1 1 ... > $ ts :'POSIXct', format: chr "2006-06-30 12:31:44" > "2006-06-30 12:31:44" "2006-06-30 12:31:44" "2006-06-30 12:31:44" ... > $ instance : Factor w/ 22 levels "1","2","Compaq Ethernet_Fast > Ethernet Adapter_Module",..: 4 4 4 4 4 4 4 4 4 4 ... > $ instanceno : Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 1 1 ... > $ perftime : num 3.16e+13 3.16e+13 3.16e+13 3.16e+13 3.16e+13 ... > $ perffreq : num 6.99e+08 6.99e+08 6.99e+08 6.99e+08 6.99e+08 ... > $ perftime100nsec: num 1.28e+17 1.28e+17 1.28e+17 1.28e+17 1.28e+17 ... > $ countername : Factor w/ 4 levels "Bytes Received/sec",..: 1 3 2 > 4 1 3 2 4 1 3 ... > $ countervalue : num 6.08e+07 6.64e+07 5.58e+06 1.00e+08 6.09e+07 ... > > > > What I am trying to do is subset this thing down by server, instance, > instanceno, countername and then apply a function to each subsetted > dataframe. The function performs a calculation on countervalue, > essentially "collapsing" instanceno and instance down to a single > value. > > Here is a snippet of my code: > t1 <- by(net1, > list( > net1$server, > factor(as.character(net1$countername))),# get rid of > unused levels of countername for this server > function(x){ > g <- by(x, > list(factor(as.character(x$instance)), # get rid of > unused levels of instance for this server > factor(as.character(x$instanceno))), # same with instanceno > > function(y){c(NA,mean(y$perffreq)*diff(y$countervalue)/diff(y$perftime))}) > data.frame(server=x$server, > ts=x$ts, > countername = x$countername, > countervalue > apply(sapply(g[!sapply(g,is.null)],I),1,sum)) > }) > > So t1 then is a list of dataframes, each with an identical set of columns) > > > str(t1[[1]]) > `data.frame': 149 obs. of 4 variables: > $ server : Factor w/ 122 levels "AB93-99","AMP93-1",..: 1 1 1 1 > 1 1 1 1 1 1 ... > $ ts :'POSIXct', format: chr "2006-06-30 12:31:44" > "2006-06-30 12:32:58" "2006-06-30 12:34:46" "2006-06-30 12:36:55" ... > $ countername : Factor w/ 4 levels "Bytes Received/sec",..: 1 1 1 1 1 > 1 1 1 1 1 ... > $ countervalue: num NA 938 816 4213 906 ... > > What I'd dearly love to do, without looping or lapply-ing through t1 > and rbinding (too much data for this to finish quickly enough -- this > is about 10% of what I'm eventually going to have to manage), is > convert t1 to one big dataframe. > > On the other hand, I admit that I may be going about this wrongly from > the start; perhaps there's a better approach? > > Any pointers would be most gratefully received. > > Many thanks! > > > -- > Regards, > > Mike Nielsen >-- Regards, Mike Nielsen
Apparently Analagous Threads
- Combining a list of similar dataframes into a single data frame [Broadcast]
- choosing between Poisson regression models: no interactions vs. interactions
- drc results differ for different versions
- ggplot2: mixing colour and linetype in geom_line
- problem to get coefficient from lm()