thr3ads.net - R help - [R] Combining a list of similar dataframes into a single dataframe [Jul 2006]

If this information is useful, please help other people find it:
Share via:

Mike Nielsen

2006-Jul-08 22:40 UTC

[R] Combining a list of similar dataframes into a single dataframe

I would be very grateful to anyone who could point to the error of my
ways in the following.

I have a dataframe called net1, as such:
> str(net1)`data.frame':    114192 obs. of  9 variables:
 $ server         : Factor w/ 122 levels
"AB93-99","AMP93-1",..: 1 1 1
1 1 1 1 1 1 1 ...
 $ ts             :'POSIXct', format: chr  "2006-06-30
12:31:44"
"2006-06-30 12:31:44" "2006-06-30 12:31:44" "2006-06-30
12:31:44" ...
 $ instance       : Factor w/ 22 levels "1","2","Compaq
Ethernet_Fast
Ethernet Adapter_Module",..: 4 4 4 4 4 4 4 4 4 4 ...
 $ instanceno     : Factor w/ 3 levels
"1","2","3": 1 1 1 1 1 1 1 1 1 1 ...
 $ perftime       : num  3.16e+13 3.16e+13 3.16e+13 3.16e+13 3.16e+13 ...
 $ perffreq       : num  6.99e+08 6.99e+08 6.99e+08 6.99e+08 6.99e+08 ...
 $ perftime100nsec: num  1.28e+17 1.28e+17 1.28e+17 1.28e+17 1.28e+17 ...
 $ countername    : Factor w/ 4 levels "Bytes Received/sec",..: 1 3 2
4 1 3 2 4 1 3 ...
 $ countervalue   : num  6.08e+07 6.64e+07 5.58e+06 1.00e+08 6.09e+07
...>
What I am trying to do is subset this thing down by server, instance,
instanceno, countername and then apply a function to each subsetted
dataframe.  The function performs a calculation on countervalue,
essentially "collapsing" instanceno and instance down to a single
value.

Here is a snippet of my code:
t1 <- by(net1,
         list(
              net1$server,
              factor(as.character(net1$countername))),# get rid of
unused levels of countername for this server
         function(x){
           g <- by(x,
                   list(factor(as.character(x$instance)), # get rid of
unused levels of instance for this server
                   factor(as.character(x$instanceno))),   # same with instanceno

function(y){c(NA,mean(y$perffreq)*diff(y$countervalue)/diff(y$perftime))})
           data.frame(server=x$server,
                      ts=x$ts,
                      countername = x$countername,
                      countervalue apply(sapply(g[!sapply(g,is.null)],I),1,sum))
         })

So t1 then is a list of dataframes, each with an identical set of columns)
> str(t1[[1]])`data.frame':	149 obs. of  4 variables:
 $ server      : Factor w/ 122 levels
"AB93-99","AMP93-1",..: 1 1 1 1
1 1 1 1 1 1 ...
 $ ts          :'POSIXct', format: chr  "2006-06-30 12:31:44"
"2006-06-30 12:32:58" "2006-06-30 12:34:46" "2006-06-30
12:36:55" ...
 $ countername : Factor w/ 4 levels "Bytes Received/sec",..: 1 1 1 1 1
1 1 1 1 1 ...
 $ countervalue: num    NA  938  816 4213  906 ...

What I'd dearly love to do, without looping or lapply-ing through t1
and rbinding (too much data for this to finish quickly enough -- this
is about 10% of what I'm eventually going to have to manage), is
convert t1 to one big dataframe.

On the other hand, I admit that I may be going about this wrongly from
the start; perhaps there's a better approach?

Any pointers would be most gratefully received.

Many thanks!


-- 
Regards,

Mike Nielsen

Mike Nielsen

2006-Jul-08 23:20 UTC

head link

[R] Combining a list of similar dataframes into a single dataframe

Well, this worked, and rather more quickly than I had expected.

Many thanks to the dogs, who told me the answer in return for walking
them and feeding them!
> jj <- eval(parse(text=paste(sep="
","rbind(",paste(sep="
","t1[[",1:length(t1),"]]",collapse=","),")")))
> str(jj)`data.frame':	85644 obs. of  4 variables:
 $ server      : Factor w/ 122 levels
"AB93-99","AMP93-1",..: 1 1 1 1
1 1 1 1 1 1 ...
 $ ts          :'POSIXct', format: chr  "2006-06-30 12:31:44"
"2006-06-30 12:32:58" "2006-06-30 12:34:46" "2006-06-30
12:36:55" ...
 $ countername : Factor w/ 4 levels "Bytes Received/sec",..: 1 1 1 1 1
1 1 1 1 1 ...
 $ countervalue: num    NA  938  816 4213  906 ...>
On 7/8/06, Mike Nielsen <mr.blacksheep at gmail.com>
wrote:> I would be very grateful to anyone who could point to the error of my
> ways in the following.
>
> I have a dataframe called net1, as such:
>
> > str(net1)
> `data.frame':    114192 obs. of  9 variables:
>  $ server         : Factor w/ 122 levels
"AB93-99","AMP93-1",..: 1 1 1
> 1 1 1 1 1 1 1 ...
>  $ ts             :'POSIXct', format: chr  "2006-06-30
12:31:44"
> "2006-06-30 12:31:44" "2006-06-30 12:31:44"
"2006-06-30 12:31:44" ...
>  $ instance       : Factor w/ 22 levels
"1","2","Compaq Ethernet_Fast
> Ethernet Adapter_Module",..: 4 4 4 4 4 4 4 4 4 4 ...
>  $ instanceno     : Factor w/ 3 levels
"1","2","3": 1 1 1 1 1 1 1 1 1 1 ...
>  $ perftime       : num  3.16e+13 3.16e+13 3.16e+13 3.16e+13 3.16e+13 ...
>  $ perffreq       : num  6.99e+08 6.99e+08 6.99e+08 6.99e+08 6.99e+08 ...
>  $ perftime100nsec: num  1.28e+17 1.28e+17 1.28e+17 1.28e+17 1.28e+17 ...
>  $ countername    : Factor w/ 4 levels "Bytes Received/sec",..: 1
3 2
> 4 1 3 2 4 1 3 ...
>  $ countervalue   : num  6.08e+07 6.64e+07 5.58e+06 1.00e+08 6.09e+07 ...
> >
>
> What I am trying to do is subset this thing down by server, instance,
> instanceno, countername and then apply a function to each subsetted
> dataframe.  The function performs a calculation on countervalue,
> essentially "collapsing" instanceno and instance down to a single
> value.
>
> Here is a snippet of my code:
> t1 <- by(net1,
>          list(
>               net1$server,
>               factor(as.character(net1$countername))),# get rid of
> unused levels of countername for this server
>          function(x){
>            g <- by(x,
>                    list(factor(as.character(x$instance)), # get rid of
> unused levels of instance for this server
>                    factor(as.character(x$instanceno))),   # same with
instanceno
>
> function(y){c(NA,mean(y$perffreq)*diff(y$countervalue)/diff(y$perftime))})
>            data.frame(server=x$server,
>                       ts=x$ts,
>                       countername = x$countername,
>                       countervalue >
apply(sapply(g[!sapply(g,is.null)],I),1,sum))
>          })
>
> So t1 then is a list of dataframes, each with an identical set of columns)
>
> > str(t1[[1]])
> `data.frame':   149 obs. of  4 variables:
>  $ server      : Factor w/ 122 levels
"AB93-99","AMP93-1",..: 1 1 1 1
> 1 1 1 1 1 1 ...
>  $ ts          :'POSIXct', format: chr  "2006-06-30
12:31:44"
> "2006-06-30 12:32:58" "2006-06-30 12:34:46"
"2006-06-30 12:36:55" ...
>  $ countername : Factor w/ 4 levels "Bytes Received/sec",..: 1 1
1 1 1
> 1 1 1 1 1 ...
>  $ countervalue: num    NA  938  816 4213  906 ...
>
> What I'd dearly love to do, without looping or lapply-ing through t1
> and rbinding (too much data for this to finish quickly enough -- this
> is about 10% of what I'm eventually going to have to manage), is
> convert t1 to one big dataframe.
>
> On the other hand, I admit that I may be going about this wrongly from
> the start; perhaps there's a better approach?
>
> Any pointers would be most gratefully received.
>
> Many thanks!
>
>
> --
> Regards,
>
> Mike Nielsen
>

-- 
Regards,

Mike Nielsen

Apparently Analagous Threads

Search for more reasonably related threads

R help - Jul 2006 - Combining a list of similar dataframes into a single dataframe

[R] Combining a list of similar dataframes into a single dataframe

[R] Combining a list of similar dataframes into a single dataframe

Apparently Analagous Threads