thr3ads.net - R help - [R] Using by() and stacking back sub-data frames to one data frame [Jun 2009]

If this information is useful, please help other people find it:
Share via:

Stephan Lindner

2009-Jun-25 03:34 UTC

[R] Using by() and stacking back sub-data frames to one data frame

Dear all,


I have a code where I subset a data frame to match entries within
levels of an factor (actually, the full script uses three difference
factors do do that). I'm very happy with the precision with which I can
work with R, but since I loop over factor levels, and the data frame is
big, the process is slow. So I've been trying to speed up the process
using by(), but I got stuck at the point where I want to stack back
the sub- data frames, and I was wondering whether someone could help me
out. 

Here is an example:

<-- 
> y <- data.frame(suid  = c(rep(1074034,16),rep(1123003,4)),                 month = rep(c(12,1,2,3),5),
                 esr   = c(6,2,2,2,1,1,1,1,2,2,2,2,9,9,9,9,2,2,2,2))

> by(y,y$month,function(x)return(x))
y$month: 1
      suid month esr
2  1074034     1   2
6  1074034     1   1
10 1074034     1   2
14 1074034     1   9
18 1123003     1   2
------------------------------------------------------------ 
y$month: 2
      suid month esr
3  1074034     2   2
7  1074034     2   1
11 1074034     2   2
15 1074034     2   9
19 1123003     2   2
------------------------------------------------------------ 
y$month: 3
      suid month esr
4  1074034     3   2
8  1074034     3   1
12 1074034     3   2
16 1074034     3   9
20 1123003     3   2
------------------------------------------------------------ 
y$month: 12
      suid month esr
1  1074034    12   6
5  1074034    12   1
9  1074034    12   2
13 1074034    12   9
17 1123003    12   2

--> 

What I would like to do is stacking these four data frames back to one
data frame, which in this simple example would just be y. I tried
unlist(), unclass() and rbind(), but none of them would work. 


Thanks a lot,



	Stephan










-- 
-----------------------
Stephan Lindner
University of Michigan

Kingsford Jones

2009-Jun-25 05:15 UTC

head link

[R] Using by() and stacking back sub-data frames to one data frame

try

do.call(rbind, yourByList)


hth,
Kingsford Jones

On Wed, Jun 24, 2009 at 9:34 PM, Stephan Lindner<lindners at umich.edu>
wrote:> Dear all,
>
>
> I have a code where I subset a data frame to match entries within
> levels of an factor (actually, the full script uses three difference
> factors do do that). I'm very happy with the precision with which I can
> work with R, but since I loop over factor levels, and the data frame is
> big, the process is slow. So I've been trying to speed up the process
> using by(), but I got stuck at the point where I want to stack back
> the sub- data frames, and I was wondering whether someone could help me
> out.
>
> Here is an example:
>
> <--
>
>> y <- data.frame(suid ?= c(rep(1074034,16),rep(1123003,4)),
> ? ? ? ? ? ? ? ? month = rep(c(12,1,2,3),5),
> ? ? ? ? ? ? ? ? esr ? = c(6,2,2,2,1,1,1,1,2,2,2,2,9,9,9,9,2,2,2,2))
>
>
>> by(y,y$month,function(x)return(x))
>
> y$month: 1
> ? ? ?suid month esr
> 2 ?1074034 ? ? 1 ? 2
> 6 ?1074034 ? ? 1 ? 1
> 10 1074034 ? ? 1 ? 2
> 14 1074034 ? ? 1 ? 9
> 18 1123003 ? ? 1 ? 2
> ------------------------------------------------------------
> y$month: 2
> ? ? ?suid month esr
> 3 ?1074034 ? ? 2 ? 2
> 7 ?1074034 ? ? 2 ? 1
> 11 1074034 ? ? 2 ? 2
> 15 1074034 ? ? 2 ? 9
> 19 1123003 ? ? 2 ? 2
> ------------------------------------------------------------
> y$month: 3
> ? ? ?suid month esr
> 4 ?1074034 ? ? 3 ? 2
> 8 ?1074034 ? ? 3 ? 1
> 12 1074034 ? ? 3 ? 2
> 16 1074034 ? ? 3 ? 9
> 20 1123003 ? ? 3 ? 2
> ------------------------------------------------------------
> y$month: 12
> ? ? ?suid month esr
> 1 ?1074034 ? ?12 ? 6
> 5 ?1074034 ? ?12 ? 1
> 9 ?1074034 ? ?12 ? 2
> 13 1074034 ? ?12 ? 9
> 17 1123003 ? ?12 ? 2
>
> -->
>
> What I would like to do is stacking these four data frames back to one
> data frame, which in this simple example would just be y. I tried
> unlist(), unclass() and rbind(), but none of them would work.
>
>
> Thanks a lot,
>
>
>
> ? ? ? ?Stephan
>
>
>
>
>
>
>
>
>
>
> --
> -----------------------
> Stephan Lindner
> University of Michigan
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

jim holtman

2009-Jun-25 11:32 UTC

head link

[R] Using by() and stacking back sub-data frames to one data frame

One thing you might consider when working with large dataframes is that
instead of partitioning the dataframe into smaller ones, create a list of
indices and use that to access the subset.  Works especially well when using
'lapply' to cromp through many segments of a data frame:
> y      suid month esr
1  1074034    12   6
2  1074034     1   2
3  1074034     2   2
4  1074034     3   2
5  1074034    12   1
6  1074034     1   1
7  1074034     2   1
8  1074034     3   1
9  1074034    12   2
10 1074034     1   2
11 1074034     2   2
12 1074034     3   2
13 1074034    12   9
14 1074034     1   9
15 1074034     2   9
16 1074034     3   9
17 1123003    12   2
18 1123003     1   2
19 1123003     2   2
20 1123003     3   2> y.ind <- split(seq(nrow(y)), y$month)
> y.ind$`1`
[1]  2  6 10 14 18
$`2`
[1]  3  7 11 15 19
$`3`
[1]  4  8 12 16 20
$`12`
[1]  1  5  9 13 17> # a subset
> y[y.ind[['12']],]      suid month esr
1  1074034    12   6
5  1074034    12   1
9  1074034    12   2
13 1074034    12   9
17 1123003    12   2>

On Wed, Jun 24, 2009 at 11:34 PM, Stephan Lindner
<lindners@umich.edu>wrote:
> Dear all,
>
>
> I have a code where I subset a data frame to match entries within
> levels of an factor (actually, the full script uses three difference
> factors do do that). I'm very happy with the precision with which I can
> work with R, but since I loop over factor levels, and the data frame is
> big, the process is slow. So I've been trying to speed up the process
> using by(), but I got stuck at the point where I want to stack back
> the sub- data frames, and I was wondering whether someone could help me
> out.
>
> Here is an example:
>
> <--
>
> > y <- data.frame(suid  = c(rep(1074034,16),rep(1123003,4)),
>                 month = rep(c(12,1,2,3),5),
>                 esr   = c(6,2,2,2,1,1,1,1,2,2,2,2,9,9,9,9,2,2,2,2))
>
>
> > by(y,y$month,function(x)return(x))
>
> y$month: 1
>      suid month esr
> 2  1074034     1   2
> 6  1074034     1   1
> 10 1074034     1   2
> 14 1074034     1   9
> 18 1123003     1   2
> ------------------------------------------------------------
> y$month: 2
>      suid month esr
> 3  1074034     2   2
> 7  1074034     2   1
> 11 1074034     2   2
> 15 1074034     2   9
> 19 1123003     2   2
> ------------------------------------------------------------
> y$month: 3
>      suid month esr
> 4  1074034     3   2
> 8  1074034     3   1
> 12 1074034     3   2
> 16 1074034     3   9
> 20 1123003     3   2
> ------------------------------------------------------------
> y$month: 12
>      suid month esr
> 1  1074034    12   6
> 5  1074034    12   1
> 9  1074034    12   2
> 13 1074034    12   9
> 17 1123003    12   2
>
> -->
>
> What I would like to do is stacking these four data frames back to one
> data frame, which in this simple example would just be y. I tried
> unlist(), unclass() and rbind(), but none of them would work.
>
>
> Thanks a lot,
>
>
>
>        Stephan
>
>
>
>
>
>
>
>
>
>
> --
> -----------------------
> Stephan Lindner
> University of Michigan
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
>
http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

	[[alternative HTML version deleted]]

hadley wickham

2009-Jun-25 12:59 UTC

head link

[R] Using by() and stacking back sub-data frames to one data frame

Have a look at ddply from the plyr package, http://had.co.nz/plyr.
It's made for exactly this type of operation.

Hadley

On Wed, Jun 24, 2009 at 10:34 PM, Stephan Lindner<lindners at umich.edu>
wrote:> Dear all,
>
>
> I have a code where I subset a data frame to match entries within
> levels of an factor (actually, the full script uses three difference
> factors do do that). I'm very happy with the precision with which I can
> work with R, but since I loop over factor levels, and the data frame is
> big, the process is slow. So I've been trying to speed up the process
> using by(), but I got stuck at the point where I want to stack back
> the sub- data frames, and I was wondering whether someone could help me
> out.
>
> Here is an example:
>
> <--
>
>> y <- data.frame(suid ?= c(rep(1074034,16),rep(1123003,4)),
> ? ? ? ? ? ? ? ? month = rep(c(12,1,2,3),5),
> ? ? ? ? ? ? ? ? esr ? = c(6,2,2,2,1,1,1,1,2,2,2,2,9,9,9,9,2,2,2,2))
>
>
>> by(y,y$month,function(x)return(x))
>
> y$month: 1
> ? ? ?suid month esr
> 2 ?1074034 ? ? 1 ? 2
> 6 ?1074034 ? ? 1 ? 1
> 10 1074034 ? ? 1 ? 2
> 14 1074034 ? ? 1 ? 9
> 18 1123003 ? ? 1 ? 2
> ------------------------------------------------------------
> y$month: 2
> ? ? ?suid month esr
> 3 ?1074034 ? ? 2 ? 2
> 7 ?1074034 ? ? 2 ? 1
> 11 1074034 ? ? 2 ? 2
> 15 1074034 ? ? 2 ? 9
> 19 1123003 ? ? 2 ? 2
> ------------------------------------------------------------
> y$month: 3
> ? ? ?suid month esr
> 4 ?1074034 ? ? 3 ? 2
> 8 ?1074034 ? ? 3 ? 1
> 12 1074034 ? ? 3 ? 2
> 16 1074034 ? ? 3 ? 9
> 20 1123003 ? ? 3 ? 2
> ------------------------------------------------------------
> y$month: 12
> ? ? ?suid month esr
> 1 ?1074034 ? ?12 ? 6
> 5 ?1074034 ? ?12 ? 1
> 9 ?1074034 ? ?12 ? 2
> 13 1074034 ? ?12 ? 9
> 17 1123003 ? ?12 ? 2
>
> -->
>
> What I would like to do is stacking these four data frames back to one
> data frame, which in this simple example would just be y. I tried
> unlist(), unclass() and rbind(), but none of them would work.
>
>
> Thanks a lot,
>
>
>
> ? ? ? ?Stephan
>
>
>
>
>
>
>
>
>
>
> --
> -----------------------
> Stephan Lindner
> University of Michigan
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
http://had.co.nz/

David Winsemius

2009-Jun-25 13:07 UTC

head link

[R] Using by() and stacking back sub-data frames to one data frame

Your request for a more general approach is precisely the reason that  
Hadley Wickham wrote the plyr package. He describes a split-apply- 
combine strategy for a variety of data structures and tools to  
implement those strategies here:

http://had.co.nz/plyr/plyr-intro-090510.pdf

The argument to the "by" stp is a column name rather than a list or  
object as it would be in tapply or split. I is just the identity  
function which doubles for return(x) in your code.

library(plyr)
 > ddply(y, "month", fun=I)
       suid month esr
1  1074034     1   2
2  1074034     1   1
3  1074034     1   2
4  1074034     1   9
5  1123003     1   2
6  1074034     2   2
7  1074034     2   1
8  1074034     2   2
9  1074034     2   9
10 1123003     2   2
11 1074034     3   2
12 1074034     3   1
13 1074034     3   2
14 1074034     3   9
15 1123003     3   2
16 1074034    12   6
17 1074034    12   1
18 1074034    12   2
19 1074034    12   9
20 1123003    12   2

On Jun 24, 2009, at 11:34 PM, Stephan Lindner wrote:
> Dear all,
>
>
> I have a code where I subset a data frame to match entries within
> levels of an factor (actually, the full script uses three difference
> factors do do that). I'm very happy with the precision with which I  
> can
> work with R, but since I loop over factor levels, and the data frame  
> is
> big, the process is slow. So I've been trying to speed up the process
> using by(), but I got stuck at the point where I want to stack back
> the sub- data frames, and I was wondering whether someone could help  
> me
> out.
>
> Here is an example:
>
> <--
>
>> y <- data.frame(suid  = c(rep(1074034,16),rep(1123003,4)),
>                 month = rep(c(12,1,2,3),5),
>                 esr   = c(6,2,2,2,1,1,1,1,2,2,2,2,9,9,9,9,2,2,2,2))
>
>
>> by(y,y$month,function(x)return(x))
>
> y$month: 1
>      suid month esr
> 2  1074034     1   2
> 6  1074034     1   1
> 10 1074034     1   2
> 14 1074034     1   9
> 18 1123003     1   2
> ------------------------------------------------------------
> y$month: 2
>      suid month esr
> 3  1074034     2   2
> 7  1074034     2   1
> 11 1074034     2   2
> 15 1074034     2   9
> 19 1123003     2   2
> ------------------------------------------------------------
> y$month: 3
>      suid month esr
> 4  1074034     3   2
> 8  1074034     3   1
> 12 1074034     3   2
> 16 1074034     3   9
> 20 1123003     3   2
> ------------------------------------------------------------
> y$month: 12
>      suid month esr
> 1  1074034    12   6
> 5  1074034    12   1
> 9  1074034    12   2
> 13 1074034    12   9
> 17 1123003    12   2
>
> -->
>
> What I would like to do is stacking these four data frames back to one
> data frame, which in this simple example would just be y. I tried
> unlist(), unclass() and rbind(), but none of them would work.
>
>
> Thanks a lot,
>
>
>
> 	Stephan
>
>
>
>
>
>
>
>
>
>
> -- 
> -----------------------
> Stephan Lindner
> University of Michigan
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
Heritage Laboratories
West Hartford, CT

Seemingly Similar Threads

Search for more possibly parallel threads

R help - Jun 2009 - Using by() and stacking back sub-data frames to one data frame

[R] Using by() and stacking back sub-data frames to one data frame

[R] Using by() and stacking back sub-data frames to one data frame

[R] Using by() and stacking back sub-data frames to one data frame

[R] Using by() and stacking back sub-data frames to one data frame

[R] Using by() and stacking back sub-data frames to one data frame

Seemingly Similar Threads