thr3ads.net - R help - [R] Working with data-frame [Nov 2014]

If this information is useful, please help other people find it:
Share via:

Christofer Bogaso

2014-Nov-09 19:58 UTC

[R] Working with data-frame

Hi again,

Let say, I have following data frame:


Dat <- structure(list(A1 = structure(c(3L, 3L, 1L, 3L, 3L, 3L, 3L, 2L,
3L, 3L, 1L, 2L, 3L, 2L, 1L, 1L, 3L, 3L, 2L, 3L, 2L, 2L, 3L, 3L,
3L, 2L, 3L, 1L, 1L, 3L), .Label = c("a", "b",
"c"), class = "factor"),
    A2 = c(2, 3, 2, 1, 3, 3, 2, 2, 3, 1, 3, 1, 3, 3, 2, 2, 1,
    2, 1, 2, 1, 3, 3, 2, 1, 2, 3, 2, 2, 2), C1 = 1:30), .Names =
c("A1",
"A2", "C1"), row.names = c(NA, -30L), class =
"data.frame")


Now my goal is :
1: Find all possible unique combinations of column 'A1' & column
'A2'.
For example A1 = c, A2 = 2 is 1 unique combination.

2. For each such unique combination, calculate sum for 'A3'.

Is there any direct R function to achieve this faster way? I have very
large data-frame to handle with such calculation.

I tried with spilt() function. However it looks to me that, it can
split a data-frame w.r.t. only one column.

Thanks for your suggestion

Bert Gunter

2014-Nov-09 20:50 UTC

head link

[R] Working with data-frame

Christopher:

If I understand correctly, see ?ave, or ?tapply, depending on what
form you want to be returned.

The trick is to first paste the columns together on whose unique you
want to split to form a singtl factor. e.g. of the form

lapply(split(yourcolumn,paste0(...)),FUN= sum)

However, the above functions already have this "built" in, so you
don't need to do this explicitly, although my impression is that it
may be a tad more efficient to do it the long way. But don't quote me
on this!

Cheers,
Bert

Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
Clifford Stoll




On Sun, Nov 9, 2014 at 11:58 AM, Christofer Bogaso
<bogaso.christofer at gmail.com> wrote:> Hi again,
>
> Let say, I have following data frame:
>
>
> Dat <- structure(list(A1 = structure(c(3L, 3L, 1L, 3L, 3L, 3L, 3L, 2L,
> 3L, 3L, 1L, 2L, 3L, 2L, 1L, 1L, 3L, 3L, 2L, 3L, 2L, 2L, 3L, 3L,
> 3L, 2L, 3L, 1L, 1L, 3L), .Label = c("a", "b",
"c"), class = "factor"),
>     A2 = c(2, 3, 2, 1, 3, 3, 2, 2, 3, 1, 3, 1, 3, 3, 2, 2, 1,
>     2, 1, 2, 1, 3, 3, 2, 1, 2, 3, 2, 2, 2), C1 = 1:30), .Names =
c("A1",
> "A2", "C1"), row.names = c(NA, -30L), class =
"data.frame")
>
>
> Now my goal is :
> 1: Find all possible unique combinations of column 'A1' &
column 'A2'.
> For example A1 = c, A2 = 2 is 1 unique combination.
>
> 2. For each such unique combination, calculate sum for 'A3'.
>
> Is there any direct R function to achieve this faster way? I have very
> large data-frame to handle with such calculation.
>
> I tried with spilt() function. However it looks to me that, it can
> split a data-frame w.r.t. only one column.
>
> Thanks for your suggestion
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

William Dunlap

2014-Nov-09 21:39 UTC

head link

[R] Working with data-frame

> I tried with spilt() function. However it looks to me that, it can
> split a data-frame w.r.t. only one column.
(I assume you you meant 'split', not 'spilt'.)

You did not show what you tried, but the following splits Dat by its
"A1"
and "A2" columns (creating a list of data.frames):
   split(Dat, f=Dat[,c("A1","A2")])

aggregate(), in core R, combine the split and the lapply needed to
calculate groupwise sums.  E.g.,
  aggregate(Dat$C1, by=Dat[,c("A1","A2")], FUN=sum)
  aggregate(C1 ~ A1 + A2, data=Dat, FUN=sum)

The plyr and dplyr packages have other ways to do this sort of thing.


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Sun, Nov 9, 2014 at 11:58 AM, Christofer Bogaso <
bogaso.christofer at gmail.com> wrote:
> Hi again,
>
> Let say, I have following data frame:
>
>
> Dat <- structure(list(A1 = structure(c(3L, 3L, 1L, 3L, 3L, 3L, 3L, 2L,
> 3L, 3L, 1L, 2L, 3L, 2L, 1L, 1L, 3L, 3L, 2L, 3L, 2L, 2L, 3L, 3L,
> 3L, 2L, 3L, 1L, 1L, 3L), .Label = c("a", "b",
"c"), class = "factor"),
>     A2 = c(2, 3, 2, 1, 3, 3, 2, 2, 3, 1, 3, 1, 3, 3, 2, 2, 1,
>     2, 1, 2, 1, 3, 3, 2, 1, 2, 3, 2, 2, 2), C1 = 1:30), .Names =
c("A1",
> "A2", "C1"), row.names = c(NA, -30L), class =
"data.frame")
>
>
> Now my goal is :
> 1: Find all possible unique combinations of column 'A1' &
column 'A2'.
> For example A1 = c, A2 = 2 is 1 unique combination.
>
> 2. For each such unique combination, calculate sum for 'A3'.
>
> Is there any direct R function to achieve this faster way? I have very
> large data-frame to handle with such calculation.
>
> I tried with spilt() function. However it looks to me that, it can
> split a data-frame w.r.t. only one column.
>
> Thanks for your suggestion
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

R help - Nov 2014 - Working with data-frame

[R] Working with data-frame

[R] Working with data-frame

[R] Working with data-frame