thr3ads.net - R help - [R] summarize dataframe based on multiple cols, not their combinations [Mar 2013]

If this information is useful, please help other people find it:
Share via:

Alexander Shenkin

2013-Mar-20 19:57 UTC

[R] summarize dataframe based on multiple cols, not their combinations

Hi folks,

I'm trying to figure out how to get summarized data based on multiple
columns.  However, instead of giving summaries for every combination of
categorical columns, I want it for each value of each categorical column
regardless of the other columns.  I could do this with three different
commands, but i'm wondering if there's a more elegant way that I'm
missing.  Thanks!

allie
> my_df = data.frame(a = c(1,1,1,0,0,0), b=c(0,0,0,1,1,1),c=c(1,0,1,0,1,0), dat=c(10,11,12,13,14,15))
> my_df  a b c dat
1 1 0 1  10
2 1 0 0  11
3 1 0 1  12
4 0 1 0  13
5 0 1 1  14
6 0 1 0  15
> # not what I want
> ddply(my_df, .(a,b,c), function(x) c("mean"=mean(x$dat),
"n"=nrow(x)))  a b c mean n
1 0 1 0   14 2
2 0 1 1   14 1
3 1 0 0   11 1
4 1 0 1   11 2

What I want:
  a b c mean n
1 1 * *   11 3
2 * 1 *   14 3
3 * * 1   12 3

where "*" refers to any value of the other columns.

Ista Zahn

2013-Mar-20 20:18 UTC

head link

[R] summarize dataframe based on multiple cols, not their combinations

How about

library(reshape2)
mdf.m <- melt(my_df, measure.vars=c("a", "b",
"c"))
mdf.m <- mdf.m[mdf.m$value > 0, ]

ddply(mdf.m, "variable", function(x) c("mean"=mean(x$dat),
"n"=nrow(x)))

?

Best,
Ista

On Wed, Mar 20, 2013 at 3:57 PM, Alexander Shenkin <ashenkin at ufl.edu>
wrote:> Hi folks,
>
> I'm trying to figure out how to get summarized data based on multiple
> columns.  However, instead of giving summaries for every combination of
> categorical columns, I want it for each value of each categorical column
> regardless of the other columns.  I could do this with three different
> commands, but i'm wondering if there's a more elegant way that
I'm
> missing.  Thanks!
>
> allie
>
>> my_df = data.frame(a = c(1,1,1,0,0,0), b=c(0,0,0,1,1,1),
> c=c(1,0,1,0,1,0), dat=c(10,11,12,13,14,15))
>
>> my_df
>   a b c dat
> 1 1 0 1  10
> 2 1 0 0  11
> 3 1 0 1  12
> 4 0 1 0  13
> 5 0 1 1  14
> 6 0 1 0  15
>
>> # not what I want
>> ddply(my_df, .(a,b,c), function(x) c("mean"=mean(x$dat),
"n"=nrow(x)))
>   a b c mean n
> 1 0 1 0   14 2
> 2 0 1 1   14 1
> 3 1 0 0   11 1
> 4 1 0 1   11 2
>
> What I want:
>   a b c mean n
> 1 1 * *   11 3
> 2 * 1 *   14 3
> 3 * * 1   12 3
>
> where "*" refers to any value of the other columns.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

John Kane

2013-Mar-20 20:24 UTC

head link

[R] summarize dataframe based on multiple cols, not their combinations

Will this do?

library(plyr)
  
  ddply(my_df, .(a), summarize, mm = mean(dat), number = length(dat))

John Kane
Kingston ON Canada

> -----Original Message-----
> From: ashenkin at ufl.edu
> Sent: Wed, 20 Mar 2013 14:57:36 -0500
> To: r-help at r-project.org
> Subject: [R] summarize dataframe based on multiple cols, not their
> combinations
> 
> Hi folks,
> 
> I'm trying to figure out how to get summarized data based on multiple
> columns.  However, instead of giving summaries for every combination of
> categorical columns, I want it for each value of each categorical column
> regardless of the other columns.  I could do this with three different
> commands, but i'm wondering if there's a more elegant way that
I'm
> missing.  Thanks!
> 
> allie
> 
>> my_df = data.frame(a = c(1,1,1,0,0,0), b=c(0,0,0,1,1,1),
> c=c(1,0,1,0,1,0), dat=c(10,11,12,13,14,15))
> 
>> my_df
>   a b c dat
> 1 1 0 1  10
> 2 1 0 0  11
> 3 1 0 1  12
> 4 0 1 0  13
> 5 0 1 1  14
> 6 0 1 0  15
> 
>> # not what I want
>> ddply(my_df, .(a,b,c), function(x) c("mean"=mean(x$dat),
"n"=nrow(x)))
>   a b c mean n
> 1 0 1 0   14 2
> 2 0 1 1   14 1
> 3 1 0 0   11 1
> 4 1 0 1   11 2
> 
> What I want:
>   a b c mean n
> 1 1 * *   11 3
> 2 * 1 *   14 3
> 3 * * 1   12 3
> 
> where "*" refers to any value of the other columns.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
____________________________________________________________
FREE 3D EARTH SCREENSAVER - Watch the Earth right on your desktop!

arun

2013-Mar-20 20:47 UTC

head link

[R] summarize dataframe based on multiple cols, not their combinations

Hi,
?lst1<- lapply(letters[1:3],function(i)
{df1<-data.frame(my_df[i],my_df["dat"]);
res<-ddply(df1,.(df1[[i]]),function(x)
c("mean"=mean(x$dat),"n"=nrow(x)));names(res)[1]<-i;res<-res[res[,1]==1,]})

res1<-Reduce(function(...) merge(...,all=TRUE),lst1)
res1[is.na(res1)]<-"*"
?res1
#? mean n a b c
#1?? 11 3 1 * *
#2?? 12 3 * * 1
#3?? 14 3 * 1 *

A.K.



----- Original Message -----
From: Alexander Shenkin <ashenkin at ufl.edu>
To: r-help at r-project.org
Cc: 
Sent: Wednesday, March 20, 2013 3:57 PM
Subject: [R] summarize dataframe based on multiple cols, not their combinations

Hi folks,

I'm trying to figure out how to get summarized data based on multiple
columns.? However, instead of giving summaries for every combination of
categorical columns, I want it for each value of each categorical column
regardless of the other columns.? I could do this with three different
commands, but i'm wondering if there's a more elegant way that I'm
missing.? Thanks!

allie
> my_df = data.frame(a = c(1,1,1,0,0,0), b=c(0,0,0,1,1,1),c=c(1,0,1,0,1,0), dat=c(10,11,12,13,14,15))
> my_df? a b c dat
1 1 0 1? 10
2 1 0 0? 11
3 1 0 1? 12
4 0 1 0? 13
5 0 1 1? 14
6 0 1 0? 15
> # not what I want
> ddply(my_df, .(a,b,c), function(x) c("mean"=mean(x$dat),
"n"=nrow(x)))? a b c mean n
1 0 1 0?  14 2
2 0 1 1?  14 1
3 1 0 0?  11 1
4 1 0 1?  11 2

What I want:
? a b c mean n
1 1 * *?  11 3
2 * 1 *?  14 3
3 * * 1?  12 3

where "*" refers to any value of the other columns.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Apparently Analagous Threads

Search for more maybe matching threads

R help - Mar 2013 - summarize dataframe based on multiple cols, not their combinations

[R] summarize dataframe based on multiple cols, not their combinations

[R] summarize dataframe based on multiple cols, not their combinations

[R] summarize dataframe based on multiple cols, not their combinations

[R] summarize dataframe based on multiple cols, not their combinations

Apparently Analagous Threads