thr3ads.net - R help - [R] computing marginal values based on multiple columns? [Dec 2012]

If this information is useful, please help other people find it:
Share via:

Simon

2012-Dec-04 09:49 UTC

[R] computing marginal values based on multiple columns?

Hello all,

I have what feels like a simple problem, but I can't find an simple
answer. Consider this data frame:
> x <- data.frame(sample1=c(35,176,182,193,124),sample2=c(198,176,190,23,15), sample3=c(12,154,21,191,156),
class=c('a','a','c','b','c'))
> x  sample1 sample2 sample3 class
1      35     198      12     a
2     176     176     154     a
3     182     190      21     c
4     193      23     191     b
5     124      15     156     c

Now I wish to know: for each sample, for values < 20% of the sample mean,
what percentage of those are class a?

I want to end up with a table like:

   sample1 sample2 sample3
1      1.0     0     0.5

I can calculate this for an individual sample using this rather clumsy
expression:

length(which(x$sample1 < mean(x$sample1) & x$class=='a')) /
length(which(x$sample1 < mean(x$sample1)))

I'd normally propagate it across the data frame using apply, but I
can't because it depends on more than one column.

Any help much appreciated!

Cheers,

Simon

	[[alternative HTML version deleted]]

Gerrit Eichner

2012-Dec-04 10:59 UTC

head link

[R] computing marginal values based on multiple columns?

Hello, Simon,

see below!


On Tue, 4 Dec 2012, Simon wrote:
> Hello all,
>
> I have what feels like a simple problem, but I can't find an simple
> answer. Consider this data frame:
>
>> x <- data.frame(sample1=c(35,176,182,193,124),
> sample2=c(198,176,190,23,15), sample3=c(12,154,21,191,156),
> class=c('a','a','c','b','c'))
>
>> x
>  sample1 sample2 sample3 class
> 1      35     198      12     a
> 2     176     176     154     a
> 3     182     190      21     c
> 4     193      23     191     b
> 5     124      15     156     c
>
> Now I wish to know: for each sample, for values < 20% of the sample
mean,
> what percentage of those are class a?
>
> I want to end up with a table like:
>
>   sample1 sample2 sample3
> 1      1.0     0     0.5

I can't reproduce this result from your description above, but if I 
understand the latter correctly, maybe the following does what you want:

x.wo.class <- subset( x, select = -class)
   # extract only the sample-columns

x.small.and.a <- x.wo.class < 0.2 * colMeans( x.wo.class) & x$class ==
"a"

apply( x.small.and.a, 2, function( xx) mean( x$class[ xx] == "a"))


  Hth  --  Gerrit

> I can calculate this for an individual sample using this rather clumsy
> expression:
>
> length(which(x$sample1 < mean(x$sample1) & x$class=='a')) /
> length(which(x$sample1 < mean(x$sample1)))
>
> I'd normally propagate it across the data frame using apply, but I
> can't because it depends on more than one column.
>
> Any help much appreciated!
>
> Cheers,
>
> Simon
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>

arun

2012-Dec-04 17:07 UTC

head link

[R] computing marginal values based on multiple columns?

HI,

I am not sure the output you wanted is correct: 

"
sample1 sample2 sample3
1? ? ? 1.0? ?  0? ?  0.5
"

because
0.2*colMeans(x[,-4])
sample1 sample2 sample3 
#? 28.40?? 24.08?? 21.36 


This might help you:
apply(x[-4],2,function(y) length(y[y <0.2*mean(y) &
x$class=="a"])/length(x[x$class=="a"]))
#sample1 sample2 sample3 
? #? 0.0???? 0.0???? 0.5 
A.K.



----- Original Message -----
From: Simon <simonzmail at gmail.com>
To: r-help at r-project.org
Cc: 
Sent: Tuesday, December 4, 2012 4:49 AM
Subject: [R] computing marginal values based on multiple columns?

Hello all,

I have what feels like a simple problem, but I can't find an simple
answer. Consider this data frame:
> x <- data.frame(sample1=c(35,176,182,193,124),sample2=c(198,176,190,23,15), sample3=c(12,154,21,191,156),
class=c('a','a','c','b','c'))
> x? sample1 sample2 sample3 class
1? ? ? 35? ?  198? ? ? 12? ?  a
2? ?  176? ?  176? ?  154? ?  a
3? ?  182? ?  190? ? ? 21? ?  c
4? ?  193? ? ? 23? ?  191? ?  b
5? ?  124? ? ? 15? ?  156? ?  c

Now I wish to know: for each sample, for values < 20% of the sample mean,
what percentage of those are class a?

I want to end up with a table like:

?  sample1 sample2 sample3
1? ? ? 1.0? ?  0? ?  0.5

I can calculate this for an individual sample using this rather clumsy
expression:

length(which(x$sample1 < mean(x$sample1) & x$class=='a')) /
length(which(x$sample1 < mean(x$sample1)))

I'd normally propagate it across the data frame using apply, but I
can't because it depends on more than one column.

Any help much appreciated!

Cheers,

Simon

??? [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Possibly Parallel Threads

Search for more possibly parallel threads

R help - Dec 2012 - computing marginal values based on multiple columns?

[R] computing marginal values based on multiple columns?

[R] computing marginal values based on multiple columns?

[R] computing marginal values based on multiple columns?

Possibly Parallel Threads