Hello all, I have what feels like a simple problem, but I can't find an simple answer. Consider this data frame:> x <- data.frame(sample1=c(35,176,182,193,124),sample2=c(198,176,190,23,15), sample3=c(12,154,21,191,156), class=c('a','a','c','b','c'))> xsample1 sample2 sample3 class 1 35 198 12 a 2 176 176 154 a 3 182 190 21 c 4 193 23 191 b 5 124 15 156 c Now I wish to know: for each sample, for values < 20% of the sample mean, what percentage of those are class a? I want to end up with a table like: sample1 sample2 sample3 1 1.0 0 0.5 I can calculate this for an individual sample using this rather clumsy expression: length(which(x$sample1 < mean(x$sample1) & x$class=='a')) / length(which(x$sample1 < mean(x$sample1))) I'd normally propagate it across the data frame using apply, but I can't because it depends on more than one column. Any help much appreciated! Cheers, Simon [[alternative HTML version deleted]]
Gerrit Eichner
2012-Dec-04 10:59 UTC
[R] computing marginal values based on multiple columns?
Hello, Simon, see below! On Tue, 4 Dec 2012, Simon wrote:> Hello all, > > I have what feels like a simple problem, but I can't find an simple > answer. Consider this data frame: > >> x <- data.frame(sample1=c(35,176,182,193,124), > sample2=c(198,176,190,23,15), sample3=c(12,154,21,191,156), > class=c('a','a','c','b','c')) > >> x > sample1 sample2 sample3 class > 1 35 198 12 a > 2 176 176 154 a > 3 182 190 21 c > 4 193 23 191 b > 5 124 15 156 c > > Now I wish to know: for each sample, for values < 20% of the sample mean, > what percentage of those are class a? > > I want to end up with a table like: > > sample1 sample2 sample3 > 1 1.0 0 0.5I can't reproduce this result from your description above, but if I understand the latter correctly, maybe the following does what you want: x.wo.class <- subset( x, select = -class) # extract only the sample-columns x.small.and.a <- x.wo.class < 0.2 * colMeans( x.wo.class) & x$class == "a" apply( x.small.and.a, 2, function( xx) mean( x$class[ xx] == "a")) Hth -- Gerrit> I can calculate this for an individual sample using this rather clumsy > expression: > > length(which(x$sample1 < mean(x$sample1) & x$class=='a')) / > length(which(x$sample1 < mean(x$sample1))) > > I'd normally propagate it across the data frame using apply, but I > can't because it depends on more than one column. > > Any help much appreciated! > > Cheers, > > Simon > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >
HI, I am not sure the output you wanted is correct: " sample1 sample2 sample3 1? ? ? 1.0? ? 0? ? 0.5 " because 0.2*colMeans(x[,-4]) sample1 sample2 sample3 #? 28.40?? 24.08?? 21.36 This might help you: apply(x[-4],2,function(y) length(y[y <0.2*mean(y) & x$class=="a"])/length(x[x$class=="a"])) #sample1 sample2 sample3 ? #? 0.0???? 0.0???? 0.5 A.K. ----- Original Message ----- From: Simon <simonzmail at gmail.com> To: r-help at r-project.org Cc: Sent: Tuesday, December 4, 2012 4:49 AM Subject: [R] computing marginal values based on multiple columns? Hello all, I have what feels like a simple problem, but I can't find an simple answer. Consider this data frame:> x <- data.frame(sample1=c(35,176,182,193,124),sample2=c(198,176,190,23,15), sample3=c(12,154,21,191,156), class=c('a','a','c','b','c'))> x? sample1 sample2 sample3 class 1? ? ? 35? ? 198? ? ? 12? ? a 2? ? 176? ? 176? ? 154? ? a 3? ? 182? ? 190? ? ? 21? ? c 4? ? 193? ? ? 23? ? 191? ? b 5? ? 124? ? ? 15? ? 156? ? c Now I wish to know: for each sample, for values < 20% of the sample mean, what percentage of those are class a? I want to end up with a table like: ? sample1 sample2 sample3 1? ? ? 1.0? ? 0? ? 0.5 I can calculate this for an individual sample using this rather clumsy expression: length(which(x$sample1 < mean(x$sample1) & x$class=='a')) / length(which(x$sample1 < mean(x$sample1))) I'd normally propagate it across the data frame using apply, but I can't because it depends on more than one column. Any help much appreciated! Cheers, Simon ??? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.