thr3ads.net - R help - [R] remove rows based on row mean [Aug 2016]

If this information is useful, please help other people find it:
Share via:

Adrian Johnson

2016-Aug-18 21:33 UTC

[R] remove rows based on row mean

Hi Group,
I have a data matrix sm (dput code given below).

I want to create a data matrix with rows with same variable that have
higher mean.
> sm     Gene GSM529305 GSM529306 GSM529307 GSM529308
1    A1BG      6.57      6.72      6.83      6.69
2    A1CF      2.91      2.80      3.08      3.00
3   A2LD1      5.82      7.01      6.62      6.87
4     A2M      9.21      9.35      9.32      9.19
5     A2M      2.94      2.50      3.16      2.76
6  A4GALT      6.86      5.75      6.06      7.04
7   A4GNT      3.97      3.56      4.22      3.88
8    AAA1      3.39      2.90      3.16      3.23
9    AAAS      8.26      8.63      8.40      8.70
10   AAAS      6.82      7.15      7.33      6.51

For example in rows 4 and 5 have same variable Gene A2M. I want to
select only row that has higher mean. I wrote the following code that
gives me duplicate rows with higher mean but I cannot properly write
the result. Could someone help.  Thanks

ugns <- unique(sm$Gene)

exwidh = c()

for(i in 1:length(ugns)){
k = ugns[i]
exwidh[i] <-
sm[names(sort(rowMeans(sm[which(sm[,1]==k),2:ncol(sm)]),decreasing=TRUE)[1]),]
}





structure(list(Gene = c("A1BG", "A1CF", "A2LD1",
"A2M", "A2M",
"A4GALT", "A4GNT", "AAA1", "AAAS",
"AAAS"), GSM529305 = c(6.57,
2.91, 5.82, 9.21, 2.94, 6.86, 3.97, 3.39, 8.26, 6.82), GSM529306 = c(6.72,
2.8, 7.01, 9.35, 2.5, 5.75, 3.56, 2.9, 8.63, 7.15), GSM529307 = c(6.83,
3.08, 6.62, 9.32, 3.16, 6.06, 4.22, 3.16, 8.4, 7.33), GSM529308 = c(6.69,
3, 6.87, 9.19, 2.76, 7.04, 3.88, 3.23, 8.7, 6.51)), .Names = c("Gene",
"GSM529305", "GSM529306", "GSM529307",
"GSM529308"), row.names = c(NA,
10L), class = "data.frame")

jeremiah rounds

2016-Aug-18 22:21 UTC

head link

[R] remove rows based on row mean

library(data.table)
setDT(dt)
op = function(s){
mean0 = apply(s, 1, mean)
ret = s[which.max(mean0)]
ret$mean = mean0
ret
}
max_row = dt[, op(.SD), by = "Gene"]

Thanks,
Jeremiah

On Thu, Aug 18, 2016 at 2:33 PM, Adrian Johnson <oriolebaltimore at
gmail.com>
wrote:
> Hi Group,
> I have a data matrix sm (dput code given below).
>
> I want to create a data matrix with rows with same variable that have
> higher mean.
>
> > sm
>      Gene GSM529305 GSM529306 GSM529307 GSM529308
> 1    A1BG      6.57      6.72      6.83      6.69
> 2    A1CF      2.91      2.80      3.08      3.00
> 3   A2LD1      5.82      7.01      6.62      6.87
> 4     A2M      9.21      9.35      9.32      9.19
> 5     A2M      2.94      2.50      3.16      2.76
> 6  A4GALT      6.86      5.75      6.06      7.04
> 7   A4GNT      3.97      3.56      4.22      3.88
> 8    AAA1      3.39      2.90      3.16      3.23
> 9    AAAS      8.26      8.63      8.40      8.70
> 10   AAAS      6.82      7.15      7.33      6.51
>
> For example in rows 4 and 5 have same variable Gene A2M. I want to
> select only row that has higher mean. I wrote the following code that
> gives me duplicate rows with higher mean but I cannot properly write
> the result. Could someone help.  Thanks
>
> ugns <- unique(sm$Gene)
>
> exwidh = c()
>
> for(i in 1:length(ugns)){
> k = ugns[i]
> exwidh[i] <- sm[names(sort(rowMeans(sm[which(sm[,1]==k),2:ncol(sm)]),
> decreasing=TRUE)[1]),]
> }
>
>
>
>
>
> structure(list(Gene = c("A1BG", "A1CF",
"A2LD1", "A2M", "A2M",
> "A4GALT", "A4GNT", "AAA1", "AAAS",
"AAAS"), GSM529305 = c(6.57,
> 2.91, 5.82, 9.21, 2.94, 6.86, 3.97, 3.39, 8.26, 6.82), GSM529306 = c(6.72,
> 2.8, 7.01, 9.35, 2.5, 5.75, 3.56, 2.9, 8.63, 7.15), GSM529307 = c(6.83,
> 3.08, 6.62, 9.32, 3.16, 6.06, 4.22, 3.16, 8.4, 7.33), GSM529308 = c(6.69,
> 3, 6.87, 9.19, 2.76, 7.04, 3.88, 3.23, 8.7, 6.51)), .Names =
c("Gene",
> "GSM529305", "GSM529306", "GSM529307",
"GSM529308"), row.names = c(NA,
> 10L), class = "data.frame")
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Jim Lemon

2016-Aug-18 22:28 UTC

head link

[R] remove rows based on row mean

Hi Adrian,
Try this:

sm$rowmeans<-rowMeans(sm[,2:length(sm)])
sm<-sm[order(sm$Gene,sm$rowmeans,decreasing=TRUE),]
sm[-which(duplicated(sm$Gene)),]

Jim


On Fri, Aug 19, 2016 at 7:33 AM, Adrian Johnson
<oriolebaltimore at gmail.com> wrote:> Hi Group,
> I have a data matrix sm (dput code given below).
>
> I want to create a data matrix with rows with same variable that have
> higher mean.
>
>> sm
>      Gene GSM529305 GSM529306 GSM529307 GSM529308
> 1    A1BG      6.57      6.72      6.83      6.69
> 2    A1CF      2.91      2.80      3.08      3.00
> 3   A2LD1      5.82      7.01      6.62      6.87
> 4     A2M      9.21      9.35      9.32      9.19
> 5     A2M      2.94      2.50      3.16      2.76
> 6  A4GALT      6.86      5.75      6.06      7.04
> 7   A4GNT      3.97      3.56      4.22      3.88
> 8    AAA1      3.39      2.90      3.16      3.23
> 9    AAAS      8.26      8.63      8.40      8.70
> 10   AAAS      6.82      7.15      7.33      6.51
>
> For example in rows 4 and 5 have same variable Gene A2M. I want to
> select only row that has higher mean. I wrote the following code that
> gives me duplicate rows with higher mean but I cannot properly write
> the result. Could someone help.  Thanks
>
> ugns <- unique(sm$Gene)
>
> exwidh = c()
>
> for(i in 1:length(ugns)){
> k = ugns[i]
> exwidh[i] <-
sm[names(sort(rowMeans(sm[which(sm[,1]==k),2:ncol(sm)]),decreasing=TRUE)[1]),]
> }
>
>
>
>
>
> structure(list(Gene = c("A1BG", "A1CF",
"A2LD1", "A2M", "A2M",
> "A4GALT", "A4GNT", "AAA1", "AAAS",
"AAAS"), GSM529305 = c(6.57,
> 2.91, 5.82, 9.21, 2.94, 6.86, 3.97, 3.39, 8.26, 6.82), GSM529306 = c(6.72,
> 2.8, 7.01, 9.35, 2.5, 5.75, 3.56, 2.9, 8.63, 7.15), GSM529307 = c(6.83,
> 3.08, 6.62, 9.32, 3.16, 6.06, 4.22, 3.16, 8.4, 7.33), GSM529308 = c(6.69,
> 3, 6.87, 9.19, 2.76, 7.04, 3.88, 3.23, 8.7, 6.51)), .Names =
c("Gene",
> "GSM529305", "GSM529306", "GSM529307",
"GSM529308"), row.names = c(NA,
> 10L), class = "data.frame")
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

jeremiah rounds

2016-Aug-18 22:31 UTC

head link

[R] remove rows based on row mean

oh I forgot I renamed sm.

dt = sm
library(data.table)
setDT(dt)
op = function(s){
mean0 = apply(s, 1, mean)
ret = s[which.max(mean0)]
ret$mean = mean0
ret
}
max_row = dt[, op(.SD), by = "Gene"]


Thanks,
Jeremiah

On Thu, Aug 18, 2016 at 3:21 PM, jeremiah rounds <roundsjeremiah at
gmail.com>
wrote:
> library(data.table)
> setDT(dt)
> op = function(s){
> mean0 = apply(s, 1, mean)
> ret = s[which.max(mean0)]
> ret$mean = mean0
> ret
> }
> max_row = dt[, op(.SD), by = "Gene"]
>
> Thanks,
> Jeremiah
>
> On Thu, Aug 18, 2016 at 2:33 PM, Adrian Johnson <oriolebaltimore at
gmail.com
> > wrote:
>
>> Hi Group,
>> I have a data matrix sm (dput code given below).
>>
>> I want to create a data matrix with rows with same variable that have
>> higher mean.
>>
>> > sm
>>      Gene GSM529305 GSM529306 GSM529307 GSM529308
>> 1    A1BG      6.57      6.72      6.83      6.69
>> 2    A1CF      2.91      2.80      3.08      3.00
>> 3   A2LD1      5.82      7.01      6.62      6.87
>> 4     A2M      9.21      9.35      9.32      9.19
>> 5     A2M      2.94      2.50      3.16      2.76
>> 6  A4GALT      6.86      5.75      6.06      7.04
>> 7   A4GNT      3.97      3.56      4.22      3.88
>> 8    AAA1      3.39      2.90      3.16      3.23
>> 9    AAAS      8.26      8.63      8.40      8.70
>> 10   AAAS      6.82      7.15      7.33      6.51
>>
>> For example in rows 4 and 5 have same variable Gene A2M. I want to
>> select only row that has higher mean. I wrote the following code that
>> gives me duplicate rows with higher mean but I cannot properly write
>> the result. Could someone help.  Thanks
>>
>> ugns <- unique(sm$Gene)
>>
>> exwidh = c()
>>
>> for(i in 1:length(ugns)){
>> k = ugns[i]
>> exwidh[i] <-
sm[names(sort(rowMeans(sm[which(sm[,1]==k),2:ncol(sm)]),decr
>> easing=TRUE)[1]),]
>> }
>>
>>
>>
>>
>>
>> structure(list(Gene = c("A1BG", "A1CF",
"A2LD1", "A2M", "A2M",
>> "A4GALT", "A4GNT", "AAA1",
"AAAS", "AAAS"), GSM529305 = c(6.57,
>> 2.91, 5.82, 9.21, 2.94, 6.86, 3.97, 3.39, 8.26, 6.82), GSM529306 =
c(6.72,
>> 2.8, 7.01, 9.35, 2.5, 5.75, 3.56, 2.9, 8.63, 7.15), GSM529307 = c(6.83,
>> 3.08, 6.62, 9.32, 3.16, 6.06, 4.22, 3.16, 8.4, 7.33), GSM529308 =
c(6.69,
>> 3, 6.87, 9.19, 2.76, 7.04, 3.88, 3.23, 8.7, 6.51)), .Names =
c("Gene",
>> "GSM529305", "GSM529306", "GSM529307",
"GSM529308"), row.names = c(NA,
>> 10L), class = "data.frame")
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posti
>> ng-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
	[[alternative HTML version deleted]]

Adrian Johnson

2016-Aug-18 22:33 UTC

head link

[R] remove rows based on row mean

Wow. This is much cleaner and smarter than the for loop, cbind
thanks a lot .

On Thu, Aug 18, 2016 at 6:28 PM, Jim Lemon <drjimlemon at gmail.com>
wrote:> Hi Adrian,
> Try this:
>
> sm$rowmeans<-rowMeans(sm[,2:length(sm)])
> sm<-sm[order(sm$Gene,sm$rowmeans,decreasing=TRUE),]
> sm[-which(duplicated(sm$Gene)),]
>
> Jim
>
>
> On Fri, Aug 19, 2016 at 7:33 AM, Adrian Johnson
> <oriolebaltimore at gmail.com> wrote:
>> Hi Group,
>> I have a data matrix sm (dput code given below).
>>
>> I want to create a data matrix with rows with same variable that have
>> higher mean.
>>
>>> sm
>>      Gene GSM529305 GSM529306 GSM529307 GSM529308
>> 1    A1BG      6.57      6.72      6.83      6.69
>> 2    A1CF      2.91      2.80      3.08      3.00
>> 3   A2LD1      5.82      7.01      6.62      6.87
>> 4     A2M      9.21      9.35      9.32      9.19
>> 5     A2M      2.94      2.50      3.16      2.76
>> 6  A4GALT      6.86      5.75      6.06      7.04
>> 7   A4GNT      3.97      3.56      4.22      3.88
>> 8    AAA1      3.39      2.90      3.16      3.23
>> 9    AAAS      8.26      8.63      8.40      8.70
>> 10   AAAS      6.82      7.15      7.33      6.51
>>
>> For example in rows 4 and 5 have same variable Gene A2M. I want to
>> select only row that has higher mean. I wrote the following code that
>> gives me duplicate rows with higher mean but I cannot properly write
>> the result. Could someone help.  Thanks
>>
>> ugns <- unique(sm$Gene)
>>
>> exwidh = c()
>>
>> for(i in 1:length(ugns)){
>> k = ugns[i]
>> exwidh[i] <-
sm[names(sort(rowMeans(sm[which(sm[,1]==k),2:ncol(sm)]),decreasing=TRUE)[1]),]
>> }
>>
>>
>>
>>
>>
>> structure(list(Gene = c("A1BG", "A1CF",
"A2LD1", "A2M", "A2M",
>> "A4GALT", "A4GNT", "AAA1",
"AAAS", "AAAS"), GSM529305 = c(6.57,
>> 2.91, 5.82, 9.21, 2.94, 6.86, 3.97, 3.39, 8.26, 6.82), GSM529306 =
c(6.72,
>> 2.8, 7.01, 9.35, 2.5, 5.75, 3.56, 2.9, 8.63, 7.15), GSM529307 = c(6.83,
>> 3.08, 6.62, 9.32, 3.16, 6.06, 4.22, 3.16, 8.4, 7.33), GSM529308 =
c(6.69,
>> 3, 6.87, 9.19, 2.76, 7.04, 3.88, 3.23, 8.7, 6.51)), .Names =
c("Gene",
>> "GSM529305", "GSM529306", "GSM529307",
"GSM529308"), row.names = c(NA,
>> 10L), class = "data.frame")
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

R help - Aug 2016 - remove rows based on row mean

[R] remove rows based on row mean

[R] remove rows based on row mean

[R] remove rows based on row mean

[R] remove rows based on row mean

[R] remove rows based on row mean