Hi Group, I have a data matrix sm (dput code given below). I want to create a data matrix with rows with same variable that have higher mean.> smGene GSM529305 GSM529306 GSM529307 GSM529308 1 A1BG 6.57 6.72 6.83 6.69 2 A1CF 2.91 2.80 3.08 3.00 3 A2LD1 5.82 7.01 6.62 6.87 4 A2M 9.21 9.35 9.32 9.19 5 A2M 2.94 2.50 3.16 2.76 6 A4GALT 6.86 5.75 6.06 7.04 7 A4GNT 3.97 3.56 4.22 3.88 8 AAA1 3.39 2.90 3.16 3.23 9 AAAS 8.26 8.63 8.40 8.70 10 AAAS 6.82 7.15 7.33 6.51 For example in rows 4 and 5 have same variable Gene A2M. I want to select only row that has higher mean. I wrote the following code that gives me duplicate rows with higher mean but I cannot properly write the result. Could someone help. Thanks ugns <- unique(sm$Gene) exwidh = c() for(i in 1:length(ugns)){ k = ugns[i] exwidh[i] <- sm[names(sort(rowMeans(sm[which(sm[,1]==k),2:ncol(sm)]),decreasing=TRUE)[1]),] } structure(list(Gene = c("A1BG", "A1CF", "A2LD1", "A2M", "A2M", "A4GALT", "A4GNT", "AAA1", "AAAS", "AAAS"), GSM529305 = c(6.57, 2.91, 5.82, 9.21, 2.94, 6.86, 3.97, 3.39, 8.26, 6.82), GSM529306 = c(6.72, 2.8, 7.01, 9.35, 2.5, 5.75, 3.56, 2.9, 8.63, 7.15), GSM529307 = c(6.83, 3.08, 6.62, 9.32, 3.16, 6.06, 4.22, 3.16, 8.4, 7.33), GSM529308 = c(6.69, 3, 6.87, 9.19, 2.76, 7.04, 3.88, 3.23, 8.7, 6.51)), .Names = c("Gene", "GSM529305", "GSM529306", "GSM529307", "GSM529308"), row.names = c(NA, 10L), class = "data.frame")
library(data.table) setDT(dt) op = function(s){ mean0 = apply(s, 1, mean) ret = s[which.max(mean0)] ret$mean = mean0 ret } max_row = dt[, op(.SD), by = "Gene"] Thanks, Jeremiah On Thu, Aug 18, 2016 at 2:33 PM, Adrian Johnson <oriolebaltimore at gmail.com> wrote:> Hi Group, > I have a data matrix sm (dput code given below). > > I want to create a data matrix with rows with same variable that have > higher mean. > > > sm > Gene GSM529305 GSM529306 GSM529307 GSM529308 > 1 A1BG 6.57 6.72 6.83 6.69 > 2 A1CF 2.91 2.80 3.08 3.00 > 3 A2LD1 5.82 7.01 6.62 6.87 > 4 A2M 9.21 9.35 9.32 9.19 > 5 A2M 2.94 2.50 3.16 2.76 > 6 A4GALT 6.86 5.75 6.06 7.04 > 7 A4GNT 3.97 3.56 4.22 3.88 > 8 AAA1 3.39 2.90 3.16 3.23 > 9 AAAS 8.26 8.63 8.40 8.70 > 10 AAAS 6.82 7.15 7.33 6.51 > > For example in rows 4 and 5 have same variable Gene A2M. I want to > select only row that has higher mean. I wrote the following code that > gives me duplicate rows with higher mean but I cannot properly write > the result. Could someone help. Thanks > > ugns <- unique(sm$Gene) > > exwidh = c() > > for(i in 1:length(ugns)){ > k = ugns[i] > exwidh[i] <- sm[names(sort(rowMeans(sm[which(sm[,1]==k),2:ncol(sm)]), > decreasing=TRUE)[1]),] > } > > > > > > structure(list(Gene = c("A1BG", "A1CF", "A2LD1", "A2M", "A2M", > "A4GALT", "A4GNT", "AAA1", "AAAS", "AAAS"), GSM529305 = c(6.57, > 2.91, 5.82, 9.21, 2.94, 6.86, 3.97, 3.39, 8.26, 6.82), GSM529306 = c(6.72, > 2.8, 7.01, 9.35, 2.5, 5.75, 3.56, 2.9, 8.63, 7.15), GSM529307 = c(6.83, > 3.08, 6.62, 9.32, 3.16, 6.06, 4.22, 3.16, 8.4, 7.33), GSM529308 = c(6.69, > 3, 6.87, 9.19, 2.76, 7.04, 3.88, 3.23, 8.7, 6.51)), .Names = c("Gene", > "GSM529305", "GSM529306", "GSM529307", "GSM529308"), row.names = c(NA, > 10L), class = "data.frame") > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Hi Adrian, Try this: sm$rowmeans<-rowMeans(sm[,2:length(sm)]) sm<-sm[order(sm$Gene,sm$rowmeans,decreasing=TRUE),] sm[-which(duplicated(sm$Gene)),] Jim On Fri, Aug 19, 2016 at 7:33 AM, Adrian Johnson <oriolebaltimore at gmail.com> wrote:> Hi Group, > I have a data matrix sm (dput code given below). > > I want to create a data matrix with rows with same variable that have > higher mean. > >> sm > Gene GSM529305 GSM529306 GSM529307 GSM529308 > 1 A1BG 6.57 6.72 6.83 6.69 > 2 A1CF 2.91 2.80 3.08 3.00 > 3 A2LD1 5.82 7.01 6.62 6.87 > 4 A2M 9.21 9.35 9.32 9.19 > 5 A2M 2.94 2.50 3.16 2.76 > 6 A4GALT 6.86 5.75 6.06 7.04 > 7 A4GNT 3.97 3.56 4.22 3.88 > 8 AAA1 3.39 2.90 3.16 3.23 > 9 AAAS 8.26 8.63 8.40 8.70 > 10 AAAS 6.82 7.15 7.33 6.51 > > For example in rows 4 and 5 have same variable Gene A2M. I want to > select only row that has higher mean. I wrote the following code that > gives me duplicate rows with higher mean but I cannot properly write > the result. Could someone help. Thanks > > ugns <- unique(sm$Gene) > > exwidh = c() > > for(i in 1:length(ugns)){ > k = ugns[i] > exwidh[i] <- sm[names(sort(rowMeans(sm[which(sm[,1]==k),2:ncol(sm)]),decreasing=TRUE)[1]),] > } > > > > > > structure(list(Gene = c("A1BG", "A1CF", "A2LD1", "A2M", "A2M", > "A4GALT", "A4GNT", "AAA1", "AAAS", "AAAS"), GSM529305 = c(6.57, > 2.91, 5.82, 9.21, 2.94, 6.86, 3.97, 3.39, 8.26, 6.82), GSM529306 = c(6.72, > 2.8, 7.01, 9.35, 2.5, 5.75, 3.56, 2.9, 8.63, 7.15), GSM529307 = c(6.83, > 3.08, 6.62, 9.32, 3.16, 6.06, 4.22, 3.16, 8.4, 7.33), GSM529308 = c(6.69, > 3, 6.87, 9.19, 2.76, 7.04, 3.88, 3.23, 8.7, 6.51)), .Names = c("Gene", > "GSM529305", "GSM529306", "GSM529307", "GSM529308"), row.names = c(NA, > 10L), class = "data.frame") > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
oh I forgot I renamed sm. dt = sm library(data.table) setDT(dt) op = function(s){ mean0 = apply(s, 1, mean) ret = s[which.max(mean0)] ret$mean = mean0 ret } max_row = dt[, op(.SD), by = "Gene"] Thanks, Jeremiah On Thu, Aug 18, 2016 at 3:21 PM, jeremiah rounds <roundsjeremiah at gmail.com> wrote:> library(data.table) > setDT(dt) > op = function(s){ > mean0 = apply(s, 1, mean) > ret = s[which.max(mean0)] > ret$mean = mean0 > ret > } > max_row = dt[, op(.SD), by = "Gene"] > > Thanks, > Jeremiah > > On Thu, Aug 18, 2016 at 2:33 PM, Adrian Johnson <oriolebaltimore at gmail.com > > wrote: > >> Hi Group, >> I have a data matrix sm (dput code given below). >> >> I want to create a data matrix with rows with same variable that have >> higher mean. >> >> > sm >> Gene GSM529305 GSM529306 GSM529307 GSM529308 >> 1 A1BG 6.57 6.72 6.83 6.69 >> 2 A1CF 2.91 2.80 3.08 3.00 >> 3 A2LD1 5.82 7.01 6.62 6.87 >> 4 A2M 9.21 9.35 9.32 9.19 >> 5 A2M 2.94 2.50 3.16 2.76 >> 6 A4GALT 6.86 5.75 6.06 7.04 >> 7 A4GNT 3.97 3.56 4.22 3.88 >> 8 AAA1 3.39 2.90 3.16 3.23 >> 9 AAAS 8.26 8.63 8.40 8.70 >> 10 AAAS 6.82 7.15 7.33 6.51 >> >> For example in rows 4 and 5 have same variable Gene A2M. I want to >> select only row that has higher mean. I wrote the following code that >> gives me duplicate rows with higher mean but I cannot properly write >> the result. Could someone help. Thanks >> >> ugns <- unique(sm$Gene) >> >> exwidh = c() >> >> for(i in 1:length(ugns)){ >> k = ugns[i] >> exwidh[i] <- sm[names(sort(rowMeans(sm[which(sm[,1]==k),2:ncol(sm)]),decr >> easing=TRUE)[1]),] >> } >> >> >> >> >> >> structure(list(Gene = c("A1BG", "A1CF", "A2LD1", "A2M", "A2M", >> "A4GALT", "A4GNT", "AAA1", "AAAS", "AAAS"), GSM529305 = c(6.57, >> 2.91, 5.82, 9.21, 2.94, 6.86, 3.97, 3.39, 8.26, 6.82), GSM529306 = c(6.72, >> 2.8, 7.01, 9.35, 2.5, 5.75, 3.56, 2.9, 8.63, 7.15), GSM529307 = c(6.83, >> 3.08, 6.62, 9.32, 3.16, 6.06, 4.22, 3.16, 8.4, 7.33), GSM529308 = c(6.69, >> 3, 6.87, 9.19, 2.76, 7.04, 3.88, 3.23, 8.7, 6.51)), .Names = c("Gene", >> "GSM529305", "GSM529306", "GSM529307", "GSM529308"), row.names = c(NA, >> 10L), class = "data.frame") >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posti >> ng-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > >[[alternative HTML version deleted]]
Wow. This is much cleaner and smarter than the for loop, cbind thanks a lot . On Thu, Aug 18, 2016 at 6:28 PM, Jim Lemon <drjimlemon at gmail.com> wrote:> Hi Adrian, > Try this: > > sm$rowmeans<-rowMeans(sm[,2:length(sm)]) > sm<-sm[order(sm$Gene,sm$rowmeans,decreasing=TRUE),] > sm[-which(duplicated(sm$Gene)),] > > Jim > > > On Fri, Aug 19, 2016 at 7:33 AM, Adrian Johnson > <oriolebaltimore at gmail.com> wrote: >> Hi Group, >> I have a data matrix sm (dput code given below). >> >> I want to create a data matrix with rows with same variable that have >> higher mean. >> >>> sm >> Gene GSM529305 GSM529306 GSM529307 GSM529308 >> 1 A1BG 6.57 6.72 6.83 6.69 >> 2 A1CF 2.91 2.80 3.08 3.00 >> 3 A2LD1 5.82 7.01 6.62 6.87 >> 4 A2M 9.21 9.35 9.32 9.19 >> 5 A2M 2.94 2.50 3.16 2.76 >> 6 A4GALT 6.86 5.75 6.06 7.04 >> 7 A4GNT 3.97 3.56 4.22 3.88 >> 8 AAA1 3.39 2.90 3.16 3.23 >> 9 AAAS 8.26 8.63 8.40 8.70 >> 10 AAAS 6.82 7.15 7.33 6.51 >> >> For example in rows 4 and 5 have same variable Gene A2M. I want to >> select only row that has higher mean. I wrote the following code that >> gives me duplicate rows with higher mean but I cannot properly write >> the result. Could someone help. Thanks >> >> ugns <- unique(sm$Gene) >> >> exwidh = c() >> >> for(i in 1:length(ugns)){ >> k = ugns[i] >> exwidh[i] <- sm[names(sort(rowMeans(sm[which(sm[,1]==k),2:ncol(sm)]),decreasing=TRUE)[1]),] >> } >> >> >> >> >> >> structure(list(Gene = c("A1BG", "A1CF", "A2LD1", "A2M", "A2M", >> "A4GALT", "A4GNT", "AAA1", "AAAS", "AAAS"), GSM529305 = c(6.57, >> 2.91, 5.82, 9.21, 2.94, 6.86, 3.97, 3.39, 8.26, 6.82), GSM529306 = c(6.72, >> 2.8, 7.01, 9.35, 2.5, 5.75, 3.56, 2.9, 8.63, 7.15), GSM529307 = c(6.83, >> 3.08, 6.62, 9.32, 3.16, 6.06, 4.22, 3.16, 8.4, 7.33), GSM529308 = c(6.69, >> 3, 6.87, 9.19, 2.76, 7.04, 3.88, 3.23, 8.7, 6.51)), .Names = c("Gene", >> "GSM529305", "GSM529306", "GSM529307", "GSM529308"), row.names = c(NA, >> 10L), class = "data.frame") >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code.