Seungyeul Yoo
2012-Jun-07 16:07 UTC
[R] select subrows based on a specific column in a matrix
Hi all, I have a matrix with 10000 rows and 10 columns. The last columns contains another identifiers but the values are not uniques so that I want to generate another matrix with rows with unique values in the last column. If I did tmp<-unique(my_mat$col10) this will give me 8560 unique entries so the ideal matrix will be 8560X10 columns now then. I tried sub_mat<-my_mat[tmp,] but it generated weird results with many "NA" values and the order was not changed. The original matrix was ranked from top so I don't want to lose the order too. For the similar problem, I have used "match" function and do some manipulate to identify the index of the first appearance of each value but is there any better and neat way to achieve the same function? Thanks, Seungyeul Yoo Postdoc Fellow, Institute of Genomics and Multiscale Biology Department of Genetics and Genomic Sciences Mount Sinai School of Medicine
Rui Barradas
2012-Jun-07 16:50 UTC
[R] select subrows based on a specific column in a matrix
Hello, You should post a data example, like the posting guide says. If your dataset is large, use something like dput(head(dat, 20)) # paste the output of this in your post. where 'dat' is your dataset. Now, try # make up some data set.seed(12) dat <- matrix(c(sort(rnorm(10)), sample(letters[1:4], 10, TRUE)), ncol=2) colnames(dat) <- c("A", "col10") dat # this does it ix <- as.logical(ave(seq_len(nrow(dat)), dat[, "col10"], FUN=function(x) ifelse(x == min(x), TRUE, FALSE))) dat[ix, ] # rows 1, 2, 4, 6 Hope this helps, Rui Barradas Em 07-06-2012 17:07, Seungyeul Yoo escreveu:> Hi all, > > I have a matrix with 10000 rows and 10 columns. The last columns contains another identifiers but the values are not uniques so that I want to generate another matrix with rows with unique values in the last column. > > If I did > > tmp<-unique(my_mat$col10) > > this will give me 8560 unique entries so the ideal matrix will be 8560X10 columns now then. > > I tried > > sub_mat<-my_mat[tmp,] > > but it generated weird results with many "NA" values and the order was not changed. The original matrix was ranked from top so I don't want to lose the order too. > > For the similar problem, I have used "match" function and do some manipulate to identify the index of the first appearance of each value but is there any better and neat way to achieve the same function? > > Thanks, > > Seungyeul Yoo > > Postdoc Fellow, > Institute of Genomics and Multiscale Biology > Department of Genetics and Genomic Sciences > Mount Sinai School of Medicine > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >