Hello list, I have a data frame M like: BAC chr pos s1 s2 RP11-80G24 1 77465510 -1 0 RP11-198H14 1 78696291 -1 0 RP11-267M21 1 79681704 -1 0 RP11-89A19 1 80950808 -1 0 RP11-6B16 1 82255496 -1 0 RP11-210E16 1 228801510 0 -1 RP11-155C15 1 230957584 0 -1 RP11-210F8 1 237932418 0 -1 RP11-263L17 2 65724492 0 1 RP11-340F16 2 65879898 0 1 RP11-68A1 2 67718674 0 0 RP11-474G23 2 68318411 0 0 RP11-218N6 2 68454651 0 0 CTD-2003M22 2 68567494 0 0 ..... how to remove those rows which have 0 for both of columns s1,s2? sth like M[!M$21=0&!M$s2=0]? Moreover, I want to get a list which could find a subset of rows which have the same pattern of data. For example, the first 8 rows in M can be clustered into 2 groups (represented below in 2 rows) and shown as: chr Start End # of rows Pattern 1 77465510 82255496 5 (-1 0) 1 228801510 237932418 3 (0 -1) Can anybody help me out of this? Thank you very much and happy holiday! Best, Allen [[alternative HTML version deleted]]
On Dec 23, 2007 4:28 PM, affy snp <affysnp at gmail.com> wrote:> Hello list, > > I have a data frame M like: > > BAC chr pos s1 s2 > RP11-80G24 1 77465510 -1 0 > RP11-198H14 1 78696291 -1 0 > RP11-267M21 1 79681704 -1 0 > RP11-89A19 1 80950808 -1 0 > RP11-6B16 1 82255496 -1 0 > RP11-210E16 1 228801510 0 -1 > RP11-155C15 1 230957584 0 -1 > RP11-210F8 1 237932418 0 -1 > RP11-263L17 2 65724492 0 1 > RP11-340F16 2 65879898 0 1 > RP11-68A1 2 67718674 0 0 > RP11-474G23 2 68318411 0 0 > RP11-218N6 2 68454651 0 0 > CTD-2003M22 2 68567494 0 0 > ..... > > how to remove those rows which have 0 for both of columns s1,s2? > sth like M[!M$21=0&!M$s2=0]? > > Moreover, I want to get a list which could find a subset of rows which have > the same pattern of data. For example, the first 8 rows in M can be > clustered > into 2 groups (represented below in 2 rows) and shown as: > > chr Start End # of rows Pattern > 1 77465510 82255496 5 (-1 0) > 1 228801510 237932418 3 (0 -1) >Using: M <- structure(list(BAC = structure(c(13L, 3L, 8L, 14L, 12L, 4L, 2L, 5L, 7L, 9L, 11L, 10L, 6L, 1L), .Label = c("CTD-2003M22", "RP11-155C15", "RP11-198H14", "RP11-210E16", "RP11-210F8", "RP11-218N6", "RP11-263L17", "RP11-267M21", "RP11-340F16", "RP11-474G23", "RP11-68A1", "RP11-6B16", "RP11-80G24", "RP11-89A19"), class = "factor"), chr = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), pos = c(77465510L, 78696291L, 79681704L, 80950808L, 82255496L, 228801510L, 230957584L, 237932418L, 65724492L, 65879898L, 67718674L, 68318411L, 68454651L, 68567494L), s1 = c(-1L, -1L, -1L, -1L, -1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), s2 = c(0L, 0L, 0L, 0L, 0L, -1L, -1L, -1L, 1L, 1L, 0L, 0L, 0L, 0L)), .Names = c("BAC", "chr", "pos", "s1", "s2" ), class = "data.frame", row.names = c(NA, -14L)) # try this subset(M, s1 | s2) # as 0 regarded as FALSE and others as TRUE # and for second question: f <- function(x) with(x, c(start = pos[1], end = tail(pos, 1), chr = chr[1], nrow = NROW(x), s1 = s1[1], s2 = s2[1]) ) do.call(rbind, by(M, M[4:5], f))
At 4:28 PM -0500 12/23/07, affy snp wrote:>Hello list, > >I have a data frame M like: > >BAC chr pos s1 s2 >RP11-80G24 1 77465510 -1 0 >RP11-198H14 1 78696291 -1 0 >RP11-267M21 1 79681704 -1 0 >RP11-89A19 1 80950808 -1 0 >RP11-6B16 1 82255496 -1 0 >RP11-210E16 1 228801510 0 -1 >RP11-155C15 1 230957584 0 -1 >RP11-210F8 1 237932418 0 -1 >RP11-263L17 2 65724492 0 1 >RP11-340F16 2 65879898 0 1 >RP11-68A1 2 67718674 0 0 >RP11-474G23 2 68318411 0 0 >RP11-218N6 2 68454651 0 0 >CTD-2003M22 2 68567494 0 0 >..... > >how to remove those rows which have 0 for both of columns s1,s2? >sth like M[!M$21=0&!M$s2=0]?M[ !(M$s1==0 & M$s2==0) , ]> >Moreover, I want to get a list which could find a subset of rows which have >the same pattern of data. For example, the first 8 rows in M can be >clustered >into 2 groups (represented below in 2 rows) and shown as: > >chr Start End # of rows Pattern >1 77465510 82255496 5 (-1 0) >1 228801510 237932418 3 (0 -1) > >Can anybody help me out of this? Thank you very much and happy holiday!pat <- paste(M$s1,M$s2) ## to find the first subset: M[ pat == pat[1] ,] ## to find the second subset: M[ pat == pat[2], ] ## and so on, for however many unique patterns there are. ## also try table(pat) Of course, your example does more than just "find" the subsets. It also does some summarizing of them. That's a little more complicated. I might start with the summarize() function in the Hmisc package, but there are potentially many ways to also do the summarizing. -Don>Best, > Allen > > [[alternative HTML version deleted]] > >______________________________________________ >R-help at r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.-- --------------------------------- Don MacQueen Lawrence Livermore National Laboratory Livermore, CA, USA 925-423-1062 macq at llnl.gov
To answer your firs question try M[-which( M$s1 == 0 & M$s2 == 0),] For the second question, you must start with the more precise definition of the grouping criterion. --- affy snp <affysnp at gmail.com> wrote:> Hello list, > > I have a data frame M like: > > BAC chr pos s1 s2 > RP11-80G24 1 77465510 -1 0 > RP11-198H14 1 78696291 -1 0 > RP11-267M21 1 79681704 -1 0 > RP11-89A19 1 80950808 -1 0 > RP11-6B16 1 82255496 -1 0 > RP11-210E16 1 228801510 0 -1 > RP11-155C15 1 230957584 0 -1 > RP11-210F8 1 237932418 0 -1 > RP11-263L17 2 65724492 0 1 > RP11-340F16 2 65879898 0 1 > RP11-68A1 2 67718674 0 0 > RP11-474G23 2 68318411 0 0 > RP11-218N6 2 68454651 0 0 > CTD-2003M22 2 68567494 0 0 > ..... > > how to remove those rows which have 0 for both of > columns s1,s2? > sth like M[!M$21=0&!M$s2=0]? > > Moreover, I want to get a list which could find a > subset of rows which have > the same pattern of data. For example, the first 8 > rows in M can be > clustered > into 2 groups (represented below in 2 rows) and > shown as: > > chr Start End # of > rows Pattern > 1 77465510 82255496 5 > (-1 0) > 1 228801510 237932418 3 > (0 -1) > > Can anybody help me out of this? Thank you very much > and happy holiday! > > Best, > Allen > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, > reproducible code. >