Ralph S.
2008-Aug-13 18:00 UTC
[R] subsetting matrix according to columns with character index
Hi, I have a long matrix of the following form which I would like to subset according to the third column: [x y z]: a1 c1 1 a1 c1 2 a2 c1 1 a1 c2 1 a1 c2 2 . . . The first two columns a characters ai and cj. I would like to keep all the rows where there are two entries for z, 1 and 2. That is, I want: a1 c1 1 a1 c1 2 a1 c2 1 a1 c2 2 . . . I try to use something like df[by(df,c(df$x,df$y),sum(z)==3),] but that only gives me one line of data per x y combination. Is there an easy way of coding to keep all rows for a and c combinations where z has entries both 1 and 2? Many thanks, Ralph _________________________________________________________________ LM_WLYIA_whichathlete_us
Henrique Dallazuanna
2008-Aug-13 18:11 UTC
[R] subsetting matrix according to columns with character index
Try this: x V1 V2 V3 1 a1 c1 1 2 a1 c1 2 3 a2 c1 1 4 a1 c2 1 5 a1 c2 2 lis <- split(x, list(x$V1, x$V2), drop = TRUE) do.call(rbind, unname(lis[sapply(lis, function(x)all(1:2 %in% x[,3]))])) On Wed, Aug 13, 2008 at 3:00 PM, Ralph S. <ruffel1 at hotmail.com> wrote:> > Hi, > > I have a long matrix of the following form which I would like to subset according to the third column: > > [x y z]: > > a1 c1 1 > a1 c1 2 > a2 c1 1 > a1 c2 1 > a1 c2 2 > . . . > > > The first two columns a characters ai and cj. > > I would like to keep all the rows where there are two entries for z, 1 and 2. > > That is, I want: > a1 c1 1 > a1 c1 2 > a1 c2 1 > a1 c2 2 > . . . > > I try to use something like df[by(df,c(df$x,df$y),sum(z)==3),] but that only gives me one line of data per x y combination. > > Is there an easy way of coding to keep all rows for a and c combinations where z has entries both 1 and 2? > > Many thanks, > > Ralph > > _________________________________________________________________ > > > LM_WLYIA_whichathlete_us > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Henrique Dallazuanna Curitiba-Paran?-Brasil 25? 25' 40" S 49? 16' 22" O
Ralph S.
2008-Aug-13 18:45 UTC
[R] subsetting matrix according to columns with character index
I tried this - I get an empty set: <0 rows> (or 0-length row.names) I guess this happens because the z variable takes only one value per row?? What works is: DFsub<-DF[DF$z == 1 | DF$z == 2,] but then, I do not eliminate the entries where there is only one entry for z given an a and c combination. Any idea what to do? -Ralph> Date: Wed, 13 Aug 2008 13:05:25 -0500 > From: markleeds@verizon.net > Subject: RE: [R] subsetting matrix according to columns with character index > To: ruffel1@hotmail.com > > it must be a dataframe so, if it was DF, then, assuming i understand > what you want then either of the following should work: > > DFsub<-DF[DF$z == 1 & DF$z == 2,] > > or > > DFsub<-subset(DF, z == 1 & z == 2 ) > > > On Wed, Aug 13, 2008 at 2:00 PM, Ralph S. wrote: > > > Hi, > > > > I have a long matrix of the following form which I would like to > > subset according to the third column: > > > > [x y z]: > > > > a1 c1 1 > > a1 c1 2 > > a2 c1 1 > > a1 c2 1 > > a1 c2 2 > > . . . > > > > > > The first two columns a characters ai and cj. > > > > I would like to keep all the rows where there are two entries for z, 1 > > and 2. > > > > That is, I want: > > a1 c1 1 > > a1 c1 2 > > a1 c2 1 > > a1 c2 2 > > . . . > > > > I try to use something like df[by(df,c(df$x,df$y),sum(z)==3),] but > > that only gives me one line of data per x y combination. > > > > Is there an easy way of coding to keep all rows for a and c > > combinations where z has entries both 1 and 2? > > Many thanks, > > > > Ralph > > > > _________________________________________________________________ > > > > > > LM_WLYIA_whichathlete_us > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code._________________________________________________________________ [[alternative HTML version deleted]]
markleeds at verizon.net
2008-Aug-13 19:06 UTC
[R] subsetting matrix according to columns with character index
i don't think i understood what you were trying to do, atleast based on Henrique's solution which I haven't cut and pasted yet in order to understand. Did Henrique's solution do what you wanted ? On Wed, Aug 13, 2008 at 2:45 PM, Ralph S. wrote: I tried this - I get an empty set: <0 rows> (or 0-length row.names) I guess this happens because the z variable takes only one value per row?? What works is: DFsub<-DF[DF$z == 1 | DF$z == 2,] but then, I do not eliminate the entries where there is only one entry for z given an a and c combination. Any idea what to do? -Ralph> Date: Wed, 13 Aug 2008 13:05:25 -0500 From: markleeds@verizon.net > Subject: RE: [R] subsetting matrix according to columns with character > index To: ruffel1@hotmail.com > it must be a dataframe so, if it was DF, then, assuming i understand > what you want then either of the following should work: > DFsub<-DF[DF$z == 1 & DF$z == 2,] > or > DFsub<-subset(DF, z == 1 & z == 2 ) > > On Wed, Aug 13, 2008 at 2:00 PM, Ralph S. wrote: >> Hi, >> I have a long matrix of the following form which I would like to >> subset according to the third column: >> [x y z]: >> a1 c1 1 a1 c1 2 a2 c1 1 a1 c2 1 a1 c2 2 . . . >> >> The first two columns a characters ai and cj. >> I would like to keep all the rows where there are two entries for z, >> 1 and 2. >> That is, I want: a1 c1 1 a1 c1 2 a1 c2 1 a1 c2 2 . . . >> I try to use something like df[by(df,c(df$x,df$y),sum(z)==3),] but >> that only gives me one line of data per x y combination. >> Is there an easy way of coding to keep all rows for a and c >> combinations where z has entries both 1 and 2? Many thanks, >> Ralph >> _________________________________________________________________ >> >> LM_WLYIA_whichathlete_us >> ______________________________________________ R-help@r-project.org >> mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do >> read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code.___________________________________ Your PC, mobile phone, and online services work together like never before. See how Windows® fits your life <http://clk.atdmt.com/MRT/go/108587394/direct/01/> [[alternative HTML version deleted]]
markleeds at verizon.net
2008-Aug-13 19:23 UTC
[R] subsetting matrix according to columns with character index
sorry ralph. i meant the OR instead of the AND so that was my bad mistake. the subset function should also work with the OR. i think i understand better what you want now also. the approach below for doing what you want assumes that , if there are 2 rows associated with the values in the first 2 columns , then they will be 1 and 2. If they are 1,1 or 2,2, then it won't work. So, henrique's solution could be better and more general. Assume your dataframe is called DF. tempres<-split(DF$x,DF$y) onlytwo<-lapply(tempres, function(.df) if (nrow(.df) == 2) { return(.df) } else { return(NULL) } ) onlytwo<-onlytwo[!sapply(onlytwo,is.null) result<-do.call(rbind,onlytwo) On Wed, Aug 13, 2008 at 2:45 PM, Ralph S. wrote: I tried this - I get an empty set: <0 rows> (or 0-length row.names) I guess this happens because the z variable takes only one value per row?? What works is: DFsub<-DF[DF$z == 1 | DF$z == 2,] but then, I do not eliminate the entries where there is only one entry for z given an a and c combination. Any idea what to do? -Ralph> Date: Wed, 13 Aug 2008 13:05:25 -0500 From: markleeds@verizon.net > Subject: RE: [R] subsetting matrix according to columns with character > index To: ruffel1@hotmail.com > it must be a dataframe so, if it was DF, then, assuming i understand > what you want then either of the following should work: > DFsub<-DF[DF$z == 1 & DF$z == 2,] > or > DFsub<-subset(DF, z == 1 & z == 2 ) > > On Wed, Aug 13, 2008 at 2:00 PM, Ralph S. wrote: >> Hi, >> I have a long matrix of the following form which I would like to >> subset according to the third column: >> [x y z]: >> a1 c1 1 a1 c1 2 a2 c1 1 a1 c2 1 a1 c2 2 . . . >> >> The first two columns a characters ai and cj. >> I would like to keep all the rows where there are two entries for z, >> 1 and 2. >> That is, I want: a1 c1 1 a1 c1 2 a1 c2 1 a1 c2 2 . . . >> I try to use something like df[by(df,c(df$x,df$y),sum(z)==3),] but >> that only gives me one line of data per x y combination. >> Is there an easy way of coding to keep all rows for a and c >> combinations where z has entries both 1 and 2? Many thanks, >> Ralph >> _________________________________________________________________ >> >> LM_WLYIA_whichathlete_us >> ______________________________________________ R-help@r-project.org >> mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do >> read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code.___________________________________ Your PC, mobile phone, and online services work together like never before. See how Windows® fits your life <http://clk.atdmt.com/MRT/go/108587394/direct/01/> [[alternative HTML version deleted]]
markleeds at verizon.net
2008-Aug-13 20:14 UTC
[R] subsetting matrix according to columns with character index
Ralph: I looked at Henrique's solution and he does 2 things which make it better than mine. 1) He splits based off the first two columns where I just split based on the second. So, my split assumes that the "same rows" are next to each other which is an unnecessary assumption. 2) He actually checks to make sure that 1 and 2 are actually in the third column of the resulting dataframes that split returns. I assumed that , if a dataframe was of length 2, then the latter would be true automatically. So, even though mine worked for what you needed, in the spirit of generality and minimal assumptions, it better to use Henrique's solution. Also, make sure you understand it because you can learn a lot from it. ( this is also true of his solutions in general ). On Wed, Aug 13, 2008 at 3:37 PM, Ralph S. wrote: yes this work, very elegant thank you. I didn't get Henriques message in my mailbox immediately for some reason - -Ralph ___________________________________ Date: Wed, 13 Aug 2008 14:23:33 -0500 From: markleeds@verizon.net Subject: RE: [R] subsetting matrix according to columns with character index To: ruffel1@hotmail.com CC: r-help@r-project.org sorry ralph. i meant the OR instead of the AND so that was my bad mistake. the subset function should also work with the OR. i think i understand better what you want now also. the approach below for doing what you want assumes that , if there are 2 rows associated with the values in the first 2 columns , then they will be 1 and 2. If they are 1,1 or 2,2, then it won't work. So, henrique's solution could be better and more general. Assume your dataframe is called DF. tempres<-split(DF$x,DF$y) onlytwo<-lapply(tempres, function(.df) if (nrow(.df) == 2) { return(.df) } else { return(NULL) } ) onlytwo<-onlytwo[!sapply(onlytwo,is.null) result<-do.call(rbind,onlytwo) On Wed, Aug 13, 2008 at 2:45 PM, Ralph S. wrote: I tried this - I get an empty set: <0 rows> (or 0-length row.names) I guess this happens because the z variable takes only one value per row?? What works is: DFsub<-DF[DF$z == 1 | DF$z == 2,] but then, I do not eliminate the entries where there is only one entry for z given an a and c combination. Any idea what to do? -Ralph> Date: Wed, 13 Aug 2008 13:05:25 -0500 From: markleeds@verizon.net > Subject: RE: [R] subsetting matrix according to columns with character > index To: ruffel1@hotmail.com > it must be a dataframe so, if it was DF, then, assuming i understand > what you want then either of the following should work: > DFsub<-DF[DF$z == 1 & DF$z == 2,] > or > DFsub<-subset(DF, z == 1 & z == 2 ) > > On Wed, Aug 13, 2008 at 2:00 PM, Ralph S. wrote: >> Hi, >> I have a long matrix of the following form which I would like to >> subset according to the third column: >> [x y z]: >> a1 c1 1 a1 c1 2 a2 c1 1 a1 c2 1 a1 c2 2 . . . >> >> The first two columns a characters ai and cj. >> I would like to keep all the rows where there are two entries for z, >> 1 and 2. >> That is, I want: a1 c1 1 a1 c1 2 a1 c2 1 a1 c2 2 . . . >> I try to use something like df[by(df,c(df$x,df$y),sum(z)==3),] but >> that only gives me one line of data per x y combination. >> Is there an easy way of coding to keep all rows for a and c >> combinations where z has entries both 1 and 2? Many thanks, >> Ralph >> _________________________________________________________________ >> >> LM_WLYIA_whichathlete_us >> ______________________________________________ R-help@r-project.org >> mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do >> read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code.___________________________________ Your PC, mobile phone, and online services work together like never before. See how Windows® fits your life <http://clk.atdmt.com/MRT/go/108587394/direct/01/> ___________________________________ Get more from your digital life. Find out how. <http://www.windowslive.com/default.html?ocid=TXT_TAGLM_WL_Home2_082008> [[alternative HTML version deleted]]