Neha Aggarwal
2018-Apr-21 07:27 UTC
[R] Check if row of dataframe is superset of any row in another dataframe.
Hi, I am looking for a way in which I can check if rows in 1 dataframe are present in another data frame in a unique way. A row in dataframe should be super set of any row in another dataframe. I can write a for loop for it, however, that will be inefficient. So, I am looking for an efficient way to do this in R. I have explained it with an example below: I want to check if a row in dataframe B is: 1) either equal to any row in A or 2) has 1's atleast for the columns where (any) row in B has 1's. My output/result is a vector of 1(TRUE) or 0(FALSE) of length equal to number of rows in B. The first row in B is exactly present in A so result has first bit as 1. Second row in B has matches with 2nd row of dataframe A (it has an extra 1 in 3rd column,which is ok);so second bit of result is also 1. Similarly, the 3rd row of B, can match to any row in A, so 3rd bit in result is also a 1. Next, 4th row in B has 1 for a column where no row in A has 1, so last bit in result is 0. Dataframe A 1 0 1 0 1 1 0 0 0 1 1 0 Dataframe B 1 0 1 0 1 1 1 0 1 1 1 1 0 0 0 1 Result<- 1 1 1 0 Thanks for the help, Neha [[alternative HTML version deleted]]
Eric Berger
2018-Apr-21 08:32 UTC
[R] Check if row of dataframe is superset of any row in another dataframe.
Hi Neha, How about this? A <- as.matrix(A) B <- as.matrix(B) C <- A %*% t(B) SA <- apply(A, MAR=1, sum ) SB <- apply(B, MAR=1, sum ) vapply( 1:nrow(B), function(j) { sum( C[,j]==SA & SA <= SB[j] ) > 0 }, 1 ) HTH, Eric On Sat, Apr 21, 2018 at 10:27 AM, Neha Aggarwal <aggarwalneha2000 at gmail.com> wrote:> Hi, > > I am looking for a way in which I can check if rows in 1 dataframe are > present in another data frame in a unique way. A row in dataframe should be > super set of any row in another dataframe. > > I can write a for loop for it, however, that will be inefficient. So, I am > looking for an efficient way to do this in R. > > I have explained it with an example below: > > I want to check if a row in dataframe B is: > 1) either equal to any row in A or > 2) has 1's atleast for the columns where (any) row in B has 1's. > > My output/result is a vector of 1(TRUE) or 0(FALSE) of length equal to > number of rows in B. The first row in B is exactly present in A so result > has first bit as 1. Second row in B has matches with 2nd row of dataframe A > (it has an extra 1 in 3rd column,which is ok);so second bit of result is > also 1. Similarly, the 3rd row of B, can match to any row in A, so 3rd bit > in result is also a 1. Next, 4th row in B has 1 for a column where no row > in A has 1, so last bit in result is 0. > > Dataframe A > 1 0 1 0 > 1 1 0 0 > 0 1 1 0 > > Dataframe B > 1 0 1 0 > 1 1 1 0 > 1 1 1 1 > 0 0 0 1 > > Result<- 1 1 1 0 > > Thanks for the help, > Neha > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Jim Lemon
2018-Apr-21 08:57 UTC
[R] Check if row of dataframe is superset of any row in another dataframe.
Hi Neha, How about this? find_subset<-function(x,y) { yrows<-dim(y)[1] match<-0 for(row in 1:yrows) match<-sum(x&y[row]) >= sum(y[row]) return(match) } apply(B,1,find_subset,A) This is somewhat obscure, as the dataframe B is coerced to a matrix by the apply function. Jim On Sat, Apr 21, 2018 at 5:27 PM, Neha Aggarwal <aggarwalneha2000 at gmail.com> wrote:> Hi, > > I am looking for a way in which I can check if rows in 1 dataframe are > present in another data frame in a unique way. A row in dataframe should be > super set of any row in another dataframe. > > I can write a for loop for it, however, that will be inefficient. So, I am > looking for an efficient way to do this in R. > > I have explained it with an example below: > > I want to check if a row in dataframe B is: > 1) either equal to any row in A or > 2) has 1's atleast for the columns where (any) row in B has 1's. > > My output/result is a vector of 1(TRUE) or 0(FALSE) of length equal to > number of rows in B. The first row in B is exactly present in A so result > has first bit as 1. Second row in B has matches with 2nd row of dataframe A > (it has an extra 1 in 3rd column,which is ok);so second bit of result is > also 1. Similarly, the 3rd row of B, can match to any row in A, so 3rd bit > in result is also a 1. Next, 4th row in B has 1 for a column where no row > in A has 1, so last bit in result is 0. > > Dataframe A > 1 0 1 0 > 1 1 0 0 > 0 1 1 0 > > Dataframe B > 1 0 1 0 > 1 1 1 0 > 1 1 1 1 > 0 0 0 1 > > Result<- 1 1 1 0 > > Thanks for the help, > Neha > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.