Rama Ramakrishnan
2009-Oct-07 19:52 UTC
[R] Need a vectorized way to avoid two nested FOR loops
Hi Friends, I have a data frame d. Let vars be the column indices for a subset of the columns in d (e.g., vars <- c(1,3,4,8)) For each row r in d, I want to collect all the other rows in d that match the values in row r for just the columns in vars. The naive way to do this is to have a for loop stepping through each row in d, and within the loop have another loop going through all the rows again, checking for equality. This is quadratic in the number of rows and takes way too long. Is there a better, "vectorized" way to do this? Thanks in advance! Rama Ramakrishnan
Here is one way of doing it:> n <- 20 > set.seed(2) > # create test dataframe > x <- as.data.frame(matrix(sample(1:2,n*6, TRUE), nrow=n)) > xV1 V2 V3 V4 V5 V6 1 1 2 2 2 1 1 2 2 1 1 2 2 1 3 2 2 1 2 1 2 4 1 1 1 1 1 2 5 2 1 2 2 1 1 6 2 1 2 1 2 2 7 1 1 2 1 2 2 8 2 1 1 1 1 1 9 1 2 2 1 2 1 10 2 1 2 1 1 1 11 2 1 1 1 2 1 12 1 1 1 1 1 2 13 2 2 2 1 1 1 14 1 2 2 1 2 2 15 1 2 1 1 1 2 16 2 2 2 2 1 2 17 2 2 2 1 1 2 18 1 1 2 2 1 1 19 1 2 2 1 1 2 20 1 1 2 2 1 2> x.col <- c(1,3,5) > # find matching columns by testing the first against all others > x.match <- x[, x.col[1]] == x[, x.col[-1]] > # print them out > x[apply(x.match, 1, all),]V1 V2 V3 V4 V5 V6 4 1 1 1 1 1 2 6 2 1 2 1 2 2 12 1 1 1 1 1 2 15 1 2 1 1 1 2> > >On Wed, Oct 7, 2009 at 3:52 PM, Rama Ramakrishnan <rama at alum.mit.edu> wrote:> > Hi Friends, > > I have a data frame d. Let vars be the column indices for a subset of the > columns in d (e.g., vars <- c(1,3,4,8)) > > For each row r in d, I want to collect all the other rows in d that match > the values in row r for just the columns in vars. > > The naive way to do this is to have a for loop stepping through each row in > d, and within the loop have another loop going through all the rows again, > checking for equality. This is quadratic in the number of rows and takes way > too long. Is there a better, "vectorized" way to do this? > > Thanks in advance! > > Rama Ramakrishnan > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
I answered the wrong question. Here is the code to find all the matches for each row: n <- 20 set.seed(2) # create test dataframe x <- as.data.frame(matrix(sample(1:2,n*6, TRUE), nrow=n)) x x.col <- c(1,3,5) # match against all the other rows x.match1 <- apply(x[, x.col], 1, function(a){ .mat <- which(apply(x[, x.col], 1, function(z){ all(a == z) })) }) # remove matches to itself x.match2 <- lapply(seq(length(x.match1)), function(z){ x.match1[[z]][!(x.match1[[z]] %in% z)] }) # x.match2 contains which rows indices match On Wed, Oct 7, 2009 at 3:52 PM, Rama Ramakrishnan <rama at alum.mit.edu> wrote:> > Hi Friends, > > I have a data frame d. Let vars be the column indices for a subset of the > columns in d (e.g., vars <- c(1,3,4,8)) > > For each row r in d, I want to collect all the other rows in d that match > the values in row r for just the columns in vars. > > The naive way to do this is to have a for loop stepping through each row in > d, and within the loop have another loop going through all the rows again, > checking for equality. This is quadratic in the number of rows and takes way > too long. Is there a better, "vectorized" way to do this? > > Thanks in advance! > > Rama Ramakrishnan > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?