thr3ads.net - R help - [R] Need a vectorized way to avoid two nested FOR loops [Oct 2009]

If this information is useful, please help other people find it:
Share via:

Rama Ramakrishnan

2009-Oct-07 19:52 UTC

[R] Need a vectorized way to avoid two nested FOR loops

Hi Friends,

I have a data frame d. Let vars be the column indices for a subset of  
the columns in d (e.g., vars <- c(1,3,4,8))

For each row r in d, I want to collect all the other rows in d that  
match the values in row r for just the columns in vars.

The naive way to do this is to have a for loop stepping through each  
row in d, and within the loop have another loop going through all the  
rows again, checking for equality. This is quadratic in the number of  
rows and takes way too long. Is there a better, "vectorized" way to do
this?

Thanks in advance!

Rama Ramakrishnan

jim holtman

2009-Oct-08 12:04 UTC

head link

[R] Need a vectorized way to avoid two nested FOR loops

Here is one way of doing it:
> n <- 20
> set.seed(2)
> # create test dataframe
> x <- as.data.frame(matrix(sample(1:2,n*6, TRUE), nrow=n))
> x   V1 V2 V3 V4 V5 V6
1   1  2  2  2  1  1
2   2  1  1  2  2  1
3   2  2  1  2  1  2
4   1  1  1  1  1  2
5   2  1  2  2  1  1
6   2  1  2  1  2  2
7   1  1  2  1  2  2
8   2  1  1  1  1  1
9   1  2  2  1  2  1
10  2  1  2  1  1  1
11  2  1  1  1  2  1
12  1  1  1  1  1  2
13  2  2  2  1  1  1
14  1  2  2  1  2  2
15  1  2  1  1  1  2
16  2  2  2  2  1  2
17  2  2  2  1  1  2
18  1  1  2  2  1  1
19  1  2  2  1  1  2
20  1  1  2  2  1  2> x.col <- c(1,3,5)
> # find matching columns by testing the first against all others
> x.match <- x[, x.col[1]] == x[, x.col[-1]]
> # print them out
> x[apply(x.match, 1, all),]   V1 V2 V3 V4 V5 V6
4   1  1  1  1  1  2
6   2  1  2  1  2  2
12  1  1  1  1  1  2
15  1  2  1  1  1  2>
>
>

On Wed, Oct 7, 2009 at 3:52 PM, Rama Ramakrishnan <rama at alum.mit.edu>
wrote:>
> Hi Friends,
>
> I have a data frame d. Let vars be the column indices for a subset of the
> columns in d (e.g., vars <- c(1,3,4,8))
>
> For each row r in d, I want to collect all the other rows in d that match
> the values in row r for just the columns in vars.
>
> The naive way to do this is to have a for loop stepping through each row in
> d, and within the loop have another loop going through all the rows again,
> checking for equality. This is quadratic in the number of rows and takes
way
> too long. Is there a better, "vectorized" way to do this?
>
> Thanks in advance!
>
> Rama Ramakrishnan
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

jim holtman

2009-Oct-08 12:24 UTC

head link

[R] Need a vectorized way to avoid two nested FOR loops

I answered the wrong question.  Here is the code to find all the
matches for each row:

n <- 20
set.seed(2)
# create test dataframe
x <- as.data.frame(matrix(sample(1:2,n*6, TRUE), nrow=n))
x
x.col <- c(1,3,5)

# match against all the other rows
x.match1 <- apply(x[, x.col], 1, function(a){
    .mat <- which(apply(x[, x.col], 1, function(z){
        all(a == z)
    }))
})

# remove matches to itself
x.match2 <- lapply(seq(length(x.match1)), function(z){
    x.match1[[z]][!(x.match1[[z]] %in% z)]
})
# x.match2 contains which rows indices match










On Wed, Oct 7, 2009 at 3:52 PM, Rama Ramakrishnan <rama at alum.mit.edu>
wrote:>
> Hi Friends,
>
> I have a data frame d. Let vars be the column indices for a subset of the
> columns in d (e.g., vars <- c(1,3,4,8))
>
> For each row r in d, I want to collect all the other rows in d that match
> the values in row r for just the columns in vars.
>
> The naive way to do this is to have a for loop stepping through each row in
> d, and within the loop have another loop going through all the rows again,
> checking for equality. This is quadratic in the number of rows and takes
way
> too long. Is there a better, "vectorized" way to do this?
>
> Thanks in advance!
>
> Rama Ramakrishnan
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

R help - Oct 2009 - Need a vectorized way to avoid two nested FOR loops

[R] Need a vectorized way to avoid two nested FOR loops

[R] Need a vectorized way to avoid two nested FOR loops

[R] Need a vectorized way to avoid two nested FOR loops