Rathore, Saubhagya Singh
2017-Jun-23 15:19 UTC
[R] R version 3.3.2, Windows 10: Applying a function to each possible pair of rows from two different data-frames
For certain reason, the content was not visible in the last mail, so posting it again. Dear Members, I have two different dataframes with a different number of rows. I need to apply a set of functions to each possible combination of rows with one row coming from 1st dataframe and other from 2nd dataframe. Though I am able to perform this task using for loops, I feel that there must be a more efficient way to do it. An example case is given below. D1 and D2 are two dataframes. I need to evaluate D3 with one column as the Euclidean distance in the x-y plane and second column as squared difference of z values, of each row pair from D1 and D2. D1<-data.frame(x=1:5,y=6:10,z=rnorm(5)) D2<-data.frame(x=19:30,y=41:52,z=rnorm(12)) D3<-data.frame(distance=integer(0),difference=integer(0)) for (i in 1:nrow(D1)){ for (j in 1:nrow(D2)) { temp<-data.frame(distance=sqrt(sum((D1[i,1:2]-D2[j,1:2])^2)),difference=(D1[i,3]-D2[j,3])^2) D3<-rbind(D3,temp) } } Thank you -----Original Message----- From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of r-help-owner at r-project.org Sent: Friday, June 23, 2017 10:47 AM To: Rathore, Saubhagya Singh <saubhagya at gatech.edu> Subject: R version 3.3.2, Windows 10: Applying a function to each possible pair of rows from two different data-frames The message's content type was not explicitly allowed
Rui Barradas
2017-Jun-23 15:35 UTC
[R] R version 3.3.2, Windows 10: Applying a function to each possible pair of rows from two different data-frames
Hello, The obvious way would be to preallocate the resulting data.frame, to expand an empty one on each iteration being a time expensive operation. n <- nrow(expand.grid(1:nrow(D1), 1:nrow(D2))) D4 <- data.frame(distance=integer(n),difference=integer(n)) k <- 0 for (i in 1:nrow(D1)){ for (j in 1:nrow(D2)) { k <- k + 1 D4[k, ] <- c(distance=sqrt(sum((D1[i,1:2]-D2[j,1:2])^2)),difference=(D1[i,3]-D2[j,3])^2) } } identical(D3, D4) Hope this helps, Rui Barradas Em 23-06-2017 16:19, Rathore, Saubhagya Singh escreveu:> For certain reason, the content was not visible in the last mail, so posting it again. > > Dear Members, > > I have two different dataframes with a different number of rows. I need to apply a set of functions to each possible combination of rows with one row coming from 1st dataframe and other from 2nd dataframe. Though I am able to perform this task using for loops, I feel that there must be a more efficient way to do it. An example case is given below. D1 and D2 are two dataframes. I need to evaluate D3 with one column as the Euclidean distance in the x-y plane and second column as squared difference of z values, of each row pair from D1 and D2. > > D1<-data.frame(x=1:5,y=6:10,z=rnorm(5)) > D2<-data.frame(x=19:30,y=41:52,z=rnorm(12)) > D3<-data.frame(distance=integer(0),difference=integer(0)) > > for (i in 1:nrow(D1)){ > > for (j in 1:nrow(D2)) { > > temp<-data.frame(distance=sqrt(sum((D1[i,1:2]-D2[j,1:2])^2)),difference=(D1[i,3]-D2[j,3])^2) > D3<-rbind(D3,temp) > } > } > > Thank you > > -----Original Message----- > From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of r-help-owner at r-project.org > Sent: Friday, June 23, 2017 10:47 AM > To: Rathore, Saubhagya Singh <saubhagya at gatech.edu> > Subject: R version 3.3.2, Windows 10: Applying a function to each possible pair of rows from two different data-frames > > The message's content type was not explicitly allowed > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Rui Barradas
2017-Jun-23 16:02 UTC
[R] R version 3.3.2, Windows 10: Applying a function to each possible pair of rows from two different data-frames
Hello, Another way would be n <- nrow(expand.grid(1:nrow(D1), 1:nrow(D2))) D5 <- data.frame(distance=integer(n),difference=integer(n)) D5[] <- do.call(rbind, lapply(seq_len(nrow(D1)), function(i) t(sapply(seq_len(nrow(D2)), function(j){ c(distance=sqrt(sum((D1[i,1:2]-D2[j,1:2])^2)),difference=(D1[i,3]-D2[j,3])^2) } )))) identical(D3, D5) In my first answer I forgot to say that constructs like 1:nrow(...) or more generally 1:m are error prone. If m == 0 you will have the perfectly legal loop for(i in 1:0) but an illegal zero index. The solution is to use ?seq_len or ?seq_along (same help page). Like this: for(i in seq_len(m)). In your case m is either nrow(D1) or nrow(D2). Hope this helps, Rui Barradas Em 23-06-2017 16:35, Rui Barradas escreveu:> Hello, > > The obvious way would be to preallocate the resulting data.frame, to > expand an empty one on each iteration being a time expensive operation. > > n <- nrow(expand.grid(1:nrow(D1), 1:nrow(D2))) > D4 <- data.frame(distance=integer(n),difference=integer(n)) > k <- 0 > for (i in 1:nrow(D1)){ > for (j in 1:nrow(D2)) { > k <- k + 1 > D4[k, ] <- > c(distance=sqrt(sum((D1[i,1:2]-D2[j,1:2])^2)),difference=(D1[i,3]-D2[j,3])^2) > > } > } > > identical(D3, D4) > > Hope this helps, > > Rui Barradas > > Em 23-06-2017 16:19, Rathore, Saubhagya Singh escreveu: >> For certain reason, the content was not visible in the last mail, so >> posting it again. >> >> Dear Members, >> >> I have two different dataframes with a different number of rows. I >> need to apply a set of functions to each possible combination of rows >> with one row coming from 1st dataframe and other from 2nd dataframe. >> Though I am able to perform this task using for loops, I feel that >> there must be a more efficient way to do it. An example case is given >> below. D1 and D2 are two dataframes. I need to evaluate D3 with one >> column as the Euclidean distance in the x-y plane and second column as >> squared difference of z values, of each row pair from D1 and D2. >> >> D1<-data.frame(x=1:5,y=6:10,z=rnorm(5)) >> D2<-data.frame(x=19:30,y=41:52,z=rnorm(12)) >> D3<-data.frame(distance=integer(0),difference=integer(0)) >> >> for (i in 1:nrow(D1)){ >> >> for (j in 1:nrow(D2)) { >> >> temp<-data.frame(distance=sqrt(sum((D1[i,1:2]-D2[j,1:2])^2)),difference=(D1[i,3]-D2[j,3])^2) >> >> D3<-rbind(D3,temp) >> } >> } >> >> Thank you >> >> -----Original Message----- >> From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of >> r-help-owner at r-project.org >> Sent: Friday, June 23, 2017 10:47 AM >> To: Rathore, Saubhagya Singh <saubhagya at gatech.edu> >> Subject: R version 3.3.2, Windows 10: Applying a function to each >> possible pair of rows from two different data-frames >> >> The message's content type was not explicitly allowed >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Bert Gunter
2017-Jun-23 19:19 UTC
[R] R version 3.3.2, Windows 10: Applying a function to each possible pair of rows from two different data-frames
You appear to be trying to write C code in R. Don't do this. If you can trade off space for efficiency, the calculation can be easily vectorized (assuming I correctly understand what you want to do, of course). set.seed(135) ## for reproducibility D1<-data.frame(x=1:5,y=6:10,z=rnorm(5)) D2<-data.frame(x=19:30,y=41:52,z=rnorm(12)) D.all <-merge(D1,D2, by.x=NULL,by.y=NULL) ## Cartesian product of the two frames D.all$distance <- sqrt(rowSums((D.all[,1:2] - D.all[,4:5])^2)) ## note use of rowSums D.all$difference <- (D.all[,3] - D.all[,6])^2 D.all Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Fri, Jun 23, 2017 at 8:19 AM, Rathore, Saubhagya Singh <saubhagya at gatech.edu> wrote:> For certain reason, the content was not visible in the last mail, so posting it again. > > Dear Members, > > I have two different dataframes with a different number of rows. I need to apply a set of functions to each possible combination of rows with one row coming from 1st dataframe and other from 2nd dataframe. Though I am able to perform this task using for loops, I feel that there must be a more efficient way to do it. An example case is given below. D1 and D2 are two dataframes. I need to evaluate D3 with one column as the Euclidean distance in the x-y plane and second column as squared difference of z values, of each row pair from D1 and D2. > > D1<-data.frame(x=1:5,y=6:10,z=rnorm(5)) > D2<-data.frame(x=19:30,y=41:52,z=rnorm(12)) > D3<-data.frame(distance=integer(0),difference=integer(0)) > > for (i in 1:nrow(D1)){ > > for (j in 1:nrow(D2)) { > > temp<-data.frame(distance=sqrt(sum((D1[i,1:2]-D2[j,1:2])^2)),difference=(D1[i,3]-D2[j,3])^2) > D3<-rbind(D3,temp) > } > } > > Thank you > > -----Original Message----- > From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of r-help-owner at r-project.org > Sent: Friday, June 23, 2017 10:47 AM > To: Rathore, Saubhagya Singh <saubhagya at gatech.edu> > Subject: R version 3.3.2, Windows 10: Applying a function to each possible pair of rows from two different data-frames > > The message's content type was not explicitly allowed > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Rathore, Saubhagya Singh
2017-Jun-23 19:53 UTC
[R] R version 3.3.2, Windows 10: Applying a function to each possible pair of rows from two different data-frames
Thank you very much Mr. Gunter for making me realize the power vectorization. I need to work lot more to exploit this strength of R. I applied your suggested method to my problem where D1 and D2 has 600 observations each. The time was significant reduced compared to the fasted working code I had (user:61, system: 0.01 , elapsed: 0.62). Thank you again for your generous help. Saubhagya -----Original Message----- From: Bert Gunter [mailto:bgunter.4567 at gmail.com] Sent: Friday, June 23, 2017 3:20 PM To: Rathore, Saubhagya Singh <saubhagya at gatech.edu> Cc: r-help at r-project.org Subject: Re: [R] R version 3.3.2, Windows 10: Applying a function to each possible pair of rows from two different data-frames You appear to be trying to write C code in R. Don't do this. If you can trade off space for efficiency, the calculation can be easily vectorized (assuming I correctly understand what you want to do, of course). set.seed(135) ## for reproducibility D1<-data.frame(x=1:5,y=6:10,z=rnorm(5)) D2<-data.frame(x=19:30,y=41:52,z=rnorm(12)) D.all <-merge(D1,D2, by.x=NULL,by.y=NULL) ## Cartesian product of the two frames D.all$distance <- sqrt(rowSums((D.all[,1:2] - D.all[,4:5])^2)) ## note use of rowSums D.all$difference <- (D.all[,3] - D.all[,6])^2 D.all Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Fri, Jun 23, 2017 at 8:19 AM, Rathore, Saubhagya Singh <saubhagya at gatech.edu> wrote:> For certain reason, the content was not visible in the last mail, so posting it again. > > Dear Members, > > I have two different dataframes with a different number of rows. I need to apply a set of functions to each possible combination of rows with one row coming from 1st dataframe and other from 2nd dataframe. Though I am able to perform this task using for loops, I feel that there must be a more efficient way to do it. An example case is given below. D1 and D2 are two dataframes. I need to evaluate D3 with one column as the Euclidean distance in the x-y plane and second column as squared difference of z values, of each row pair from D1 and D2. > > D1<-data.frame(x=1:5,y=6:10,z=rnorm(5)) > D2<-data.frame(x=19:30,y=41:52,z=rnorm(12)) > D3<-data.frame(distance=integer(0),difference=integer(0)) > > for (i in 1:nrow(D1)){ > > for (j in 1:nrow(D2)) { > > temp<-data.frame(distance=sqrt(sum((D1[i,1:2]-D2[j,1:2])^2)),differenc > e=(D1[i,3]-D2[j,3])^2) > D3<-rbind(D3,temp) > } > } > > Thank you > > -----Original Message----- > From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of > r-help-owner at r-project.org > Sent: Friday, June 23, 2017 10:47 AM > To: Rathore, Saubhagya Singh <saubhagya at gatech.edu> > Subject: R version 3.3.2, Windows 10: Applying a function to each > possible pair of rows from two different data-frames > > The message's content type was not explicitly allowed > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Apparently Analagous Threads
- R version 3.3.2, Windows 10: Applying a function to each possible pair of rows from two different data-frames
- R version 3.3.2, Windows 10: Applying a function to each possible pair of rows from two different data-frames
- R version 3.3.2, Windows 10: Applying a function to each possible pair of rows from two different data-frames
- R version 3.3.2, Windows 10: gstat package: Error in fitting a variogram model using 'fit.variogram' function
- New to R, trying to use agnes, but can't load my ditance matrix