Rathore, Saubhagya Singh
2017-Jun-23 15:19 UTC
[R] R version 3.3.2, Windows 10: Applying a function to each possible pair of rows from two different data-frames
For certain reason, the content was not visible in the last mail, so posting it
again.
Dear Members,
I have two different dataframes with a different number of rows. I need to apply
a set of functions to each possible combination of rows with one row coming from
1st dataframe and other from 2nd dataframe. Though I am able to perform this
task using for loops, I feel that there must be a more efficient way to do it.
An example case is given below. D1 and D2 are two dataframes. I need to evaluate
D3 with one column as the Euclidean distance in the x-y plane and second column
as squared difference of z values, of each row pair from D1 and D2.
D1<-data.frame(x=1:5,y=6:10,z=rnorm(5))
D2<-data.frame(x=19:30,y=41:52,z=rnorm(12))
D3<-data.frame(distance=integer(0),difference=integer(0))
for (i in 1:nrow(D1)){
for (j in 1:nrow(D2)) {
temp<-data.frame(distance=sqrt(sum((D1[i,1:2]-D2[j,1:2])^2)),difference=(D1[i,3]-D2[j,3])^2)
D3<-rbind(D3,temp)
}
}
Thank you
-----Original Message-----
From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of r-help-owner
at r-project.org
Sent: Friday, June 23, 2017 10:47 AM
To: Rathore, Saubhagya Singh <saubhagya at gatech.edu>
Subject: R version 3.3.2, Windows 10: Applying a function to each possible pair
of rows from two different data-frames
The message's content type was not explicitly allowed
Rui Barradas
2017-Jun-23 15:35 UTC
[R] R version 3.3.2, Windows 10: Applying a function to each possible pair of rows from two different data-frames
Hello,
The obvious way would be to preallocate the resulting data.frame, to
expand an empty one on each iteration being a time expensive operation.
n <- nrow(expand.grid(1:nrow(D1), 1:nrow(D2)))
D4 <- data.frame(distance=integer(n),difference=integer(n))
k <- 0
for (i in 1:nrow(D1)){
for (j in 1:nrow(D2)) {
k <- k + 1
D4[k, ] <-
c(distance=sqrt(sum((D1[i,1:2]-D2[j,1:2])^2)),difference=(D1[i,3]-D2[j,3])^2)
}
}
identical(D3, D4)
Hope this helps,
Rui Barradas
Em 23-06-2017 16:19, Rathore, Saubhagya Singh escreveu:> For certain reason, the content was not visible in the last mail, so
posting it again.
>
> Dear Members,
>
> I have two different dataframes with a different number of rows. I need to
apply a set of functions to each possible combination of rows with one row
coming from 1st dataframe and other from 2nd dataframe. Though I am able to
perform this task using for loops, I feel that there must be a more efficient
way to do it. An example case is given below. D1 and D2 are two dataframes. I
need to evaluate D3 with one column as the Euclidean distance in the x-y plane
and second column as squared difference of z values, of each row pair from D1
and D2.
>
> D1<-data.frame(x=1:5,y=6:10,z=rnorm(5))
> D2<-data.frame(x=19:30,y=41:52,z=rnorm(12))
> D3<-data.frame(distance=integer(0),difference=integer(0))
>
> for (i in 1:nrow(D1)){
>
> for (j in 1:nrow(D2)) {
>
>
temp<-data.frame(distance=sqrt(sum((D1[i,1:2]-D2[j,1:2])^2)),difference=(D1[i,3]-D2[j,3])^2)
> D3<-rbind(D3,temp)
> }
> }
>
> Thank you
>
> -----Original Message-----
> From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of
r-help-owner at r-project.org
> Sent: Friday, June 23, 2017 10:47 AM
> To: Rathore, Saubhagya Singh <saubhagya at gatech.edu>
> Subject: R version 3.3.2, Windows 10: Applying a function to each possible
pair of rows from two different data-frames
>
> The message's content type was not explicitly allowed
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Rui Barradas
2017-Jun-23 16:02 UTC
[R] R version 3.3.2, Windows 10: Applying a function to each possible pair of rows from two different data-frames
Hello,
Another way would be
n <- nrow(expand.grid(1:nrow(D1), 1:nrow(D2)))
D5 <- data.frame(distance=integer(n),difference=integer(n))
D5[] <- do.call(rbind, lapply(seq_len(nrow(D1)), function(i)
t(sapply(seq_len(nrow(D2)), function(j){
c(distance=sqrt(sum((D1[i,1:2]-D2[j,1:2])^2)),difference=(D1[i,3]-D2[j,3])^2)
}
))))
identical(D3, D5)
In my first answer I forgot to say that constructs like 1:nrow(...) or
more generally 1:m are error prone. If m == 0 you will have the
perfectly legal loop for(i in 1:0) but an illegal zero index.
The solution is to use ?seq_len or ?seq_along (same help page). Like
this: for(i in seq_len(m)). In your case m is either nrow(D1) or nrow(D2).
Hope this helps,
Rui Barradas
Em 23-06-2017 16:35, Rui Barradas escreveu:> Hello,
>
> The obvious way would be to preallocate the resulting data.frame, to
> expand an empty one on each iteration being a time expensive operation.
>
> n <- nrow(expand.grid(1:nrow(D1), 1:nrow(D2)))
> D4 <- data.frame(distance=integer(n),difference=integer(n))
> k <- 0
> for (i in 1:nrow(D1)){
> for (j in 1:nrow(D2)) {
> k <- k + 1
> D4[k, ] <-
>
c(distance=sqrt(sum((D1[i,1:2]-D2[j,1:2])^2)),difference=(D1[i,3]-D2[j,3])^2)
>
> }
> }
>
> identical(D3, D4)
>
> Hope this helps,
>
> Rui Barradas
>
> Em 23-06-2017 16:19, Rathore, Saubhagya Singh escreveu:
>> For certain reason, the content was not visible in the last mail, so
>> posting it again.
>>
>> Dear Members,
>>
>> I have two different dataframes with a different number of rows. I
>> need to apply a set of functions to each possible combination of rows
>> with one row coming from 1st dataframe and other from 2nd dataframe.
>> Though I am able to perform this task using for loops, I feel that
>> there must be a more efficient way to do it. An example case is given
>> below. D1 and D2 are two dataframes. I need to evaluate D3 with one
>> column as the Euclidean distance in the x-y plane and second column as
>> squared difference of z values, of each row pair from D1 and D2.
>>
>> D1<-data.frame(x=1:5,y=6:10,z=rnorm(5))
>> D2<-data.frame(x=19:30,y=41:52,z=rnorm(12))
>> D3<-data.frame(distance=integer(0),difference=integer(0))
>>
>> for (i in 1:nrow(D1)){
>>
>> for (j in 1:nrow(D2)) {
>>
>>
temp<-data.frame(distance=sqrt(sum((D1[i,1:2]-D2[j,1:2])^2)),difference=(D1[i,3]-D2[j,3])^2)
>>
>> D3<-rbind(D3,temp)
>> }
>> }
>>
>> Thank you
>>
>> -----Original Message-----
>> From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of
>> r-help-owner at r-project.org
>> Sent: Friday, June 23, 2017 10:47 AM
>> To: Rathore, Saubhagya Singh <saubhagya at gatech.edu>
>> Subject: R version 3.3.2, Windows 10: Applying a function to each
>> possible pair of rows from two different data-frames
>>
>> The message's content type was not explicitly allowed
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
Bert Gunter
2017-Jun-23 19:19 UTC
[R] R version 3.3.2, Windows 10: Applying a function to each possible pair of rows from two different data-frames
You appear to be trying to write C code in R. Don't do this. If you can trade off space for efficiency, the calculation can be easily vectorized (assuming I correctly understand what you want to do, of course). set.seed(135) ## for reproducibility D1<-data.frame(x=1:5,y=6:10,z=rnorm(5)) D2<-data.frame(x=19:30,y=41:52,z=rnorm(12)) D.all <-merge(D1,D2, by.x=NULL,by.y=NULL) ## Cartesian product of the two frames D.all$distance <- sqrt(rowSums((D.all[,1:2] - D.all[,4:5])^2)) ## note use of rowSums D.all$difference <- (D.all[,3] - D.all[,6])^2 D.all Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Fri, Jun 23, 2017 at 8:19 AM, Rathore, Saubhagya Singh <saubhagya at gatech.edu> wrote:> For certain reason, the content was not visible in the last mail, so posting it again. > > Dear Members, > > I have two different dataframes with a different number of rows. I need to apply a set of functions to each possible combination of rows with one row coming from 1st dataframe and other from 2nd dataframe. Though I am able to perform this task using for loops, I feel that there must be a more efficient way to do it. An example case is given below. D1 and D2 are two dataframes. I need to evaluate D3 with one column as the Euclidean distance in the x-y plane and second column as squared difference of z values, of each row pair from D1 and D2. > > D1<-data.frame(x=1:5,y=6:10,z=rnorm(5)) > D2<-data.frame(x=19:30,y=41:52,z=rnorm(12)) > D3<-data.frame(distance=integer(0),difference=integer(0)) > > for (i in 1:nrow(D1)){ > > for (j in 1:nrow(D2)) { > > temp<-data.frame(distance=sqrt(sum((D1[i,1:2]-D2[j,1:2])^2)),difference=(D1[i,3]-D2[j,3])^2) > D3<-rbind(D3,temp) > } > } > > Thank you > > -----Original Message----- > From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of r-help-owner at r-project.org > Sent: Friday, June 23, 2017 10:47 AM > To: Rathore, Saubhagya Singh <saubhagya at gatech.edu> > Subject: R version 3.3.2, Windows 10: Applying a function to each possible pair of rows from two different data-frames > > The message's content type was not explicitly allowed > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Rathore, Saubhagya Singh
2017-Jun-23 19:53 UTC
[R] R version 3.3.2, Windows 10: Applying a function to each possible pair of rows from two different data-frames
Thank you very much Mr. Gunter for making me realize the power vectorization. I need to work lot more to exploit this strength of R. I applied your suggested method to my problem where D1 and D2 has 600 observations each. The time was significant reduced compared to the fasted working code I had (user:61, system: 0.01 , elapsed: 0.62). Thank you again for your generous help. Saubhagya -----Original Message----- From: Bert Gunter [mailto:bgunter.4567 at gmail.com] Sent: Friday, June 23, 2017 3:20 PM To: Rathore, Saubhagya Singh <saubhagya at gatech.edu> Cc: r-help at r-project.org Subject: Re: [R] R version 3.3.2, Windows 10: Applying a function to each possible pair of rows from two different data-frames You appear to be trying to write C code in R. Don't do this. If you can trade off space for efficiency, the calculation can be easily vectorized (assuming I correctly understand what you want to do, of course). set.seed(135) ## for reproducibility D1<-data.frame(x=1:5,y=6:10,z=rnorm(5)) D2<-data.frame(x=19:30,y=41:52,z=rnorm(12)) D.all <-merge(D1,D2, by.x=NULL,by.y=NULL) ## Cartesian product of the two frames D.all$distance <- sqrt(rowSums((D.all[,1:2] - D.all[,4:5])^2)) ## note use of rowSums D.all$difference <- (D.all[,3] - D.all[,6])^2 D.all Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Fri, Jun 23, 2017 at 8:19 AM, Rathore, Saubhagya Singh <saubhagya at gatech.edu> wrote:> For certain reason, the content was not visible in the last mail, so posting it again. > > Dear Members, > > I have two different dataframes with a different number of rows. I need to apply a set of functions to each possible combination of rows with one row coming from 1st dataframe and other from 2nd dataframe. Though I am able to perform this task using for loops, I feel that there must be a more efficient way to do it. An example case is given below. D1 and D2 are two dataframes. I need to evaluate D3 with one column as the Euclidean distance in the x-y plane and second column as squared difference of z values, of each row pair from D1 and D2. > > D1<-data.frame(x=1:5,y=6:10,z=rnorm(5)) > D2<-data.frame(x=19:30,y=41:52,z=rnorm(12)) > D3<-data.frame(distance=integer(0),difference=integer(0)) > > for (i in 1:nrow(D1)){ > > for (j in 1:nrow(D2)) { > > temp<-data.frame(distance=sqrt(sum((D1[i,1:2]-D2[j,1:2])^2)),differenc > e=(D1[i,3]-D2[j,3])^2) > D3<-rbind(D3,temp) > } > } > > Thank you > > -----Original Message----- > From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of > r-help-owner at r-project.org > Sent: Friday, June 23, 2017 10:47 AM > To: Rathore, Saubhagya Singh <saubhagya at gatech.edu> > Subject: R version 3.3.2, Windows 10: Applying a function to each > possible pair of rows from two different data-frames > > The message's content type was not explicitly allowed > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Seemingly Similar Threads
- R version 3.3.2, Windows 10: Applying a function to each possible pair of rows from two different data-frames
- R version 3.3.2, Windows 10: Applying a function to each possible pair of rows from two different data-frames
- R version 3.3.2, Windows 10: Applying a function to each possible pair of rows from two different data-frames
- how to recover a list structure
- R version 3.3.2, Windows 10: gstat package: Error in fitting a variogram model using 'fit.variogram' function