thr3ads.net - R help - [R] Trying to avoid the loop while merging two data frames [Dec 2015]

If this information is useful, please help other people find it:
Share via:

Dimitri Liakhovitski

2015-Dec-22 17:27 UTC

[R] Trying to avoid the loop while merging two data frames

Hello!
I have a solution for my task that is based on a loop. However, it's
too slow for my real-life problem that is much larger in scope.
However, I cannot use merge. Any advice on how to do it faster?
Thanks a lot for any hint on how to speed it up!

# I have 'mydata' data frame:
set.seed(123)
mydata <- data.frame(myid = 1001:1100,
                     version = sample(1:20, 100, replace = T))
head(mydata)
table(mydata$version)

# I have 'myinfo' data frame that contains information for each
'version':
set.seed(12)
myinfo <- data.frame(version = sort(rep(1:20, 30)), a = rnorm(60), b
rnorm(60),
                                 c = rnorm(60), d = rnorm(60))
head(myinfo, 40)

### MY SOLUTION WITH A LOOP:
### Looping through each id of mydata and grabbing
### all columns from 'myinfo' for the corresponding 'version':

# 1. Creating placeholder list for the results:
result <- split(mydata[c("myid", "version")], f =
list(mydata$myid))
length(result)
(result)[1:3]


# 2. Looping through each element of 'result':
for(i in 1:length(result)){
      id <- result[[i]]$myid
      result[[i]] <- myinfo[myinfo$version == result[[i]]$version, ]
      result[[i]]$myid <- id
      result[[i]] <- result[[i]][c(5, 1:4)]
}
result <- do.call(rbind, result)
head(result) # This is the desired result

-- 
Dimitri Liakhovitski

jim holtman

2015-Dec-22 19:26 UTC

head link

[R] Trying to avoid the loop while merging two data frames

You seem to be saving 'myid' and then overwriting it with the last
statement:

 result[[i]] <- result[[i]][c(5, 1:4)]

Why doesn't 'merge' work for you?  I tried it on your data, and seem
to get
back the same number of rows; may not be in the same order, but the content
looks the same, and it does have 'myid' on it.


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Tue, Dec 22, 2015 at 12:27 PM, Dimitri Liakhovitski <
dimitri.liakhovitski at gmail.com> wrote:
> Hello!
> I have a solution for my task that is based on a loop. However, it's
> too slow for my real-life problem that is much larger in scope.
> However, I cannot use merge. Any advice on how to do it faster?
> Thanks a lot for any hint on how to speed it up!
>
> # I have 'mydata' data frame:
> set.seed(123)
> mydata <- data.frame(myid = 1001:1100,
>                      version = sample(1:20, 100, replace = T))
> head(mydata)
> table(mydata$version)
>
> # I have 'myinfo' data frame that contains information for each
'version':
> set.seed(12)
> myinfo <- data.frame(version = sort(rep(1:20, 30)), a = rnorm(60), b
> rnorm(60),
>                                  c = rnorm(60), d = rnorm(60))
> head(myinfo, 40)
>
> ### MY SOLUTION WITH A LOOP:
> ### Looping through each id of mydata and grabbing
> ### all columns from 'myinfo' for the corresponding
'version':
>
> # 1. Creating placeholder list for the results:
> result <- split(mydata[c("myid", "version")], f =
list(mydata$myid))
> length(result)
> (result)[1:3]
>
>
> # 2. Looping through each element of 'result':
> for(i in 1:length(result)){
>       id <- result[[i]]$myid
>       result[[i]] <- myinfo[myinfo$version == result[[i]]$version, ]
>       result[[i]]$myid <- id
>       result[[i]] <- result[[i]][c(5, 1:4)]
> }
> result <- do.call(rbind, result)
> head(result) # This is the desired result
>
> --
> Dimitri Liakhovitski
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Dimitri Liakhovitski

2015-Dec-22 20:34 UTC

head link

[R] Trying to avoid the loop while merging two data frames

I know I am overwriting.
merge doesn't solve it because each version in mydata is given to more
than one id. Hence, I thought I can't merge by version.
I am not sure how to answer the question about "the problem".
I described the current state and the desired state. If possible, I'd
like to get from the current state to the desired state faster than
when using a loop.

On Tue, Dec 22, 2015 at 2:26 PM, jim holtman <jholtman at gmail.com>
wrote:> You seem to be saving 'myid' and then overwriting it with the last
> statement:
>
>  result[[i]] <- result[[i]][c(5, 1:4)]
>
> Why doesn't 'merge' work for you?  I tried it on your data, and
seem to get
> back the same number of rows; may not be in the same order, but the content
> looks the same, and it does have 'myid' on it.
>
>
> Jim Holtman
> Data Munger Guru
>
> What is the problem that you are trying to solve?
> Tell me what you want to do, not how you want to do it.
>
> On Tue, Dec 22, 2015 at 12:27 PM, Dimitri Liakhovitski
> <dimitri.liakhovitski at gmail.com> wrote:
>>
>> Hello!
>> I have a solution for my task that is based on a loop. However,
it's
>> too slow for my real-life problem that is much larger in scope.
>> However, I cannot use merge. Any advice on how to do it faster?
>> Thanks a lot for any hint on how to speed it up!
>>
>> # I have 'mydata' data frame:
>> set.seed(123)
>> mydata <- data.frame(myid = 1001:1100,
>>                      version = sample(1:20, 100, replace = T))
>> head(mydata)
>> table(mydata$version)
>>
>> # I have 'myinfo' data frame that contains information for each
'version':
>> set.seed(12)
>> myinfo <- data.frame(version = sort(rep(1:20, 30)), a = rnorm(60), b
>> rnorm(60),
>>                                  c = rnorm(60), d = rnorm(60))
>> head(myinfo, 40)
>>
>> ### MY SOLUTION WITH A LOOP:
>> ### Looping through each id of mydata and grabbing
>> ### all columns from 'myinfo' for the corresponding
'version':
>>
>> # 1. Creating placeholder list for the results:
>> result <- split(mydata[c("myid", "version")], f
= list(mydata$myid))
>> length(result)
>> (result)[1:3]
>>
>>
>> # 2. Looping through each element of 'result':
>> for(i in 1:length(result)){
>>       id <- result[[i]]$myid
>>       result[[i]] <- myinfo[myinfo$version == result[[i]]$version, ]
>>       result[[i]]$myid <- id
>>       result[[i]] <- result[[i]][c(5, 1:4)]
>> }
>> result <- do.call(rbind, result)
>> head(result) # This is the desired result
>>
>> --
>> Dimitri Liakhovitski
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>


-- 
Dimitri Liakhovitski

R help - Dec 2015 - Trying to avoid the loop while merging two data frames

[R] Trying to avoid the loop while merging two data frames

[R] Trying to avoid the loop while merging two data frames

[R] Trying to avoid the loop while merging two data frames