thr3ads.net - R help - [R] Reducing execution time [Jul 2016]

If this information is useful, please help other people find it:
Share via:

sri vathsan

2016-Jul-27 14:47 UTC

[R] Reducing execution time

Hi,

Apologizes for the less information.

Basically, myCombos is a matrix with 3 variables which is a triplet that is
a combination of 79 codes. There are around 3lakh combination as such and
it looks like below.

V1 V2 V3
65 23 77
77 34 65
55 34 23
23 77 34
34 65 55

Each triplet will compare in a list (mylist) having 8177 elements which
will looks like below.

77,65,34,23,55
65,23,77,65,55,34
77,34,65
55,78,56
98,23,77,65,34

Now I want to count the no of occurrence of the triplet in the above list.
I.e., the triplet 65 23 77 is seen 3 times in the list. So my output looks
like below

V1 V2 V3 Freq
65 23 77  3
77 34 65  4
55 34 23  2

I hope, I made it clear this time.


On Wed, Jul 27, 2016 at 7:00 PM, Bert Gunter <bgunter.4567 at gmail.com>
wrote:
> Not entirely sure I understand, but match() is already vectorized, so you
> should be able to lose the supply(). This would speed things up a lot.
> Please re-read ?match *carefully* .
>
> Bert
>
> On Jul 27, 2016 6:15 AM, "sri vathsan" <srivibish at
gmail.com> wrote:
>
> Hi,
>
> I created list of 3 combination numbers (mycombos, around 3 lakh
> combinations) and counting the occurrence of those combination in another
> list. This comparision list (mylist) is having around 8000 records.I am
> using the following code.
>
> myCounts <- sapply(1:nrow(myCombos), FUN=function(i) {
>   sum(sapply(myList, function(j) {
>     sum(!is.na(match(c(myCombos[i,]), j)))})==3)})
>
> The above code takes very long time to execute and is there any other
> effecting method which will reduce the time.
> --
>
> Regards,
> Srivathsan.K
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>

-- 

Regards,
Srivathsan.K
Phone : 9600165206

	[[alternative HTML version deleted]]

Sarah Goslee

2016-Jul-27 15:32 UTC

head link

[R] Reducing execution time

Hi,

It's really a good idea to use dput() or some other reproducible way
to provide data. I had to guess as to what your data looked like.

It appears that order doesn't matter?

Given than, here's one approach:

combs <- structure(list(V1 = c(65L, 77L, 55L, 23L, 34L), V2 = c(23L, 34L,
34L, 77L, 65L), V3 = c(77L, 65L, 23L, 34L, 55L)), .Names = c("V1",
"V2", "V3"), class = "data.frame", row.names =
c(NA, -5L))

dat <- list(
c(77,65,34,23,55),
c(65,23,77,65,55,34),
c(77,34,65),
c(55,78,56),
c(98,23,77,65,34))


sapply(seq_len(nrow(combs)), function(i)sum(sapply(dat,
function(j)all(combs[i,] %in% j))))

On a dataset of comparable time to yours, it takes me under a minute and a half.
> combs <- combs[rep(1:nrow(combs), length=100), ]
> dat <- dat[rep(1:length(dat), length=10000)]
>
> dim(combs)
[1] 100   3> length(dat)
[1] 10000>
> system.time(test <- sapply(seq_len(nrow(combs)),
function(i)sum(sapply(dat, function(j)all(combs[i,] %in% j)))))   user  system elapsed
 86.380   0.006  86.391




On Wed, Jul 27, 2016 at 10:47 AM, sri vathsan <srivibish at gmail.com>
wrote:> Hi,
>
> Apologizes for the less information.
>
> Basically, myCombos is a matrix with 3 variables which is a triplet that is
> a combination of 79 codes. There are around 3lakh combination as such and
> it looks like below.
>
> V1 V2 V3
> 65 23 77
> 77 34 65
> 55 34 23
> 23 77 34
> 34 65 55
>
> Each triplet will compare in a list (mylist) having 8177 elements which
> will looks like below.
>
> 77,65,34,23,55
> 65,23,77,65,55,34
> 77,34,65
> 55,78,56
> 98,23,77,65,34
>
> Now I want to count the no of occurrence of the triplet in the above list.
> I.e., the triplet 65 23 77 is seen 3 times in the list. So my output looks
> like below
>
> V1 V2 V3 Freq
> 65 23 77  3
> 77 34 65  4
> 55 34 23  2
>
> I hope, I made it clear this time.
>
>
> On Wed, Jul 27, 2016 at 7:00 PM, Bert Gunter <bgunter.4567 at
gmail.com> wrote:
>
>> Not entirely sure I understand, but match() is already vectorized, so
you
>> should be able to lose the supply(). This would speed things up a lot.
>> Please re-read ?match *carefully* .
>>
>> Bert
>>
>> On Jul 27, 2016 6:15 AM, "sri vathsan" <srivibish at
gmail.com> wrote:
>>
>> Hi,
>>
>> I created list of 3 combination numbers (mycombos, around 3 lakh
>> combinations) and counting the occurrence of those combination in
another
>> list. This comparision list (mylist) is having around 8000 records.I am
>> using the following code.
>>
>> myCounts <- sapply(1:nrow(myCombos), FUN=function(i) {
>>   sum(sapply(myList, function(j) {
>>     sum(!is.na(match(c(myCombos[i,]), j)))})==3)})
>>
>> The above code takes very long time to execute and is there any other
>> effecting method which will reduce the time.
>> --
>>
>> Regards,
>> Srivathsan.K
>>

sri vathsan

2016-Jul-27 17:29 UTC

head link

[R] Reducing execution time

Hi,

Thanks for the solution. But I am afraid that after running this code still
it takes more time. It has been an hour and still it is executing. I
understand the delay because each triplet has to compare almost 9000
elements.

Regards,
Sri

On Wed, Jul 27, 2016 at 9:02 PM, Sarah Goslee <sarah.goslee at gmail.com>
wrote:
> Hi,
>
> It's really a good idea to use dput() or some other reproducible way
> to provide data. I had to guess as to what your data looked like.
>
> It appears that order doesn't matter?
>
> Given than, here's one approach:
>
> combs <- structure(list(V1 = c(65L, 77L, 55L, 23L, 34L), V2 = c(23L,
34L,
> 34L, 77L, 65L), V3 = c(77L, 65L, 23L, 34L, 55L)), .Names =
c("V1",
> "V2", "V3"), class = "data.frame", row.names
= c(NA, -5L))
>
> dat <- list(
> c(77,65,34,23,55),
> c(65,23,77,65,55,34),
> c(77,34,65),
> c(55,78,56),
> c(98,23,77,65,34))
>
>
> sapply(seq_len(nrow(combs)), function(i)sum(sapply(dat,
> function(j)all(combs[i,] %in% j))))
>
> On a dataset of comparable time to yours, it takes me under a minute and a
> half.
>
> > combs <- combs[rep(1:nrow(combs), length=100), ]
> > dat <- dat[rep(1:length(dat), length=10000)]
> >
> > dim(combs)
> [1] 100   3
> > length(dat)
> [1] 10000
> >
> > system.time(test <- sapply(seq_len(nrow(combs)),
> function(i)sum(sapply(dat, function(j)all(combs[i,] %in% j)))))
>    user  system elapsed
>  86.380   0.006  86.391
>
>
>
>
> On Wed, Jul 27, 2016 at 10:47 AM, sri vathsan <srivibish at
gmail.com> wrote:
> > Hi,
> >
> > Apologizes for the less information.
> >
> > Basically, myCombos is a matrix with 3 variables which is a triplet
that
> is
> > a combination of 79 codes. There are around 3lakh combination as such
and
> > it looks like below.
> >
> > V1 V2 V3
> > 65 23 77
> > 77 34 65
> > 55 34 23
> > 23 77 34
> > 34 65 55
> >
> > Each triplet will compare in a list (mylist) having 8177 elements
which
> > will looks like below.
> >
> > 77,65,34,23,55
> > 65,23,77,65,55,34
> > 77,34,65
> > 55,78,56
> > 98,23,77,65,34
> >
> > Now I want to count the no of occurrence of the triplet in the above
> list.
> > I.e., the triplet 65 23 77 is seen 3 times in the list. So my output
> looks
> > like below
> >
> > V1 V2 V3 Freq
> > 65 23 77  3
> > 77 34 65  4
> > 55 34 23  2
> >
> > I hope, I made it clear this time.
> >
> >
> > On Wed, Jul 27, 2016 at 7:00 PM, Bert Gunter <bgunter.4567 at
gmail.com>
> wrote:
> >
> >> Not entirely sure I understand, but match() is already vectorized,
so
> you
> >> should be able to lose the supply(). This would speed things up a
lot.
> >> Please re-read ?match *carefully* .
> >>
> >> Bert
> >>
> >> On Jul 27, 2016 6:15 AM, "sri vathsan" <srivibish at
gmail.com> wrote:
> >>
> >> Hi,
> >>
> >> I created list of 3 combination numbers (mycombos, around 3 lakh
> >> combinations) and counting the occurrence of those combination in
> another
> >> list. This comparision list (mylist) is having around 8000
records.I am
> >> using the following code.
> >>
> >> myCounts <- sapply(1:nrow(myCombos), FUN=function(i) {
> >>   sum(sapply(myList, function(j) {
> >>     sum(!is.na(match(c(myCombos[i,]), j)))})==3)})
> >>
> >> The above code takes very long time to execute and is there any
other
> >> effecting method which will reduce the time.
> >> --
> >>
> >> Regards,
> >> Srivathsan.K
> >>
>


-- 

Regards,
Srivathsan.K
Phone : 9600165206

	[[alternative HTML version deleted]]

jeremiah rounds

2016-Jul-27 19:17 UTC

head link

[R] Reducing execution time

If I understood the request this is the same programming  task as counting
words in a document and counting character sequences in a string or
matching bytes in byte arrays (though you don't want to go down that far)
 You can do something like what follows.  There are also vectorized greps
in stringr.

combs = structure(list(V1 = c(65L, 77L, 55L, 23L, 34L), V2 = c(23L, 34L,
34L, 77L, 65L), V3 = c(77L, 65L, 23L, 34L, 55L)), .Names = c("V1",
"V2", "V3"), class = "data.frame", row.names =
c(NA, -5L))

dat = list(
c(77,65,34,23,55, 65,23,77, 44),
c(65,23,77,65,55,34, 77, 34,65, 10),
c(77,34,65),
c(55,78,56),
c(98,23,77,65,34, 65, 23, 77, 34))


words = unlist(apply(combs, 1 , function(d) paste(as.character(d),
collapse=" ")))
dat = lapply(dat, function(d) paste( as.character(d), collapse= " "))
doc = paste(dat, collapse = " ## ") # just some arbitrary separator
character that isn't in your words
counts = sapply(words, function(w) length(grep(w, doc)))
names(counts) = words
counts
cbind(combs, data.frame(N = counts))



On Wed, Jul 27, 2016 at 8:32 AM, Sarah Goslee <sarah.goslee at gmail.com>
wrote:
> Hi,
>
> It's really a good idea to use dput() or some other reproducible way
> to provide data. I had to guess as to what your data looked like.
>
> It appears that order doesn't matter?
>
> Given than, here's one approach:
>
> combs <- structure(list(V1 = c(65L, 77L, 55L, 23L, 34L), V2 = c(23L,
34L,
> 34L, 77L, 65L), V3 = c(77L, 65L, 23L, 34L, 55L)), .Names =
c("V1",
> "V2", "V3"), class = "data.frame", row.names
= c(NA, -5L))
>
> dat <- list(
> c(77,65,34,23,55),
> c(65,23,77,65,55,34),
> c(77,34,65),
> c(55,78,56),
> c(98,23,77,65,34))
>
>
> sapply(seq_len(nrow(combs)), function(i)sum(sapply(dat,
> function(j)all(combs[i,] %in% j))))
>
> On a dataset of comparable time to yours, it takes me under a minute and a
> half.
>
> > combs <- combs[rep(1:nrow(combs), length=100), ]
> > dat <- dat[rep(1:length(dat), length=10000)]
> >
> > dim(combs)
> [1] 100   3
> > length(dat)
> [1] 10000
> >
> > system.time(test <- sapply(seq_len(nrow(combs)),
> function(i)sum(sapply(dat, function(j)all(combs[i,] %in% j)))))
>    user  system elapsed
>  86.380   0.006  86.391
>
>
>
>
> On Wed, Jul 27, 2016 at 10:47 AM, sri vathsan <srivibish at
gmail.com> wrote:
> > Hi,
> >
> > Apologizes for the less information.
> >
> > Basically, myCombos is a matrix with 3 variables which is a triplet
that
> is
> > a combination of 79 codes. There are around 3lakh combination as such
and
> > it looks like below.
> >
> > V1 V2 V3
> > 65 23 77
> > 77 34 65
> > 55 34 23
> > 23 77 34
> > 34 65 55
> >
> > Each triplet will compare in a list (mylist) having 8177 elements
which
> > will looks like below.
> >
> > 77,65,34,23,55
> > 65,23,77,65,55,34
> > 77,34,65
> > 55,78,56
> > 98,23,77,65,34
> >
> > Now I want to count the no of occurrence of the triplet in the above
> list.
> > I.e., the triplet 65 23 77 is seen 3 times in the list. So my output
> looks
> > like below
> >
> > V1 V2 V3 Freq
> > 65 23 77  3
> > 77 34 65  4
> > 55 34 23  2
> >
> > I hope, I made it clear this time.
> >
> >
> > On Wed, Jul 27, 2016 at 7:00 PM, Bert Gunter <bgunter.4567 at
gmail.com>
> wrote:
> >
> >> Not entirely sure I understand, but match() is already vectorized,
so
> you
> >> should be able to lose the supply(). This would speed things up a
lot.
> >> Please re-read ?match *carefully* .
> >>
> >> Bert
> >>
> >> On Jul 27, 2016 6:15 AM, "sri vathsan" <srivibish at
gmail.com> wrote:
> >>
> >> Hi,
> >>
> >> I created list of 3 combination numbers (mycombos, around 3 lakh
> >> combinations) and counting the occurrence of those combination in
> another
> >> list. This comparision list (mylist) is having around 8000
records.I am
> >> using the following code.
> >>
> >> myCounts <- sapply(1:nrow(myCombos), FUN=function(i) {
> >>   sum(sapply(myList, function(j) {
> >>     sum(!is.na(match(c(myCombos[i,]), j)))})==3)})
> >>
> >> The above code takes very long time to execute and is there any
other
> >> effecting method which will reduce the time.
> >> --
> >>
> >> Regards,
> >> Srivathsan.K
> >>
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

R help - Jul 2016 - Reducing execution time

[R] Reducing execution time

[R] Reducing execution time

[R] Reducing execution time

[R] Reducing execution time