Hi, Thanks for the solution. But I am afraid that after running this code still it takes more time. It has been an hour and still it is executing. I understand the delay because each triplet has to compare almost 9000 elements. Regards, Sri On Wed, Jul 27, 2016 at 9:02 PM, Sarah Goslee <sarah.goslee at gmail.com> wrote:> Hi, > > It's really a good idea to use dput() or some other reproducible way > to provide data. I had to guess as to what your data looked like. > > It appears that order doesn't matter? > > Given than, here's one approach: > > combs <- structure(list(V1 = c(65L, 77L, 55L, 23L, 34L), V2 = c(23L, 34L, > 34L, 77L, 65L), V3 = c(77L, 65L, 23L, 34L, 55L)), .Names = c("V1", > "V2", "V3"), class = "data.frame", row.names = c(NA, -5L)) > > dat <- list( > c(77,65,34,23,55), > c(65,23,77,65,55,34), > c(77,34,65), > c(55,78,56), > c(98,23,77,65,34)) > > > sapply(seq_len(nrow(combs)), function(i)sum(sapply(dat, > function(j)all(combs[i,] %in% j)))) > > On a dataset of comparable time to yours, it takes me under a minute and a > half. > > > combs <- combs[rep(1:nrow(combs), length=100), ] > > dat <- dat[rep(1:length(dat), length=10000)] > > > > dim(combs) > [1] 100 3 > > length(dat) > [1] 10000 > > > > system.time(test <- sapply(seq_len(nrow(combs)), > function(i)sum(sapply(dat, function(j)all(combs[i,] %in% j))))) > user system elapsed > 86.380 0.006 86.391 > > > > > On Wed, Jul 27, 2016 at 10:47 AM, sri vathsan <srivibish at gmail.com> wrote: > > Hi, > > > > Apologizes for the less information. > > > > Basically, myCombos is a matrix with 3 variables which is a triplet that > is > > a combination of 79 codes. There are around 3lakh combination as such and > > it looks like below. > > > > V1 V2 V3 > > 65 23 77 > > 77 34 65 > > 55 34 23 > > 23 77 34 > > 34 65 55 > > > > Each triplet will compare in a list (mylist) having 8177 elements which > > will looks like below. > > > > 77,65,34,23,55 > > 65,23,77,65,55,34 > > 77,34,65 > > 55,78,56 > > 98,23,77,65,34 > > > > Now I want to count the no of occurrence of the triplet in the above > list. > > I.e., the triplet 65 23 77 is seen 3 times in the list. So my output > looks > > like below > > > > V1 V2 V3 Freq > > 65 23 77 3 > > 77 34 65 4 > > 55 34 23 2 > > > > I hope, I made it clear this time. > > > > > > On Wed, Jul 27, 2016 at 7:00 PM, Bert Gunter <bgunter.4567 at gmail.com> > wrote: > > > >> Not entirely sure I understand, but match() is already vectorized, so > you > >> should be able to lose the supply(). This would speed things up a lot. > >> Please re-read ?match *carefully* . > >> > >> Bert > >> > >> On Jul 27, 2016 6:15 AM, "sri vathsan" <srivibish at gmail.com> wrote: > >> > >> Hi, > >> > >> I created list of 3 combination numbers (mycombos, around 3 lakh > >> combinations) and counting the occurrence of those combination in > another > >> list. This comparision list (mylist) is having around 8000 records.I am > >> using the following code. > >> > >> myCounts <- sapply(1:nrow(myCombos), FUN=function(i) { > >> sum(sapply(myList, function(j) { > >> sum(!is.na(match(c(myCombos[i,]), j)))})==3)}) > >> > >> The above code takes very long time to execute and is there any other > >> effecting method which will reduce the time. > >> -- > >> > >> Regards, > >> Srivathsan.K > >> >-- Regards, Srivathsan.K Phone : 9600165206 [[alternative HTML version deleted]]
You said you had 79 triplets and 8000 records. When I compared 100 triplets to 10000 records it took 86 seconds. So obviously there is something you're not telling us about the format of your data. If you use dput() to provide actual examples, you will get better results than if we on Rhelp have to guess. Because we tend to guess in ways that make the most sense after extensive R experience, and that's probably not what you have. Sarah On Wed, Jul 27, 2016 at 1:29 PM, sri vathsan <srivibish at gmail.com> wrote:> Hi, > > Thanks for the solution. But I am afraid that after running this code still > it takes more time. It has been an hour and still it is executing. I > understand the delay because each triplet has to compare almost 9000 > elements. > > Regards, > Sri > > On Wed, Jul 27, 2016 at 9:02 PM, Sarah Goslee <sarah.goslee at gmail.com> > wrote: >> >> Hi, >> >> It's really a good idea to use dput() or some other reproducible way >> to provide data. I had to guess as to what your data looked like. >> >> It appears that order doesn't matter? >> >> Given than, here's one approach: >> >> combs <- structure(list(V1 = c(65L, 77L, 55L, 23L, 34L), V2 = c(23L, 34L, >> 34L, 77L, 65L), V3 = c(77L, 65L, 23L, 34L, 55L)), .Names = c("V1", >> "V2", "V3"), class = "data.frame", row.names = c(NA, -5L)) >> >> dat <- list( >> c(77,65,34,23,55), >> c(65,23,77,65,55,34), >> c(77,34,65), >> c(55,78,56), >> c(98,23,77,65,34)) >> >> >> sapply(seq_len(nrow(combs)), function(i)sum(sapply(dat, >> function(j)all(combs[i,] %in% j)))) >> >> On a dataset of comparable time to yours, it takes me under a minute and a >> half. >> >> > combs <- combs[rep(1:nrow(combs), length=100), ] >> > dat <- dat[rep(1:length(dat), length=10000)] >> > >> > dim(combs) >> [1] 100 3 >> > length(dat) >> [1] 10000 >> > >> > system.time(test <- sapply(seq_len(nrow(combs)), >> > function(i)sum(sapply(dat, function(j)all(combs[i,] %in% j))))) >> user system elapsed >> 86.380 0.006 86.391 >> >> >> >> >> On Wed, Jul 27, 2016 at 10:47 AM, sri vathsan <srivibish at gmail.com> wrote: >> > Hi, >> > >> > Apologizes for the less information. >> > >> > Basically, myCombos is a matrix with 3 variables which is a triplet that >> > is >> > a combination of 79 codes. There are around 3lakh combination as such >> > and >> > it looks like below. >> > >> > V1 V2 V3 >> > 65 23 77 >> > 77 34 65 >> > 55 34 23 >> > 23 77 34 >> > 34 65 55 >> > >> > Each triplet will compare in a list (mylist) having 8177 elements which >> > will looks like below. >> > >> > 77,65,34,23,55 >> > 65,23,77,65,55,34 >> > 77,34,65 >> > 55,78,56 >> > 98,23,77,65,34 >> > >> > Now I want to count the no of occurrence of the triplet in the above >> > list. >> > I.e., the triplet 65 23 77 is seen 3 times in the list. So my output >> > looks >> > like below >> > >> > V1 V2 V3 Freq >> > 65 23 77 3 >> > 77 34 65 4 >> > 55 34 23 2 >> > >> > I hope, I made it clear this time. >> > >> > >> > On Wed, Jul 27, 2016 at 7:00 PM, Bert Gunter <bgunter.4567 at gmail.com> >> > wrote: >> > >> >> Not entirely sure I understand, but match() is already vectorized, so >> >> you >> >> should be able to lose the supply(). This would speed things up a lot. >> >> Please re-read ?match *carefully* . >> >> >> >> Bert >> >> >> >> On Jul 27, 2016 6:15 AM, "sri vathsan" <srivibish at gmail.com> wrote: >> >> >> >> Hi, >> >> >> >> I created list of 3 combination numbers (mycombos, around 3 lakh >> >> combinations) and counting the occurrence of those combination in >> >> another >> >> list. This comparision list (mylist) is having around 8000 records.I am >> >> using the following code. >> >> >> >> myCounts <- sapply(1:nrow(myCombos), FUN=function(i) { >> >> sum(sapply(myList, function(j) { >> >> sum(!is.na(match(c(myCombos[i,]), j)))})==3)}) >> >> >> >> The above code takes very long time to execute and is there any other >> >> effecting method which will reduce the time. >> >> -- >> >> >> >> Regards, >> >> Srivathsan.K >> >> > > > >
Hi, It is not a just 79 triplets. As I said, there are 79 codes. I am making triplets out of that 79 codes and matching the triplets in the list. Please find the dput of the data below.> dput(head(newd,10))structure(list(uniq_id = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10"), hi = c("11, 22, 84, 85, 108, 111", "18, 84, 85, 87, 122, 134", "2, 18, 22", "18, 108, 122, 134, 176", "19, 85, 87, 100, 107", "79, 85, 111", "11, 88, 108", "19, 88, 96", "19, 85, 96", "19, 100, 103")), .Names = c("uniq_id", "hi"), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"))>I am trying to count the frequency of the triplets in the above data using the below code. # split column into a list myList <- strsplit(newd$hi, split=",") # get all pairwise combinations myCombos <- t(combn(unique(unlist(myList)), 3)) # count the instances where the pair is present myCounts <- sapply(1:nrow(myCombos), FUN=function(i) { sum(sapply(myList, function(j) { sum(!is.na(match(c(myCombos[i,]), j)))})==3)}) #final matrix final <- cbind(matrix(as.integer(myCombos), nrow(myCombos)), myCounts) I hope I made my point clear. Please let me know if I miss anything. Regards, Sri On Wed, Jul 27, 2016 at 11:19 PM, Sarah Goslee <sarah.goslee at gmail.com> wrote:> You said you had 79 triplets and 8000 records. > > When I compared 100 triplets to 10000 records it took 86 seconds. > > So obviously there is something you're not telling us about the format > of your data. > > If you use dput() to provide actual examples, you will get better > results than if we on Rhelp have to guess. Because we tend to guess in > ways that make the most sense after extensive R experience, and that's > probably not what you have. > > Sarah > > On Wed, Jul 27, 2016 at 1:29 PM, sri vathsan <srivibish at gmail.com> wrote: > > Hi, > > > > Thanks for the solution. But I am afraid that after running this code > still > > it takes more time. It has been an hour and still it is executing. I > > understand the delay because each triplet has to compare almost 9000 > > elements. > > > > Regards, > > Sri > > > > On Wed, Jul 27, 2016 at 9:02 PM, Sarah Goslee <sarah.goslee at gmail.com> > > wrote: > >> > >> Hi, > >> > >> It's really a good idea to use dput() or some other reproducible way > >> to provide data. I had to guess as to what your data looked like. > >> > >> It appears that order doesn't matter? > >> > >> Given than, here's one approach: > >> > >> combs <- structure(list(V1 = c(65L, 77L, 55L, 23L, 34L), V2 = c(23L, > 34L, > >> 34L, 77L, 65L), V3 = c(77L, 65L, 23L, 34L, 55L)), .Names = c("V1", > >> "V2", "V3"), class = "data.frame", row.names = c(NA, -5L)) > >> > >> dat <- list( > >> c(77,65,34,23,55), > >> c(65,23,77,65,55,34), > >> c(77,34,65), > >> c(55,78,56), > >> c(98,23,77,65,34)) > >> > >> > >> sapply(seq_len(nrow(combs)), function(i)sum(sapply(dat, > >> function(j)all(combs[i,] %in% j)))) > >> > >> On a dataset of comparable time to yours, it takes me under a minute > and a > >> half. > >> > >> > combs <- combs[rep(1:nrow(combs), length=100), ] > >> > dat <- dat[rep(1:length(dat), length=10000)] > >> > > >> > dim(combs) > >> [1] 100 3 > >> > length(dat) > >> [1] 10000 > >> > > >> > system.time(test <- sapply(seq_len(nrow(combs)), > >> > function(i)sum(sapply(dat, function(j)all(combs[i,] %in% j))))) > >> user system elapsed > >> 86.380 0.006 86.391 > >> > >> > >> > >> > >> On Wed, Jul 27, 2016 at 10:47 AM, sri vathsan <srivibish at gmail.com> > wrote: > >> > Hi, > >> > > >> > Apologizes for the less information. > >> > > >> > Basically, myCombos is a matrix with 3 variables which is a triplet > that > >> > is > >> > a combination of 79 codes. There are around 3lakh combination as such > >> > and > >> > it looks like below. > >> > > >> > V1 V2 V3 > >> > 65 23 77 > >> > 77 34 65 > >> > 55 34 23 > >> > 23 77 34 > >> > 34 65 55 > >> > > >> > Each triplet will compare in a list (mylist) having 8177 elements > which > >> > will looks like below. > >> > > >> > 77,65,34,23,55 > >> > 65,23,77,65,55,34 > >> > 77,34,65 > >> > 55,78,56 > >> > 98,23,77,65,34 > >> > > >> > Now I want to count the no of occurrence of the triplet in the above > >> > list. > >> > I.e., the triplet 65 23 77 is seen 3 times in the list. So my output > >> > looks > >> > like below > >> > > >> > V1 V2 V3 Freq > >> > 65 23 77 3 > >> > 77 34 65 4 > >> > 55 34 23 2 > >> > > >> > I hope, I made it clear this time. > >> > > >> > > >> > On Wed, Jul 27, 2016 at 7:00 PM, Bert Gunter <bgunter.4567 at gmail.com> > >> > wrote: > >> > > >> >> Not entirely sure I understand, but match() is already vectorized, so > >> >> you > >> >> should be able to lose the supply(). This would speed things up a > lot. > >> >> Please re-read ?match *carefully* . > >> >> > >> >> Bert > >> >> > >> >> On Jul 27, 2016 6:15 AM, "sri vathsan" <srivibish at gmail.com> wrote: > >> >> > >> >> Hi, > >> >> > >> >> I created list of 3 combination numbers (mycombos, around 3 lakh > >> >> combinations) and counting the occurrence of those combination in > >> >> another > >> >> list. This comparision list (mylist) is having around 8000 records.I > am > >> >> using the following code. > >> >> > >> >> myCounts <- sapply(1:nrow(myCombos), FUN=function(i) { > >> >> sum(sapply(myList, function(j) { > >> >> sum(!is.na(match(c(myCombos[i,]), j)))})==3)}) > >> >> > >> >> The above code takes very long time to execute and is there any other > >> >> effecting method which will reduce the time. > >> >> -- > >> >> > >> >> Regards, > >> >> Srivathsan.K > >> >> > > > > > > > > >-- Regards, Srivathsan.K Phone : 9600165206 [[alternative HTML version deleted]]