For 10 million data points
table(interaction(vec_D, vec_C, vec_B, vec_A))
took my laptop 11.45 seconds and the following function required 0.18 seconds
f0 <- function (vec_A, vec_B, vec_C, vec_D)
{
x <- 1 + vec_A + 2 * (vec_B + 2 * (vec_C + 2 * vec_D))
tab <- tabulate(x, nbins = 16)
names(tab) <- do.call(paste0, rev(expand.grid(0:1, 0:1, 0:1,
0:1)))
tab
}
Aside from the order of the entries in the output tables, they gave the same
results.
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at
r-project.org] On Behalf
> Of Sridhar Iyer
> Sent: Saturday, June 01, 2013 2:57 PM
> To: r-help at r-project.org
> Subject: [R] Frequency count of Boolean pattern in 4 vectors.
>
> I need to do this on very large datasets ( > a few million data points).
So
> seeking help in figuring out an implementation of the task.
>
> Input 4 vectors which contain values as 0 or 1. (as integers, not boolean
> bits)
> vec_A = ( 0, 1, 0, 0, ...... 1, 0, 1, 0) etc
> vec_B = (0,0,1,1.....)
> vec_C, vec_D (similar to above)
> All four vectors are same length.
>
> I need to compute frequency count of the boolean literals for DCBA,
> DCBA
> 0000
> 0001
> 0010
> 0011
> ..
> ..
> 1111
>
> Questions:
> a) Is there a mechanism for combining the 4 vectors (in integer formats)
> into 4 bits of a new vector or some other
> type? (or treat them as boolean values true/false instead of 0 or 1
> integers).
> b) what is the most efficient mechanism for obtaining the frequency count
of
> each of the sixteen Boolean
> combinations?
>
> I need to do this frequently on large datasets. So am trying to get an
> efficient implementation (instead of
> a quick and dirty scheme). Thank you very very much in advance.
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.