Hi,
On Wed, Jan 11, 2012 at 9:57 AM, Dimitri Liakhovitski
<dimitri.liakhovitski at gmail.com> wrote:> Dear R-ers,
>
> I have a loop below that loops through my numeric variables in data
> frame x and through levels of the factor "group" and multiplies
(group
> by group) the values of numeric variables in x by the corresponding
> group-specific values from data frame y. In reality, my:
> dim(x) is 300,000 rows by 100 variables, and
> dim(y) is 120 levels of "group" by 100 variables.
> So, my huge data frame x takes up a lot of space in memory. This is
> why I am actually replacing values of "a" and "b" in x
with newly
> calculated values, rather than adding them.
> The code does what I need, but it takes forever.
>
> Is there maybe a more speedy way to achieve what I need?
> Thanks a lot!
Here's an all-middle-steps included way to do so using data.table. If
you use more data.table-centric idioms (using `:=` operator and other
ways to `merge`) you can likely eek out less memory and higher speed,
but I'll leave it like so for pedagogical purposes ;-)
===library(data.table)
## your data
xx <-
data.table(group=c(rep("group1",5),rep("group2",5)),
a=1:10, b=seq(10,100,by=10), key="group")
yy <- data.table(group=c("group1","group2"), a=c(10,20),
b=c(2,3),
key="group")
## temp data.table to get your ducks in a row
m <- merge(xx, yy, by="group", suffixes=c(".x",
".y"))
## your answers will be in the aa and bb columns
result <- transform(m, aa=a.x * a.y, bb=b.x * b.y)
===
Truth be told, if you use normal data.frames, the code will look very
similar to above, so you can try that, too.
HTH,
-steve
--
Steve Lianoglou
Graduate Student: Computational Systems Biology
?| Memorial Sloan-Kettering Cancer Center
?| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact