Try aggregate. It takes only 8 seconds for the 800000 rows in the
example below.
m <- as.data.frame(matrix(rnorm(16000000), ncol = 20))
m$ID <- rbinom(nrow(m), 10, prob = 0.5)
system.time(
aggregate(m[, 5:20], list(ID = m$ID), sd)
)
user system elapsed
6.14 1.37 7.55 >
HTH,
Thierry
------------------------------------------------------------------------
----
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature
and Forest
Cel biometrie, methodologie en kwaliteitszorg / Section biometrics,
methodology and quality assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium
tel. + 32 54/436 185
Thierry.Onkelinx op inbo.be
www.inbo.be
To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to
say what the experiment died of.
~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data.
~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of
data.
~ John Tukey
-----Oorspronkelijk bericht-----
Van: r-help-bounces op r-project.org [mailto:r-help-bounces op r-project.org]
Namens Daren Tan
Verzonden: maandag 7 juli 2008 11:18
Aan: r-help op stat.math.ethz.ch
Onderwerp: [R] How can I optimize this piece of code ?
Currently it needs 50+ mins to run on a 800000 rows. I need to run it
hundreds of times :P
t(apply(unique_ids, 1, function(x) { sd(subset(m[, 5:20], m[,"ID"]
=x)) } ))
_________________________________________________________________
[[alternative HTML version deleted]]
______________________________________________
R-help op r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.