Displaying 1 result from an estimated 1 matches for "randomdf".
2009 Nov 19
1
Performance of 'by' and 'ddply' on a large data frame
...sing R. One of the problems I come up
against is after having extracted a large dataset (>5M rows) out of
database, I realize I need another variable. In this case I have data
frame with dates. I want to find the minimum date for each value of x1
and add that minimum date to my data.frame.
> randomdf <- function(p) {
data.frame(x1=sample(1:10^4, 10^p, replace=T),
x2=sample(seq.Date(Sys.Date() - 356*3,Sys.Date(), by="day"), 10^p, replace=T),
y1=sample(1:100, 10^p, replace=T))
}
> testby <- function(p) {
df <- randomdf(p)
system.time(by(df, df$x1, function(dfi) { min(dfi$x2)...