search for: randomdf

Displaying 1 result from an estimated 1 matches for "randomdf".

2009 Nov 19
1
Performance of 'by' and 'ddply' on a large data frame
...sing R. One of the problems I come up against is after having extracted a large dataset (>5M rows) out of database, I realize I need another variable. In this case I have data frame with dates. I want to find the minimum date for each value of x1 and add that minimum date to my data.frame. > randomdf <- function(p) { data.frame(x1=sample(1:10^4, 10^p, replace=T), x2=sample(seq.Date(Sys.Date() - 356*3,Sys.Date(), by="day"), 10^p, replace=T), y1=sample(1:100, 10^p, replace=T)) } > testby <- function(p) { df <- randomdf(p) system.time(by(df, df$x1, function(dfi) { min(dfi$x2)...