Hi, I use the code below to aggregate / cnt my test data. It works fine, but the problem is with my real data (33'000 rows) where the function is really slow (nothing happened in half an hour). Does anybody know of other functions that I could use? Thanks, Hans-Peter -------------- dat <- data.frame( Datum = c( 32586, 32587, 32587, 32625, 32656, 32656, 32656, 32672, 32672, 32699 ), FischerID = c( 58395, 58395, 58395, 88434, 89953, 89953, 89953, 64395, 62896, 62870 ), Anzahl = c( 2, 2, 1, 1, 2, 1, 7, 1, 1, 2 ) ) f <- function(x) data.frame( Datum = x[1,1], FischerID = x[1,2], Anzahl = sum( x[,3] ), Cnt = dim( x )[1] ) t.a <- do.call("rbind", by(dat, dat[,1:2], f)) # slow for 33'000 rows t.a <- t.a[order( t.a[,1], t.a[,2] ),] # show data dat t.a
Convert dat to a matrix and see if working with the matrix instead of a data frame speeds things up enough. On 10/13/05, Hans-Peter <gchappi at gmail.com> wrote:> Hi, > > I use the code below to aggregate / cnt my test data. It works fine, > but the problem is with my real data (33'000 rows) where the function > is really slow (nothing happened in half an hour). > > Does anybody know of other functions that I could use? > > Thanks, > Hans-Peter > > -------------- > dat <- data.frame( Datum = c( 32586, 32587, 32587, 32625, 32656, > 32656, 32656, 32672, 32672, 32699 ), > FischerID = c( 58395, 58395, 58395, 88434, 89953, 89953, > 89953, 64395, 62896, 62870 ), > Anzahl = c( 2, 2, 1, 1, 2, 1, 7, 1, 1, 2 ) ) > f <- function(x) data.frame( Datum = x[1,1], FischerID = x[1,2], > Anzahl = sum( x[,3] ), Cnt = dim( x )[1] ) > t.a <- do.call("rbind", by(dat, dat[,1:2], f)) # slow for 33'000 rows > t.a <- t.a[order( t.a[,1], t.a[,2] ),] > > # show data > dat > t.a > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >
Gabor Grothendieck wrote:> Convert dat to a matrix and see if working with the > matrix instead of a data frame speeds things up > enough.In the Hmisc package the asNumericMatrix and matrix2dataFrame functions facilite this. Also look at the summarize and mApply functions in Hmisc, which can be quite fast. Frank Harrell
Hi, Yesterday, I have analysed data with 160000 rows and 10 columns. Aggregation would be impossible with a data frame format, but when converting it to a matrix with *numeric* entries (check, if the variables are of class numeric!) the computation needs only 7 seconds on a Pentium III. I??m sadly to say, that this is also slow in comparsion with the proc summary in SAS (less than one second), but the code is much more elegant in R! Best, Matthias> Hi, > > I use the code below to aggregate / cnt my test data. It > works fine, but the problem is with my real data (33'000 > rows) where the function is really slow (nothing happened in > half an hour). > > Does anybody know of other functions that I could use? > > Thanks, > Hans-Peter > > -------------- > dat <- data.frame( Datum = c( 32586, 32587, 32587, 32625, > 32656, 32656, 32656, 32672, 32672, 32699 ), > FischerID = c( 58395, 58395, 58395, 88434, > 89953, 89953, 89953, 64395, 62896, 62870 ), > Anzahl = c( 2, 2, 1, 1, 2, 1, 7, 1, 1, 2 ) ) > f <- function(x) data.frame( Datum = x[1,1], FischerID = > x[1,2], Anzahl = sum( x[,3] ), Cnt = dim( x )[1] ) > t.a <- do.call("rbind", by(dat, dat[,1:2], f)) # slow for > 33'000 rows > t.a <- t.a[order( t.a[,1], t.a[,2] ),] > > # show data > dat > t.a > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read > the posting guide! http://www.R-project.org/posting-guide.html >