I have a (very big - 1.5 rows) dataframe with a (POSIXt" "POSIXlt") column h (hour). Surprisingly, I cannot calculate a simple aggregate over the dataframe.> n.h1 = sqldf("select distinct h, count(*) from x group by h")Error in sqliteExecStatement(con, statement, bind.data) : RS-DBI driver: (error in statement: no such table: x) In addition: Warning message: In value[[3L]](cond) : RAW() can only be applied to a 'raw', not a 'double'> n.h2 = aggregate(x$price, by = x$h, FUN = nrow)Error in names(y) <- c(names(by), names(x)) : 'names' attribute [10] must be the same length as the vector [2] Arrgh... -- View this message in context: http://r.789695.n4.nabble.com/Failure-to-aggregate-tp2528613p2528613.html Sent from the R help mailing list archive at Nabble.com.
On Sep 6, 2010, at 12:15 PM, Dimitri Shvorob wrote:> > I have a (very big - 1.5 rows) dataframe with a (POSIXt" "POSIXlt") > column h > (hour). Surprisingly, I cannot calculate a simple aggregate over the > dataframe. > >> n.h1 = sqldf("select distinct h, count(*) from x group by h") > Error in sqliteExecStatement(con, statement, bind.data) : > RS-DBI driver: (error in statement: no such table: x) > In addition: Warning message: > In value[[3L]](cond) : RAW() can only be applied to a 'raw', not a > 'double' > >> n.h2 = aggregate(x$price, by = x$h, FUN = nrow)A vector argument (x$price) would only have one row (at most). nrow(c(1,2) NULL> Error in names(y) <- c(names(by), names(x)) : > 'names' attribute [10] must be the same length as the vector [2]Try: tapply(x$price, by = x$h, FUN = length) -- David.
On Mon, Sep 6, 2010 at 12:15 PM, Dimitri Shvorob <dimitri.shvorob at gmail.com> wrote:> > I have a (very big - 1.5 rows) dataframe with a (POSIXt" ?"POSIXlt") column h > (hour). Surprisingly, I cannot calculate a simple aggregate over the > dataframe. > >> n.h1 = sqldf("select distinct h, count(*) from x group by h") > Error in sqliteExecStatement(con, statement, bind.data) : > ?RS-DBI driver: (error in statement: no such table: x) > In addition: Warning message: > In value[[3L]](cond) : RAW() can only be applied to a 'raw', not a 'double' > >> n.h2 = aggregate(x$price, by = x$h, FUN = nrow) > Error in names(y) <- c(names(by), names(x)) : > ?'names' attribute [10] must be the same length as the vector [2]Since you are using group by you don't want "distinct". In aggregate use x["price"] and x["h"] rather than x$price and x$h or use a formula. Also use nrow or length in place of NROW. library(sqldf) x <- data.frame(price = 1:4, h = c(1, 1, 2, 3)) sqldf("select h, count(*) from x group by h") aggregate(x["price"], by = x["h"], FUN = NROW) aggregate(x["price"], by = x["h"], FUN = length) aggregate(price ~ h, x, FUN = length) -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
On Wed, Sep 8, 2010 at 4:48 AM, Dimitri Shvorob <dimitri.shvorob at gmail.com> wrote:> > I was able to aggregate (with sqldf, at least), after saving and re-loading > the dataframe. My first guess was that h (and/or price?) now being a factor > - stringsAsFactors = T by default - made the difference, and I tried to > convert x$h to factor, but received an error.Please provide enough of x to reproduce your problem. e.g. x <- head(x) dput(x) # repeat code and ensure it still shows problem -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com