I have a data frame with two columns, a factor and a numeric. I want to create data frame with the factor, its frequency and the median of the numeric column> head(motifList)events score 1 aeijm -0.25000000 2 begjm -0.25000000 3 afgjm -0.25000000 4 afhjm -0.25000000 5 aeijm -0.25000000 6 aehjm 0.08333333 To get the frequency table of events:> motifTable <- as.data.frame(table(motifList$events)) > head(motifTable)Var1 Freq 1 aeijm 110 2 begjm 46 3 afgjm 337 4 afhjm 102 5 aehjm 190 6 adijm 18>Now get the score column back in.> motifTable2 <- merge(motifList, motifTable, by="events") > head(motifTable2)events percent freq 1 adgjm 0.00000000 111 2 adgjm NA 111 3 adgjm 0.13333333 111 4 adgjm 0.06666667 111 5 adgjm -0.16666667 111 6 adgjm NA 111>Then lastly to aggregate on the events column getting the median of the score> motifTable3 <- aggregate.data.frame(motifTable2, by=list(motifTable2$events), FUN=median, na.rm=TRUE)Error in median.default(X[[1L]], ...) : need numeric data Which gives the error as events are a factor. Can someone enlighten me to a more obvious approach? dhs [[alternative HTML version deleted]]
On Jan 30, 2010, at 4:09 PM, david hilton shanabrook wrote:> I have a data frame with two columns, a factor and a numeric. I > want to create data frame with the factor, its frequency and the > median of the numeric column >> head(motifList) > events score > 1 aeijm -0.25000000 > 2 begjm -0.25000000 > 3 afgjm -0.25000000 > 4 afhjm -0.25000000 > 5 aeijm -0.25000000 > 6 aehjm 0.08333333 > > To get the frequency table of events: > >> motifTable <- as.data.frame(table(motifList$events)) >> head(motifTable) > Var1 Freq > 1 aeijm 110 > 2 begjm 46 > 3 afgjm 337 > 4 afhjm 102 > 5 aehjm 190 > 6 adijm 18 >> > > Now get the score column back in. > >> motifTable2 <- merge(motifList, motifTable, by="events") >> head(motifTable2) > events percent freq > 1 adgjm 0.00000000 111 > 2 adgjm NA 111 > 3 adgjm 0.13333333 111 > 4 adgjm 0.06666667 111 > 5 adgjm -0.16666667 111 > 6 adgjm NA 111 >> > > Then lastly to aggregate on the events column getting the median of > the score >> motifTable3 <- aggregate.data.frame(motifTable2, >> by=list(motifTable2$events), FUN=median, na.rm=TRUE) > Error in median.default(X[[1L]], ...) : need numeric data > > Which gives the error as events are a factor. Can someone enlighten > me to a more obvious approach?I don't think grouping on a factor is the source of your error. You have NA's in your data and median will choke on those unless you specify na.rm=TRUE. -- David Winsemius, MD Heritage Laboratories West Hartford, CT
Hi: You could complete the entire process in one shot with the plyr package, using function ddply. Using the piece of data supplied,> ddply(motifList, .(events), summarize, freq = length(events), score median(score))events freq score 1 aehjm 1 0.08333333 2 aeijm 2 -0.25000000 3 afgjm 1 -0.25000000 4 afhjm 1 -0.25000000 5 begjm 1 -0.25000000 HTH, Dennis On Sat, Jan 30, 2010 at 1:09 PM, david hilton shanabrook < dhshanab@acad.umass.edu> wrote:> I have a data frame with two columns, a factor and a numeric. I want to > create data frame with the factor, its frequency and the median of the > numeric column > > head(motifList) > events score > 1 aeijm -0.25000000 > 2 begjm -0.25000000 > 3 afgjm -0.25000000 > 4 afhjm -0.25000000 > 5 aeijm -0.25000000 > 6 aehjm 0.08333333 > > To get the frequency table of events: > > > motifTable <- as.data.frame(table(motifList$events)) > > head(motifTable) > Var1 Freq > 1 aeijm 110 > 2 begjm 46 > 3 afgjm 337 > 4 afhjm 102 > 5 aehjm 190 > 6 adijm 18 > > > > Now get the score column back in. > > > motifTable2 <- merge(motifList, motifTable, by="events") > > head(motifTable2) > events percent freq > 1 adgjm 0.00000000 111 > 2 adgjm NA 111 > 3 adgjm 0.13333333 111 > 4 adgjm 0.06666667 111 > 5 adgjm -0.16666667 111 > 6 adgjm NA 111 > > > > Then lastly to aggregate on the events column getting the median of the > score > > motifTable3 <- aggregate.data.frame(motifTable2, > by=list(motifTable2$events), FUN=median, na.rm=TRUE) > Error in median.default(X[[1L]], ...) : need numeric data > > Which gives the error as events are a factor. Can someone enlighten me to > a more obvious approach? > > dhs > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]