David Winsemius
2011-Jul-16 17:27 UTC
[R] Fwd: construct boxplots from data with varying column widths
From: David Winsemius <dwinsemius at comcast.net> On Jul 16, 2011, at 12:15 PM, Rory Campbell-Lange wrote:> On 16/07/11, David Winsemius (dwinsemius at comcast.net) wrote: >> >> On Jul 16, 2011, at 11:19 AM, Rory Campbell-Lange wrote: >> >>> I'm an R beginner, and I would like to construct a set of boxplots >>> showing database function runtimes. > >>> I can easily reformat the base data to provide it to R in a format >>> such as: >>> >>> function1,12.5 >>> function1,13.11 >>> function1,35.2 >>> ... > >> That is definitely to be preferred. Read that into R and show us the >> results of str on your R data object. > > Thanks for your suggestion. > >> str(data2) > 'data.frame': 1940170 obs. of 2 variables: > $ function.: Factor w/ 127 levels "fn_activities01_list",..: 102 > 102 102 102 102 102 102 102 102 102 ... > $ runtime : num 38.1 32.4 41.2 92.9 130.5 .. > >> head(data2) > function. runtime > 1 fn_slot03_byperson 38.083 > 2 fn_slot03_byperson 32.396 > 3 fn_slot03_byperson 41.246 > 4 fn_slot03_byperson 92.904 > 5 fn_slot03_byperson 130.512 > 6 fn_slot03_byperson 113.853 > > tmp <- data2[data2$dbfunc=='fn_slot03_byperson',] >> length(tmp$runtime) > [1] 24004 >> ave(tmp$runtime)[1] > [1] 41.8108I would have guessed you would get an error, but maybe if ave() is given no grouping factor it just returns a grand mean. Try instead one of these: aggregate(data2, data2$function. , FUN=mean) tapply(data2$runtime, data2$function. , FUN=mean) data2$grpmean <- ave( data2$runtime, data2$function. , FUN=mean) The last one adds a column in the dataframe and could be useful for identifying items that are some particular diastance away from thier group mean. -- David Winsemius, MD West Hartford, CT David Winsemius, MD West Hartford, CT
Rory Campbell-Lange
2011-Jul-17 04:47 UTC
[R] construct boxplots from data with varying column widths
On 16/07/11, David Winsemius (dwinsemius at comcast.net) wrote:> From: David Winsemius <dwinsemius at comcast.net> > On Jul 16, 2011, at 12:15 PM, Rory Campbell-Lange wrote: > >On 16/07/11, David Winsemius (dwinsemius at comcast.net) wrote: > >>On Jul 16, 2011, at 11:19 AM, Rory Campbell-Lange wrote: > >> > >>>I'm an R beginner, and I would like to construct a set of boxplots > >>>showing database function runtimes. > > > >>>I can easily reformat the base data to provide it to R in a format > >>>such as: > >>> > >>>function1,12.5 > >>>function1,13.11 > >>>function1,35.2 > > > I would have guessed you would get an error, but maybe if ave() is > given no grouping factor it just returns a grand mean.You are correct, and my apologies for cross posting this question here but also on stackoverflow.> Try instead one of these: > > aggregate(data2, data2$function. , FUN=mean) > > tapply(data2$runtime, data2$function. , FUN=mean)The two above error because of 'by' > aggregate(data2, data2$dbfunc , FUN=mean) Error in aggregate.data.frame(data2, data2$dbfunc, FUN = mean) : 'by' must be a list I tried to construct a list of names for the 'by' clause and tried again: > funcnames <- levels(data2$dbfunc) aggregate(data2, funcnames , FUN=mean) but that causes the same error.> data2$grpmean <- ave( data2$runtime, data2$function. , FUN=mean) > > The last one adds a column in the dataframe and could be useful for > identifying items that are some particular diastance away from thier > group mean.I failed initially to see the purpose of adding the grpmean column. However, I think I now 'get it' -- it allows one to filter. a. build data frame dbfunc runtime 1 fn_slot03_byperson 38.083 2 fn_slot03_byperson 32.396 3 fn_slot03_byperson 41.246 4 fn_slot03_byperson 92.904 5 fn_slot03_byperson 130.512 6 fn_slot03_byperson 113.853 b. add groupmean data2$grpmean <- ave(data2$runtime, data2$dbfunc. , FUN=mean) dbfunc runtime grpmean 1 fn_slot03_byperson 38.083 41.8108 2 fn_slot03_byperson 32.396 41.8108 3 fn_slot03_byperson 41.246 41.8108 4 fn_slot03_byperson 92.904 41.8108 5 fn_slot03_byperson 130.512 41.8108 6 fn_slot03_byperson 113.853 41.8108 c. filter by grpmean where grpmean over 150 ms data3 <- data2[data2$grpmean > 150,] d. attempt to plot boxplot(runtime ~ dbfunc, data3) this produces a set of circles for each function, rather that the box and whisker plot I'm expecting. I'm not sure how to 'fold' the results to get the equivalent of an SQL 'group by' in the results. Thanks very much for your help, and my apologies for the cross-posting on stackoverflow (http://stackoverflow.com/questions/6720036/r-summarise-data-frame-with-repeating-rows-into-boxplots) Rory