Rory Campbell-Lange
2011-Jul-16 15:19 UTC
[R] construct boxplots from data with varying column widths
I'm an R beginner, and I would like to construct a set of boxplots showing database function runtimes. The data I have is currently is in the following format: function1,12.5,13.11,35.2,11.1.....n function2,21.5,42.22,17.3,14.2....................n ... this is the function name followed by somewhere between 1 and 10,000 runtimes for each function. The number of runtimes is in milliseconds. I can easily reformat the base data to provide it to R in a format such as: function1,12.5 function1,13.11 function1,35.2 ... There are about 120 individual functions. I wish to show the top 20 functions by average runtime (ideally sorted by average runtime descending). Using a boxplot will help show the variation in runtime for each function. I don't know how to read this data into R so that I can construct the boxplots. I'd be also grateful for advice on how to filter the output of the boxplot to show only the top 20. Rory
David Winsemius
2011-Jul-16 15:47 UTC
[R] construct boxplots from data with varying column widths
On Jul 16, 2011, at 11:19 AM, Rory Campbell-Lange wrote:> I'm an R beginner, and I would like to construct a set of boxplots > showing database function runtimes. > > The data I have is currently is in the following format: > > function1,12.5,13.11,35.2,11.1.....n > function2,21.5,42.22,17.3,14.2....................n > ... > > this is the function name followed by somewhere between 1 and 10,000 > runtimes for each function. The number of runtimes is in milliseconds. > > I can easily reformat the base data to provide it to R in a format > such > as: > > function1,12.5 > function1,13.11 > function1,35.2 > ...That is definitely to be preferred. Read that into R and show us the results of str on your R data object.> > There are about 120 individual functions. I wish to show the top 20 > functions by average runtime (ideally sorted by average runtime > descending). Using a boxplot will help show the variation in runtime > for > each function. > > I don't know how to read this data into R so that I can construct the > boxplots. I'd be also grateful for advice on how to filter the > output of > the boxplot to show only the top 20.Oh. That is material covered in introductory texts, of which there are many, in the contributed documentation at the CRAN website. There is also an Import/Export Manual. After it's in an R workspace, you may want to look at the ave or aggregate functions to compute mean runtime by group. Rhelp is not set up as a tutorial service. The format laid out in the Posting Guide is User(reads pages and pages of documentation), User(makes effort, encounters difficulty with R code), User(constructs detailed posting with code , data and verbatim error messages). -- David Winsemius, MD West Hartford, CT