I'm relatively new to R, and have looked through the documentation and FAQ, and have not been able to find out how I can accomplish something. If someone can point me in the right direction, I would greatly appreciate it. I am doing system performance tests, and end up with large volumes of data that consists of transaction timings, coupled with system timings. The analysis process is iterative, and I've been using Excel (too many limitations, but the pivot tables have been extremely useful) to do some of the drilldown processing, such as identifyng bottlenecks. To generalize the problem with an example, I have a series of data for which I would like to be able to produce a box plot. However, much of the data varies from run to run or from system to system. For example, the UNIX sar utility can produce a snapshot of disk activity for each disk in the system. Each snapshot lists a number of statistics for each disk, and after some cleanup with some utilities, you end up with something like: Time Device Busy Queue AvServ 10:00:00 d1 0.0 0.0 8.3 10:00:00 d2 35.5 5.6 37.8 10:00:00 d3 10.5 0.8 16.0 10:00:30 d1 0.8 0.0 10.2 10:00:30 d2 42.1 5.9 42.5 10:00:30 d3 3.2 0.1 12.0 ........ Each set of statistics for each disk (d1-d3) are repeated for each time snapshot. I'd like to be able to have a boxplot where I get the any of the statistics for each disk. Such that, I can have a box plot of the percent busy for each disk, or the average service time for each disk, etc. The basic problem I am having is how can I do this in an automated fashion, without knowing the names of the disks. I built a data.frame using read.csv (is this even the correct terminology), and tried using unique() to identify the names of the disks, but then I got all caught up in trying to build vectors of data for each disk on the specified column. And even then, if I did accomplish this, I couldn't figure out how to pas a variable number of vectors to boxplot. If someone can point me in the right direction, I can apply the concepts to other tasks I would like to accomplish. Thank you -------------- next part -------------- An HTML attachment was scrubbed... URL: https://stat.ethz.ch/pipermail/r-help/attachments/20010429/d936b5c9/attachment.html
Prof Brian D Ripley
2001-Apr-29 15:44 UTC
[R] Using R for processing computer performance data
On Sun, 29 Apr 2001, Peter Gallanis wrote:> I'm relatively new to R, and have looked through the documentation and FAQ, and have not been able to find out how I can accomplish something. If someone can point me in the right direction, I would greatly appreciate it. > > I am doing system performance tests, and end up with large volumes of data that consists of transaction timings, coupled with system timings. The analysis process is iterative, and I've been using Excel (too many limitations, but the pivot tables have been extremely useful) to do some of the drilldown processing, such as identifyng bottlenecks. > > To generalize the problem with an example, I have a series of data for which I would like to be able to produce a box plot. However, much of the data varies from run to run or from system to system. For example, the UNIX sar utility can produce a snapshot of disk activity for each disk in the system. Each snapshot lists a number of statistics for each disk, and after some cleanup with some utilities, you end up with something like: > > Time Device Busy Queue AvServ > 10:00:00 d1 0.0 0.0 8.3 > 10:00:00 d2 35.5 5.6 37.8 > 10:00:00 d3 10.5 0.8 16.0 > 10:00:30 d1 0.8 0.0 10.2 > 10:00:30 d2 42.1 5.9 42.5 > 10:00:30 d3 3.2 0.1 12.0 > ........ > > Each set of statistics for each disk (d1-d3) are repeated for each time snapshot. I'd like to be able to have a boxplot where I get the any of the statistics for each disk. Such that, I can have a box plot of the percent busy for each disk, or the average service time for each disk, etc. The basic problem I am having is how can I do this in an automated fashion, without knowing the names of the disks. I built a data.frame using read.csv (is this even the correct terminology), and tried using unique() to identify the names of the disks, but then I got all caught up in trying to build vectors of data for each disk on the specified column. And even then, if I did accomplish this, I couldn't figure out how to pas a variable number of vectors to boxplot. > > If someone can point me in the right direction, I can apply the concepts to other tasks I would like to accomplish.Function by(), or more generally tapply. The disk names will be the levels of the factor `Device'. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UK Fax: +44 1865 272595 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._