David Wolfskill
2011-Feb-22 19:44 UTC
[R] Advice/suggestions on using a stacked barplot with "zoo"?
I'm not sure I'm using appropriate tools for what I'm trying to do; a reason for my lack of certainty is that I'm encountering difficulty with what I thought would be fairly simple: labeling the X axis of my graphs with human-readable timestamps. I'm using R 2.12.1, running in a FreeBSD 8.2-PRERELEASE r218945 environment. As I mentioned in a previous note, I working with data collected via sampling over a period of time. Each sample has a (unique) POSIXct timestamp, and a couple hundred data. By the time I feed them to R, they're in a format suitable for read.table(..., header = TRUE). As each sample has an associated unique timestamp, and I'm investigating relationships of the data over time (intending to identify correlations between trends and with externally-observed behavior)), and it may happen that a sample gets "missed," I'm using library(zoo). [Thank you, Achim Zeileis, Gabor Grothendieck, and Ajay Shah!] The samples may occur as frequently as once per second; unfortunately, that's the current upper limit of precision of the sampling technique I'm using. On the other hand, storing the timestamps as POSIXct is both convenient and natural for this situation. The bulk of the (other) data are numeric (though not quite all). Many of the numeric data are "counters" (that is, the "interesting bit" is the difference in value from one sample to another, vs. the magnitude of the value itself); others are (in RRDtool parlance) "gauges" -- their value in a given sample represents the state of the corresponding environment, quite apart from any coirresponding values from other samples. One set of data that serve to illustrate what I'm doing is counters for CPU state. The way this works is that the system receives a periodic interrupt and uses the occasion of servicing this "statclock" interrupt to sample the CPU state and increment a corresponding counter. At any given moment, the CPU is considered to be in one of 5 states: * user * nice * system * interrupt * idle There is a separate counter for each of these 5 states; at boot time, each counter starts at 0, and at each statclock interrupt, precisely one of these counters will be incremented (by one). Given samples of each of these counters at each end of a suitable time interval, it is then possible to determine the (mean) breakdown of CPU usage during that interval by: * Determining the corresponding differences for each of the above states. * Determine the total "statclock ticks" during the interval (by summing the differences). * Determine the proportion of time spent in each state by dividing the difference for each state by the total (and multipyling by 100%, to get a percentage, if that's desired -- and in my case, it is). After using the data to populate a zoo, I generate another zoo of the lagged differences (via diff()). I then graph the result using a stacked barplot(). In my case, I choose to first graph the "system" time, then "interrupt", then "user", then "nice". I explicitly do not graph the "idle" time; thus, the top of the graph represents total %CPU busy. In doing this, I (belatedly) encountered prop.table(), which seems appropriate for calculating the required totals -- but I seem to need to feed it a matrix, rather than a zoo, then convert the result back to a zoo. That's a level of awkwardness that seems suspect to me. (Before I encountered prop.table(), I had cobbled up a function that seemed to work, at least as much as I tested it (not much!); its use did manage to avoid the coercion to matrux & back to zoo.) Here are some samples:> cpu_states[1] "user" "nice" "sys" "intr" "idle"> plot_states[1] "sys" "intr" "user" "nice"> CPU[1:10]user nice sys intr idle 1298333405 28722903 25098 4900282 1809059 2811144985 1298333415 28722906 25098 4900289 1809059 2811155661 1298333425 28723842 25098 4900478 1809068 2811165209 1298333435 28726921 25098 4901270 1809077 2811172012 1298333445 28730078 25098 4902053 1809086 2811178746 1298333455 28732541 25098 4902176 1809099 2811186833 1298333465 28735600 25098 4902473 1809105 2811194152 1298333475 28737791 25098 4902654 1809105 2811202463 1298333485 28740727 25098 4902855 1809108 2811210006 1298333495 28741826 25098 4903096 1809109 2811219348> CPUd <- diff(CPU) > CPUd[1:10]user nice sys intr idle 1298333415 3 0 7 0 10676 1298333425 936 0 189 9 9548 1298333435 3079 0 792 9 6803 1298333445 3157 0 783 9 6734 1298333455 2463 0 123 13 8087 1298333465 3059 0 297 6 7319 1298333475 2191 0 181 0 8311 1298333485 2936 0 201 3 7543 1298333495 1099 0 241 1 9342 1298333505 4401 0 390 4 5889> prop.table(CPUd[1:10], 1)Error in dn[[2L]] : subscript out of bounds> prop.table(as.matrix(CPUd[1:10]), 1)user nice sys intr idle 2 0.0002807412 0 0.0006550627 0.000000e+00 0.9990642 3 0.0876240404 0 0.0176933159 8.425389e-04 0.8938401 4 0.2882149209 0 0.0741364785 8.424600e-04 0.6368061 5 0.2955162408 0 0.0732940185 8.424600e-04 0.6303473 6 0.2304884896 0 0.0115103874 1.216545e-03 0.7567846 7 0.2863964048 0 0.0278063852 5.617452e-04 0.6852355 8 0.2050922026 0 0.0169428063 0.000000e+00 0.7779650 9 0.2748291678 0 0.0188149396 2.808200e-04 0.7060751 10 0.1028737246 0 0.0225592062 9.360666e-05 0.8744735 11 0.4119243729 0 0.0365031823 3.743916e-04 0.5511981> prop.table(as.matrix(CPUd[1:10]), 1)*100user nice sys intr idle 2 0.02807412 0 0.06550627 0.000000000 99.90642 3 8.76240404 0 1.76933159 0.084253885 89.38401 4 28.82149209 0 7.41364785 0.084245998 63.68061 5 29.55162408 0 7.32940185 0.084245998 63.03473 6 23.04884896 0 1.15103874 0.121654501 75.67846 7 28.63964048 0 2.78063852 0.056174515 68.52355 8 20.50922026 0 1.69428063 0.000000000 77.79650 9 27.48291678 0 1.88149396 0.028081999 70.60751 10 10.28737246 0 2.25592062 0.009360666 87.44735 11 41.19243729 0 3.65031823 0.037439161 55.11981> zoo(prop.table(as.matrix(CPUd[1:10]), 1), index(CPUd))[1:10]*100user nice sys intr idle 1298333415 0.02807412 0 0.06550627 0.000000000 99.90642 1298333425 8.76240404 0 1.76933159 0.084253885 89.38401 1298333435 28.82149209 0 7.41364785 0.084245998 63.68061 1298333445 29.55162408 0 7.32940185 0.084245998 63.03473 1298333455 23.04884896 0 1.15103874 0.121654501 75.67846 1298333465 28.63964048 0 2.78063852 0.056174515 68.52355 1298333475 20.50922026 0 1.69428063 0.000000000 77.79650 1298333485 27.48291678 0 1.88149396 0.028081999 70.60751 1298333495 10.28737246 0 2.25592062 0.009360666 87.44735 1298333505 41.19243729 0 3.65031823 0.037439161 55.11981> barplot(zoo(prop.table(as.matrix(CPUd[, c(plot_states, "idle")]), 1)*100, index(CPUd))[, 1:4], border = NA, col = plot_colors, ylim = c(0, 100), space = 0, legend.text = plot_states, args.legend = c(x = "topleft", title = "CPU states"), ylab = "%CPU", xlab = "Time (seconds)", main = "CPU utilization during FreeBSD \"make -j12 buildworld\"")And the resulting graph is mostly OK, but the X axis appears to be a sequence of POSIXcts. I've tried various forms of evasive maneuvers (mostly, specifying "..., axisnames = FALSE" in the barplot() invocation, followed by invocations involving axis() subsequently)... but I seem to be unable to get either human-readable time-of-day representations ("%T") or a sequence showing the elapsed time from the origin (either in seconds or minutes, for example). So I'd appreciate suggestions for: * Simplifying the approach -- I have difficulty believing that coercing a zoo to a matrix and back again is likely to be sensible for something as simple as what I'm trying to do. * Making the X axis a bit more human-friendly. Thanks! Peace, david -- David H. Wolfskill david at catwhisker.org Depriving a girl or boy of an opportunity for education is evil. See http://www.catwhisker.org/~david/publickey.gpg for my public key. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 196 bytes Desc: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110222/8d6a2672/attachment.bin>