David Wolfskill
2015-Mar-31 17:22 UTC
[R] data.frame: data-driven column selections that vary by row??
On Tue, Mar 31, 2015 at 07:11:28AM -0800, John Kane wrote:> I think we need some data and code > Reproducibility > https://github.com/hadley/devtools/wiki/Reproducibility > http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example > ....I apologize for failing to provide that. Here is a quite small subset of the data (with a few edits to reduce excess verbosity in names of things) that still illustrates the challenge I perceive:> dput(bw)structure(list(timestamp = c(1426892400L, 1426892400L, 1426892400L, 1426892400L, 1426892400L, 1426892400L, 1426892460L, 1426892460L, 1426892460L, 1426892460L, 1426892460L, 1426892460L, 1426892520L, 1426892520L, 1426892520L, 1426892520L, 1426892520L, 1426892520L ), hostname = c("c001", "c002", "c021", "c022", "c041", "c051", "c001", "c002", "c021", "c022", "c041", "c051", "c001", "c002", "c021", "c022", "c041", "c051"), health = c(0.0549374999999983, 0.250585416666667, 1, 1, 0.577784167075767, 0.546805261621527, 0.1599375, 0.24954375, 1, 1, 0.582307554123614, 0.558298168996525, 0.2813125, 0.270877083333333, 1, 1, 0.579231349457365, 0.542973020177151 ), hw = c(1.9, 1.9, 1.4, 1.4, 1.5, 1.5, 1.9, 1.9, 1.4, 1.4, 1.5, 1.5, 1.9, 1.9, 1.4, 1.4, 1.5, 1.5), fw = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L ), .Label = "2015Q1.2", class = "factor"), role = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L), .Label = c("control", "test"), class = "factor"), type = structure(c(3L, 3L, 1L, 1L, 2L, 2L, 3L, 3L, 1L, 1L, 2L, 2L, 3L, 3L, 1L, 1L, 2L, 2L), .Label = c("D", "F", "H"), class = "factor"), da20_busy_pct = c(79.1, 62.8, NA, NA, NA, NA, 75, 64.8, NA, NA, NA, NA, 72.2, 74.5, NA, NA, NA, NA), da20_dev_type = structure(c(2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L), .Label = c("", "hdd"), class = "factor"), da20_kb_per_xfer_read = c(727.23, 665.81, NA, NA, NA, NA, 737.04, 691.38, NA, NA, NA, NA, 721.71, 668.96, NA, NA, NA, NA), da20_kb_per_xfer_write = c(0, 0, NA, NA, NA, NA, 0, 0, NA, NA, NA, NA, 0, 0, NA, NA, NA, NA), da20_mb_per_sec_read = c(39.77, 31.21, NA, NA, NA, NA, 36.71, 32.41, NA, NA, NA, NA, 35.94, 37.24, NA, NA, NA, NA), da20_mb_per_sec_write = c(0, 0, NA, NA, NA, NA, 0, 0, NA, NA, NA, NA, 0, 0, NA, NA, NA, NA), da20_ms_per_xactn_read = c(43.5, 31.6, NA, NA, NA, NA, 35.7, 30.2, NA, NA, NA, NA, 32.7, 34.6, NA, NA, NA, NA), da20_ms_per_xactn_write = c(0, 0, NA, NA, NA, NA, 0, 0, NA, NA, NA, NA, 0, 0, NA, NA, NA, NA), da20_Q_length = c(0, 0, NA, NA, NA, NA, 2, 0, NA, NA, NA, NA, 1, 1, NA, NA, NA, NA ), da20_xfers_per_sec_other = c(0, 0, NA, NA, NA, NA, 0, 0, NA, NA, NA, NA, 0, 0, NA, NA, NA, NA), da20_xfers_per_sec_read = c(56, 48, NA, NA, NA, NA, 51, 48, NA, NA, NA, NA, 51, 57, NA, NA, NA, NA), da20_xfers_per_sec_write = c(0, 0, NA, NA, NA, NA, 0, 0, NA, NA, NA, NA, 0, 0, NA, NA, NA, NA), da2_busy_pct = c(84.5, 81.8, 29.5, 26.7, 55.5, 50.9, 80.6, 79.7, 29.2, 27.3, 58.8, 50.2, 74.6, 79.3, 29.4, 26.6, 55.4, 50.1), da2_dev_type = structure(c(2L, 2L, 3L, 3L, 3L, 3L, 2L, 2L, 3L, 3L, 3L, 3L, 2L, 2L, 3L, 3L, 3L, 3L), .Label = c("", "hdd", "ssd"), class = "factor"), da2_kb_per_xfer_read = c(690.67, 686.63, 613.78, 587, 571.64, 553.27, 692.26, 660.05, 612.01, 594.28, 560.16, 566.41, 672.68, 670.25, 604.64, 592.16, 565.02, 564.43), da2_kb_per_xfer_write = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), da2_mb_per_sec_read = c(44.52, 41.57, 134.26, 120.38, 252.88, 229.09, 41.24, 39.96, 132.68, 123.61, 268.04, 227.34, 37.44, 39.93, 133.45, 120.28, 251.06, 225.99), da2_mb_per_sec_write = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), da2_ms_per_xactn_read = c(49.1, 47.8, 2, 1.8, 2.6, 2.4, 40.3, 43.9, 2, 1.8, 2.8, 2.4, 37.1, 40.9, 1.9, 1.8, 2.6, 2.4), da2_ms_per_xactn_write = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), da2_Q_length = c(0, 2, 0, 1, 3, 0, 3, 0, 0, 0, 0, 0, 2, 2, 0, 0, 0, 3), da2_xfers_per_sec_other = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), da2_xfers_per_sec_read = c(66, 62, 224, 210, 453, 424, 61, 62, 222, 213, 490, 411, 57, 61, 226, 208, 455, 410), da2_xfers_per_sec_write = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), .Names = c("timestamp", "hostname", "health", "hw", "fw", "role", "type", "da20_busy_pct", "da20_dev_type", "da20_kb_per_xfer_read", "da20_kb_per_xfer_write", "da20_mb_per_sec_read", "da20_mb_per_sec_write", "da20_ms_per_xactn_read", "da20_ms_per_xactn_write", "da20_Q_length", "da20_xfers_per_sec_other", "da20_xfers_per_sec_read", "da20_xfers_per_sec_write", "da2_busy_pct", "da2_dev_type", "da2_kb_per_xfer_read", "da2_kb_per_xfer_write", "da2_mb_per_sec_read", "da2_mb_per_sec_write", "da2_ms_per_xactn_read", "da2_ms_per_xactn_write", "da2_Q_length", "da2_xfers_per_sec_other", "da2_xfers_per_sec_read", "da2_xfers_per_sec_write"), class = "data.frame", row.names = c(1L, 2L, 7L, 8L, 13L, 16L, 19L, 20L, 25L, 26L, 31L, 34L, 37L, 38L, 43L, 44L, 49L, 52L))> dim(bw)[1] 18 31 (In the current case, there are a few more columns per device, as well as about 40 more devices -- and thousands of rows -- represented in the data.) For reference (as well):> version_ platform i386-portbld-freebsd10.1 arch i386 os freebsd10.1 system i386, freebsd10.1 status Patched major 3 minor 0.2 year 2013 month 11 day 12 svn rev 64207 language R version.string R version 3.0.2 Patched (2013-11-12 r64207) nickname Frisbee Sailing>[BTW: the first link cited (above) is now a redirect to <http://adv-r.had.co.nz/Reproducibility.html>.] Peace, david -- David H. Wolfskill r at catwhisker.org Those who murder in the name of God or prophet are blasphemous cowards. See http://www.catwhisker.org/~david/publickey.gpg for my public key. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 949 bytes Desc: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20150331/0f006ce0/attachment.bin>
Ista Zahn
2015-Mar-31 18:19 UTC
[R] data.frame: data-driven column selections that vary by row??
Hi David, I suggest reading http://www.jstatsoft.org/v59/i10, then: library(tidyr) library(dplyr) bw <- gather(bw, key = "tmp", value = "value", matches("^d[a-z]+[0-9]+")) bw <- separate(bw, tmp, c("disc", "var"), "_", extra = "merge") bw <- spread(bw, var, value) Best, Ista On Tue, Mar 31, 2015 at 1:22 PM, David Wolfskill <r at catwhisker.org> wrote:> On Tue, Mar 31, 2015 at 07:11:28AM -0800, John Kane wrote: >> I think we need some data and code >> Reproducibility >> https://github.com/hadley/devtools/wiki/Reproducibility >> http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example >> .... > > I apologize for failing to provide that. > > Here is a quite small subset of the data (with a few edits to reduce > excess verbosity in names of things) that still illustrates the > challenge I perceive: > >> dput(bw) > structure(list(timestamp = c(1426892400L, 1426892400L, 1426892400L, > 1426892400L, 1426892400L, 1426892400L, 1426892460L, 1426892460L, > 1426892460L, 1426892460L, 1426892460L, 1426892460L, 1426892520L, > 1426892520L, 1426892520L, 1426892520L, 1426892520L, 1426892520L > ), hostname = c("c001", "c002", "c021", "c022", "c041", "c051", > "c001", "c002", "c021", "c022", "c041", "c051", "c001", "c002", > "c021", "c022", "c041", "c051"), health = c(0.0549374999999983, > 0.250585416666667, 1, 1, 0.577784167075767, 0.546805261621527, > 0.1599375, 0.24954375, 1, 1, 0.582307554123614, 0.558298168996525, > 0.2813125, 0.270877083333333, 1, 1, 0.579231349457365, 0.542973020177151 > ), hw = c(1.9, 1.9, 1.4, 1.4, 1.5, 1.5, 1.9, 1.9, 1.4, 1.4, 1.5, > 1.5, 1.9, 1.9, 1.4, 1.4, 1.5, 1.5), fw = structure(c(1L, 1L, > 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L > ), .Label = "2015Q1.2", class = "factor"), role = structure(c(1L, > 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, > 2L), .Label = c("control", "test"), class = "factor"), type = structure(c(3L, > 3L, 1L, 1L, 2L, 2L, 3L, 3L, 1L, 1L, 2L, 2L, 3L, 3L, 1L, 1L, 2L, > 2L), .Label = c("D", "F", "H"), class = "factor"), da20_busy_pct = c(79.1, > 62.8, NA, NA, NA, NA, 75, 64.8, NA, NA, NA, NA, 72.2, 74.5, NA, > NA, NA, NA), da20_dev_type = structure(c(2L, 2L, 1L, 1L, 1L, > 1L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L), .Label = c("", > "hdd"), class = "factor"), da20_kb_per_xfer_read = c(727.23, > 665.81, NA, NA, NA, NA, 737.04, 691.38, NA, NA, NA, NA, 721.71, > 668.96, NA, NA, NA, NA), da20_kb_per_xfer_write = c(0, 0, NA, > NA, NA, NA, 0, 0, NA, NA, NA, NA, 0, 0, NA, NA, NA, NA), da20_mb_per_sec_read = c(39.77, > 31.21, NA, NA, NA, NA, 36.71, 32.41, NA, NA, NA, NA, 35.94, 37.24, > NA, NA, NA, NA), da20_mb_per_sec_write = c(0, 0, NA, NA, NA, > NA, 0, 0, NA, NA, NA, NA, 0, 0, NA, NA, NA, NA), da20_ms_per_xactn_read = c(43.5, > 31.6, NA, NA, NA, NA, 35.7, 30.2, NA, NA, NA, NA, 32.7, 34.6, > NA, NA, NA, NA), da20_ms_per_xactn_write = c(0, 0, NA, NA, NA, > NA, 0, 0, NA, NA, NA, NA, 0, 0, NA, NA, NA, NA), da20_Q_length = c(0, > 0, NA, NA, NA, NA, 2, 0, NA, NA, NA, NA, 1, 1, NA, NA, NA, NA > ), da20_xfers_per_sec_other = c(0, 0, NA, NA, NA, NA, 0, 0, NA, > NA, NA, NA, 0, 0, NA, NA, NA, NA), da20_xfers_per_sec_read = c(56, > 48, NA, NA, NA, NA, 51, 48, NA, NA, NA, NA, 51, 57, NA, NA, NA, > NA), da20_xfers_per_sec_write = c(0, 0, NA, NA, NA, NA, 0, 0, > NA, NA, NA, NA, 0, 0, NA, NA, NA, NA), da2_busy_pct = c(84.5, > 81.8, 29.5, 26.7, 55.5, 50.9, 80.6, 79.7, 29.2, 27.3, 58.8, 50.2, > 74.6, 79.3, 29.4, 26.6, 55.4, 50.1), da2_dev_type = structure(c(2L, > 2L, 3L, 3L, 3L, 3L, 2L, 2L, 3L, 3L, 3L, 3L, 2L, 2L, 3L, 3L, 3L, > 3L), .Label = c("", "hdd", "ssd"), class = "factor"), da2_kb_per_xfer_read = c(690.67, > 686.63, 613.78, 587, 571.64, 553.27, 692.26, 660.05, 612.01, > 594.28, 560.16, 566.41, 672.68, 670.25, 604.64, 592.16, 565.02, > 564.43), da2_kb_per_xfer_write = c(0, 0, 0, 0, 0, 0, 0, 0, 0, > 0, 0, 0, 0, 0, 0, 0, 0, 0), da2_mb_per_sec_read = c(44.52, 41.57, > 134.26, 120.38, 252.88, 229.09, 41.24, 39.96, 132.68, 123.61, > 268.04, 227.34, 37.44, 39.93, 133.45, 120.28, 251.06, 225.99), > da2_mb_per_sec_write = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > 0, 0, 0, 0, 0, 0, 0), da2_ms_per_xactn_read = c(49.1, 47.8, > 2, 1.8, 2.6, 2.4, 40.3, 43.9, 2, 1.8, 2.8, 2.4, 37.1, 40.9, > 1.9, 1.8, 2.6, 2.4), da2_ms_per_xactn_write = c(0, 0, 0, > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), da2_Q_length = c(0, > 2, 0, 1, 3, 0, 3, 0, 0, 0, 0, 0, 2, 2, 0, 0, 0, 3), da2_xfers_per_sec_other = c(0, > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), da2_xfers_per_sec_read = c(66, > 62, 224, 210, 453, 424, 61, 62, 222, 213, 490, 411, 57, 61, > 226, 208, 455, 410), da2_xfers_per_sec_write = c(0, 0, 0, > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), .Names = c("timestamp", > "hostname", "health", "hw", "fw", "role", "type", "da20_busy_pct", > "da20_dev_type", "da20_kb_per_xfer_read", "da20_kb_per_xfer_write", > "da20_mb_per_sec_read", "da20_mb_per_sec_write", "da20_ms_per_xactn_read", > "da20_ms_per_xactn_write", "da20_Q_length", "da20_xfers_per_sec_other", > "da20_xfers_per_sec_read", "da20_xfers_per_sec_write", "da2_busy_pct", > "da2_dev_type", "da2_kb_per_xfer_read", "da2_kb_per_xfer_write", > "da2_mb_per_sec_read", "da2_mb_per_sec_write", "da2_ms_per_xactn_read", > "da2_ms_per_xactn_write", "da2_Q_length", "da2_xfers_per_sec_other", > "da2_xfers_per_sec_read", "da2_xfers_per_sec_write"), class = "data.frame", row.names = c(1L, > 2L, 7L, 8L, 13L, 16L, 19L, 20L, 25L, 26L, 31L, 34L, 37L, 38L, > 43L, 44L, 49L, 52L)) >> dim(bw) > [1] 18 31 > > (In the current case, there are a few more columns per device, as > well as about 40 more devices -- and thousands of rows -- represented > in the data.) > > For reference (as well): >> version > _ > platform i386-portbld-freebsd10.1 > arch i386 > os freebsd10.1 > system i386, freebsd10.1 > status Patched > major 3 > minor 0.2 > year 2013 > month 11 > day 12 > svn rev 64207 > language R > version.string R version 3.0.2 Patched (2013-11-12 r64207) > nickname Frisbee Sailing >> > > [BTW: the first link cited (above) is now a redirect to > <http://adv-r.had.co.nz/Reproducibility.html>.] > > Peace, > david > -- > David H. Wolfskill r at catwhisker.org > Those who murder in the name of God or prophet are blasphemous cowards. > > See http://www.catwhisker.org/~david/publickey.gpg for my public key. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Tom Wright
2015-Mar-31 18:31 UTC
[R] data.frame: data-driven column selections that vary by row??
Not entirely sure I understand your problem here (your first email was a lot of reading). Would it make sense to add an extra column device_name Thus ending up with something like: Host Device Type host_A ada0 ssd host_A ada1 ssd host_A ada2 hdd ... host_N da3 ssd You could then subset this dataframe: subset(data,Type=="ssd" & Device=="ada0") On Tue, 2015-03-31 at 10:22 -0700, David Wolfskill wrote:> On Tue, Mar 31, 2015 at 07:11:28AM -0800, John Kane wrote: > > I think we need some data and code > > Reproducibility > > https://github.com/hadley/devtools/wiki/Reproducibility > > http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example > > .... > > I apologize for failing to provide that. > > Here is a quite small subset of the data (with a few edits to reduce > excess verbosity in names of things) that still illustrates the > challenge I perceive: > > > dput(bw) > structure(list(timestamp = c(1426892400L, 1426892400L, 1426892400L, > 1426892400L, 1426892400L, 1426892400L, 1426892460L, 1426892460L, > 1426892460L, 1426892460L, 1426892460L, 1426892460L, 1426892520L, > 1426892520L, 1426892520L, 1426892520L, 1426892520L, 1426892520L > ), hostname = c("c001", "c002", "c021", "c022", "c041", "c051", > "c001", "c002", "c021", "c022", "c041", "c051", "c001", "c002", > "c021", "c022", "c041", "c051"), health = c(0.0549374999999983, > 0.250585416666667, 1, 1, 0.577784167075767, 0.546805261621527, > 0.1599375, 0.24954375, 1, 1, 0.582307554123614, 0.558298168996525, > 0.2813125, 0.270877083333333, 1, 1, 0.579231349457365, 0.542973020177151 > ), hw = c(1.9, 1.9, 1.4, 1.4, 1.5, 1.5, 1.9, 1.9, 1.4, 1.4, 1.5, > 1.5, 1.9, 1.9, 1.4, 1.4, 1.5, 1.5), fw = structure(c(1L, 1L, > 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L > ), .Label = "2015Q1.2", class = "factor"), role = structure(c(1L, > 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, > 2L), .Label = c("control", "test"), class = "factor"), type = structure(c(3L, > 3L, 1L, 1L, 2L, 2L, 3L, 3L, 1L, 1L, 2L, 2L, 3L, 3L, 1L, 1L, 2L, > 2L), .Label = c("D", "F", "H"), class = "factor"), da20_busy_pct = c(79.1, > 62.8, NA, NA, NA, NA, 75, 64.8, NA, NA, NA, NA, 72.2, 74.5, NA, > NA, NA, NA), da20_dev_type = structure(c(2L, 2L, 1L, 1L, 1L, > 1L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L), .Label = c("", > "hdd"), class = "factor"), da20_kb_per_xfer_read = c(727.23, > 665.81, NA, NA, NA, NA, 737.04, 691.38, NA, NA, NA, NA, 721.71, > 668.96, NA, NA, NA, NA), da20_kb_per_xfer_write = c(0, 0, NA, > NA, NA, NA, 0, 0, NA, NA, NA, NA, 0, 0, NA, NA, NA, NA), da20_mb_per_sec_read = c(39.77, > 31.21, NA, NA, NA, NA, 36.71, 32.41, NA, NA, NA, NA, 35.94, 37.24, > NA, NA, NA, NA), da20_mb_per_sec_write = c(0, 0, NA, NA, NA, > NA, 0, 0, NA, NA, NA, NA, 0, 0, NA, NA, NA, NA), da20_ms_per_xactn_read = c(43.5, > 31.6, NA, NA, NA, NA, 35.7, 30.2, NA, NA, NA, NA, 32.7, 34.6, > NA, NA, NA, NA), da20_ms_per_xactn_write = c(0, 0, NA, NA, NA, > NA, 0, 0, NA, NA, NA, NA, 0, 0, NA, NA, NA, NA), da20_Q_length = c(0, > 0, NA, NA, NA, NA, 2, 0, NA, NA, NA, NA, 1, 1, NA, NA, NA, NA > ), da20_xfers_per_sec_other = c(0, 0, NA, NA, NA, NA, 0, 0, NA, > NA, NA, NA, 0, 0, NA, NA, NA, NA), da20_xfers_per_sec_read = c(56, > 48, NA, NA, NA, NA, 51, 48, NA, NA, NA, NA, 51, 57, NA, NA, NA, > NA), da20_xfers_per_sec_write = c(0, 0, NA, NA, NA, NA, 0, 0, > NA, NA, NA, NA, 0, 0, NA, NA, NA, NA), da2_busy_pct = c(84.5, > 81.8, 29.5, 26.7, 55.5, 50.9, 80.6, 79.7, 29.2, 27.3, 58.8, 50.2, > 74.6, 79.3, 29.4, 26.6, 55.4, 50.1), da2_dev_type = structure(c(2L, > 2L, 3L, 3L, 3L, 3L, 2L, 2L, 3L, 3L, 3L, 3L, 2L, 2L, 3L, 3L, 3L, > 3L), .Label = c("", "hdd", "ssd"), class = "factor"), da2_kb_per_xfer_read = c(690.67, > 686.63, 613.78, 587, 571.64, 553.27, 692.26, 660.05, 612.01, > 594.28, 560.16, 566.41, 672.68, 670.25, 604.64, 592.16, 565.02, > 564.43), da2_kb_per_xfer_write = c(0, 0, 0, 0, 0, 0, 0, 0, 0, > 0, 0, 0, 0, 0, 0, 0, 0, 0), da2_mb_per_sec_read = c(44.52, 41.57, > 134.26, 120.38, 252.88, 229.09, 41.24, 39.96, 132.68, 123.61, > 268.04, 227.34, 37.44, 39.93, 133.45, 120.28, 251.06, 225.99), > da2_mb_per_sec_write = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > 0, 0, 0, 0, 0, 0, 0), da2_ms_per_xactn_read = c(49.1, 47.8, > 2, 1.8, 2.6, 2.4, 40.3, 43.9, 2, 1.8, 2.8, 2.4, 37.1, 40.9, > 1.9, 1.8, 2.6, 2.4), da2_ms_per_xactn_write = c(0, 0, 0, > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), da2_Q_length = c(0, > 2, 0, 1, 3, 0, 3, 0, 0, 0, 0, 0, 2, 2, 0, 0, 0, 3), da2_xfers_per_sec_other = c(0, > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), da2_xfers_per_sec_read = c(66, > 62, 224, 210, 453, 424, 61, 62, 222, 213, 490, 411, 57, 61, > 226, 208, 455, 410), da2_xfers_per_sec_write = c(0, 0, 0, > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), .Names = c("timestamp", > "hostname", "health", "hw", "fw", "role", "type", "da20_busy_pct", > "da20_dev_type", "da20_kb_per_xfer_read", "da20_kb_per_xfer_write", > "da20_mb_per_sec_read", "da20_mb_per_sec_write", "da20_ms_per_xactn_read", > "da20_ms_per_xactn_write", "da20_Q_length", "da20_xfers_per_sec_other", > "da20_xfers_per_sec_read", "da20_xfers_per_sec_write", "da2_busy_pct", > "da2_dev_type", "da2_kb_per_xfer_read", "da2_kb_per_xfer_write", > "da2_mb_per_sec_read", "da2_mb_per_sec_write", "da2_ms_per_xactn_read", > "da2_ms_per_xactn_write", "da2_Q_length", "da2_xfers_per_sec_other", > "da2_xfers_per_sec_read", "da2_xfers_per_sec_write"), class = "data.frame", row.names = c(1L, > 2L, 7L, 8L, 13L, 16L, 19L, 20L, 25L, 26L, 31L, 34L, 37L, 38L, > 43L, 44L, 49L, 52L)) > > dim(bw) > [1] 18 31 > > (In the current case, there are a few more columns per device, as > well as about 40 more devices -- and thousands of rows -- represented > in the data.) > > For reference (as well): > > version > _ > platform i386-portbld-freebsd10.1 > arch i386 > os freebsd10.1 > system i386, freebsd10.1 > status Patched > major 3 > minor 0.2 > year 2013 > month 11 > day 12 > svn rev 64207 > language R > version.string R version 3.0.2 Patched (2013-11-12 r64207) > nickname Frisbee Sailing > > > > [BTW: the first link cited (above) is now a redirect to > <http://adv-r.had.co.nz/Reproducibility.html>.] > > Peace, > david > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Tom Wright
2015-Mar-31 18:35 UTC
[R] data.frame: data-driven column selections that vary by row??
Nice clean-up!!! On Tue, 2015-03-31 at 14:19 -0400, Ista Zahn wrote:> library(tidyr) > library(dplyr) > bw <- gather(bw, key = "tmp", value = "value", > matches("^d[a-z]+[0-9]+")) > bw <- separate(bw, tmp, c("disc", "var"), "_", extra = "merge") > bw <- spread(bw, var, value)