thr3ads.net - R help - [R] data.frame: data-driven column selections that vary by row?? [Mar 2015]

If this information is useful, please help other people find it:
Share via:

David Wolfskill

2015-Mar-31 17:22 UTC

[R] data.frame: data-driven column selections that vary by row??

On Tue, Mar 31, 2015 at 07:11:28AM -0800, John Kane
wrote:> I think we need some data and code 
> Reproducibility
> https://github.com/hadley/devtools/wiki/Reproducibility
> 
http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
> ....
I apologize for failing to provide that.

Here is a quite small subset of the data (with a few edits to reduce
excess verbosity in names of things) that still illustrates the
challenge I perceive:
> dput(bw)structure(list(timestamp = c(1426892400L, 1426892400L, 1426892400L, 
1426892400L, 1426892400L, 1426892400L, 1426892460L, 1426892460L, 
1426892460L, 1426892460L, 1426892460L, 1426892460L, 1426892520L, 
1426892520L, 1426892520L, 1426892520L, 1426892520L, 1426892520L
), hostname = c("c001", "c002", "c021",
"c022", "c041", "c051",
"c001", "c002", "c021", "c022",
"c041", "c051", "c001", "c002",
"c021", "c022", "c041", "c051"), health
= c(0.0549374999999983,
0.250585416666667, 1, 1, 0.577784167075767, 0.546805261621527, 
0.1599375, 0.24954375, 1, 1, 0.582307554123614, 0.558298168996525, 
0.2813125, 0.270877083333333, 1, 1, 0.579231349457365, 0.542973020177151
), hw = c(1.9, 1.9, 1.4, 1.4, 1.5, 1.5, 1.9, 1.9, 1.4, 1.4, 1.5, 
1.5, 1.9, 1.9, 1.4, 1.4, 1.5, 1.5), fw = structure(c(1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L
), .Label = "2015Q1.2", class = "factor"), role =
structure(c(1L,
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 
2L), .Label = c("control", "test"), class =
"factor"), type = structure(c(3L,
3L, 1L, 1L, 2L, 2L, 3L, 3L, 1L, 1L, 2L, 2L, 3L, 3L, 1L, 1L, 2L, 
2L), .Label = c("D", "F", "H"), class =
"factor"), da20_busy_pct = c(79.1,
62.8, NA, NA, NA, NA, 75, 64.8, NA, NA, NA, NA, 72.2, 74.5, NA, 
NA, NA, NA), da20_dev_type = structure(c(2L, 2L, 1L, 1L, 1L, 
1L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L), .Label = c("", 
"hdd"), class = "factor"), da20_kb_per_xfer_read = c(727.23,
665.81, NA, NA, NA, NA, 737.04, 691.38, NA, NA, NA, NA, 721.71, 
668.96, NA, NA, NA, NA), da20_kb_per_xfer_write = c(0, 0, NA, 
NA, NA, NA, 0, 0, NA, NA, NA, NA, 0, 0, NA, NA, NA, NA), da20_mb_per_sec_read =
c(39.77,
31.21, NA, NA, NA, NA, 36.71, 32.41, NA, NA, NA, NA, 35.94, 37.24, 
NA, NA, NA, NA), da20_mb_per_sec_write = c(0, 0, NA, NA, NA, 
NA, 0, 0, NA, NA, NA, NA, 0, 0, NA, NA, NA, NA), da20_ms_per_xactn_read =
c(43.5,
31.6, NA, NA, NA, NA, 35.7, 30.2, NA, NA, NA, NA, 32.7, 34.6, 
NA, NA, NA, NA), da20_ms_per_xactn_write = c(0, 0, NA, NA, NA, 
NA, 0, 0, NA, NA, NA, NA, 0, 0, NA, NA, NA, NA), da20_Q_length = c(0, 
0, NA, NA, NA, NA, 2, 0, NA, NA, NA, NA, 1, 1, NA, NA, NA, NA
), da20_xfers_per_sec_other = c(0, 0, NA, NA, NA, NA, 0, 0, NA, 
NA, NA, NA, 0, 0, NA, NA, NA, NA), da20_xfers_per_sec_read = c(56, 
48, NA, NA, NA, NA, 51, 48, NA, NA, NA, NA, 51, 57, NA, NA, NA, 
NA), da20_xfers_per_sec_write = c(0, 0, NA, NA, NA, NA, 0, 0, 
NA, NA, NA, NA, 0, 0, NA, NA, NA, NA), da2_busy_pct = c(84.5, 
81.8, 29.5, 26.7, 55.5, 50.9, 80.6, 79.7, 29.2, 27.3, 58.8, 50.2, 
74.6, 79.3, 29.4, 26.6, 55.4, 50.1), da2_dev_type = structure(c(2L, 
2L, 3L, 3L, 3L, 3L, 2L, 2L, 3L, 3L, 3L, 3L, 2L, 2L, 3L, 3L, 3L, 
3L), .Label = c("", "hdd", "ssd"), class =
"factor"), da2_kb_per_xfer_read = c(690.67,
686.63, 613.78, 587, 571.64, 553.27, 692.26, 660.05, 612.01, 
594.28, 560.16, 566.41, 672.68, 670.25, 604.64, 592.16, 565.02, 
564.43), da2_kb_per_xfer_write = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0), da2_mb_per_sec_read = c(44.52, 41.57, 
134.26, 120.38, 252.88, 229.09, 41.24, 39.96, 132.68, 123.61, 
268.04, 227.34, 37.44, 39.93, 133.45, 120.28, 251.06, 225.99), 
    da2_mb_per_sec_write = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0), da2_ms_per_xactn_read = c(49.1, 47.8, 
    2, 1.8, 2.6, 2.4, 40.3, 43.9, 2, 1.8, 2.8, 2.4, 37.1, 40.9, 
    1.9, 1.8, 2.6, 2.4), da2_ms_per_xactn_write = c(0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), da2_Q_length = c(0, 
    2, 0, 1, 3, 0, 3, 0, 0, 0, 0, 0, 2, 2, 0, 0, 0, 3), da2_xfers_per_sec_other
= c(0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), da2_xfers_per_sec_read =
c(66,
    62, 224, 210, 453, 424, 61, 62, 222, 213, 490, 411, 57, 61, 
    226, 208, 455, 410), da2_xfers_per_sec_write = c(0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), .Names =
c("timestamp",
"hostname", "health", "hw", "fw",
"role", "type", "da20_busy_pct",
"da20_dev_type", "da20_kb_per_xfer_read",
"da20_kb_per_xfer_write",
"da20_mb_per_sec_read", "da20_mb_per_sec_write",
"da20_ms_per_xactn_read",
"da20_ms_per_xactn_write", "da20_Q_length",
"da20_xfers_per_sec_other",
"da20_xfers_per_sec_read", "da20_xfers_per_sec_write",
"da2_busy_pct",
"da2_dev_type", "da2_kb_per_xfer_read",
"da2_kb_per_xfer_write",
"da2_mb_per_sec_read", "da2_mb_per_sec_write",
"da2_ms_per_xactn_read",
"da2_ms_per_xactn_write", "da2_Q_length",
"da2_xfers_per_sec_other",
"da2_xfers_per_sec_read", "da2_xfers_per_sec_write"), class
= "data.frame", row.names = c(1L,
2L, 7L, 8L, 13L, 16L, 19L, 20L, 25L, 26L, 31L, 34L, 37L, 38L, 
43L, 44L, 49L, 52L))> dim(bw)[1] 18 31

(In the current case, there are a few more columns per device, as
well as about 40 more devices -- and thousands of rows -- represented
in the data.)

For reference (as well):> version               _                                          
platform       i386-portbld-freebsd10.1                   
arch           i386                                       
os             freebsd10.1                                
system         i386, freebsd10.1                          
status         Patched                                    
major          3                                          
minor          0.2                                        
year           2013                                       
month          11                                         
day            12                                         
svn rev        64207                                      
language       R                                          
version.string R version 3.0.2 Patched (2013-11-12 r64207)
nickname       Frisbee Sailing                           
> 
[BTW: the first link cited (above) is now a redirect to
<http://adv-r.had.co.nz/Reproducibility.html>.]

Peace,
david
-- 
David H. Wolfskill				r at catwhisker.org
Those who murder in the name of God or prophet are blasphemous cowards.

See http://www.catwhisker.org/~david/publickey.gpg for my public key.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 949 bytes
Desc: not available
URL:
<https://stat.ethz.ch/pipermail/r-help/attachments/20150331/0f006ce0/attachment.bin>

Ista Zahn

2015-Mar-31 18:19 UTC

head link

[R] data.frame: data-driven column selections that vary by row??

Hi David,

I suggest reading http://www.jstatsoft.org/v59/i10, then:

library(tidyr)
library(dplyr)
bw <- gather(bw, key = "tmp", value = "value",
matches("^d[a-z]+[0-9]+"))
bw <- separate(bw, tmp, c("disc", "var"), "_",
extra = "merge")
bw <- spread(bw, var, value)

Best,
Ista

On Tue, Mar 31, 2015 at 1:22 PM, David Wolfskill <r at catwhisker.org>
wrote:> On Tue, Mar 31, 2015 at 07:11:28AM -0800, John Kane wrote:
>> I think we need some data and code
>> Reproducibility
>> https://github.com/hadley/devtools/wiki/Reproducibility
>> 
http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
>> ....
>
> I apologize for failing to provide that.
>
> Here is a quite small subset of the data (with a few edits to reduce
> excess verbosity in names of things) that still illustrates the
> challenge I perceive:
>
>> dput(bw)
> structure(list(timestamp = c(1426892400L, 1426892400L, 1426892400L,
> 1426892400L, 1426892400L, 1426892400L, 1426892460L, 1426892460L,
> 1426892460L, 1426892460L, 1426892460L, 1426892460L, 1426892520L,
> 1426892520L, 1426892520L, 1426892520L, 1426892520L, 1426892520L
> ), hostname = c("c001", "c002", "c021",
"c022", "c041", "c051",
> "c001", "c002", "c021", "c022",
"c041", "c051", "c001", "c002",
> "c021", "c022", "c041", "c051"),
health = c(0.0549374999999983,
> 0.250585416666667, 1, 1, 0.577784167075767, 0.546805261621527,
> 0.1599375, 0.24954375, 1, 1, 0.582307554123614, 0.558298168996525,
> 0.2813125, 0.270877083333333, 1, 1, 0.579231349457365, 0.542973020177151
> ), hw = c(1.9, 1.9, 1.4, 1.4, 1.5, 1.5, 1.9, 1.9, 1.4, 1.4, 1.5,
> 1.5, 1.9, 1.9, 1.4, 1.4, 1.5, 1.5), fw = structure(c(1L, 1L,
> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L
> ), .Label = "2015Q1.2", class = "factor"), role =
structure(c(1L,
> 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L,
> 2L), .Label = c("control", "test"), class =
"factor"), type = structure(c(3L,
> 3L, 1L, 1L, 2L, 2L, 3L, 3L, 1L, 1L, 2L, 2L, 3L, 3L, 1L, 1L, 2L,
> 2L), .Label = c("D", "F", "H"), class =
"factor"), da20_busy_pct = c(79.1,
> 62.8, NA, NA, NA, NA, 75, 64.8, NA, NA, NA, NA, 72.2, 74.5, NA,
> NA, NA, NA), da20_dev_type = structure(c(2L, 2L, 1L, 1L, 1L,
> 1L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L), .Label =
c("",
> "hdd"), class = "factor"), da20_kb_per_xfer_read =
c(727.23,
> 665.81, NA, NA, NA, NA, 737.04, 691.38, NA, NA, NA, NA, 721.71,
> 668.96, NA, NA, NA, NA), da20_kb_per_xfer_write = c(0, 0, NA,
> NA, NA, NA, 0, 0, NA, NA, NA, NA, 0, 0, NA, NA, NA, NA),
da20_mb_per_sec_read = c(39.77,
> 31.21, NA, NA, NA, NA, 36.71, 32.41, NA, NA, NA, NA, 35.94, 37.24,
> NA, NA, NA, NA), da20_mb_per_sec_write = c(0, 0, NA, NA, NA,
> NA, 0, 0, NA, NA, NA, NA, 0, 0, NA, NA, NA, NA), da20_ms_per_xactn_read =
c(43.5,
> 31.6, NA, NA, NA, NA, 35.7, 30.2, NA, NA, NA, NA, 32.7, 34.6,
> NA, NA, NA, NA), da20_ms_per_xactn_write = c(0, 0, NA, NA, NA,
> NA, 0, 0, NA, NA, NA, NA, 0, 0, NA, NA, NA, NA), da20_Q_length = c(0,
> 0, NA, NA, NA, NA, 2, 0, NA, NA, NA, NA, 1, 1, NA, NA, NA, NA
> ), da20_xfers_per_sec_other = c(0, 0, NA, NA, NA, NA, 0, 0, NA,
> NA, NA, NA, 0, 0, NA, NA, NA, NA), da20_xfers_per_sec_read = c(56,
> 48, NA, NA, NA, NA, 51, 48, NA, NA, NA, NA, 51, 57, NA, NA, NA,
> NA), da20_xfers_per_sec_write = c(0, 0, NA, NA, NA, NA, 0, 0,
> NA, NA, NA, NA, 0, 0, NA, NA, NA, NA), da2_busy_pct = c(84.5,
> 81.8, 29.5, 26.7, 55.5, 50.9, 80.6, 79.7, 29.2, 27.3, 58.8, 50.2,
> 74.6, 79.3, 29.4, 26.6, 55.4, 50.1), da2_dev_type = structure(c(2L,
> 2L, 3L, 3L, 3L, 3L, 2L, 2L, 3L, 3L, 3L, 3L, 2L, 2L, 3L, 3L, 3L,
> 3L), .Label = c("", "hdd", "ssd"), class =
"factor"), da2_kb_per_xfer_read = c(690.67,
> 686.63, 613.78, 587, 571.64, 553.27, 692.26, 660.05, 612.01,
> 594.28, 560.16, 566.41, 672.68, 670.25, 604.64, 592.16, 565.02,
> 564.43), da2_kb_per_xfer_write = c(0, 0, 0, 0, 0, 0, 0, 0, 0,
> 0, 0, 0, 0, 0, 0, 0, 0, 0), da2_mb_per_sec_read = c(44.52, 41.57,
> 134.26, 120.38, 252.88, 229.09, 41.24, 39.96, 132.68, 123.61,
> 268.04, 227.34, 37.44, 39.93, 133.45, 120.28, 251.06, 225.99),
>     da2_mb_per_sec_write = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>     0, 0, 0, 0, 0, 0, 0), da2_ms_per_xactn_read = c(49.1, 47.8,
>     2, 1.8, 2.6, 2.4, 40.3, 43.9, 2, 1.8, 2.8, 2.4, 37.1, 40.9,
>     1.9, 1.8, 2.6, 2.4), da2_ms_per_xactn_write = c(0, 0, 0,
>     0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), da2_Q_length = c(0,
>     2, 0, 1, 3, 0, 3, 0, 0, 0, 0, 0, 2, 2, 0, 0, 0, 3),
da2_xfers_per_sec_other = c(0,
>     0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
da2_xfers_per_sec_read = c(66,
>     62, 224, 210, 453, 424, 61, 62, 222, 213, 490, 411, 57, 61,
>     226, 208, 455, 410), da2_xfers_per_sec_write = c(0, 0, 0,
>     0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), .Names =
c("timestamp",
> "hostname", "health", "hw", "fw",
"role", "type", "da20_busy_pct",
> "da20_dev_type", "da20_kb_per_xfer_read",
"da20_kb_per_xfer_write",
> "da20_mb_per_sec_read", "da20_mb_per_sec_write",
"da20_ms_per_xactn_read",
> "da20_ms_per_xactn_write", "da20_Q_length",
"da20_xfers_per_sec_other",
> "da20_xfers_per_sec_read", "da20_xfers_per_sec_write",
"da2_busy_pct",
> "da2_dev_type", "da2_kb_per_xfer_read",
"da2_kb_per_xfer_write",
> "da2_mb_per_sec_read", "da2_mb_per_sec_write",
"da2_ms_per_xactn_read",
> "da2_ms_per_xactn_write", "da2_Q_length",
"da2_xfers_per_sec_other",
> "da2_xfers_per_sec_read", "da2_xfers_per_sec_write"),
class = "data.frame", row.names = c(1L,
> 2L, 7L, 8L, 13L, 16L, 19L, 20L, 25L, 26L, 31L, 34L, 37L, 38L,
> 43L, 44L, 49L, 52L))
>> dim(bw)
> [1] 18 31
>
> (In the current case, there are a few more columns per device, as
> well as about 40 more devices -- and thousands of rows -- represented
> in the data.)
>
> For reference (as well):
>> version
>                _
> platform       i386-portbld-freebsd10.1
> arch           i386
> os             freebsd10.1
> system         i386, freebsd10.1
> status         Patched
> major          3
> minor          0.2
> year           2013
> month          11
> day            12
> svn rev        64207
> language       R
> version.string R version 3.0.2 Patched (2013-11-12 r64207)
> nickname       Frisbee Sailing
>>
>
> [BTW: the first link cited (above) is now a redirect to
> <http://adv-r.had.co.nz/Reproducibility.html>.]
>
> Peace,
> david
> --
> David H. Wolfskill                              r at catwhisker.org
> Those who murder in the name of God or prophet are blasphemous cowards.
>
> See http://www.catwhisker.org/~david/publickey.gpg for my public key.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Tom Wright

2015-Mar-31 18:31 UTC

head link

[R] data.frame: data-driven column selections that vary by row??

Not entirely sure I understand your problem here (your first email was a
lot of reading).

Would it make sense to add an extra column device_name

Thus ending up with something like:
Host	  Device  Type
host_A    ada0    ssd
host_A    ada1    ssd
host_A    ada2    hdd
...
host_N    da3     ssd


You could then subset this dataframe:
subset(data,Type=="ssd" & Device=="ada0")

On Tue, 2015-03-31 at 10:22 -0700, David Wolfskill
wrote:> On Tue, Mar 31, 2015 at 07:11:28AM -0800, John Kane wrote:
> > I think we need some data and code 
> > Reproducibility
> > https://github.com/hadley/devtools/wiki/Reproducibility
> > 
http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
> > ....
> 
> I apologize for failing to provide that.
> 
> Here is a quite small subset of the data (with a few edits to reduce
> excess verbosity in names of things) that still illustrates the
> challenge I perceive:
> 
> > dput(bw)
> structure(list(timestamp = c(1426892400L, 1426892400L, 1426892400L, 
> 1426892400L, 1426892400L, 1426892400L, 1426892460L, 1426892460L, 
> 1426892460L, 1426892460L, 1426892460L, 1426892460L, 1426892520L, 
> 1426892520L, 1426892520L, 1426892520L, 1426892520L, 1426892520L
> ), hostname = c("c001", "c002", "c021",
"c022", "c041", "c051",
> "c001", "c002", "c021", "c022",
"c041", "c051", "c001", "c002",
> "c021", "c022", "c041", "c051"),
health = c(0.0549374999999983,
> 0.250585416666667, 1, 1, 0.577784167075767, 0.546805261621527, 
> 0.1599375, 0.24954375, 1, 1, 0.582307554123614, 0.558298168996525, 
> 0.2813125, 0.270877083333333, 1, 1, 0.579231349457365, 0.542973020177151
> ), hw = c(1.9, 1.9, 1.4, 1.4, 1.5, 1.5, 1.9, 1.9, 1.4, 1.4, 1.5, 
> 1.5, 1.9, 1.9, 1.4, 1.4, 1.5, 1.5), fw = structure(c(1L, 1L, 
> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L
> ), .Label = "2015Q1.2", class = "factor"), role =
structure(c(1L,
> 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 
> 2L), .Label = c("control", "test"), class =
"factor"), type = structure(c(3L,
> 3L, 1L, 1L, 2L, 2L, 3L, 3L, 1L, 1L, 2L, 2L, 3L, 3L, 1L, 1L, 2L, 
> 2L), .Label = c("D", "F", "H"), class =
"factor"), da20_busy_pct = c(79.1,
> 62.8, NA, NA, NA, NA, 75, 64.8, NA, NA, NA, NA, 72.2, 74.5, NA, 
> NA, NA, NA), da20_dev_type = structure(c(2L, 2L, 1L, 1L, 1L, 
> 1L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L), .Label =
c("",
> "hdd"), class = "factor"), da20_kb_per_xfer_read =
c(727.23,
> 665.81, NA, NA, NA, NA, 737.04, 691.38, NA, NA, NA, NA, 721.71, 
> 668.96, NA, NA, NA, NA), da20_kb_per_xfer_write = c(0, 0, NA, 
> NA, NA, NA, 0, 0, NA, NA, NA, NA, 0, 0, NA, NA, NA, NA),
da20_mb_per_sec_read = c(39.77,
> 31.21, NA, NA, NA, NA, 36.71, 32.41, NA, NA, NA, NA, 35.94, 37.24, 
> NA, NA, NA, NA), da20_mb_per_sec_write = c(0, 0, NA, NA, NA, 
> NA, 0, 0, NA, NA, NA, NA, 0, 0, NA, NA, NA, NA), da20_ms_per_xactn_read =
c(43.5,
> 31.6, NA, NA, NA, NA, 35.7, 30.2, NA, NA, NA, NA, 32.7, 34.6, 
> NA, NA, NA, NA), da20_ms_per_xactn_write = c(0, 0, NA, NA, NA, 
> NA, 0, 0, NA, NA, NA, NA, 0, 0, NA, NA, NA, NA), da20_Q_length = c(0, 
> 0, NA, NA, NA, NA, 2, 0, NA, NA, NA, NA, 1, 1, NA, NA, NA, NA
> ), da20_xfers_per_sec_other = c(0, 0, NA, NA, NA, NA, 0, 0, NA, 
> NA, NA, NA, 0, 0, NA, NA, NA, NA), da20_xfers_per_sec_read = c(56, 
> 48, NA, NA, NA, NA, 51, 48, NA, NA, NA, NA, 51, 57, NA, NA, NA, 
> NA), da20_xfers_per_sec_write = c(0, 0, NA, NA, NA, NA, 0, 0, 
> NA, NA, NA, NA, 0, 0, NA, NA, NA, NA), da2_busy_pct = c(84.5, 
> 81.8, 29.5, 26.7, 55.5, 50.9, 80.6, 79.7, 29.2, 27.3, 58.8, 50.2, 
> 74.6, 79.3, 29.4, 26.6, 55.4, 50.1), da2_dev_type = structure(c(2L, 
> 2L, 3L, 3L, 3L, 3L, 2L, 2L, 3L, 3L, 3L, 3L, 2L, 2L, 3L, 3L, 3L, 
> 3L), .Label = c("", "hdd", "ssd"), class =
"factor"), da2_kb_per_xfer_read = c(690.67,
> 686.63, 613.78, 587, 571.64, 553.27, 692.26, 660.05, 612.01, 
> 594.28, 560.16, 566.41, 672.68, 670.25, 604.64, 592.16, 565.02, 
> 564.43), da2_kb_per_xfer_write = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 
> 0, 0, 0, 0, 0, 0, 0, 0, 0), da2_mb_per_sec_read = c(44.52, 41.57, 
> 134.26, 120.38, 252.88, 229.09, 41.24, 39.96, 132.68, 123.61, 
> 268.04, 227.34, 37.44, 39.93, 133.45, 120.28, 251.06, 225.99), 
>     da2_mb_per_sec_write = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
>     0, 0, 0, 0, 0, 0, 0), da2_ms_per_xactn_read = c(49.1, 47.8, 
>     2, 1.8, 2.6, 2.4, 40.3, 43.9, 2, 1.8, 2.8, 2.4, 37.1, 40.9, 
>     1.9, 1.8, 2.6, 2.4), da2_ms_per_xactn_write = c(0, 0, 0, 
>     0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), da2_Q_length = c(0, 
>     2, 0, 1, 3, 0, 3, 0, 0, 0, 0, 0, 2, 2, 0, 0, 0, 3),
da2_xfers_per_sec_other = c(0,
>     0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
da2_xfers_per_sec_read = c(66,
>     62, 224, 210, 453, 424, 61, 62, 222, 213, 490, 411, 57, 61, 
>     226, 208, 455, 410), da2_xfers_per_sec_write = c(0, 0, 0, 
>     0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), .Names =
c("timestamp",
> "hostname", "health", "hw", "fw",
"role", "type", "da20_busy_pct",
> "da20_dev_type", "da20_kb_per_xfer_read",
"da20_kb_per_xfer_write",
> "da20_mb_per_sec_read", "da20_mb_per_sec_write",
"da20_ms_per_xactn_read",
> "da20_ms_per_xactn_write", "da20_Q_length",
"da20_xfers_per_sec_other",
> "da20_xfers_per_sec_read", "da20_xfers_per_sec_write",
"da2_busy_pct",
> "da2_dev_type", "da2_kb_per_xfer_read",
"da2_kb_per_xfer_write",
> "da2_mb_per_sec_read", "da2_mb_per_sec_write",
"da2_ms_per_xactn_read",
> "da2_ms_per_xactn_write", "da2_Q_length",
"da2_xfers_per_sec_other",
> "da2_xfers_per_sec_read", "da2_xfers_per_sec_write"),
class = "data.frame", row.names = c(1L,
> 2L, 7L, 8L, 13L, 16L, 19L, 20L, 25L, 26L, 31L, 34L, 37L, 38L, 
> 43L, 44L, 49L, 52L))
> > dim(bw)
> [1] 18 31
> 
> (In the current case, there are a few more columns per device, as
> well as about 40 more devices -- and thousands of rows -- represented
> in the data.)
> 
> For reference (as well):
> > version
>                _                                          
> platform       i386-portbld-freebsd10.1                   
> arch           i386                                       
> os             freebsd10.1                                
> system         i386, freebsd10.1                          
> status         Patched                                    
> major          3                                          
> minor          0.2                                        
> year           2013                                       
> month          11                                         
> day            12                                         
> svn rev        64207                                      
> language       R                                          
> version.string R version 3.0.2 Patched (2013-11-12 r64207)
> nickname       Frisbee Sailing                            
> > 
> 
> [BTW: the first link cited (above) is now a redirect to
> <http://adv-r.had.co.nz/Reproducibility.html>.]
> 
> Peace,
> david
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Tom Wright

2015-Mar-31 18:35 UTC

head link

[R] data.frame: data-driven column selections that vary by row??

Nice clean-up!!!

On Tue, 2015-03-31 at 14:19 -0400, Ista Zahn wrote:> library(tidyr)
> library(dplyr)
> bw <- gather(bw, key = "tmp", value = "value",
> matches("^d[a-z]+[0-9]+"))
> bw <- separate(bw, tmp, c("disc", "var"),
"_", extra = "merge")
> bw <- spread(bw, var, value)

R help - Mar 2015 - data.frame: data-driven column selections that vary by row??

[R] data.frame: data-driven column selections that vary by row??

[R] data.frame: data-driven column selections that vary by row??

[R] data.frame: data-driven column selections that vary by row??

[R] data.frame: data-driven column selections that vary by row??