Jabez Wilson
2011-Dec-01 21:15 UTC
[R] Fw: calculate mean of multiple rows in a data frame
NAME ID a b c d 1 Control_1 probe~B01R01C01 381 213 345 653 2 Control_2 probe~B01R01C02 574 629 563 783 3 Control_1 probe~B01R09C01 673 511 521 967 4 Control_3 probe~B01R09C02 53 809 999 50 5 MM0289~RFU:11810.15 probe~B29R13C06 681 34 115 587 6 MM0289~RFU:9238.41 probe~B29R13C05 784 443 20 784 7 MM16597~RFU:36765.38 probe~B44R15C20 719 251 790 445 8 MM16597~RFU:41258.94 probe~B44R15C19 677 363 268 686 NAME ID a b c d 1 Control_1 probe~B01R01C01 381 213 345 653 2 Control_2 probe~B01R01C02 574 629 563 783 3 Control_1 probe~B01R09C01 673 511 521 967 4 Control_3 probe~B01R09C02 53 809 999 50 5 MM0289~RFU:11810.15 probe~B29R13C06 681 34 115 587 6 MM0289~RFU:9238.41 probe~B29R13C05 784 443 20 784 7 MM16597~RFU:36765.38 probe~B44R15C20 719 251 790 445 8 MM16597~RFU:41258.94 probe~B44R15C19 677 363 268 686 Sorry, that should look like this: NAME ID a b c d 1 Control_1 probe~B01R01C01 381 213 345 653 2 Control_2 probe~B01R01C02 574 629 563 783 3 Control_1 probe~B01R09C01 673 511 521 967 4 Control_3 probe~B01R09C02 53 809 999 50 5 MM0289~RFU:11810.15 probe~B29R13C06 681 34 115 587 6 MM0289~RFU:9238.41 probe~B29R13C05 784 443 20 784 7 MM16597~RFU:36765.38 probe~B44R15C20 719 251 790 445 8 MM16597~RFU:41258.94 probe~B44R15C19 677 363 268 686 NAME ID a b c d 1 Control_1 probe~B01R01C01 3 22 926 774 2 Control_2 probe~B01R01C02 712 13 32 179 3 Control_1 probe~B01R09C01 937 824 898 668 4 Control_3 probe~B01R09C02 464 836 508 53 5 MM0289~RFU:11810.15 probe~B29R13C06 99 544 607 984 6 MM0289~RFU:9238.41 probe~B29R13C05 605 603 862 575 7 MM16597~RFU:36765.38 probe~B44R15C20 700 923 219 582 8 MM16597~RFU:41258.94 probe~B44R15C19 132 777 497 995 --- On Thu, 1/12/11, Jabez Wilson <jabezwuk@yahoo.co.uk> wrote: From: Jabez Wilson <jabezwuk@yahoo.co.uk> Subject: calculate mean of multiple rows in a data frame To: "R-Help" <r-help@stat.math.ethz.ch> Date: Thursday, 1 December, 2011, 20:45 Dear all, I have a data frame (DF) in the following format: NAME ID a b c d 1 Control_1 probe~B01R01C01 381 213 345 653 2 Control_2 probe~B01R01C02 574 629 563 783 3 Control_1 probe~B01R09C01 673 511 521 967 4 Control_3 probe~B01R09C02 53 809 999 50 5 MM0289~RFU:11810.15 probe~B29R13C06 681 34 115 587 6 MM0289~RFU:9238.41 probe~B29R13C05 784 443 20 784 7 MM16597~RFU:36765.38 probe~B44R15C20 719 251 790 445 8 MM16597~RFU:41258.94 probe~B44R15C19 677 363 268 686..... I would like to consolidate the data frame by parsing through the rows, and where the NAME is identical, consolidate into one row and return the mean. I can do this for the first lines (Control_1 etc) by using aggregate() aggregate(DF[,-c(1:2)], by=list(DF$NAME), mean) but since aggregate looks for unique lines it won't consolidate e.g. lines 5/6 and 7/8. Is there a way of telling aggregate to grep just the first part of the name (i.e. up to "~") and consolidate those? I could pre-grep the file before importing into R, but I'd like to do it within R if possible. Thanks for any suggestions [[alternative HTML version deleted]]
Jean V Adams
2011-Dec-02 14:29 UTC
[R] Fw: calculate mean of multiple rows in a data frame
It's easier for folks to help you if you put your example data in a format
that can be readily read in R. See, for example, the dput() function,
which you can use to provide us with something like this:
DF <- structure(list(NAME = c("Control_1", "Control_2",
"Control_1",
"Control_3", "MM0289~RFU:11810.15",
"MM0289~RFU:9238.41",
"MM16597~RFU:36765.38",
"MM16597~RFU:41258.94"), ID = c("probe~B01R01C01",
"probe~B01R01C02",
"probe~B01R09C01", "probe~B01R09C02",
"probe~B29R13C06",
"probe~B29R13C05",
"probe~B44R15C20", "probe~B44R15C19"), a = c(3L, 712L, 937L,
464L, 99L, 605L, 700L, 132L), b = c(22L, 13L, 824L, 836L, 544L,
603L, 923L, 777L), c = c(926L, 32L, 898L, 508L, 607L, 862L, 219L,
497L), d = c(774L, 179L, 668L, 53L, 984L, 575L, 582L, 995L)), .Names =
c("NAME",
"ID", "a", "b", "c", "d"),
class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6",
"7", "8"))
If I understand what you're after, you want to summarize data within
groups, but your NAME variable is not as general as you would like. You
can get around this by creating a new variable which is a shorter and more
general version of the NAME variable. I did this by saving just the part
of the NAME before the colon, ":".
shortname <- sapply(strsplit(DF$NAME, ":"), "[", 1)
aggregate(DF[, -(1:2)], by=list(shortname=shortname), mean)
shortname a b c d
1 Control_1 470 423.0 912.0 721.0
2 Control_2 712 13.0 32.0 179.0
3 Control_3 464 836.0 508.0 53.0
4 MM0289~RFU 352 573.5 734.5 779.5
5 MM16597~RFU 416 850.0 358.0 788.5
Jean
> Jabez Wilson wrote on 12/01/2011 03:15:39 PM:
> NAME
> ID
> a
> b
> c
> d
>
> 1
> Control_1
> probe~B01R01C01
> 381
> 213
> 345
> 653
>
> 2
> Control_2
> probe~B01R01C02
> 574
> 629
> 563
> 783
>
> 3
> Control_1
> probe~B01R09C01
> 673
> 511
> 521
> 967
>
> 4
> Control_3
> probe~B01R09C02
> 53
> 809
> 999
> 50
>
> 5
> MM0289~RFU:11810.15
> probe~B29R13C06
> 681
> 34
> 115
> 587
>
> 6
> MM0289~RFU:9238.41
> probe~B29R13C05
> 784
> 443
> 20
> 784
>
> 7
> MM16597~RFU:36765.38
> probe~B44R15C20
> 719
> 251
> 790
> 445
>
> 8
> MM16597~RFU:41258.94
> probe~B44R15C19
> 677
> 363
> 268
> 686
>
>
>
> NAME
> ID
> a
> b
> c
> d
>
> 1
> Control_1
> probe~B01R01C01
> 381
> 213
> 345
> 653
>
> 2
> Control_2
> probe~B01R01C02
> 574
> 629
> 563
> 783
>
> 3
> Control_1
> probe~B01R09C01
> 673
> 511
> 521
> 967
>
> 4
> Control_3
> probe~B01R09C02
> 53
> 809
> 999
> 50
>
> 5
> MM0289~RFU:11810.15
> probe~B29R13C06
> 681
> 34
> 115
> 587
>
> 6
> MM0289~RFU:9238.41
> probe~B29R13C05
> 784
> 443
> 20
> 784
>
> 7
> MM16597~RFU:36765.38
> probe~B44R15C20
> 719
> 251
> 790
> 445
>
> 8
> MM16597~RFU:41258.94
> probe~B44R15C19
> 677
> 363
> 268
> 686
> Sorry, that should look like this:
>
>
>
>
> NAME
> ID
> a
> b
> c
> d
>
> 1
> Control_1
> probe~B01R01C01
> 381
> 213
> 345
> 653
>
> 2
> Control_2
> probe~B01R01C02
> 574
> 629
> 563
> 783
>
> 3
> Control_1
> probe~B01R09C01
> 673
> 511
> 521
> 967
>
> 4
> Control_3
> probe~B01R09C02
> 53
> 809
> 999
> 50
>
> 5
> MM0289~RFU:11810.15
> probe~B29R13C06
> 681
> 34
> 115
> 587
>
> 6
> MM0289~RFU:9238.41
> probe~B29R13C05
> 784
> 443
> 20
> 784
>
> 7
> MM16597~RFU:36765.38
> probe~B44R15C20
> 719
> 251
> 790
> 445
>
> 8
> MM16597~RFU:41258.94
> probe~B44R15C19
> 677
> 363
> 268
> 686 NAME ID a b c d
> 1 Control_1 probe~B01R01C01 3 22 926 774
> 2 Control_2 probe~B01R01C02 712 13 32 179
> 3 Control_1 probe~B01R09C01 937 824 898 668
> 4 Control_3 probe~B01R09C02 464 836 508 53
> 5 MM0289~RFU:11810.15 probe~B29R13C06 99 544 607 984
> 6 MM0289~RFU:9238.41 probe~B29R13C05 605 603 862 575
> 7 MM16597~RFU:36765.38 probe~B44R15C20 700 923 219 582
> 8 MM16597~RFU:41258.94 probe~B44R15C19 132 777 497 995
>
> --- On Thu, 1/12/11, Jabez Wilson <jabezwuk@yahoo.co.uk> wrote:
>
>
> From: Jabez Wilson <jabezwuk@yahoo.co.uk>
> Subject: calculate mean of multiple rows in a data frame
> To: "R-Help" <r-help@stat.math.ethz.ch>
> Date: Thursday, 1 December, 2011, 20:45
>
> Dear all, I have a data frame (DF) in the following format:
>
> NAME
> ID
> a
> b
> c
> d
>
> 1
> Control_1
> probe~B01R01C01
> 381
> 213
> 345
> 653
>
> 2
> Control_2
> probe~B01R01C02
> 574
> 629
> 563
> 783
>
> 3
> Control_1
> probe~B01R09C01
> 673
> 511
> 521
> 967
>
> 4
> Control_3
> probe~B01R09C02
> 53
> 809
> 999
> 50
>
> 5
> MM0289~RFU:11810.15
> probe~B29R13C06
> 681
> 34
> 115
> 587
>
> 6
> MM0289~RFU:9238.41
> probe~B29R13C05
> 784
> 443
> 20
> 784
>
> 7
> MM16597~RFU:36765.38
> probe~B44R15C20
> 719
> 251
> 790
> 445
>
> 8
> MM16597~RFU:41258.94
> probe~B44R15C19
> 677
> 363
> 268
> 686.....
> I would like to consolidate the data frame by parsing through the
> rows, and where the NAME is identical, consolidate into one row and
> return the mean.
> I can do this for the first lines (Control_1 etc) by using aggregate()
> aggregate(DF[,-c(1:2)], by=list(DF$NAME), mean)
> but since aggregate looks for unique lines it won't consolidate e.g.
> lines 5/6 and 7/8.
> Is there a way of telling aggregate to grep just the first part of
> the name (i.e. up to "~") and consolidate those?
> I could pre-grep the file before importing into R, but I'd like to
> do it within R if possible.
> Thanks for any suggestions
[[alternative HTML version deleted]]