thr3ads.net - R help - [R] Fw: calculate mean of multiple rows in a data frame [Dec 2011]

If this information is useful, please help other people find it:
Share via:

Jabez Wilson

2011-Dec-01 21:15 UTC

[R] Fw: calculate mean of multiple rows in a data frame

NAME
ID
a
b
c
d

1
Control_1
probe~B01R01C01
381
213
345
653

2
Control_2
probe~B01R01C02
574
629
563
783

3
Control_1
probe~B01R09C01
673
511
521
967

4
Control_3
probe~B01R09C02
53
809
999
50

5
MM0289~RFU:11810.15
probe~B29R13C06
681
34
115
587

6
MM0289~RFU:9238.41
probe~B29R13C05
784
443
20
784

7
MM16597~RFU:36765.38
probe~B44R15C20
719
251
790
445

8
MM16597~RFU:41258.94
probe~B44R15C19
677
363
268
686



NAME
ID
a
b
c
d

1
Control_1
probe~B01R01C01
381
213
345
653

2
Control_2
probe~B01R01C02
574
629
563
783

3
Control_1
probe~B01R09C01
673
511
521
967

4
Control_3
probe~B01R09C02
53
809
999
50

5
MM0289~RFU:11810.15
probe~B29R13C06
681
34
115
587

6
MM0289~RFU:9238.41
probe~B29R13C05
784
443
20
784

7
MM16597~RFU:36765.38
probe~B44R15C20
719
251
790
445

8
MM16597~RFU:41258.94
probe~B44R15C19
677
363
268
686
Sorry, that should look like this:




NAME
ID
a
b
c
d

1
Control_1
probe~B01R01C01
381
213
345
653

2
Control_2
probe~B01R01C02
574
629
563
783

3
Control_1
probe~B01R09C01
673
511
521
967

4
Control_3
probe~B01R09C02
53
809
999
50

5
MM0289~RFU:11810.15
probe~B29R13C06
681
34
115
587

6
MM0289~RFU:9238.41
probe~B29R13C05
784
443
20
784

7
MM16597~RFU:36765.38
probe~B44R15C20
719
251
790
445

8
MM16597~RFU:41258.94
probe~B44R15C19
677
363
268
686 NAME ID a b c d 
1 Control_1 probe~B01R01C01 3 22 926 774 
2 Control_2 probe~B01R01C02 712 13 32 179 
3 Control_1 probe~B01R09C01 937 824 898 668 
4 Control_3 probe~B01R09C02 464 836 508 53 
5 MM0289~RFU:11810.15 probe~B29R13C06 99 544 607 984 
6 MM0289~RFU:9238.41 probe~B29R13C05 605 603 862 575 
7 MM16597~RFU:36765.38 probe~B44R15C20 700 923 219 582 
8 MM16597~RFU:41258.94 probe~B44R15C19 132 777 497 995

--- On Thu, 1/12/11, Jabez Wilson <jabezwuk@yahoo.co.uk> wrote:


From: Jabez Wilson <jabezwuk@yahoo.co.uk>
Subject: calculate mean of multiple rows in a data frame
To: "R-Help" <r-help@stat.math.ethz.ch>
Date: Thursday, 1 December, 2011, 20:45







Dear all, I have a data frame (DF) in the following format:










NAME
ID
a
b
c
d

1
Control_1
probe~B01R01C01
381
213
345
653

2
Control_2
probe~B01R01C02
574
629
563
783

3
Control_1
probe~B01R09C01
673
511
521
967

4
Control_3
probe~B01R09C02
53
809
999
50

5
MM0289~RFU:11810.15
probe~B29R13C06
681
34
115
587

6
MM0289~RFU:9238.41
probe~B29R13C05
784
443
20
784

7
MM16597~RFU:36765.38
probe~B44R15C20
719
251
790
445

8
MM16597~RFU:41258.94
probe~B44R15C19
677
363
268
686.....
I would like to consolidate the data frame by parsing through the rows, and
where the NAME is identical, consolidate into one row and return the mean.
I can do this for the first lines (Control_1 etc) by using aggregate()
aggregate(DF[,-c(1:2)], by=list(DF$NAME), mean)
but since aggregate looks for unique lines it won't consolidate e.g. lines
5/6 and 7/8.
Is there a way of telling aggregate to grep just the first part of the name
(i.e. up to "~") and consolidate those?
I could pre-grep the file before importing into R, but I'd like to do it
within R if possible.
Thanks for any suggestions
 
	[[alternative HTML version deleted]]

Jean V Adams

2011-Dec-02 14:29 UTC

head link

[R] Fw: calculate mean of multiple rows in a data frame

It's easier for folks to help you if you put your example data in a format 
that can be readily read in R.  See, for example, the dput() function, 
which you can use to provide us with something like this:

DF <- structure(list(NAME = c("Control_1", "Control_2",
"Control_1",
"Control_3", "MM0289~RFU:11810.15",
"MM0289~RFU:9238.41",
"MM16597~RFU:36765.38", 
"MM16597~RFU:41258.94"), ID = c("probe~B01R01C01",
"probe~B01R01C02",
"probe~B01R09C01", "probe~B01R09C02",
"probe~B29R13C06",
"probe~B29R13C05", 
"probe~B44R15C20", "probe~B44R15C19"), a = c(3L, 712L, 937L,
464L, 99L, 605L, 700L, 132L), b = c(22L, 13L, 824L, 836L, 544L, 
603L, 923L, 777L), c = c(926L, 32L, 898L, 508L, 607L, 862L, 219L, 
497L), d = c(774L, 179L, 668L, 53L, 984L, 575L, 582L, 995L)), .Names = 
c("NAME", 
"ID", "a", "b", "c", "d"),
class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6",
"7", "8"))

If I understand what you're after, you want to summarize data within 
groups, but your NAME variable is not as general as you would like.  You 
can get around this by creating a new variable which is a shorter and more 
general version of the NAME variable.  I did this by saving just the part 
of the NAME before the colon, ":".

shortname <- sapply(strsplit(DF$NAME, ":"), "[", 1)
aggregate(DF[, -(1:2)], by=list(shortname=shortname), mean) 

    shortname   a     b     c     d
1   Control_1 470 423.0 912.0 721.0
2   Control_2 712  13.0  32.0 179.0
3   Control_3 464 836.0 508.0  53.0
4  MM0289~RFU 352 573.5 734.5 779.5
5 MM16597~RFU 416 850.0 358.0 788.5

Jean

> Jabez Wilson wrote on 12/01/2011 03:15:39 PM:
> NAME
> ID
> a
> b
> c
> d
> 
> 1
> Control_1
> probe~B01R01C01
> 381
> 213
> 345
> 653
> 
> 2
> Control_2
> probe~B01R01C02
> 574
> 629
> 563
> 783
> 
> 3
> Control_1
> probe~B01R09C01
> 673
> 511
> 521
> 967
> 
> 4
> Control_3
> probe~B01R09C02
> 53
> 809
> 999
> 50
> 
> 5
> MM0289~RFU:11810.15
> probe~B29R13C06
> 681
> 34
> 115
> 587
> 
> 6
> MM0289~RFU:9238.41
> probe~B29R13C05
> 784
> 443
> 20
> 784
> 
> 7
> MM16597~RFU:36765.38
> probe~B44R15C20
> 719
> 251
> 790
> 445
> 
> 8
> MM16597~RFU:41258.94
> probe~B44R15C19
> 677
> 363
> 268
> 686
> 
> 
> 
> NAME
> ID
> a
> b
> c
> d
> 
> 1
> Control_1
> probe~B01R01C01
> 381
> 213
> 345
> 653
> 
> 2
> Control_2
> probe~B01R01C02
> 574
> 629
> 563
> 783
> 
> 3
> Control_1
> probe~B01R09C01
> 673
> 511
> 521
> 967
> 
> 4
> Control_3
> probe~B01R09C02
> 53
> 809
> 999
> 50
> 
> 5
> MM0289~RFU:11810.15
> probe~B29R13C06
> 681
> 34
> 115
> 587
> 
> 6
> MM0289~RFU:9238.41
> probe~B29R13C05
> 784
> 443
> 20
> 784
> 
> 7
> MM16597~RFU:36765.38
> probe~B44R15C20
> 719
> 251
> 790
> 445
> 
> 8
> MM16597~RFU:41258.94
> probe~B44R15C19
> 677
> 363
> 268
> 686
> Sorry, that should look like this:
> 
> 
> 
> 
> NAME
> ID
> a
> b
> c
> d
> 
> 1
> Control_1
> probe~B01R01C01
> 381
> 213
> 345
> 653
> 
> 2
> Control_2
> probe~B01R01C02
> 574
> 629
> 563
> 783
> 
> 3
> Control_1
> probe~B01R09C01
> 673
> 511
> 521
> 967
> 
> 4
> Control_3
> probe~B01R09C02
> 53
> 809
> 999
> 50
> 
> 5
> MM0289~RFU:11810.15
> probe~B29R13C06
> 681
> 34
> 115
> 587
> 
> 6
> MM0289~RFU:9238.41
> probe~B29R13C05
> 784
> 443
> 20
> 784
> 
> 7
> MM16597~RFU:36765.38
> probe~B44R15C20
> 719
> 251
> 790
> 445
> 
> 8
> MM16597~RFU:41258.94
> probe~B44R15C19
> 677
> 363
> 268
> 686 NAME ID a b c d 
> 1 Control_1 probe~B01R01C01 3 22 926 774 
> 2 Control_2 probe~B01R01C02 712 13 32 179 
> 3 Control_1 probe~B01R09C01 937 824 898 668 
> 4 Control_3 probe~B01R09C02 464 836 508 53 
> 5 MM0289~RFU:11810.15 probe~B29R13C06 99 544 607 984 
> 6 MM0289~RFU:9238.41 probe~B29R13C05 605 603 862 575 
> 7 MM16597~RFU:36765.38 probe~B44R15C20 700 923 219 582 
> 8 MM16597~RFU:41258.94 probe~B44R15C19 132 777 497 995
> 
> --- On Thu, 1/12/11, Jabez Wilson <jabezwuk@yahoo.co.uk> wrote:
> 
> 
> From: Jabez Wilson <jabezwuk@yahoo.co.uk>
> Subject: calculate mean of multiple rows in a data frame
> To: "R-Help" <r-help@stat.math.ethz.ch>
> Date: Thursday, 1 December, 2011, 20:45
> 
> Dear all, I have a data frame (DF) in the following format:
> 
> NAME
> ID
> a
> b
> c
> d
> 
> 1
> Control_1
> probe~B01R01C01
> 381
> 213
> 345
> 653
> 
> 2
> Control_2
> probe~B01R01C02
> 574
> 629
> 563
> 783
> 
> 3
> Control_1
> probe~B01R09C01
> 673
> 511
> 521
> 967
> 
> 4
> Control_3
> probe~B01R09C02
> 53
> 809
> 999
> 50
> 
> 5
> MM0289~RFU:11810.15
> probe~B29R13C06
> 681
> 34
> 115
> 587
> 
> 6
> MM0289~RFU:9238.41
> probe~B29R13C05
> 784
> 443
> 20
> 784
> 
> 7
> MM16597~RFU:36765.38
> probe~B44R15C20
> 719
> 251
> 790
> 445
> 
> 8
> MM16597~RFU:41258.94
> probe~B44R15C19
> 677
> 363
> 268
> 686.....
> I would like to consolidate the data frame by parsing through the 
> rows, and where the NAME is identical, consolidate into one row and 
> return the mean.
> I can do this for the first lines (Control_1 etc) by using aggregate()
> aggregate(DF[,-c(1:2)], by=list(DF$NAME), mean)
> but since aggregate looks for unique lines it won't consolidate e.g.
> lines 5/6 and 7/8.
> Is there a way of telling aggregate to grep just the first part of 
> the name (i.e. up to "~") and consolidate those?
> I could pre-grep the file before importing into R, but I'd like to 
> do it within R if possible.
> Thanks for any suggestions
	[[alternative HTML version deleted]]

Maybe Matching Threads

calculate mean of multiple rows in a data frame

R help - Dec 2011 - Fw: calculate mean of multiple rows in a data frame

[R] Fw: calculate mean of multiple rows in a data frame

[R] Fw: calculate mean of multiple rows in a data frame

Maybe Matching Threads