Jabez Wilson
2011-Dec-01 21:15 UTC
[R] Fw: calculate mean of multiple rows in a data frame
NAME ID a b c d 1 Control_1 probe~B01R01C01 381 213 345 653 2 Control_2 probe~B01R01C02 574 629 563 783 3 Control_1 probe~B01R09C01 673 511 521 967 4 Control_3 probe~B01R09C02 53 809 999 50 5 MM0289~RFU:11810.15 probe~B29R13C06 681 34 115 587 6 MM0289~RFU:9238.41 probe~B29R13C05 784 443 20 784 7 MM16597~RFU:36765.38 probe~B44R15C20 719 251 790 445 8 MM16597~RFU:41258.94 probe~B44R15C19 677 363 268 686 NAME ID a b c d 1 Control_1 probe~B01R01C01 381 213 345 653 2 Control_2 probe~B01R01C02 574 629 563 783 3 Control_1 probe~B01R09C01 673 511 521 967 4 Control_3 probe~B01R09C02 53 809 999 50 5 MM0289~RFU:11810.15 probe~B29R13C06 681 34 115 587 6 MM0289~RFU:9238.41 probe~B29R13C05 784 443 20 784 7 MM16597~RFU:36765.38 probe~B44R15C20 719 251 790 445 8 MM16597~RFU:41258.94 probe~B44R15C19 677 363 268 686 Sorry, that should look like this: NAME ID a b c d 1 Control_1 probe~B01R01C01 381 213 345 653 2 Control_2 probe~B01R01C02 574 629 563 783 3 Control_1 probe~B01R09C01 673 511 521 967 4 Control_3 probe~B01R09C02 53 809 999 50 5 MM0289~RFU:11810.15 probe~B29R13C06 681 34 115 587 6 MM0289~RFU:9238.41 probe~B29R13C05 784 443 20 784 7 MM16597~RFU:36765.38 probe~B44R15C20 719 251 790 445 8 MM16597~RFU:41258.94 probe~B44R15C19 677 363 268 686 NAME ID a b c d 1 Control_1 probe~B01R01C01 3 22 926 774 2 Control_2 probe~B01R01C02 712 13 32 179 3 Control_1 probe~B01R09C01 937 824 898 668 4 Control_3 probe~B01R09C02 464 836 508 53 5 MM0289~RFU:11810.15 probe~B29R13C06 99 544 607 984 6 MM0289~RFU:9238.41 probe~B29R13C05 605 603 862 575 7 MM16597~RFU:36765.38 probe~B44R15C20 700 923 219 582 8 MM16597~RFU:41258.94 probe~B44R15C19 132 777 497 995 --- On Thu, 1/12/11, Jabez Wilson <jabezwuk@yahoo.co.uk> wrote: From: Jabez Wilson <jabezwuk@yahoo.co.uk> Subject: calculate mean of multiple rows in a data frame To: "R-Help" <r-help@stat.math.ethz.ch> Date: Thursday, 1 December, 2011, 20:45 Dear all, I have a data frame (DF) in the following format: NAME ID a b c d 1 Control_1 probe~B01R01C01 381 213 345 653 2 Control_2 probe~B01R01C02 574 629 563 783 3 Control_1 probe~B01R09C01 673 511 521 967 4 Control_3 probe~B01R09C02 53 809 999 50 5 MM0289~RFU:11810.15 probe~B29R13C06 681 34 115 587 6 MM0289~RFU:9238.41 probe~B29R13C05 784 443 20 784 7 MM16597~RFU:36765.38 probe~B44R15C20 719 251 790 445 8 MM16597~RFU:41258.94 probe~B44R15C19 677 363 268 686..... I would like to consolidate the data frame by parsing through the rows, and where the NAME is identical, consolidate into one row and return the mean. I can do this for the first lines (Control_1 etc) by using aggregate() aggregate(DF[,-c(1:2)], by=list(DF$NAME), mean) but since aggregate looks for unique lines it won't consolidate e.g. lines 5/6 and 7/8. Is there a way of telling aggregate to grep just the first part of the name (i.e. up to "~") and consolidate those? I could pre-grep the file before importing into R, but I'd like to do it within R if possible. Thanks for any suggestions [[alternative HTML version deleted]]
Jean V Adams
2011-Dec-02 14:29 UTC
[R] Fw: calculate mean of multiple rows in a data frame
It's easier for folks to help you if you put your example data in a format that can be readily read in R. See, for example, the dput() function, which you can use to provide us with something like this: DF <- structure(list(NAME = c("Control_1", "Control_2", "Control_1", "Control_3", "MM0289~RFU:11810.15", "MM0289~RFU:9238.41", "MM16597~RFU:36765.38", "MM16597~RFU:41258.94"), ID = c("probe~B01R01C01", "probe~B01R01C02", "probe~B01R09C01", "probe~B01R09C02", "probe~B29R13C06", "probe~B29R13C05", "probe~B44R15C20", "probe~B44R15C19"), a = c(3L, 712L, 937L, 464L, 99L, 605L, 700L, 132L), b = c(22L, 13L, 824L, 836L, 544L, 603L, 923L, 777L), c = c(926L, 32L, 898L, 508L, 607L, 862L, 219L, 497L), d = c(774L, 179L, 668L, 53L, 984L, 575L, 582L, 995L)), .Names = c("NAME", "ID", "a", "b", "c", "d"), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6", "7", "8")) If I understand what you're after, you want to summarize data within groups, but your NAME variable is not as general as you would like. You can get around this by creating a new variable which is a shorter and more general version of the NAME variable. I did this by saving just the part of the NAME before the colon, ":". shortname <- sapply(strsplit(DF$NAME, ":"), "[", 1) aggregate(DF[, -(1:2)], by=list(shortname=shortname), mean) shortname a b c d 1 Control_1 470 423.0 912.0 721.0 2 Control_2 712 13.0 32.0 179.0 3 Control_3 464 836.0 508.0 53.0 4 MM0289~RFU 352 573.5 734.5 779.5 5 MM16597~RFU 416 850.0 358.0 788.5 Jean> Jabez Wilson wrote on 12/01/2011 03:15:39 PM:> NAME > ID > a > b > c > d > > 1 > Control_1 > probe~B01R01C01 > 381 > 213 > 345 > 653 > > 2 > Control_2 > probe~B01R01C02 > 574 > 629 > 563 > 783 > > 3 > Control_1 > probe~B01R09C01 > 673 > 511 > 521 > 967 > > 4 > Control_3 > probe~B01R09C02 > 53 > 809 > 999 > 50 > > 5 > MM0289~RFU:11810.15 > probe~B29R13C06 > 681 > 34 > 115 > 587 > > 6 > MM0289~RFU:9238.41 > probe~B29R13C05 > 784 > 443 > 20 > 784 > > 7 > MM16597~RFU:36765.38 > probe~B44R15C20 > 719 > 251 > 790 > 445 > > 8 > MM16597~RFU:41258.94 > probe~B44R15C19 > 677 > 363 > 268 > 686 > > > > NAME > ID > a > b > c > d > > 1 > Control_1 > probe~B01R01C01 > 381 > 213 > 345 > 653 > > 2 > Control_2 > probe~B01R01C02 > 574 > 629 > 563 > 783 > > 3 > Control_1 > probe~B01R09C01 > 673 > 511 > 521 > 967 > > 4 > Control_3 > probe~B01R09C02 > 53 > 809 > 999 > 50 > > 5 > MM0289~RFU:11810.15 > probe~B29R13C06 > 681 > 34 > 115 > 587 > > 6 > MM0289~RFU:9238.41 > probe~B29R13C05 > 784 > 443 > 20 > 784 > > 7 > MM16597~RFU:36765.38 > probe~B44R15C20 > 719 > 251 > 790 > 445 > > 8 > MM16597~RFU:41258.94 > probe~B44R15C19 > 677 > 363 > 268 > 686 > Sorry, that should look like this: > > > > > NAME > ID > a > b > c > d > > 1 > Control_1 > probe~B01R01C01 > 381 > 213 > 345 > 653 > > 2 > Control_2 > probe~B01R01C02 > 574 > 629 > 563 > 783 > > 3 > Control_1 > probe~B01R09C01 > 673 > 511 > 521 > 967 > > 4 > Control_3 > probe~B01R09C02 > 53 > 809 > 999 > 50 > > 5 > MM0289~RFU:11810.15 > probe~B29R13C06 > 681 > 34 > 115 > 587 > > 6 > MM0289~RFU:9238.41 > probe~B29R13C05 > 784 > 443 > 20 > 784 > > 7 > MM16597~RFU:36765.38 > probe~B44R15C20 > 719 > 251 > 790 > 445 > > 8 > MM16597~RFU:41258.94 > probe~B44R15C19 > 677 > 363 > 268 > 686 NAME ID a b c d > 1 Control_1 probe~B01R01C01 3 22 926 774 > 2 Control_2 probe~B01R01C02 712 13 32 179 > 3 Control_1 probe~B01R09C01 937 824 898 668 > 4 Control_3 probe~B01R09C02 464 836 508 53 > 5 MM0289~RFU:11810.15 probe~B29R13C06 99 544 607 984 > 6 MM0289~RFU:9238.41 probe~B29R13C05 605 603 862 575 > 7 MM16597~RFU:36765.38 probe~B44R15C20 700 923 219 582 > 8 MM16597~RFU:41258.94 probe~B44R15C19 132 777 497 995 > > --- On Thu, 1/12/11, Jabez Wilson <jabezwuk@yahoo.co.uk> wrote: > > > From: Jabez Wilson <jabezwuk@yahoo.co.uk> > Subject: calculate mean of multiple rows in a data frame > To: "R-Help" <r-help@stat.math.ethz.ch> > Date: Thursday, 1 December, 2011, 20:45 > > Dear all, I have a data frame (DF) in the following format: > > NAME > ID > a > b > c > d > > 1 > Control_1 > probe~B01R01C01 > 381 > 213 > 345 > 653 > > 2 > Control_2 > probe~B01R01C02 > 574 > 629 > 563 > 783 > > 3 > Control_1 > probe~B01R09C01 > 673 > 511 > 521 > 967 > > 4 > Control_3 > probe~B01R09C02 > 53 > 809 > 999 > 50 > > 5 > MM0289~RFU:11810.15 > probe~B29R13C06 > 681 > 34 > 115 > 587 > > 6 > MM0289~RFU:9238.41 > probe~B29R13C05 > 784 > 443 > 20 > 784 > > 7 > MM16597~RFU:36765.38 > probe~B44R15C20 > 719 > 251 > 790 > 445 > > 8 > MM16597~RFU:41258.94 > probe~B44R15C19 > 677 > 363 > 268 > 686..... > I would like to consolidate the data frame by parsing through the > rows, and where the NAME is identical, consolidate into one row and > return the mean. > I can do this for the first lines (Control_1 etc) by using aggregate() > aggregate(DF[,-c(1:2)], by=list(DF$NAME), mean) > but since aggregate looks for unique lines it won't consolidate e.g. > lines 5/6 and 7/8. > Is there a way of telling aggregate to grep just the first part of > the name (i.e. up to "~") and consolidate those? > I could pre-grep the file before importing into R, but I'd like to > do it within R if possible. > Thanks for any suggestions[[alternative HTML version deleted]]