Joel Fürstenberg-Hägg
2010-Jan-15 10:03 UTC
[R] Remove part of string in colname and calculate mean for columns groups
Hi all, I have two question. First, I wonder how to remove a part of the column names in a matrix? I would like to remove the "_ACCX" or "_NAX" part below. Is there a method where the "_" as well as all characters after i can be removed?> dim(exprdata)[1] 88 512>> colnames(exprdata[,c(1:20)])[1] "Akita_ACC1" "Akita_ACC2" "Akita_ACC3" "Akita_ACC4" "Alc.0_ACC1" "Alc.0_ACC2" "Alc.0_ACC3" [8] "Alc.0_ACC4" "Alc.0_ACC5" "Bl.1_ACC1" "Bl.1_ACC2" "Bl.1_ACC3" "Bl.1_ACC4" "Bla.1_ACC1" [15] "Bla.1_ACC2" "Bla.1_ACC3" "Bla.1_ACC4" "Blh.1_ACC1" "Blh.1_ACC2" "Blh.1_ACC3" Secondly, I would like to calculate the mean of each column group in the matrix, for instance all columns beginning with "Akita", and save all new columns as a new matrix. For instance, use:> head(exprdata[,c(1:4)])Akita_ACC1 Akita_ACC2 Akita_ACC3 Akita_ACC4 A100005-101 6.668931 NA NA NA A122001-101 10.562564 11.706395 11.608989 8.289093 A128001-101 14.946749 8.112625 8.176438 10.104254 A133001-101 5.186679 6.089870 4.119589 3.168841 A133003-101 NA NA 19.825480 2.587695 A134001-101 3.259402 4.835642 4.679607 4.490254 To get something like: Akita A100005-101 6.668931 A122001-101 10.54176 A128001-101 10.10425 A133001-101 3.168841 A133003-101 2.587695 A134001-101 4.490254 However, the column groups are of different sizes (3-10 columns) so I guess I'll need a method based on the column names. Anyone who can help me? Best regards, Joel _________________________________________________________________ Nya Windows 7 - Hitta en dator som passar dig! Mer information. http://windows.microsoft.com/shop [[alternative HTML version deleted]]
Dieter Menne
2010-Jan-15 11:20 UTC
[R] Remove part of string in colname and calculate mean for columns groups
Joel F?rstenberg-H?gg wrote:> > > > I have two question. First, I wonder how to remove a part of the column > names in a matrix? I would like to remove the "_ACCX" or "_NAX" part > below. Is there a method where the "_" as well as all characters after i > can be removed? > > Secondly, I would like to calculate the mean of each column group in the > matrix, for instance all columns beginning with "Akita", and save all new > columns as a new matrix. > >If you do that in the example you gave, duplicate column names would be the result, but let's assume that was a typo. # Simplify names data = data.frame(Akita.0_ACC1=1:2,Akita.1_ACC2=2:3,Alc.0_ACC1=3:4) names(data) = sub("_.*","",names(data)) # Make a new data frame with the Akitas only dataAkita = data[,grep("Akita",names(data))] To do the averaging, check the examples in package plyr. Dieter -- View this message in context: http://n4.nabble.com/Remove-part-of-string-in-colname-and-calculate-mean-for-columns-groups-tp1014652p1014724.html Sent from the R help mailing list archive at Nabble.com.