I think I understand tapply but i still can't figure out how to do the following. I have a dataframe where some of the column names are the same and i want to make a new dataframe where columns that have the same name are averaged by row. so, if the data frame, DF, was AAA BBB CCC AAA DDD 1 0 7 11 13 2 0 8 12 14 3 0 6 0 15 then the resulting data frame would be exactly the same except that the AAA column would be 6 comes from (11 + 1)/2 7 comes from (12 + 2)/2 3 stays 3 because the element in the other AAA is zero so i don't want to average that one. it shoulsd just stay 3. So, I do DF[DF == 0]<-NA rowaverage<-function(x) x[rowMeans(forecastDf[x],na.rm=TRUE) revisedDF<-tapply(seq(DF),names(DF),rowmeans) there are two problems with this : 1) i need to go through the rows of the same name, not the columns so i don't think seq(DF) is right because that goes through the columns but i want to go through rows. 2) BBB will come back with ALL NA's ( since it was unique and there was nothing else to average ( and I don't know how to transform that BB column to all zero's. thanks and i'm sorry for so many questions. i'm getting bettter with this stuff and my questions will decrease soon. my guess is that i no longer should be using tapply ? and should be using some other version of apply. thanks mark
i think you can't have column with the same names. > data.frame(AAA=1:3, AAA=4:6) AAA AAA.1 1 1 4 2 2 5 3 3 6 but you could subset the data frame by names using substring(): sapply(unique(substring(names(data1), 1, 3)), function(x) rowMeans(data1[, substring(names(data1), 1, 3) == x]) ------------------------------------------------------------------- Jacques VESLOT CNRS UMR 8090 I.B.L (2?me ?tage) 1 rue du Professeur Calmette B.P. 245 59019 Lille Cedex Tel : 33 (0)3.20.87.10.44 Fax : 33 (0)3.20.87.10.31 http://www-good.ibl.fr ------------------------------------------------------------------- markleeds at verizon.net a ?crit :> I think I understand tapply but i still > can't figure out how to do the following. > > I have a dataframe where some of the column names are the same > and i want to make a new dataframe where columns > that have the same name are averaged by row. > > so, if the data frame, DF, was > > AAA BBB CCC AAA DDD > 1 0 7 11 13 > 2 0 8 12 14 > 3 0 6 0 15 > > then the resulting data frame would be exactly the same except > that the AAA column would be > > 6 comes from (11 + 1)/2 > 7 comes from (12 + 2)/2 > 3 stays 3 because the element in the other AAA is zero > so i don't want to average that one. it shoulsd just stay 3. > > So, I do > > DF[DF == 0]<-NA > rowaverage<-function(x) x[rowMeans(forecastDf[x],na.rm=TRUE) > revisedDF<-tapply(seq(DF),names(DF),rowmeans) > > there are two problems with this : > > 1) i need to go through the rows of the same name, not the columns > so i don't think seq(DF) is right because that goes through > the columns but i want to go through rows. > > 2) BBB will come back with ALL NA's ( since > it was unique and there was nothing else to average ( and I don't know how to transform that BB column to all zero's. > > thanks and i'm sorry for so many questions. i'm getting bettter with this stuff and my questions will decrease soon. > > my guess is that i no longer should be using tapply ? > and should be using some other version of apply. > thanks > mark > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >
I think this does what you want:> In <- "AAA BBB CCC AAA DDD+ 1 0 7 11 13 + 2 0 8 12 14 + 3 0 6 0 15"> DF <- read.table(textConnection(In), header=TRUE, check.names=FALSE) > > DF[DF == 0]<-NA > rowaverage<-function(x) rowMeans(DF[x],na.rm=TRUE) > revisedDF<-tapply(seq(DF),names(DF),rowaverage) > revisedDF$AAA 1 2 3 6 7 3 $BBB 1 2 3 NA NA NA $CCC 1 2 3 7 8 6 $DDD 1 2 3 13 14 15> do.call('cbind', revisedDF)AAA BBB CCC DDD 1 6 NA 7 13 2 7 NA 8 14 3 3 NA 6 15> >On 7/6/06, markleeds@verizon.net <markleeds@verizon.net> wrote:> > I think I understand tapply but i still > can't figure out how to do the following. > > I have a dataframe where some of the column names are the same > and i want to make a new dataframe where columns > that have the same name are averaged by row. > > so, if the data frame, DF, was > > AAA BBB CCC AAA DDD > 1 0 7 11 13 > 2 0 8 12 14 > 3 0 6 0 15 > > then the resulting data frame would be exactly the same except > that the AAA column would be > > 6 comes from (11 + 1)/2 > 7 comes from (12 + 2)/2 > 3 stays 3 because the element in the other AAA is zero > so i don't want to average that one. it shoulsd just stay 3. > > So, I do > > DF[DF == 0]<-NA > rowaverage<-function(x) x[rowMeans(forecastDf[x],na.rm=TRUE) > revisedDF<-tapply(seq(DF),names(DF),rowmeans) > > there are two problems with this : > > 1) i need to go through the rows of the same name, not the columns > so i don't think seq(DF) is right because that goes through > the columns but i want to go through rows. > > 2) BBB will come back with ALL NA's ( since > it was unique and there was nothing else to average ( and I don't know how > to transform that BB column to all zero's. > > thanks and i'm sorry for so many questions. i'm getting bettter with this > stuff and my questions will decrease soon. > > my guess is that i no longer should be using tapply ? > and should be using some other version of apply. > thanks > mark > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html >-- Jim Holtman Cincinnati, OH +1 513 646 9390 (Cell) +1 513 247 0281 (Home) What is the problem you are trying to solve? [[alternative HTML version deleted]]