I think I understand tapply but i still
can't figure out how to do the following.
I have a dataframe where some of the column names are the same
and i want to make a new dataframe where columns
that have the same name are averaged by row.
so, if the data frame, DF, was
AAA BBB CCC AAA DDD
1 0 7 11 13
2 0 8 12 14
3 0 6 0 15
then the resulting data frame would be exactly the same except
that the AAA column would be
6 comes from (11 + 1)/2
7 comes from (12 + 2)/2
3 stays 3 because the element in the other AAA is zero
so i don't want to average that one. it shoulsd just stay 3.
So, I do
DF[DF == 0]<-NA
rowaverage<-function(x) x[rowMeans(forecastDf[x],na.rm=TRUE)
revisedDF<-tapply(seq(DF),names(DF),rowmeans)
there are two problems with this :
1) i need to go through the rows of the same name, not the columns
so i don't think seq(DF) is right because that goes through
the columns but i want to go through rows.
2) BBB will come back with ALL NA's ( since
it was unique and there was nothing else to average ( and I don't know how
to transform that BB column to all zero's.
thanks and i'm sorry for so many questions. i'm getting bettter with
this stuff and my questions will decrease soon.
my guess is that i no longer should be using tapply ?
and should be using some other version of apply.
thanks
mark
i think you can't have column with the same names. > data.frame(AAA=1:3, AAA=4:6) AAA AAA.1 1 1 4 2 2 5 3 3 6 but you could subset the data frame by names using substring(): sapply(unique(substring(names(data1), 1, 3)), function(x) rowMeans(data1[, substring(names(data1), 1, 3) == x]) ------------------------------------------------------------------- Jacques VESLOT CNRS UMR 8090 I.B.L (2?me ?tage) 1 rue du Professeur Calmette B.P. 245 59019 Lille Cedex Tel : 33 (0)3.20.87.10.44 Fax : 33 (0)3.20.87.10.31 http://www-good.ibl.fr ------------------------------------------------------------------- markleeds at verizon.net a ?crit :> I think I understand tapply but i still > can't figure out how to do the following. > > I have a dataframe where some of the column names are the same > and i want to make a new dataframe where columns > that have the same name are averaged by row. > > so, if the data frame, DF, was > > AAA BBB CCC AAA DDD > 1 0 7 11 13 > 2 0 8 12 14 > 3 0 6 0 15 > > then the resulting data frame would be exactly the same except > that the AAA column would be > > 6 comes from (11 + 1)/2 > 7 comes from (12 + 2)/2 > 3 stays 3 because the element in the other AAA is zero > so i don't want to average that one. it shoulsd just stay 3. > > So, I do > > DF[DF == 0]<-NA > rowaverage<-function(x) x[rowMeans(forecastDf[x],na.rm=TRUE) > revisedDF<-tapply(seq(DF),names(DF),rowmeans) > > there are two problems with this : > > 1) i need to go through the rows of the same name, not the columns > so i don't think seq(DF) is right because that goes through > the columns but i want to go through rows. > > 2) BBB will come back with ALL NA's ( since > it was unique and there was nothing else to average ( and I don't know how to transform that BB column to all zero's. > > thanks and i'm sorry for so many questions. i'm getting bettter with this stuff and my questions will decrease soon. > > my guess is that i no longer should be using tapply ? > and should be using some other version of apply. > thanks > mark > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >
I think this does what you want:> In <- "AAA BBB CCC AAA DDD+ 1 0 7 11 13 + 2 0 8 12 14 + 3 0 6 0 15"> DF <- read.table(textConnection(In), header=TRUE, check.names=FALSE) > > DF[DF == 0]<-NA > rowaverage<-function(x) rowMeans(DF[x],na.rm=TRUE) > revisedDF<-tapply(seq(DF),names(DF),rowaverage) > revisedDF$AAA 1 2 3 6 7 3 $BBB 1 2 3 NA NA NA $CCC 1 2 3 7 8 6 $DDD 1 2 3 13 14 15> do.call('cbind', revisedDF)AAA BBB CCC DDD 1 6 NA 7 13 2 7 NA 8 14 3 3 NA 6 15> >On 7/6/06, markleeds@verizon.net <markleeds@verizon.net> wrote:> > I think I understand tapply but i still > can't figure out how to do the following. > > I have a dataframe where some of the column names are the same > and i want to make a new dataframe where columns > that have the same name are averaged by row. > > so, if the data frame, DF, was > > AAA BBB CCC AAA DDD > 1 0 7 11 13 > 2 0 8 12 14 > 3 0 6 0 15 > > then the resulting data frame would be exactly the same except > that the AAA column would be > > 6 comes from (11 + 1)/2 > 7 comes from (12 + 2)/2 > 3 stays 3 because the element in the other AAA is zero > so i don't want to average that one. it shoulsd just stay 3. > > So, I do > > DF[DF == 0]<-NA > rowaverage<-function(x) x[rowMeans(forecastDf[x],na.rm=TRUE) > revisedDF<-tapply(seq(DF),names(DF),rowmeans) > > there are two problems with this : > > 1) i need to go through the rows of the same name, not the columns > so i don't think seq(DF) is right because that goes through > the columns but i want to go through rows. > > 2) BBB will come back with ALL NA's ( since > it was unique and there was nothing else to average ( and I don't know how > to transform that BB column to all zero's. > > thanks and i'm sorry for so many questions. i'm getting bettter with this > stuff and my questions will decrease soon. > > my guess is that i no longer should be using tapply ? > and should be using some other version of apply. > thanks > mark > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html >-- Jim Holtman Cincinnati, OH +1 513 646 9390 (Cell) +1 513 247 0281 (Home) What is the problem you are trying to solve? [[alternative HTML version deleted]]