thr3ads.net - R help - [R] tapply question [Jul 2006]

If this information is useful, please help other people find it:
Share via:

markleeds at verizon.net

2006-Jul-06 13:59 UTC

[R] tapply question

I think I understand tapply but i still
can't figure out how to do the following.

I have a dataframe where some of the column names are the same
and i want to make a new dataframe where columns
that have the same name are averaged by row.

so, if the data frame, DF, was 

AAA    BBB      CCC   AAA DDD
1       0        7     11  13
2        0       8     12  14
3        0       6      0  15

then the resulting data frame would be exactly the same except
that the AAA column would be 

6   comes from  (11 + 1)/2
7    comes from  (12 + 2)/2
3   stays 3 because the element in the other AAA is zero
so i don't want to average that one. it shoulsd just stay 3.

So, I do 

DF[DF == 0]<-NA
rowaverage<-function(x) x[rowMeans(forecastDf[x],na.rm=TRUE)
revisedDF<-tapply(seq(DF),names(DF),rowmeans)

there are two problems with this :

1) i need to go through the rows of the same name, not the columns
so i don't think seq(DF) is right because that goes through 
the columns but i want to go through rows.

2) BBB will come back with ALL NA's ( since
it was unique and there was nothing else to average ( and I don't know how
to transform that BB column to all zero's.

thanks and i'm sorry for so many questions. i'm getting bettter with
this stuff and my questions will decrease soon.

my guess is that i no longer should be using tapply ?
and should be using some other version of apply.
thanks
                                         mark

Jacques VESLOT

2006-Jul-06 14:10 UTC

head link

[R] tapply question

i think you can't have column with the same names.

 > data.frame(AAA=1:3, AAA=4:6)
   AAA AAA.1
1   1     4
2   2     5
3   3     6

but you could subset the data frame by names using substring():

sapply(unique(substring(names(data1), 1, 3)), function(x)
	rowMeans(data1[, substring(names(data1), 1, 3) == x])


-------------------------------------------------------------------
Jacques VESLOT

CNRS UMR 8090
I.B.L (2?me ?tage)
1 rue du Professeur Calmette
B.P. 245
59019 Lille Cedex

Tel : 33 (0)3.20.87.10.44
Fax : 33 (0)3.20.87.10.31

http://www-good.ibl.fr
-------------------------------------------------------------------


markleeds at verizon.net a ?crit :> I think I understand tapply but i still
> can't figure out how to do the following.
> 
> I have a dataframe where some of the column names are the same
> and i want to make a new dataframe where columns
> that have the same name are averaged by row.
> 
> so, if the data frame, DF, was 
> 
> AAA    BBB      CCC   AAA DDD
> 1       0        7     11  13
> 2        0       8     12  14
> 3        0       6      0  15
> 
> then the resulting data frame would be exactly the same except
> that the AAA column would be 
> 
> 6   comes from  (11 + 1)/2
> 7    comes from  (12 + 2)/2
> 3   stays 3 because the element in the other AAA is zero
> so i don't want to average that one. it shoulsd just stay 3.
> 
> So, I do 
> 
> DF[DF == 0]<-NA
> rowaverage<-function(x) x[rowMeans(forecastDf[x],na.rm=TRUE)
> revisedDF<-tapply(seq(DF),names(DF),rowmeans)
> 
> there are two problems with this :
> 
> 1) i need to go through the rows of the same name, not the columns
> so i don't think seq(DF) is right because that goes through 
> the columns but i want to go through rows.
> 
> 2) BBB will come back with ALL NA's ( since
> it was unique and there was nothing else to average ( and I don't know
how to transform that BB column to all zero's.
> 
> thanks and i'm sorry for so many questions. i'm getting bettter
with this stuff and my questions will decrease soon.
> 
> my guess is that i no longer should be using tapply ?
> and should be using some other version of apply.
> thanks
>                                          mark
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
>

jim holtman

2006-Jul-06 14:16 UTC

head link

[R] tapply question

I think this does what you want:

> In <- "AAA    BBB      CCC   AAA DDD+ 1       0        7     11  13
+ 2        0       8     12  14
+ 3        0       6      0  15"> DF <- read.table(textConnection(In), header=TRUE, check.names=FALSE)
>
> DF[DF == 0]<-NA
> rowaverage<-function(x) rowMeans(DF[x],na.rm=TRUE)
> revisedDF<-tapply(seq(DF),names(DF),rowaverage)
> revisedDF$AAA
1 2 3
6 7 3

$BBB
 1  2  3
NA NA NA

$CCC
1 2 3
7 8 6

$DDD
 1  2  3
13 14 15
> do.call('cbind', revisedDF)  AAA BBB CCC DDD
1   6  NA   7  13
2   7  NA   8  14
3   3  NA   6  15>
>


On 7/6/06, markleeds@verizon.net <markleeds@verizon.net>
wrote:>
> I think I understand tapply but i still
> can't figure out how to do the following.
>
> I have a dataframe where some of the column names are the same
> and i want to make a new dataframe where columns
> that have the same name are averaged by row.
>
> so, if the data frame, DF, was
>
> AAA    BBB      CCC   AAA DDD
> 1       0        7     11  13
> 2        0       8     12  14
> 3        0       6      0  15
>
> then the resulting data frame would be exactly the same except
> that the AAA column would be
>
> 6   comes from  (11 + 1)/2
> 7    comes from  (12 + 2)/2
> 3   stays 3 because the element in the other AAA is zero
> so i don't want to average that one. it shoulsd just stay 3.
>
> So, I do
>
> DF[DF == 0]<-NA
> rowaverage<-function(x) x[rowMeans(forecastDf[x],na.rm=TRUE)
> revisedDF<-tapply(seq(DF),names(DF),rowmeans)
>
> there are two problems with this :
>
> 1) i need to go through the rows of the same name, not the columns
> so i don't think seq(DF) is right because that goes through
> the columns but i want to go through rows.
>
> 2) BBB will come back with ALL NA's ( since
> it was unique and there was nothing else to average ( and I don't know
how
> to transform that BB column to all zero's.
>
> thanks and i'm sorry for so many questions. i'm getting bettter
with this
> stuff and my questions will decrease soon.
>
> my guess is that i no longer should be using tapply ?
> and should be using some other version of apply.
> thanks
>                                         mark
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>


-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390 (Cell)
+1 513 247 0281 (Home)

What is the problem you are trying to solve?

	[[alternative HTML version deleted]]

Maybe Matching Threads

Search for more reasonably related threads

R help - Jul 2006 - tapply question

[R] tapply question

[R] tapply question

[R] tapply question

Maybe Matching Threads