Frank Mattes
2003-May-26 18:38 UTC
[R] help with subset(), still original dataframe in tapply
Dear R-help reader, it would be great if someone knows what I'm doing wrong. I have (shorten) dataframe, which consists of a group identification and a number>exUID REL 1 R1.B8.31 0.000 2 R1.B8.31 0.000 3 R1.B8.31 0.000 4 R1.B8.31 0.000 5 R1.B8.38 0.010 6 R1.B8.38 0.060 7 R1.B8.38 0.006 8 R1.B8.38 0.010 9 R1.B8.48 0.080 10 R1.B8.48 NA 11 R1.B8.48 0.006 I'm creating now a subset missing the values 0 and "NA"> newex<-subset(ex,ex$REL>0) > newexUID REL 5 R1.B8.38 0.010 6 R1.B8.38 0.060 7 R1.B8.38 0.006 8 R1.B8.38 0.010 9 R1.B8.48 0.080 11 R1.B8.48 0.006 and now would like to apply the mean to each group in (UID)> tapply(newex$REL,newex$UID,mean,rm.na=T)R1.B8.31 R1.B8.38 R1.B8.48 NA 0.0215 0.0430 to my surprise, I still have the mean for group R1.B8.31, which has been removed by the subset function before. I can remove the NA by tapply(newex$REL,interaction(newex$UID,drop=T),mean,rm.na=T) but I would like to know why the tapply still uses the original dataframe. Many thanks for your help Frank -- Frank Mattes, e-mail: f.mattes at ucl.ac.uk Department of Virology fax 0044(0)207 8302854 Royal Free Hospital and tel 0044(0)207 8302997 University College Medical School London
Peter Dalgaard BSA
2003-May-26 18:53 UTC
[R] help with subset(), still original dataframe in tapply
Frank Mattes <f.mattes at rfc.ucl.ac.uk> writes:> I'm creating now a subset missing the values 0 and "NA" > > newex<-subset(ex,ex$REL>0) > > newex > UID REL > 5 R1.B8.38 0.010 > 6 R1.B8.38 0.060 > 7 R1.B8.38 0.006 > 8 R1.B8.38 0.010 > 9 R1.B8.48 0.080 > 11 R1.B8.48 0.006 > > and now would like to apply the mean to each group in (UID) > > > tapply(newex$REL,newex$UID,mean,rm.na=T) > R1.B8.31 R1.B8.38 R1.B8.48 > NA 0.0215 0.0430 > > to my surprise, I still have the mean for group R1.B8.31, which has > been removed by the subset function before.A subset of a three-level factor is still a three-level factor. If you want it to become a factor with only those levels that are present in data, you need to say so, e.g. with tapply(newex$REL,factor(newex$UID),mean)> but I would like to know why the tapply still uses the original dataframe.It doesn't. -- O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907