David Afshartous
2008-Jun-16 15:30 UTC
[R] aggregate() function, strange behavior for augmented data
All, I'm re-running some analysis that has been augmented with additional data. When I use the exact same code for the augmented data, the behavior of the aggregate function is very strange, viz., one of the resulting variables is now coded as a factor while it was coded as numeric for the original data. Unfortunately, I cannot provide a reproducible code example since it only seems to occur with this data. I've checked and re-checked the of both the original and augmented data but nothing appears inconsistent. Any suggestions much appreciated. See below for specifics. Cheers, David # original data> dim(junk1)[1] 96 3> junk1[1,]Hour Drug Aldo 1 0 P 9> junk1$Hour[1] 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 [39] 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 [77] 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 ### Not coded as a factor> junk1.mean.time.drug = aggregate(junk1[3], junk1[c(1,2)], mean) > junk1.mean.time.drug$Hour[1] 0 3 5 0 3 5 ### not coded as a factor # augmented data dim(junk1) [1] 108 3> junk1[1,]Hour Drug Aldo 1 0 P 9> junk1$Hour[1] 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 [51] 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 [101] 3 5 0 3 5 0 3 5 ### not coded as a factor> junk1.mean.time.drug = aggregate(junk1[3], junk1[c(1,2)], mean) > junk1.mean.time.drug$Hour[1] 0 3 5 0 3 5 Levels: 0 3 5 ################## coded as a factor now! ## of course, I get recode it again but I'm curious as to why this is ## changing here
David Afshartous
2008-Jun-16 15:50 UTC
[R] aggregate() function, strange behavior for augmented data
Everything was read in the same way, and str(junk1) confirms that they are the same structure. This is very strange. ## original data:> str(junk1)'data.frame': 96 obs. of 3 variables: $ Hour: int 0 3 5 0 3 5 0 3 5 0 ... $ Drug: Factor w/ 2 levels "D","P": 2 2 2 1 1 1 2 2 2 1 ... $ Aldo: int 9 15 4 8 13 3 5 11 5 7 ... ## augmented data:> str(junk1)'data.frame': 108 obs. of 3 variables: $ Hour: int 0 3 5 0 3 5 0 3 5 0 ... $ Drug: Factor w/ 2 levels "D","P": 2 2 2 1 1 1 2 2 2 1 ... $ Aldo: int 9 15 4 8 13 3 5 11 5 7 ... On 6/16/08 11:37 AM, "markleeds at verizon.net" <markleeds at verizon.net> wrote:> > hi: do str(junk1) and it will tell you what the components of junk1 > are. > > the only thing i can think of is that you used stringsAsFactors=FALSE > when you ( probably ) used read.table to read in junk but you didn't use > that > options when you used read.table to read in junk1 ? > > > On Mon, Jun 16, 2008 at 11:30 AM, David Afshartous wrote: > >> All, >> >> I'm re-running some analysis that has been augmented with additional >> data. >> When I use the exact same code for the augmented data, the behavior of >> the >> aggregate function is very strange, viz., one of the resulting >> variables is >> now coded as a factor while it was coded as numeric for the original >> data. >> Unfortunately, I cannot provide a reproducible code example since it >> only >> seems to occur with this data. I've checked and re-checked the of >> both the >> original and augmented data but nothing appears inconsistent. Any >> suggestions much appreciated. See below for specifics. >> >> Cheers, >> David >> >> >> >> >> >> >> >> >> >> # original data >>> dim(junk1) >> [1] 96 3 >>> junk1[1,] >> Hour Drug Aldo >> 1 0 P 9 >>> junk1$Hour >> [1] 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 >> 0 3 >> 5 0 3 >> [39] 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 >> 5 0 >> 3 5 0 >> [77] 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 ### Not coded as a >> factor >>> junk1.mean.time.drug = aggregate(junk1[3], junk1[c(1,2)], mean) >>> junk1.mean.time.drug$Hour >> [1] 0 3 5 0 3 5 ### not coded as a factor >> >> # augmented data >> dim(junk1) >> [1] 108 3 >>> junk1[1,] >> Hour Drug Aldo >> 1 0 P 9 >>> junk1$Hour >> [1] 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 >> 5 0 3 >> 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 >> [51] 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 >> 3 5 0 >> 3 5 0 3 5 0 3 5 0 3 5 0 3 5 0 >> [101] 3 5 0 3 5 0 3 5 ### not coded as a factor >>> junk1.mean.time.drug = aggregate(junk1[3], junk1[c(1,2)], mean) >>> junk1.mean.time.drug$Hour >> [1] 0 3 5 0 3 5 >> Levels: 0 3 5 ################## coded as a factor now! >> >> ## of course, I get recode it again but I'm curious as to why this is >> ## changing here >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code.