Recently, I was using aggregate() to develop averages by trial for an experiment I was running. Trials were indicated as ordinal numbers for each subject. aggregate() turned trial into factors during the aggregation process. I then wanted to create a scatter plot of subject performance by trial, so I applied as.numeric to the (now) factor variable trial. as.numeric reordered the trial indicator creating some (at first) incomprehensible results. Investigation revealed that aggregate must first be interpreting trial as a character and then turning it into a factor. The behavior I observed is reproducible from the following transcript using R1.6.1 on RH linux 7.3. > test <- as.factor(as.character(c(1,2,3,4,5,6,7,8,9,10,11))) > test [1] 1 2 3 4 5 6 7 8 9 10 11 Levels: 1 10 11 2 3 4 5 6 7 8 9 > as.numeric(test) [1] 1 4 5 6 7 8 9 10 11 2 3 It strikes me that as.numeric should *never* reorder the vector it is working on. There is this workaround for the problem: > as.numeric(as.character(test)) [1] 1 2 3 4 5 6 7 8 9 10 11 However, I should not have to know about the internals of aggregate to be able to use its results. Bud Gibson
Bud Gibson <fpgibson at umich.edu> writes:> > test <- as.factor(as.character(c(1,2,3,4,5,6,7,8,9,10,11))) > > test > [1] 1 2 3 4 5 6 7 8 9 10 11 > Levels: 1 10 11 2 3 4 5 6 7 8 9 > > as.numeric(test) > [1] 1 4 5 6 7 8 9 10 11 2 3 > > It strikes me that as.numeric should *never* reorder the vector it is > working on. There is this workaround for the problem:as.numeric is not reordering anything. "2" is the 4th level of the test factor, which in turn is due to alphabetic ordering of the factor levels in as.factor() [or factor() for that matter]. If you want to avoid that, set factor levels explicitly: test <- factor(as.character(c(1:11)),levels=c(1:11)) test as.numeric(test) I suppose that similar treatment of your "trial" variable prior to calling aggregate() could solve your problem. -- O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
On Sun, 08 Dec 2002 10:03:54 -0500 Bud Gibson <fpgibson at umich.edu> wrote:> Recently, I was using aggregate() to develop averages by trial for an > experiment I was running. Trials were indicated as ordinal numbers for > each subject. aggregate() turned trial into factors during the > aggregation process. I then wanted to create a scatter plot of subject > performance by trial, so I applied as.numeric to the (now) factor > variable trial. as.numeric reordered the trial indicator creating some > (at first) incomprehensible results. > > Investigation revealed that aggregate must first be interpreting trial > as a character and then turning it into a factor. The behavior I > observed is reproducible from the following transcript using R1.6.1 on > RH linux 7.3. > > > test <- as.factor(as.character(c(1,2,3,4,5,6,7,8,9,10,11))) > > test > [1] 1 2 3 4 5 6 7 8 9 10 11 > Levels: 1 10 11 2 3 4 5 6 7 8 9 > > as.numeric(test) > [1] 1 4 5 6 7 8 9 10 11 2 3 > > It strikes me that as.numeric should *never* reorder the vector it is > working on. There is this workaround for the problem: > > > as.numeric(as.character(test)) > [1] 1 2 3 4 5 6 7 8 9 10 11 > > However, I should not have to know about the internals of aggregate to > be able to use its results. > > Bud GibsonOne of the reasons for being of the summarize function in the Hmisc library (http://hesweb1.med.virginia.edu/biostat/s/Hmisc.html) is that it preserves the nature of the stratification variables. summarize produces data frames that are like the original data except with the response variables replaced by scalar or vector statistical summaries. -- Frank E Harrell Jr Prof. of Biostatistics & Statistics Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences U. Virginia School of Medicine http://hesweb1.med.virginia.edu/biostat