Andrea Bernasconi DG
2010-Jun-14 12:52 UTC
[R] Which is the easiest (most elegant) way to force "aov" to treat numerical variables as categorical ?
Hi R help, Hi R help, Which is the easiest (most elegant) way to force "aov" to treat numerical variables as categorical ? Sincerely, Andrea Bernasconi DG PROBLEM EXAMPLE I consider the latin squares example described at page 157 of the book: Statistics for Experimenters: Design, Innovation, and Discovery by George E. P. Box, J. Stuart Hunter, William G. Hunter. This example use the data-file /BHH2-Data/tab0408.dat from ftp://ftp.wiley.com/ in /sci_tech_med/statistics_experimenters/BHH2-Data.zip. The file tab0408.dat contains following DATA:> DATAdriver cars additive y 1 1 1 A 19 2 2 1 D 23 3 3 1 B 15 4 4 1 C 19 5 1 2 B 24 6 2 2 C 24 7 3 2 D 14 8 4 2 A 18 9 1 3 D 23 10 2 3 A 19 11 3 3 C 15 12 4 3 B 19 13 1 4 C 26 14 2 4 B 30 15 3 4 A 16 16 4 4 D 16 Now> summary( aov(MODEL, data=DATA) )Df Sum Sq Mean Sq F value Pr(>F) cars 1 12.8 12.800 0.8889 0.3680 driver 1 115.2 115.200 8.0000 0.0179 * additive 3 40.0 13.333 0.9259 0.4634 Residuals 10 144.0 14.400 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 This results differ from book result at p 159, since "cars" and "driver" are treated as numerical variables by "aov". BRUTE FORCE SOLUTION Manually transforming "cars" and "driver" into categorical variables, I obtain the correct result:> DATA_ABdriver cars additive y 1 D1 C1 A 19 2 D2 C1 D 23 3 D3 C1 B 15 4 D4 C1 C 19 5 D1 C2 B 24 6 D2 C2 C 24 7 D3 C2 D 14 8 D4 C2 A 18 9 D1 C3 D 23 10 D2 C3 A 19 11 D3 C3 C 15 12 D4 C3 B 19 13 D1 C4 C 26 14 D2 C4 B 30 15 D3 C4 A 16 16 D4 C4 D 16> summary( aov(MODEL, data=DATA_AB) )Df Sum Sq Mean Sq F value Pr(>F) cars 3 24 8.000 1.5 0.307174 driver 3 216 72.000 13.5 0.004466 ** additive 3 40 13.333 2.5 0.156490 Residuals 6 32 5.333 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 QUESTION Which is the easiest (most elegant) way to force "driver" and "cars" from DATA to be treated as categorical variables by "aov"? More generally, which is the easiest way to force "aov" to treat numerical variables as categorical ? Sincerely, Andrea Bernasconi DG PROBLEM EXAMPLE I consider the latin squares example described at page 157 of the book: Statistics for Experimenters: Design, Innovation, and Discovery by George E. P. Box, J. Stuart Hunter, William G. Hunter. This example use the data-file /BHH2-Data/tab0408.dat from ftp://ftp.wiley.com/ in /sci_tech_med/statistics_experimenters/BHH2-Data.zip. The file tab0408.dat contains following DATA:> DATAdriver cars additive y 1 1 1 A 19 2 2 1 D 23 3 3 1 B 15 4 4 1 C 19 5 1 2 B 24 6 2 2 C 24 7 3 2 D 14 8 4 2 A 18 9 1 3 D 23 10 2 3 A 19 11 3 3 C 15 12 4 3 B 19 13 1 4 C 26 14 2 4 B 30 15 3 4 A 16 16 4 4 D 16 Now> summary( aov(MODEL, data=DATA) )Df Sum Sq Mean Sq F value Pr(>F) cars 1 12.8 12.800 0.8889 0.3680 driver 1 115.2 115.200 8.0000 0.0179 * additive 3 40.0 13.333 0.9259 0.4634 Residuals 10 144.0 14.400 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 This results differ from book result at p 159, since "cars" and "driver" are treated as numerical variables by "aov". BRUTE FORCE SOLUTION Manually transforming "cars" and "driver" into categorical variables, I obtain the correct result:> DATA_ABdriver cars additive y 1 D1 C1 A 19 2 D2 C1 D 23 3 D3 C1 B 15 4 D4 C1 C 19 5 D1 C2 B 24 6 D2 C2 C 24 7 D3 C2 D 14 8 D4 C2 A 18 9 D1 C3 D 23 10 D2 C3 A 19 11 D3 C3 C 15 12 D4 C3 B 19 13 D1 C4 C 26 14 D2 C4 B 30 15 D3 C4 A 16 16 D4 C4 D 16> summary( aov(MODEL, data=DATA_AB) )Df Sum Sq Mean Sq F value Pr(>F) cars 3 24 8.000 1.5 0.307174 driver 3 216 72.000 13.5 0.004466 ** additive 3 40 13.333 2.5 0.156490 Residuals 6 32 5.333 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 QUESTION Which is the easiest (most elegant) way to force "driver" and "cars" from DATA to be treated as categorical variables by "aov"? More generally, which is the easiest way to force "aov" to treat numerical variables as categorical ? [[alternative HTML version deleted]]
Ivan Calandra
2010-Jun-14 13:05 UTC
[R] Which is the easiest (most elegant) way to force "aov" to treat numerical variables as categorical ?
Hi, See ?factor e.g.: DATA$driver <- factor(DATA$driver) See also the level= argument if you want to change the order of your levels. HTH, Ivan Le 6/14/2010 14:52, Andrea Bernasconi DG a écrit :> Hi R help, > > Hi R help, > > Which is the easiest (most elegant) way to force "aov" to treat numerical variables as categorical ? > > Sincerely, Andrea Bernasconi DG > > PROBLEM EXAMPLE > > I consider the latin squares example described at page 157 of the book: > Statistics for Experimenters: Design, Innovation, and Discovery by George E. P. Box, J. Stuart Hunter, William G. Hunter. > > This example use the data-file /BHH2-Data/tab0408.dat from ftp://ftp.wiley.com/ in /sci_tech_med/statistics_experimenters/BHH2-Data.zip. > > The file tab0408.dat contains following DATA: > >> DATA >> > driver cars additive y > 1 1 1 A 19 > 2 2 1 D 23 > 3 3 1 B 15 > 4 4 1 C 19 > 5 1 2 B 24 > 6 2 2 C 24 > 7 3 2 D 14 > 8 4 2 A 18 > 9 1 3 D 23 > 10 2 3 A 19 > 11 3 3 C 15 > 12 4 3 B 19 > 13 1 4 C 26 > 14 2 4 B 30 > 15 3 4 A 16 > 16 4 4 D 16 > > Now > >> summary( aov(MODEL, data=DATA) ) >> > Df Sum Sq Mean Sq F value Pr(>F) > cars 1 12.8 12.800 0.8889 0.3680 > driver 1 115.2 115.200 8.0000 0.0179 * > additive 3 40.0 13.333 0.9259 0.4634 > Residuals 10 144.0 14.400 > --- > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > > This results differ from book result at p 159, since "cars" and "driver" are treated as numerical variables by "aov". > > BRUTE FORCE SOLUTION > > Manually transforming "cars" and "driver" into categorical variables, I obtain the correct result: > >> DATA_AB >> > driver cars additive y > 1 D1 C1 A 19 > 2 D2 C1 D 23 > 3 D3 C1 B 15 > 4 D4 C1 C 19 > 5 D1 C2 B 24 > 6 D2 C2 C 24 > 7 D3 C2 D 14 > 8 D4 C2 A 18 > 9 D1 C3 D 23 > 10 D2 C3 A 19 > 11 D3 C3 C 15 > 12 D4 C3 B 19 > 13 D1 C4 C 26 > 14 D2 C4 B 30 > 15 D3 C4 A 16 > 16 D4 C4 D 16 > >> summary( aov(MODEL, data=DATA_AB) ) >> > Df Sum Sq Mean Sq F value Pr(>F) > cars 3 24 8.000 1.5 0.307174 > driver 3 216 72.000 13.5 0.004466 ** > additive 3 40 13.333 2.5 0.156490 > Residuals 6 32 5.333 > --- > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > > QUESTION > > Which is the easiest (most elegant) way to force "driver" and "cars" from DATA to be treated as categorical variables by "aov"? > More generally, which is the easiest way to force "aov" to treat numerical variables as categorical ? > > Sincerely, Andrea Bernasconi DG > > PROBLEM EXAMPLE > > I consider the latin squares example described at page 157 of the book: > Statistics for Experimenters: Design, Innovation, and Discovery by George E. P. Box, J. Stuart Hunter, William G. Hunter. > > This example use the data-file /BHH2-Data/tab0408.dat from ftp://ftp.wiley.com/ in /sci_tech_med/statistics_experimenters/BHH2-Data.zip. > > The file tab0408.dat contains following DATA: > >> DATA >> > driver cars additive y > 1 1 1 A 19 > 2 2 1 D 23 > 3 3 1 B 15 > 4 4 1 C 19 > 5 1 2 B 24 > 6 2 2 C 24 > 7 3 2 D 14 > 8 4 2 A 18 > 9 1 3 D 23 > 10 2 3 A 19 > 11 3 3 C 15 > 12 4 3 B 19 > 13 1 4 C 26 > 14 2 4 B 30 > 15 3 4 A 16 > 16 4 4 D 16 > > Now > >> summary( aov(MODEL, data=DATA) ) >> > Df Sum Sq Mean Sq F value Pr(>F) > cars 1 12.8 12.800 0.8889 0.3680 > driver 1 115.2 115.200 8.0000 0.0179 * > additive 3 40.0 13.333 0.9259 0.4634 > Residuals 10 144.0 14.400 > --- > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > > This results differ from book result at p 159, since "cars" and "driver" are treated as numerical variables by "aov". > > BRUTE FORCE SOLUTION > > Manually transforming "cars" and "driver" into categorical variables, I obtain the correct result: > >> DATA_AB >> > driver cars additive y > 1 D1 C1 A 19 > 2 D2 C1 D 23 > 3 D3 C1 B 15 > 4 D4 C1 C 19 > 5 D1 C2 B 24 > 6 D2 C2 C 24 > 7 D3 C2 D 14 > 8 D4 C2 A 18 > 9 D1 C3 D 23 > 10 D2 C3 A 19 > 11 D3 C3 C 15 > 12 D4 C3 B 19 > 13 D1 C4 C 26 > 14 D2 C4 B 30 > 15 D3 C4 A 16 > 16 D4 C4 D 16 > >> summary( aov(MODEL, data=DATA_AB) ) >> > Df Sum Sq Mean Sq F value Pr(>F) > cars 3 24 8.000 1.5 0.307174 > driver 3 216 72.000 13.5 0.004466 ** > additive 3 40 13.333 2.5 0.156490 > Residuals 6 32 5.333 > --- > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > > QUESTION > > Which is the easiest (most elegant) way to force "driver" and "cars" from DATA to be treated as categorical variables by "aov"? > More generally, which is the easiest way to force "aov" to treat numerical variables as categorical ? > > > > [[alternative HTML version deleted]] > > > > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Ivan CALANDRA PhD Student University of Hamburg Biozentrum Grindel und Zoologisches Museum Abt. Säugetiere Martin-Luther-King-Platz 3 D-20146 Hamburg, GERMANY +49(0)40 42838 6231 ivan.calandra@uni-hamburg.de ********** http://www.for771.uni-bonn.de http://webapp5.rrz.uni-hamburg.de/mammals/eng/mitarbeiter.php [[alternative HTML version deleted]]
Andrea Bernasconi DG
2010-Jun-14 13:07 UTC
[R] Which is the easiest (most elegant) way to force "aov" to treat numerical variables as categorical ?
I think I found the solution !> cc<-factor(cars) > dd<-factor(driver) > MODEL<-y~cc+dd+additive > summary(aov(MODEL,data=DATA))On 14 Jun, 2010, at 2:52 PM, Andrea Bernasconi DG wrote:> Hi R help, > > Hi R help, > > Which is the easiest (most elegant) way to force "aov" to treat numerical variables as categorical ? > > Sincerely, Andrea Bernasconi DG > > PROBLEM EXAMPLE > > I consider the latin squares example described at page 157 of the book: > Statistics for Experimenters: Design, Innovation, and Discovery by George E. P. Box, J. Stuart Hunter, William G. Hunter. > > This example use the data-file /BHH2-Data/tab0408.dat from ftp://ftp.wiley.com/ in /sci_tech_med/statistics_experimenters/BHH2-Data.zip. > > The file tab0408.dat contains following DATA: > > DATA > driver cars additive y > 1 1 1 A 19 > 2 2 1 D 23 > 3 3 1 B 15 > 4 4 1 C 19 > 5 1 2 B 24 > 6 2 2 C 24 > 7 3 2 D 14 > 8 4 2 A 18 > 9 1 3 D 23 > 10 2 3 A 19 > 11 3 3 C 15 > 12 4 3 B 19 > 13 1 4 C 26 > 14 2 4 B 30 > 15 3 4 A 16 > 16 4 4 D 16 > > Now > > summary( aov(MODEL, data=DATA) ) > Df Sum Sq Mean Sq F value Pr(>F) > cars 1 12.8 12.800 0.8889 0.3680 > driver 1 115.2 115.200 8.0000 0.0179 * > additive 3 40.0 13.333 0.9259 0.4634 > Residuals 10 144.0 14.400 > --- > Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > > This results differ from book result at p 159, since "cars" and "driver" are treated as numerical variables by "aov". > > BRUTE FORCE SOLUTION > > Manually transforming "cars" and "driver" into categorical variables, I obtain the correct result: > > DATA_AB > driver cars additive y > 1 D1 C1 A 19 > 2 D2 C1 D 23 > 3 D3 C1 B 15 > 4 D4 C1 C 19 > 5 D1 C2 B 24 > 6 D2 C2 C 24 > 7 D3 C2 D 14 > 8 D4 C2 A 18 > 9 D1 C3 D 23 > 10 D2 C3 A 19 > 11 D3 C3 C 15 > 12 D4 C3 B 19 > 13 D1 C4 C 26 > 14 D2 C4 B 30 > 15 D3 C4 A 16 > 16 D4 C4 D 16 > > summary( aov(MODEL, data=DATA_AB) ) > Df Sum Sq Mean Sq F value Pr(>F) > cars 3 24 8.000 1.5 0.307174 > driver 3 216 72.000 13.5 0.004466 ** > additive 3 40 13.333 2.5 0.156490 > Residuals 6 32 5.333 > --- > Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > > QUESTION > > Which is the easiest (most elegant) way to force "driver" and "cars" from DATA to be treated as categorical variables by "aov"? > More generally, which is the easiest way to force "aov" to treat numerical variables as categorical ? > > Sincerely, Andrea Bernasconi DG > > PROBLEM EXAMPLE > > I consider the latin squares example described at page 157 of the book: > Statistics for Experimenters: Design, Innovation, and Discovery by George E. P. Box, J. Stuart Hunter, William G. Hunter. > > This example use the data-file /BHH2-Data/tab0408.dat from ftp://ftp.wiley.com/ in /sci_tech_med/statistics_experimenters/BHH2-Data.zip. > > The file tab0408.dat contains following DATA: > > DATA > driver cars additive y > 1 1 1 A 19 > 2 2 1 D 23 > 3 3 1 B 15 > 4 4 1 C 19 > 5 1 2 B 24 > 6 2 2 C 24 > 7 3 2 D 14 > 8 4 2 A 18 > 9 1 3 D 23 > 10 2 3 A 19 > 11 3 3 C 15 > 12 4 3 B 19 > 13 1 4 C 26 > 14 2 4 B 30 > 15 3 4 A 16 > 16 4 4 D 16 > > Now > > summary( aov(MODEL, data=DATA) ) > Df Sum Sq Mean Sq F value Pr(>F) > cars 1 12.8 12.800 0.8889 0.3680 > driver 1 115.2 115.200 8.0000 0.0179 * > additive 3 40.0 13.333 0.9259 0.4634 > Residuals 10 144.0 14.400 > --- > Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > > This results differ from book result at p 159, since "cars" and "driver" are treated as numerical variables by "aov". > > BRUTE FORCE SOLUTION > > Manually transforming "cars" and "driver" into categorical variables, I obtain the correct result: > > DATA_AB > driver cars additive y > 1 D1 C1 A 19 > 2 D2 C1 D 23 > 3 D3 C1 B 15 > 4 D4 C1 C 19 > 5 D1 C2 B 24 > 6 D2 C2 C 24 > 7 D3 C2 D 14 > 8 D4 C2 A 18 > 9 D1 C3 D 23 > 10 D2 C3 A 19 > 11 D3 C3 C 15 > 12 D4 C3 B 19 > 13 D1 C4 C 26 > 14 D2 C4 B 30 > 15 D3 C4 A 16 > 16 D4 C4 D 16 > > summary( aov(MODEL, data=DATA_AB) ) > Df Sum Sq Mean Sq F value Pr(>F) > cars 3 24 8.000 1.5 0.307174 > driver 3 216 72.000 13.5 0.004466 ** > additive 3 40 13.333 2.5 0.156490 > Residuals 6 32 5.333 > --- > Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > > QUESTION > > Which is the easiest (most elegant) way to force "driver" and "cars" from DATA to be treated as categorical variables by "aov"? > More generally, which is the easiest way to force "aov" to treat numerical variables as categorical ? > >Mobile +41 79 621 74 07 [[alternative HTML version deleted]]