jens.lund@nordea.com
2001-Oct-10  15:43 UTC
[Rd] Wrong df in bartlett.test when subsetting data (PR#1124)
Full_Name: Jens Lund
Version: 1.3.1
OS: Windows
Submission from: (NULL) (194.239.194.210)
When we subset a dataset bartlett.test does not calculate the df correctly:
X <- rnorm(50)     # Generate simulated samples
f <- gl(10,5)      # Generate a factor with 10 levels and
                   # 5 replications of each level
# Works correctly
bartlett.test(X,f) # Correct number of df = 9
# Just use the first 5 levels of f
# Does not work correctly
bartlett.test(X[1:25],f[1:25]) # Should have df = 4, but uses df = 9
bartlett.test(X~f,subset=1:25) # Should have df = 4, but uses df = 9
In version 1.2.2 the problem could be solved by substituting
        k <- nlevels(g)
with    k <- length(unique(g))
(I do not get the R-code in the new version get by typing bartlett.test at the
prompt.)
This might leed to a discussion on whether nlevels should return the number of
attained factor levels or the number of "theoretical" levels:
nlevels(f)       # Returns 10, fine!
nlevels(f[1:25]) # Returns 10, might not be fine; 5 might be the correct
answer.
The same question might be asked for levels, levels(f) versus levels(f[1:25]).
At least the documentation should be clear on this point, which I do not think
it is now. (Perhaps the levels attribute on the factor should be changed for a
subset of the data, but this is probably not an efficient way of doing
things...)
A similar problem also comes up with interactions between factors in unbalanced
designs:
g <- gl(2,40,50)
nlevels(f:g)         # Returns 20
length(unique(f:g))  # Returns 10
table(f,g)           # See the design
This bug has also been seen on version 1.2.2 on Linux, but is reproduced for
this bug report on:> version
         _              
platform i386-pc-mingw32
arch     x86            
os       Win32          
system   x86, Win32     
status                  
major    1              
minor    3.1            
year     2001           
month    08             
day      31             
language R              
Best regards, Jens
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To:
r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Thomas Lumley
2001-Oct-10  15:52 UTC
[Rd] Wrong df in bartlett.test when subsetting data (PR#1124)
On Wed, 10 Oct 2001 jens.lund@nordea.com wrote: <stuff about actual bug in bartlett.test() snipped> > This might leed to a discussion on whether nlevels should return the number of > attained factor levels or the number of "theoretical" levels: > nlevels(f) # Returns 10, fine! > nlevels(f[1:25]) # Returns 10, might not be fine; 5 might be the correct > answer. > The same question might be asked for levels, levels(f) versus levels(f[1:25]). > At least the documentation should be clear on this point, which I do not think > it is now. (Perhaps the levels attribute on the factor should be changed for a > subset of the data, but this is probably not an efficient way of doing > things...) > > A similar problem also comes up with interactions between factors in unbalanced > designs: > g <- gl(2,40,50) > nlevels(f:g) # Returns 20 > length(unique(f:g)) # Returns 10 > table(f,g) # See the designI strongly believe that levels() and nlevels() are doing the right thing here. If a variable is a factor with two levels (eg Male & Female) then it shouldn't stop being a factor with two levels just because you have a small sample. In particular, when you have a single observation g[i] it is still important to know the set of levels it comes from, rather than just the category it is in. -thomas Thomas Lumley Asst. Professor, Biostatistics tlumley@u.washington.edu University of Washington, Seattle -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._