thr3ads.net - R devel - [Rd] Wrong df in bartlett.test when subsetting data (PR#1124) [Oct 2001]

If this information is useful, please help other people find it:
Share via:

jens.lund@nordea.com

2001-Oct-10 15:43 UTC

[Rd] Wrong df in bartlett.test when subsetting data (PR#1124)

Full_Name: Jens Lund
Version: 1.3.1
OS: Windows
Submission from: (NULL) (194.239.194.210)


When we subset a dataset bartlett.test does not calculate the df correctly:

X <- rnorm(50)     # Generate simulated samples
f <- gl(10,5)      # Generate a factor with 10 levels and
                   # 5 replications of each level

# Works correctly
bartlett.test(X,f) # Correct number of df = 9

# Just use the first 5 levels of f
# Does not work correctly
bartlett.test(X[1:25],f[1:25]) # Should have df = 4, but uses df = 9
bartlett.test(X~f,subset=1:25) # Should have df = 4, but uses df = 9

In version 1.2.2 the problem could be solved by substituting
        k <- nlevels(g)
with    k <- length(unique(g))
(I do not get the R-code in the new version get by typing bartlett.test at the
prompt.)

This might leed to a discussion on whether nlevels should return the number of
attained factor levels or the number of "theoretical" levels:
nlevels(f)       # Returns 10, fine!
nlevels(f[1:25]) # Returns 10, might not be fine; 5 might be the correct
answer.
The same question might be asked for levels, levels(f) versus levels(f[1:25]).
At least the documentation should be clear on this point, which I do not think
it is now. (Perhaps the levels attribute on the factor should be changed for a
subset of the data, but this is probably not an efficient way of doing
things...)

A similar problem also comes up with interactions between factors in unbalanced
designs:
g <- gl(2,40,50)
nlevels(f:g)         # Returns 20
length(unique(f:g))  # Returns 10
table(f,g)           # See the design

This bug has also been seen on version 1.2.2 on Linux, but is reproduced for
this bug report on:> version         _              
platform i386-pc-mingw32
arch     x86            
os       Win32          
system   x86, Win32     
status                  
major    1              
minor    3.1            
year     2001           
month    08             
day      31             
language R              

Best regards, Jens

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To:
r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Thomas Lumley

2001-Oct-10 15:52 UTC

head link

[Rd] Wrong df in bartlett.test when subsetting data (PR#1124)

On Wed, 10 Oct 2001 jens.lund@nordea.com wrote:

<stuff about actual bug in bartlett.test() snipped>
> This might leed to a discussion on whether nlevels should return the number
of
> attained factor levels or the number of "theoretical" levels:
> nlevels(f)       # Returns 10, fine!
> nlevels(f[1:25]) # Returns 10, might not be fine; 5 might be the correct
> answer.
> The same question might be asked for levels, levels(f) versus
levels(f[1:25]).
> At least the documentation should be clear on this point, which I do not
think
> it is now. (Perhaps the levels attribute on the factor should be changed
for a
> subset of the data, but this is probably not an efficient way of doing
> things...)
>
> A similar problem also comes up with interactions between factors in
unbalanced
> designs:
> g <- gl(2,40,50)
> nlevels(f:g)         # Returns 20
> length(unique(f:g))  # Returns 10
> table(f,g)           # See the design
I strongly believe that levels() and nlevels() are doing the right thing
here.  If a variable is a factor with two levels (eg Male & Female) then
it shouldn't stop being a factor with two levels just because you have a
small sample.  In particular, when you have a single observation g[i] it
is still important to know the set of levels it comes from, rather than
just the category it is in.

	-thomas


Thomas Lumley			Asst. Professor, Biostatistics
tlumley@u.washington.edu	University of Washington, Seattle

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To:
r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Seemingly Similar Threads

Search for more reasonably related threads

R devel - Oct 2001 - Wrong df in bartlett.test when subsetting data (PR#1124)

[Rd] Wrong df in bartlett.test when subsetting data (PR#1124)

[Rd] Wrong df in bartlett.test when subsetting data (PR#1124)

Seemingly Similar Threads