In the course of applying Shapiro-Wilk to 100,000 samples of 60 items
from 100,000 different distributions, I encountered a fatal error in
apply(). This can be reconstructed as follows, using the attached data
file distr.dat containing 2 lines of my original 100,000-line file:
> version
         _         
platform Windows   
arch     x86       
os       Win32     
system   x86, Win32
status             
major    1         
minor    1.1       
year     2000      
month    August    
day      15        
language R         > # Read the data in
> distr.dat <- matrix(scan("distr.dat"), byrow=T, ncol=60)
Read 120 items> # Define a function to perform Shapiro-Wilk, with protection
> # against values that would cause fatal errors in shapiro.test()
> shap <- function(x){
+   result.W <- NA; result.p <- NA    # Set default return value
+   if (length(x[!is.na(x)])>3){      # Check for bad values
+     x.var <- var(x, na.rm=T)
+     if (!is.na(x.var)){
+       if(x.var>0){                  # If values OK, perform test
+         shap.res <- shapiro.test(x)
+         result.W <- shap.res$statistic    # Problem line
+         result.p <- shap.res$p.value
+       }
+     }
+   }
+   c(result.W, result.p)
+ }> apply(dist.dat, 1, shap)
Error in names(x) == ans.names : comparison (1) is possible only for
vector types>
If we look at the structure of the value returned by shap() for the
first sample, we see that W has a "names" attribute whereas p has
not (or its "names" attribute is the empty string):
> str(shap(dist.dat[1,]))
 Named num [1:2] 0.887519 0.000622
 - attr(*, "names")= chr [1:2] "W"
"">
but in the case of the second sample (all NA except one), shapiro.test()
was not called, and the preset default value c(NA, NA) returned:
> str(shap(dist.dat[2,]))
 logi [1:2] NA NA>
Looking directly at the result of shapiro.test(), we get:
> str(shapiro.test(dist.dat[1,]))
List of 4
 $ statistic: Named num 0.888
  ..- attr(*, "names")= chr "W"
 $ p.value  : num 0.000622
 $ method   : chr "Shapiro-Wilk normality test"
 $ data.name: chr "dist.dat[1, ]"
 - attr(*, "class")= chr "htest">
So shapiro.test()$statistic has a "names" attribute, whereas
shapiro.test()$p.value does not, and my default return value c(NA, NA)
does not. If I remove the "names" attribute from W by changing the
line
of code marked "# Problem line" to
        result.W <- as.numeric(shap.res$statistic)
the error disappears. Reading the help for apply() I could find no
reference to "names" attributes, let alone any restrictions on them.
It appears to me that there is a bug in apply(), in that it cannot deal
gracefully with this somewhat unusual situation. However some blame may
be attributable to me or to shapiro.test(), so I leave it for the
R-gurus to look at and to forward to R-bugs if appropriate.
Clive Jenkins.
-------------- next part --------------
NA 0.000632 NA 0.009640 NA 0.000632 NA -0.001176 0.004235 NA 0.004235 0.002418
0.011395 0.002433 0.004235 -0.001170 -0.001170 0.000632 0.002433 0.000623
0.000632 -0.001170 0.006037 0.004235 0.004235 0.002418 NA NA NA 0.002418 NA
0.009640 0.000623 NA NA 0.004213 NA -0.001170 0.009600 0.006037 -0.001170
0.006009 -0.001173 0.002433 0.004235 NA NA NA NA NA 0.009640 0.002433 0.004235
NA 0.002433 -0.001170 0.004235 -0.001173 -0.001170 -0.001170
NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA 0.015956 NA NA NA NA