Please report bugs in contributed R packages to the packages'
maintainers, in this case Alvaro A. Novo (in CC).
Uwe Ligges
Andreas Wolf wrote:
> dear list members,
> there seems to be a problem with the prelim.norm function (package norm)
> as number of items in the dataset increases.
>
> the output of prelim.norm() is a list with different summary statistics,
> one of them is the missingness indicator matrix "r". it lists all
> patterns of missing data and a count of how often each pattern occured
> in the dataset. as the number of items and number of patterns increases,
> it seems to malfunction, as it stops after less than 200 patterns and
> the count for the last row/pattern equals the number of subjects minus
> the number of patterns listed before.
>
> let's give an example: i generate multivariate normal data for 40
> variables and 500 observations. i randomly delete 10 percent of the
> values for each person (i.e. set them to NA). as the number of possible
> patterns of missings (combinations without repetition: 4 over 40) is
> 91390, you'd expect to have (almost) as many different patterns of
> missings as subjects in the dataset (~ 500). however, running
> prelim.norm, the "r" matrix indicates some 170 patterns (it
varies in
> multiple runs !!), the last pattern to be some 320 times in the dataset
> (which is, of course, not true if you check).
>
> any ideas?
>
>
> INPUT:
> x <- matrix(rnorm(20000),500,40) # generate 50 variables with 500
> observations
>
> for (tmp in 1:500) {
> draw <- sample(1:40, 4, replace=F)
> x[tmp, draw] <- NA
> } # set (random) 10 percent of values per observation to NA
>
> library(norm)
> s <- prelim.norm(x) # run prelim.norm from package norm
> s$r # missingness indicator matrix (0-missing, 1-observed)
> dimnames(s$r)[[1]][length(s$r[,1])] # count for (supposedly) last
> pattern
>
> tmp <- which(s$r[length(s$r[,1]),] == 0) # vector of items
> (supposedly) missing in last pattern
> which(is.na(x[,tmp[1]]) & is.na(x[,tmp[2]]) & is.na(x[,tmp[3]])
&
> is.na(x[,tmp[4]])) # list cases with last pattern
>
>
>
>
> p.s. it works fine up to 30 items ... hence, it's not due to the
> absolute number of patterns, as there're almost as many patterns as
> subjects with 3 out of 30 items missing (possible patterns: 3 over 30 >
4060)
>
> p.p.s. i first thought of the recursion limit in R, but it doesn't help
> ( options(expressions = 100000) )
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html