thr3ads.net - R help - [R] patterns of missing data: determining monotonicity [Jan 2005]

If this information is useful, please help other people find it:
Share via:

Michael Friendly

2005-Jan-06 16:50 UTC

[R] patterns of missing data: determining monotonicity

Here is a problem that perhaps someone out here has an idea about.  It 
vaguely reminds me of something
I've seen before, but can't place.  Can anyone help?

For multiple imputation, there are simpler methods available if  the 
patterns of missing data are 'monotone' ---
if Vj is missing then all variables Vk, k>j are also missing, vs. more 
complex methods required when the patterns are not monotone.  The 
problem is to determine if, for a collection of variables, there is an 
ordering of them with a monotone
missing data pattern, or, if not, what the longest monotone sequence is.

Here is an example, where in a dataset of 65 observations, there are 8 
different patterns of missingness,
with X and . representing observed and missing:

Group   V2   V3   V4   V5   V6   V7   V8   V9   V10   V11   nmiss
  1     X    X    X    X    X    X    X    X     X     X      0  
  2     X    X    X    X    X    X    .    X     X     X      1  
  3     X    X    X    X    X    .    X    X     X     X      1  
  4     X    X    X    X    X    .    .    X     X     X      2  
  5     X    X    .    X    .    X    X    X     X     X      2  
  6     X    X    .    .    X    X    X    X     X     X      2  
  7     X    X    .    .    X    .    X    X     X     X      3  
  8     X    X    .    .    .    X    X    X     X     X      3  

Treated as a binary matrix, one can sort the columns by the number
of non-missing for each variable, and monotone means that there
are at most 2 runs -- a string of 0s followed by all 1s for *all*
patterns. But how
to determine an ordering (or orderings) of variables of maximal length?

Group   V2   V3   V9   V10   V11   V6   V8   V5   V7   V4   nmiss
  1      0    0    0    0     0     0    0    0    0    0     0  
  2      0    0    0    0     0     0    1    0    0    0     1  
  3      0    0    0    0     0     0    0    0    1    0     1  
  4      0    0    0    0     0     0    1    0    1    0     2  
  5      0    0    0    0     0     1    0    0    0    1     2  
  6      0    0    0    0     0     0    0    1    0    1     2  
  7      0    0    0    0     0     0    0    1    1    1     3  
  8      0    0    0    0     0     1    0    1    0    1     3  
        ==   ==   ==   ===   ===   ==   ==   ==   ==   =         0    0    0   
0     0     2    2    3    3    4



-- 
Michael Friendly     Email: friendly at yorku.ca 
Professor, Psychology Dept.
York University      Voice: 416 736-5115 x66249 Fax: 416 736-5814
4700 Keele Street    math.yorku.ca/SCS/friendly.html
Toronto, ONT  M3J 1P3 CANADA

Michael Friendly

2005-Jan-07 13:13 UTC

head link

[R] re: patterns of missing data: determining monotonicity

[Sorry for the re-post; my examples got garbled in the original cut/paste.]

Here is a problem that perhaps someone out here has an idea about.  It 
vaguely reminds me of something
I've seen before, but can't place.  Can anyone help?

For multiple imputation, there are simpler methods available if  the 
patterns of missing data are 'monotone' ---
if Vj is missing then all variables Vk, k>j are also missing, vs. more 
complex methods required when the patterns are not monotone.  The 
problem is to determine if, for a collection of variables, there is an 
ordering of them with a monotone
missing data pattern, or, if not, what the longest monotone sequence is.

Here is an example, where in a dataset of 65 observations, there are 8 
different patterns of missingness,
with X and . representing observed and missing:

Group  V2  V3  V4  V5  V6  V7  V8  V9  V10  V11   nmiss

  1    x   x   x   x   x   x   x   x    x    x      0  
  2    x   x   x   x   x   x   .   x    x    x      1  
  3    x   x   x   x   x   .   x   x    x    x      1  
  4    x   x   x   x   x   .   .   x    x    x      2  
  5    x   x   .   x   .   x   x   x    x    x      2  
  6    x   x   .   .   x   x   x   x    x    x      2  
  7    x   x   .   .   x   .   x   x    x    x      3  
  8    x   x   .   .   .   x   x   x    x    x      3  

Treated as a binary matrix, one can sort the columns by the number
of non-missing for each variable, and monotone means that there
are at most 2 runs -- a string of 0s followed by all 1s for *all*
patterns. But how
to determine an ordering (or orderings) of variables of maximal length?

Group   V2  V3  V9 V10  V11  V6  V8  V5  V7  V4   nmiss

  1     0   0   0   0    0    0   0   0   0   0    0  
  2     0   0   0   0    0    0   1   0   0   0    1  
  3     0   0   0   0    0    0   0   0   1   0    1  
  4     0   0   0   0    0    0   1   0   1   0    2  
  5     0   0   0   0    0    1   0   0   0   1    2  
  6     0   0   0   0    0    0   0   1   0   1    2  
  7     0   0   0   0    0    0   0   1   1   1    3  
  8     0   0   0   0    0    1   0   1   0   1    3  
       ==  ==  ==  ===  ===  ==  ==  ==  ==  =        0   0   0   0    0    2  
2   3   3   4


-- 
Michael Friendly     Email: friendly at yorku.ca 
Professor, Psychology Dept.
York University      Voice: 416 736-5115 x66249 Fax: 416 736-5814
4700 Keele Street    math.yorku.ca/SCS/friendly.html
Toronto, ONT  M3J 1P3 CANADA

(Ted Harding)

2005-Jan-07 14:13 UTC

head link

[R] patterns of missing data: determining monotonicity

On 06-Jan-05 Michael Friendly wrote:> Here is a problem that perhaps someone out here has an idea
> about. It vaguely reminds me of something I've seen before,
> but can't place.  Can anyone help?
> 
> For multiple imputation, there are simpler methods available
> if  the patterns of missing data are 'monotone' --- if Vj is
> missing then all variables Vk, k>j are also missing, vs. more 
> complex methods required when the patterns are not monotone.
> The problem is to determine if, for a collection of variables,
> there is an ordering of them with a monotone missing data pattern,
> or, if not, what the longest monotone sequence is.
> 
> Here is an example, where in a dataset of 65 observations, there
> are 8 different patterns of missingness, with X and . representing
> observed and missing:
> 
> Group   V2   V3   V4   V5   V6   V7   V8   V9   V10   V11   nmiss
>   1     X    X    X    X    X    X    X    X     X     X      0  
>   2     X    X    X    X    X    X    .    X     X     X      1  
>   3     X    X    X    X    X    .    X    X     X     X      1  
>   4     X    X    X    X    X    .    .    X     X     X      2  
>   5     X    X    .    X    .    X    X    X     X     X      2  
>   6     X    X    .    .    X    X    X    X     X     X      2  
>   7     X    X    .    .    X    .    X    X     X     X      3  
>   8     X    X    .    .    .    X    X    X     X     X      3  
> 
> Treated as a binary matrix, one can sort the columns by the number
> of non-missing for each variable, and monotone means that there
> are at most 2 runs -- a string of 0s followed by all 1s for *all*
> patterns. But how
> to determine an ordering (or orderings) of variables of maximal length?
> 
> Group   V2   V3   V9   V10   V11   V6   V8   V5   V7   V4   nmiss
>   1      0    0    0    0     0     0    0    0    0    0     0  
>   2      0    0    0    0     0     0    1    0    0    0     1  
>   3      0    0    0    0     0     0    0    0    1    0     1  
>   4      0    0    0    0     0     0    1    0    1    0     2  
>   5      0    0    0    0     0     1    0    0    0    1     2  
>   6      0    0    0    0     0     0    0    1    0    1     2  
>   7      0    0    0    0     0     0    0    1    1    1     3  
>   8      0    0    0    0     0     1    0    1    0    1     3  
>         ==   ==   ==   ===   ===   ==   ==   ==   ==   =>          0   
0    0    0     0     2    2    3    3    4
Hi Michael,

Consider the following approach. It's not a full solution to
the specific problem you have posed above, but it contains
pathways to solutions.

If you're doing multiple imputation anyway, you should install
the packages "cat" (for categorical data), "norm" (for
continuous
data, assumed Normal) and "mix" (for data mixing both kinds),
and also "pan" for MI on "panel" data, which might also be
useful
to you.

I'll discuss the situation using "cat" as an example, though
"norm" works the same way as far as this question is concerned.

First make sure your data are arranged as a matrix X (say)
with rows representing "cases" and columns variables. If the
variables are categorical, make sure that their values are
represented as integers 1, 2, 3, ... (don't start with "0"),
and represent missing values as NA.

Example of data matrix X:

  X
        [,1] [,2] [,3]
   [1,]    3    1    2
   [2,]    2    1    3
   [3,]    2    1   NA
   [4,]    2    3   NA
   [5,]    1    3   NA
   [6,]    2   NA   NA
   [7,]    2   NA   NA
   [8,]    3   NA   NA
   [9,]   NA   NA   NA
  [10,]   NA   NA   NA

(constructed to have monotone pattern). Now shuffle it:

  X<-X[,sample(1:3)]
  X<-X[sample(1:10),]

  X
        [,1] [,2] [,3]
   [1,]    1    2    3
   [2,]    3   NA    2
   [3,]    1   NA    2
   [4,]    3   NA    1
   [5,]    1    3    2
   [6,]   NA   NA   NA
   [7,]   NA   NA    2
   [8,]   NA   NA    3
   [9,]   NA   NA    2
  [10,]   NA   NA   NA

Consider this as a real data matrix where now it is not
obvious that it has monotone missingness pattern. Then:

  library(cat)
  s <- prelim.cat(X)

Now read *very*carefully"

  ?prelim.cat

and in particular what is said about its value (the value of s).
Note also what is *not* said about it!

Now look at "s" by printing it to the console. Amongst its 17
components the following are of particular interest.

  s$x
        [,1] [,2] [,3]
   [1,]    1    3    2
   [2,]    1    2    3
   [3,]    3   NA    1
   [4,]    1   NA    2
   [5,]    3   NA    2
   [6,]   NA   NA    2
   [7,]   NA   NA    2
   [8,]   NA   NA    3
   [9,]   NA   NA   NA
  [10,]   NA   NA   NA

You can see that this is the same as X except that rows have
been permuted to push the NAs downwards. The component

   s$ro
   [1]  2  5  4  3  1  9  6  8  7 10

shows the permutation: the original Row 1 of X is Row 2 of s$x,
the original row 2 of X is Row 5 of s$x, and so on.

Now look at the component s$nmis of s:

   s$nmis
   [1] 5 8 2

This gives the numbers of missing values in the different
columns of X (and of s$x since the order of columns has not
been changed).

Now you can sort s$nmis into decreasing order using the
"index.return=TRUE" option of 'sort' so as to get the
column permutation:

  sort(s$nmis,index.return=TRUE)
  $x
  [1] 2 5 8

  $ix
  [1] 3 1 2

You can check directly that s$x[,c(3,1,2)] is in monotone
pattern; more directly, you can get X re-structured into
monotone pattern as

  s$x[,sort(s$nmis,index.return=TRUE)$ix]
        [,1] [,2] [,3]
   [1,]    2    1    3
   [2,]    3    1    2
   [3,]    1    3   NA
   [4,]    2    1   NA
   [5,]    2    3   NA
   [6,]    2   NA   NA
   [7,]    2   NA   NA
   [8,]    3   NA   NA
   [9,]   NA   NA   NA
  [10,]   NA   NA   NA

I hope this is some help. At least it shows you places where
you can start digging. If the original X is incompatible with
monotone pattern, then the above should give you something
which is close to monotone, though I'm not sure whether it
will get you "as close as possible"; and you may need to
do some more work to uncover how to determine your "longest
monotone sequence".

In any case, since these MI packages (all based on Shafer's
original S code) work internally with monotonicity in mind,
for reasons of efficiency and fast convergence, you may find
that your imputation needs are met by them.

Best wishes,
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861  [NB: New number!]
Date: 07-Jan-05                                       Time: 14:13:20
------------------------------ XFMail ------------------------------

Maybe Matching Threads

Search for more maybe matching threads

R help - Jan 2005 - patterns of missing data: determining monotonicity

[R] patterns of missing data: determining monotonicity

[R] re: patterns of missing data: determining monotonicity

[R] patterns of missing data: determining monotonicity

Maybe Matching Threads