On Sun, Aug 12, 2012 at 10:58 PM, Louise Cowpertwait
<louisecowpertwait at gmail.com> wrote:> Hi there,
>
> I have subscribed to R-help but am not sure how to view or post questions?
I think this is the right way.
Indeed!
>
> I am planning on doing a multivariate regression investigating the
relationship between depression (a continuous variable) and social support
variables (mostly continuous, some categorical) among older people. I have a
number of demographic and health-related variables that I am including as
control variables. I have a large dataset from nearly 4,000 individuals.
>
> I need to check whether my data is 1) Missing at Random (MAR) and 2)
Missing Completely At Random (MCAR).
>
> Here are three questions that I have related to this:
>
>
> 1) To check whether the data is MAR, I dichotomised a variable into missing
and not missing, and checked for any significant differences in means (for
continuous) or proportions (for categorical) of the other variables. I did this
for each of the variables in my analysis. Is this correct?
Something like a classic chi-sq test or small sample analogue? Seems
somewhat reasonable, though you'll have to worry about making sure you
don't get tripped up by doing too many tests. See ?p.adjust.methods
for some references.
>
> 2) Because of the size of my dataset, relationships for my MAR analysis are
coming up as significant when, practically, the differences in means or
proportions are not meaningful. Is it acceptable for me to argue as such, and
say that the data is effectively MAR despite statistical significance?
I'm not sure there is a "statistical" answer to that: it's
going to
depend much more on the nature of your data set. Let your "meta"
knowledge of the source of missingness guide things here.
>
> Sorry this is not a question specifically to R (more of a stats question)
so no problem if no-one can help, though it would be greatly appreciated.
>
> 3) I have no idea how to check whether the data is Missing Completely At
Random in R. I think this involves seeing whether those who had missing data for
one variable were more likely to have missing data in other variables? If so, I
don't know how to do this. Or, I need to do an overall test like
Little's test of missing completely at random. I have spent ages looking
online and at packages and can't find anything.
You might want to check the rms package and the accompanying book
(Regression Modeling Strategies) by Frank Harrell. It has the best
coverage of MAR/MCAR/Imputation/etc that I've read on a
"practical"
basis.
Cheers,
Michael
>
> Please help! I don't want to use SPSS!
>
> Cheers,
>
> Louise
>
>
>
>
>
> Louise Cowpertwait
> louisecowpertwait at gmail.com
> 021 258 9795
> Auckland, NZ
>
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.