thr3ads.net - R help - [R] logistic regression in an incomplete dataset [Apr 2010]

If this information is useful, please help other people find it:
Share via:

Desmond D Campbell

2010-Apr-04 22:44 UTC

[R] logistic regression in an incomplete dataset

Dear all,

I want to do a logistic regression.
So far I've only found out how, in a dataset of complete cases.
I'd like to do logistic regression via max likelihood, using all the study
cases (complete and incomplete). Can you help?

I'm using glm() with family=binomial(logit).
If any covariate in a study case is missing then the study case is
dropped, i.e. it is doing a complete case analysis.
As a lot of study cases are being dropped, I'd rather it did maximum
likelihood and on all the study cases.
I tried setting glm()'s na.action to NULL, but then it complained about
NA's present in the study cases.

regards
Desmond

Desmond Campbell

2010-Apr-05 10:34 UTC

head link

[R] logistic regression in an incomplete dataset

Dear all,

I want to do a logistic regression.
So far I've only found out how to do that in R, in a dataset of complete
cases.
I'd like to do logistic regression via max likelihood, using all the study
cases (complete and incomplete). Can you help?

I'm using glm() with family=binomial(logit).
If any covariate in a study case is missing then the study case is dropped, i.e.
it is doing a complete cases analysis.
As a lot of study cases are being dropped, I'd rather it did maximum
likelihood using all the study cases.
I tried setting glm()'s na.action to NULL, but then it complained about
NA's present in the study cases.
I've about 1000 unmatched study cases and less than 10 covariates so could
use unconditional ML estimation (as opposed to conditional ML estimation).

regards
Desmond


-- 
Desmond Campbell
UCL Genetics Institute
D.Campbell at ucl.ac.uk
Tel. ext. 020 31084006, int. 54006

Desmond Campbell

2010-Apr-06 11:12 UTC

head link

[R] logistic regression in an incomplete dataset

Hi Bert,

Thanks for your reply.

I AM making an assumption of MAR data, because
  informative missingness (I assume you mean NMAR) is too hard to deal with
  I have quite a few covariates (so the observed is likely to predict 
the missing and mitigate against informative missingness)
  the missingness is not supposed to be censoring
  I doubt the missingness on the covariates (mostly environmental type 
measures) is censoring with respect to the independent variables which 
are genotypes

I don't like complete case logistic regression because
  it is less robust
  and throws away info
However I don't have time to do anything clever so I'm just going to go 
along with the complete case logistic regression.

Thanks again.

regards
Desmond

Bert Gunter wrote:> Desmond:
>
> The problem with ML with missing data is both the M and the L. In MAR, the
L
> factors into a part involving the missingness parameters and the model
> parameters,  and you can maximize the model parameters part without having
> to worry about missingness because they depend only on the observed data.
> (MCAR is even easier, since missingness doesn't change the likelihood).
>
> For informative missingness you have to come up with an L to maximize, and
> this is hard. There's also no way of checking the adequacy of the L
(since
> the data to check it are missing). And when you choose your L, the M may be
> hard to do numerically.
>
> As Emmanuel indicated, Bayes may help, but now I'm at he end of MY
> knowledge.
>
> Note that in many cases, "missing" is actually not missing --
it's
> censoring. And for that, likelihoods can be obtained (and maximized). 
>
> Cheers,
>
> Bert Gunter
> Genentech Nonclinical Biostatistics
>  
>  
>
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at
r-project.org] On
> Behalf Of Desmond D Campbell
> Sent: Monday, April 05, 2010 3:19 PM
> To: Emmanuel Charpentier
> Cc: r-help at r-project.org; Desmond Campbell
> Subject: Re: [R] logistic regression in an incomplete dataset
>
> Dear Emmanuel,
>
> Thank you.
>
> Yes I broadly agree with what you say.
> I think ML is a better strategy than complete case, because I think its
> estimates will be more robust than complete case.
> For unbiased estimates I think
>   ML requires the data is MAR,
>   complete case requires the data is MCAR
>
> Anyway I would have thought ML could be done without resorting to Multiple
> Imputation, but I'm at the edge of my knowledge here.
>
> Thanks once again,
>
> regards
> Desmond
>
>
> From: Emmanuel Charpentier <charpent <at> bacbuc.dyndns.org>
> Subject: Re: logistic regression in an incomplete dataset
> Newsgroups: gmane.comp.lang.r.general
> Date: 2010-04-05 19:58:20 GMT (2 hours and 10 minutes ago)
>
> Dear Desmond,
>
> a somewhat analogous question has been posed recently (about 2 weeks
> ago) on the sig-mixed-model list, and I tried (in two posts) to give
> some elements of information (and some bibliographic pointers). To
> summarize tersely :
>
> - a model of "information missingness" (i. e. *why* are some data
> missing ?) is necessary to choose the right measures to take. Two
> special cases (Missing At Random and Missing Completely At Random) allow
> for (semi-)automated compensation. See literature for further details.
>
> - complete-case analysis may give seriously weakened and *biased*
> results. Pairwise-complete-case analysis is usually *worse*.
>
> - simple imputation leads to underestimated variances and might also
> give biased results.
>
> - multiple imputation is currently thought of a good way to alleviate
> missing data if you have a missingness model (or can honestly bet on
> MCAR or MAR), and if you properly combine the results of your
> imputations.
>
> - A few missing data packages exist in R to handle this case. My ersonal
> selection at this point would be mice, mi, Amelia, and possibly mitools,
> but none of them is fully satisfying(n particular, accounting for a
> random effect needs special handling all the way in all packages...).
>
> - An interesting alternative is to write a full probability model (in
> BUGS fo example) and use Bayesian estimation ; in this framework,
> missing data are "naturally" modeled in the model used for
analysis.
> However, this might entail *large* work, be difficult and not always
> succeed (numerical difficulties. Furthermore, the results of a Byesian
> analysis might not be what you seek...
>
> HTH,
>
> 					Emmanuel Charpentier
>
> Le lundi 05 avril 2010 ? 11:34 +0100, Desmond Campbell a ?crit :
>   
>> Dear all,
>>
>> I want to do a logistic regression.
>> So far I've only found out how to do that in R, in a dataset of
complete
>>     
> cases.
>   
>> I'd like to do logistic regression via max likelihood, using all
the
>>     
> study cases (complete and
> incomplete). Can you help?
>   
>> I'm using glm() with family=binomial(logit).
>> If any covariate in a study case is missing then the study case is
>>     
> dropped, i.e. it is doing a complete cases analysis.
>   
>> As a lot of study cases are being dropped, I'd rather it did
maximum
>>     
> likelihood using all the study cases.
>   
>> I tried setting glm()'s na.action to NULL, but then it complained
about
>>     
> NA's present in the study cases.
>   
>> I've about 1000 unmatched study cases and less than 10 covariates
so
>>     
> could use unconditional ML
> estimation (as opposed to conditional ML estimation).
>   
>> regards
>> Desmond
>>
>>
>> --
>> Desmond Campbell
>> UCL Genetics Institute
>> D.Campbell at ucl.ac.uk
>> Tel. ext. 020 31084006, int. 54006
>>
>>
>>     
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>   

-- 
Desmond Campbell
UCL Genetics Institute
D.Campbell at ucl.ac.uk
Tel. ext. 020 31084006, int. 54006

Possibly Parallel Threads

Search for more apparently analagous threads

R help - Apr 2010 - logistic regression in an incomplete dataset

[R] logistic regression in an incomplete dataset

[R] logistic regression in an incomplete dataset

[R] logistic regression in an incomplete dataset

Possibly Parallel Threads