thr3ads.net - R help - [R] mice: selecting small subset of variables to impute from dataset with many variables (> 2500) [Jul 2022]

If this information is useful, please help other people find it:
Share via:

Bert Gunter

2022-Jul-14 18:09 UTC

[R] mice: selecting small subset of variables to impute from dataset with many variables (> 2500)

If I understand your query correctly, you can use negative indexing to
omit variables. See ?'[' for details.
> dat <- data.frame (a = 1:3, b = letters[1:3], c = 4:6, d = letters[5:7])
> dat  a b c d
1 1 a 4 e
2 2 b 5 f
3 3 c 6 g> dat[,-c(2,4)]  a c
1 1 4
2 2 5
3 3 6

Of course you have to know the numerical index of the columns you wish
to omit, but somethingh of the sort seems unavoidable in any case.

Cheers,
Bert

On Thu, Jul 14, 2022 at 11:00 AM Ian McPhail <ivmcphail at gmail.com>
wrote:>
> Hello,
>
> I am looking for some advice on how to select subsets of variables for
> imputing when using the mice package.
>
> From Van Buuren's original mice paper, I see that selecting variables
to be
> 'skipped' in an imputation can be written as:
>
> ini <- mice(nhanes2, maxit = 0, print = FALSE)
> pred <- ini$pred
> pred[, "bmi"] <- 0
> meth <- ini$meth
> meth["bmi"] <- ""
>
> With the last two lines specifying the the "bmi" variable gets
skipped over
> and not imputed.
>
> And I have come across other examples, but all that I have seen lay out a
> method of skipping variables where EVERY variable is named (as
"bmi" is
> named above). I am wondering if there is a reasonably easy way to select
> out approximately 30 variables for imputation from a larger dataset with
> around 2500 variables, without having to name all 2450+ other variables.
>
> Thank you,
>
> Ian
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Ebert,Timothy Aaron

2022-Jul-14 18:11 UTC

head link

[R] mice: selecting small subset of variables to impute from dataset with many variables (> 2500)

Maybe this is too simple but could you use the select() function from dplyr?
Tim

-----Original Message-----
From: R-help <r-help-bounces at r-project.org> On Behalf Of Bert Gunter
Sent: Thursday, July 14, 2022 2:10 PM
To: Ian McPhail <ivmcphail at gmail.com>
Cc: R-help <r-help at r-project.org>
Subject: Re: [R] mice: selecting small subset of variables to impute from
dataset with many variables (> 2500)

[External Email]

If I understand your query correctly, you can use negative indexing to omit
variables. See ?'[' for details.
> dat <- data.frame (a = 1:3, b = letters[1:3], c = 4:6, d = 
> letters[5:7]) dat  a b c d
1 1 a 4 e
2 2 b 5 f
3 3 c 6 g> dat[,-c(2,4)]  a c
1 1 4
2 2 5
3 3 6

Of course you have to know the numerical index of the columns you wish to omit,
but somethingh of the sort seems unavoidable in any case.

Cheers,
Bert

On Thu, Jul 14, 2022 at 11:00 AM Ian McPhail <ivmcphail at gmail.com>
wrote:>
> Hello,
>
> I am looking for some advice on how to select subsets of variables for 
> imputing when using the mice package.
>
> From Van Buuren's original mice paper, I see that selecting variables 
> to be 'skipped' in an imputation can be written as:
>
> ini <- mice(nhanes2, maxit = 0, print = FALSE) pred <- ini$pred
pred[,
> "bmi"] <- 0 meth <- ini$meth meth["bmi"] <-
""
>
> With the last two lines specifying the the "bmi" variable gets
skipped
> over and not imputed.
>
> And I have come across other examples, but all that I have seen lay 
> out a method of skipping variables where EVERY variable is named (as 
> "bmi" is named above). I am wondering if there is a reasonably
easy
> way to select out approximately 30 variables for imputation from a 
> larger dataset with around 2500 variables, without having to name all 2450+
other variables.
>
> Thank you,
>
> Ian
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mail
>
man_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAs
> Rzsn7AkP-g&m=UxEz20f8LSF-iyVuq17UnoNVkEe6HoC3E6vHWssLjSBKtqLSrm7qs8v2e
> wcXchwc&s=ABj_L_b515lhH7RIgTmmjylyWxJCbRWvzZDkxUkGw90&e> PLEASE
do read the posting guide
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.or
>
g_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeA
> sRzsn7AkP-g&m=UxEz20f8LSF-iyVuq17UnoNVkEe6HoC3E6vHWssLjSBKtqLSrm7qs8v2
> ewcXchwc&s=LiocKPLYgq5olAT6tqGjr2xOLwDWw55DRzhuq7gcF5A&e> and
provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=UxEz20f8LSF-iyVuq17UnoNVkEe6HoC3E6vHWssLjSBKtqLSrm7qs8v2ewcXchwc&s=ABj_L_b515lhH7RIgTmmjylyWxJCbRWvzZDkxUkGw90&ePLEASE
do read the posting guide
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=UxEz20f8LSF-iyVuq17UnoNVkEe6HoC3E6vHWssLjSBKtqLSrm7qs8v2ewcXchwc&s=LiocKPLYgq5olAT6tqGjr2xOLwDWw55DRzhuq7gcF5A&eand
provide commented, minimal, self-contained, reproducible code.

R help - Jul 2022 - mice: selecting small subset of variables to impute from dataset with many variables (> 2500)

[R] mice: selecting small subset of variables to impute from dataset with many variables (> 2500)

[R] mice: selecting small subset of variables to impute from dataset with many variables (> 2500)