thr3ads.net - R help - [R] Linear Model and Missing Data in Predictors [Mar 2016]

If this information is useful, please help other people find it:
Share via:

Lorenzo Isella

2016-Mar-15 15:14 UTC

[R] Linear Model and Missing Data in Predictors

Dear All,
A situation that for sure happens very often: suppose you are in the
following situation

set.seed(1235)
x1 <- seq(30)
x2 <- c(rep(NA, 9), rnorm(19)+9, c(NA, NA))
x3 <- c(rnorm(17)-2, rep(NA, 13))

y <- exp(seq(1,5, length=30))


mm<-lm(y~x1+x2+x3)

i.e. you try a simple linear regression with multiple regressors
which exhibit some missing values.
This is what happens to me while working with some time series which I
use as regressors and whose missing values are padded with NAs.
lm, as a default, disregard the sets of incomplete observations and
therefore drops quite a lot of data.
Is there any way to circumvent this? I mean, is there a way to somehow
come up with a piecewise linear regression where, whenever possible,
all the 3 regressors are used but we switch to 1 or 2 when there are
missing data?
I say this because it is totally unfeasible to try to figure out the
values of the missing data in my regressors, but at the same time I
cannot restrict my model to the intersection of the non-NA values in
the 3 regressors. If this makes sense, do I have to code it myself or
is there any package which already implemented this?
Any suggestion is appreciated.
Cheers

Lorenzo

Jeff Newmiller

2016-Mar-15 16:36 UTC

head link

[R] Linear Model and Missing Data in Predictors

IMHO this is not a question about R... it is a question about statistics whether
R is involved or not. As such, a forum like stats.stackexchange.com would be
better suited to address this.

FWIW I happen to think that expecting R to solve this for you is unreasonable. 
-- 
Sent from my phone. Please excuse my brevity.

On March 15, 2016 8:14:42 AM PDT, Lorenzo Isella <lorenzo.isella at
gmail.com> wrote:>Dear All,
>A situation that for sure happens very often: suppose you are in the
>following situation
>
>set.seed(1235)
>x1 <- seq(30)
>x2 <- c(rep(NA, 9), rnorm(19)+9, c(NA, NA))
>x3 <- c(rnorm(17)-2, rep(NA, 13))
>
>y <- exp(seq(1,5, length=30))
>
>
>mm<-lm(y~x1+x2+x3)
>
>i.e. you try a simple linear regression with multiple regressors
>which exhibit some missing values.
>This is what happens to me while working with some time series which I
>use as regressors and whose missing values are padded with NAs.
>lm, as a default, disregard the sets of incomplete observations and
>therefore drops quite a lot of data.
>Is there any way to circumvent this? I mean, is there a way to somehow
>come up with a piecewise linear regression where, whenever possible,
>all the 3 regressors are used but we switch to 1 or 2 when there are
>missing data?
>I say this because it is totally unfeasible to try to figure out the
>values of the missing data in my regressors, but at the same time I
>cannot restrict my model to the intersection of the non-NA values in
>the 3 regressors. If this makes sense, do I have to code it myself or
>is there any package which already implemented this?
>Any suggestion is appreciated.
>Cheers
>
>Lorenzo
>
>______________________________________________
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
	[[alternative HTML version deleted]]

William Dunlap

2016-Mar-15 16:47 UTC

head link

[R] Linear Model and Missing Data in Predictors

One technique for dealing with this is called 'multiple imputation'.
Google for 'multiple imputation in R' to find R packages that implement
it (e.g., the 'mi' package).

Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Tue, Mar 15, 2016 at 8:14 AM, Lorenzo Isella <lorenzo.isella at
gmail.com>
wrote:
> Dear All,
> A situation that for sure happens very often: suppose you are in the
> following situation
>
> set.seed(1235)
> x1 <- seq(30)
> x2 <- c(rep(NA, 9), rnorm(19)+9, c(NA, NA))
> x3 <- c(rnorm(17)-2, rep(NA, 13))
>
> y <- exp(seq(1,5, length=30))
>
>
> mm<-lm(y~x1+x2+x3)
>
> i.e. you try a simple linear regression with multiple regressors
> which exhibit some missing values.
> This is what happens to me while working with some time series which I
> use as regressors and whose missing values are padded with NAs.
> lm, as a default, disregard the sets of incomplete observations and
> therefore drops quite a lot of data.
> Is there any way to circumvent this? I mean, is there a way to somehow
> come up with a piecewise linear regression where, whenever possible,
> all the 3 regressors are used but we switch to 1 or 2 when there are
> missing data?
> I say this because it is totally unfeasible to try to figure out the
> values of the missing data in my regressors, but at the same time I
> cannot restrict my model to the intersection of the non-NA values in
> the 3 regressors. If this makes sense, do I have to code it myself or
> is there any package which already implemented this?
> Any suggestion is appreciated.
> Cheers
>
> Lorenzo
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

R help - Mar 2016 - Linear Model and Missing Data in Predictors

[R] Linear Model and Missing Data in Predictors

[R] Linear Model and Missing Data in Predictors

[R] Linear Model and Missing Data in Predictors