thr3ads.net - R help - [R] can predict ignore rows with insufficient info [Sep 2003]

If this information is useful, please help other people find it:
Share via:

Peter Whiting

2003-Sep-16 16:44 UTC

[R] can predict ignore rows with insufficient info

I need predict to ignore rows that contain levels not in the
model.

Consider a data frame, "const", that has columns for the number of
days required to construct a site and the city and state the site
was constructed in.

g<-lm(days~city,data=const)

Some of the sites in const have not yet been completed, and therefore
they have days==NA. I want to predict how many days these sites
will take to complete (I've simplified the above discussion to
remove many of the other factors involved.)

nconst<-subset(const,is.na(const$days))
x<-predict(g,nconst)
Error in model.frame.default(object, data, xlev = xlev) :
        factor city has new level(s) ALBANY

This is because we haven't yet completed a site in Albany.
If I just had one to worry about I could easily fix it (choose
a nearby market with similar characteristic) but I am dealing
with a several hundred cities. Instead, for the cities not
modeled by g I'd simply like to use the state, even though I
don't expect it to be as good:

g<-lm(days~state,data=const)
x<-predict(g,nconst)

I'm not sure how to identify the cities in nconst that are not
modeled by g (my actual model has many more predictors in the
formula) Is there a way to instruct predict to only predict the
rows for which it has enough information and not complain about
the others?

g<-lm(days~city,data=const)
x<-predict(g,nconst) ## the rows of x with city=ALBANY will be NA
g<-lm(days~state,data=const)
y<-predict(g,nconst)
x[is.na(x)]<-y[is.na(x)]

thanks,
pete

Peter Whiting

2003-Sep-16 19:15 UTC

head link

[R] can predict ignore rows with insufficient info

On Tue, Sep 16, 2003 at 11:44:02AM -0500, Peter Whiting
wrote:> 
> I'm not sure how to identify the cities in nconst that are not
> modeled by g (my actual model has many more predictors in the
> formula)
I guess I could use some form of
subset(const,const$city%in%g$xlevels$city) 
over and over again for each factor...

as usual, there has to be a better way.

pete



> Is there a way to instruct predict to only predict the
> rows for which it has enough information and not complain about
> the others?
> 
> g<-lm(days~city,data=const)
> x<-predict(g,nconst) ## the rows of x with city=ALBANY will be NA
> g<-lm(days~state,data=const)
> y<-predict(g,nconst)
> x[is.na(x)]<-y[is.na(x)]
> 
> thanks,
> pete
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Thomas W Blackwell

2003-Sep-16 20:17 UTC

head link

[R] can predict ignore rows with insufficient info

Peter  -

Your subsequent email seems just right.  You have to determine
ahead of time which rows can be estimated.  Here's a strategy,
and possibly some code to implement it.

Let  supported(i,y,d)  be a user-written function which returns
a logical vector indicating rows which should be omitted from
the prediction on account of a non-covered covariate in column i
of data frame d with outcome variable y.  Apply this function to
all columns in your data frame using  lapply().  Then do the "or"
of all the logical vectors by calculating the row sums of the
numeric (0 or 1) equivalents.  Last, convert back to logical,
and subscript your data frame with this in the call to  predict().

Here's some rough code:

supported <- function(i,y,d)  {
   result <- rep(F, dim(d)[1])      # default return value when
   if (is.factor(d[[i]]))           #  d[[i]] is not a factor.
     result <- d[[i]] %in% unique(d[[i]][ !is.na(d[[y]]) ])
   result  }

tmp.1 <- lapply(seq(along=const), supported, "days", const)
tmp.2 <- matrix(unlist(tmp.1[ names(const) != "days" ]),
nrow=dim(const)[1])
tmp.3 <- as.logical(as.vector(tmp.2 %*% rep(1, dim(tmp.2)[2])))

x <- predict(g, const[ is.na(const$days) & !tmp.3, ])

This code uses a few arcane maneuvers.  Look at help pages for
the relevant functions to dope out what it is doing.  Particularly
for  lapply(), seq(), rep(), unlist(), unique(), "%*%",
"%in%".
(The last two must be quoted in order to see the help).

However, the code might work for you right out of the box !

-  tom blackwell  -  u michigan medical school  -  ann arbor  -

On Tue, 16 Sep 2003, Peter Whiting wrote:
> I need predict to ignore rows that contain levels not in the
> model.
>
> Consider a data frame, "const", that has columns for the number
of
> days required to construct a site and the city and state the site
> was constructed in.
>
> g<-lm(days~city,data=const)
>
> Some of the sites in const have not yet been completed, and therefore
> they have days==NA. I want to predict how many days these sites
> will take to complete (I've simplified the above discussion to
> remove many of the other factors involved.)
>
> nconst<-subset(const,is.na(const$days))
> x<-predict(g,nconst)
> Error in model.frame.default(object, data, xlev = xlev) :
>         factor city has new level(s) ALBANY
>
> This is because we haven't yet completed a site in Albany.
> If I just had one to worry about I could easily fix it (choose
> a nearby market with similar characteristic) but I am dealing
> with a several hundred cities. Instead, for the cities not
> modeled by g I'd simply like to use the state, even though I
> don't expect it to be as good:
>
> g<-lm(days~state,data=const)
> x<-predict(g,nconst)
>
> I'm not sure how to identify the cities in nconst that are not
> modeled by g (my actual model has many more predictors in the
> formula) Is there a way to instruct predict to only predict the
> rows for which it has enough information and not complain about
> the others?
>
> g<-lm(days~city,data=const)
> x<-predict(g,nconst) ## the rows of x with city=ALBANY will be NA
> g<-lm(days~state,data=const)
> y<-predict(g,nconst)
> x[is.na(x)]<-y[is.na(x)]
>
> thanks,
> pete
>

Maybe Matching Threads

Search for more possibly parallel threads

R help - Sep 2003 - can predict ignore rows with insufficient info

[R] can predict ignore rows with insufficient info

[R] can predict ignore rows with insufficient info

[R] can predict ignore rows with insufficient info

Maybe Matching Threads