thr3ads.net - R devel - [Rd] Wish R Core had a standard format (or generic function) for "newdata" objects [Apr 2011]

If this information is useful, please help other people find it:
Share via:

Paul Johnson

2011-Apr-26 15:13 UTC

[Rd] Wish R Core had a standard format (or generic function) for "newdata" objects

Is anybody working on a way to standardize the creation of "newdata"
objects for predict methods?

When using predict, I find it difficult/tedious to create newdata data
frames when there are many variables. It is necessary to set all
variables at the mean/mode/median, and then for some variables of
interest, one has to insert values for which predictions are desired.
I was at a presentation by Scott Long last week and he was discussing
the increasing emphasis in Stata on calculations of marginal
predictions and "Spost" an several other packages, and,
co-incidentally, I had a student visit who is learning to use R MASS's
polr (W.Venables and B. Ripley) and we wrestled for quite a while to
try to make the same calculations that Stata makes automatically.  It
spits out predicted probabilities each independent variable, keeping
other variables at a reference level.

I've found R packages that aim to do essentially the same thing.

In Frank Harrell's Design/rms framework, he uses a "data.dist"
function that generates an object that the user has to put into the R
options.  I think many users trip over the use of "options" there.  If
I don't use that for a month or two, I completely forget the fine
points and have to fight with it.  But it does "work" to give plots
and predict functions the information they require.

In  Zelig ( by Kosuke Imai, Gary King, and Olivia Lau), a function
"setx" does the work of creating "newdata" objects. That
appears to be
about right as a candidate for a generic "newdata" function. Perhaps
it could directly generalize to all R regression functions, but right
now it is tailored to the models in Zelig. It has separate methods for
the different types of models, and that is a bit confusing to me,since
the "newdata" in one model should be the same as the newdata in
another, I'm guessing. But his code is all there, I'll keep looking.

In Effects (by John Fox), there are internal functions to create
newdata and plot the marginal effects. If you load effects and run,
for example, "effects:::effect.lm" you see Prof Fox has his own way of
grabbing information from model columns and calculating predictions.

I think it is time the R Core Team would look at this tell "us" what
is the right way to do this. I think the interface to setx in Zelig is
pretty easy to understand, at least for numeric variables.

In R's termplot function, such a thing could be put to use.  As far as
I can tell now, termplot is doing most of the work of creating a
newdata object, but not exactly.

It seems like it would be a shame to proliferate more functions that
do the same function, when it is such a common thing.

-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

Duncan Murdoch

2011-Apr-27 00:39 UTC

head link

[Rd] Wish R Core had a standard format (or generic function) for "newdata" objects

On 26/04/2011 11:13 AM, Paul Johnson wrote:> Is anybody working on a way to standardize the creation of
"newdata"
> objects for predict methods?
They're generally just dataframes.  Use the data.frame() function.
> When using predict, I find it difficult/tedious to create newdata data
> frames when there are many variables. It is necessary to set all
> variables at the mean/mode/median, and then for some variables of
> interest, one has to insert values for which predictions are desired.
In most models, all variables are necessary in order to produce 
predictions.  If you want to do predictions for one variable, holding 
the others at particular fixed values, just create a dataframe.

For example, suppose the original data is

X <- data.frame(a=rnorm(100), b=rnorm(100), c=rnorm(100))
y <- with(X, a + 2*b + 3*c + rnorm(100))

# You use lm() to get a fit:

fit <- lm(y ~ ., data=X)

# Compute the means of all the covariates:

means <- lapply(X, mean)

# Replace a by a range of values from -1 to 1:

means$a <- seq(-1, 1, len=11)

# Convert to a data.frame:

newdata <- as.data.frame(means)

# Do the predictions:

predict(fit, newdata=newdata)


That was three lines of code to produce the newdata dataframe.  It's not 
that hard.  I would think it's easier to write those lines than to 
specify how to do this in general.
> I was at a presentation by Scott Long last week and he was discussing
> the increasing emphasis in Stata on calculations of marginal
> predictions and "Spost" an several other packages, and,
> co-incidentally, I had a student visit who is learning to use R MASS's
> polr (W.Venables and B. Ripley) and we wrestled for quite a while to
> try to make the same calculations that Stata makes automatically.  It
> spits out predicted probabilities each independent variable, keeping
> other variables at a reference level.
>
> I've found R packages that aim to do essentially the same thing.
>
> In Frank Harrell's Design/rms framework, he uses a
"data.dist"
> function that generates an object that the user has to put into the R
> options.  I think many users trip over the use of "options"
there.  If
> I don't use that for a month or two, I completely forget the fine
> points and have to fight with it.  But it does "work" to give
plots
> and predict functions the information they require.
>
> In  Zelig ( by Kosuke Imai, Gary King, and Olivia Lau), a function
> "setx" does the work of creating "newdata" objects.
That appears to be
> about right as a candidate for a generic "newdata" function.
Perhaps
> it could directly generalize to all R regression functions, but right
> now it is tailored to the models in Zelig. It has separate methods for
> the different types of models, and that is a bit confusing to me,since
> the "newdata" in one model should be the same as the newdata in
> another, I'm guessing. But his code is all there, I'll keep
looking.
>
> In Effects (by John Fox), there are internal functions to create
> newdata and plot the marginal effects. If you load effects and run,
> for example, "effects:::effect.lm" you see Prof Fox has his own
way of
> grabbing information from model columns and calculating predictions.
>
> I think it is time the R Core Team would look at this tell "us"
what
> is the right way to do this. I think the interface to setx in Zelig is
> pretty easy to understand, at least for numeric variables.
If you don't like the way this was done in my three lines above, or by 
Frank Harrell, or the Zelig group, or John Fox, why don't you do it 
yourself, and get it right this time?  It's pretty rude to complain 
about things that others have given you for free, and demand they do it 
better.

Duncan Murdoch

>
> In R's termplot function, such a thing could be put to use.  As far as
> I can tell now, termplot is doing most of the work of creating a
> newdata object, but not exactly.
>
> It seems like it would be a shame to proliferate more functions that
> do the same function, when it is such a common thing.
>

Christophe Dutang

2011-Apr-27 16:53 UTC

head link

[Rd] Wish R Core had a standard format (or generic function) for "newdata" objects

Among many solutions, I generally use the following code, which avoids the
ideal average individual, by considering the mean across of the predicted
values:

averagingpredict <- function(model, varname, varseq, type, subset=NULL)
{
    if(is.null(subset))
        mydata <- model$data
    else
        mydata <- model$data[subset, ]

    f <- function(x)
    {
        mydata[, varname] <- x
        mean(predict(model, newdata=mydata, type=type), na.rm=TRUE)
    }

    sapply(varseq, f)
}

It is time consuming, but it deals with non numeric variables.


Christophe


2011/4/26 Paul Johnson <pauljohn32@gmail.com>
> Is anybody working on a way to standardize the creation of
"newdata"
> objects for predict methods?
>
> When using predict, I find it difficult/tedious to create newdata data
> frames when there are many variables. It is necessary to set all
> variables at the mean/mode/median, and then for some variables of
> interest, one has to insert values for which predictions are desired.
> I was at a presentation by Scott Long last week and he was discussing
> the increasing emphasis in Stata on calculations of marginal
> predictions and "Spost" an several other packages, and,
> co-incidentally, I had a student visit who is learning to use R MASS's
> polr (W.Venables and B. Ripley) and we wrestled for quite a while to
> try to make the same calculations that Stata makes automatically.  It
> spits out predicted probabilities each independent variable, keeping
> other variables at a reference level.
>
> I've found R packages that aim to do essentially the same thing.
>
> In Frank Harrell's Design/rms framework, he uses a
"data.dist"
> function that generates an object that the user has to put into the R
> options.  I think many users trip over the use of "options"
there.  If
> I don't use that for a month or two, I completely forget the fine
> points and have to fight with it.  But it does "work" to give
plots
> and predict functions the information they require.
>
> In  Zelig ( by Kosuke Imai, Gary King, and Olivia Lau), a function
> "setx" does the work of creating "newdata" objects.
That appears to be
> about right as a candidate for a generic "newdata" function.
Perhaps
> it could directly generalize to all R regression functions, but right
> now it is tailored to the models in Zelig. It has separate methods for
> the different types of models, and that is a bit confusing to me,since
> the "newdata" in one model should be the same as the newdata in
> another, I'm guessing. But his code is all there, I'll keep
looking.
>
> In Effects (by John Fox), there are internal functions to create
> newdata and plot the marginal effects. If you load effects and run,
> for example, "effects:::effect.lm" you see Prof Fox has his own
way of
> grabbing information from model columns and calculating predictions.
>
> I think it is time the R Core Team would look at this tell "us"
what
> is the right way to do this. I think the interface to setx in Zelig is
> pretty easy to understand, at least for numeric variables.
>
> In R's termplot function, such a thing could be put to use.  As far as
> I can tell now, termplot is doing most of the work of creating a
> newdata object, but not exactly.
>
> It seems like it would be a shame to proliferate more functions that
> do the same function, when it is such a common thing.
>
> --
> Paul E. Johnson
> Professor, Political Science
> 1541 Lilac Lane, Room 504
> University of Kansas
>
> ______________________________________________
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>


-- 
Christophe DUTANG
Ph. D. student at ISFA, Lyon, France

	[[alternative HTML version deleted]]

Seemingly Similar Threads

Search for more apparently analagous threads

R devel - Apr 2011 - Wish R Core had a standard format (or generic function) for "newdata" objects

[Rd] Wish R Core had a standard format (or generic function) for "newdata" objects

[Rd] Wish R Core had a standard format (or generic function) for "newdata" objects

[Rd] Wish R Core had a standard format (or generic function) for "newdata" objects

Seemingly Similar Threads