thr3ads.net - R help - [R] Out-of-sample predictions with boosting model [Jul 2010]

If this information is useful, please help other people find it:
Share via:

Travis Berge

2010-Jul-28 18:53 UTC

[R] Out-of-sample predictions with boosting model

Hi UseRs -

I am new to R, and could use some help making out-of-sample predictions
using a boosting model (the mboost command). The issue is complicated by the
fact that I have panel data (time by country), and am estimating the model
separately for each country. FYI, this is monthly data and I have 1986m1 -
2009m12 for 9 countries.

To give you a flavor of what I am doing, here is a simple example to show
how I make in-sample predictions:

# data has following columns: country year month y x1 x2 x3
dat = read.csv(data.csv)

# Create function that estimates model, produces in-sample predictions
bbox = function(df)
{
blackbox = mboost(y ~ x1 + x2 + x3)
predict(blackbox)
}

# Use lapply to estimate by country
bycountry = lapply(split(dat, dat$country), bbox)


So that in the end I have an object bycountry that contains the in-sample
predictions of the model, estimated for each country separately. What I
would like to do is take this model and estimate it for each country using
some initial data. I.e., estimate Australia with 1986m1-2003m12, make
prediction about 2004m1, roll data forward. Estimate AUS with 1986m2-2004m1,
predict 2004m2, etc for all data points. Now do the same for Canada,
Denmark, etc.

So I guess my problem is twofold. 1) How to make these out-of-sample
predictions, by country, when my data has not been declared as time-series?
(I do not think that mboost can handle time-series data...x1 x2 and x3 have
been lagged appropriately). 2) How to save the one-step ahead predictions
into a vector?

Any thoughts would be greatly appreciated. Many thanks!

-Travis

	[[alternative HTML version deleted]]

Benjamin Hofner

2010-Jul-30 09:48 UTC

head link

[R] Out-of-sample predictions with boosting model

Hi Travis,

I try to give you some hints that might bring you closer to a solution.
The clue to your problem (as far as I understand it) might just be to 
appropriately use the predict function of mboost. You can specify a new 
data set (e.g. a part of your original data set not used for estimation) 
and

 > predict(model, newdata = newdata)

which gives you a vector of predictions as you wanted. Thus, you could, 
for example, specify newdata such that you get your one-step ahead 
predictions.

To estimate the model only on a subset of the data you could either use

 > mboost(y ~ x1 + x2 + x3, data = some_part_of_your_dataset)

or you can apply weights

 > model <- mboost(y ~ x1 + x2 + x3, data = data,
+                 weights = c(rep(1, 100), rep(0, nrow(data) - 100)))
 > predict(model) ## gives you predictions for all observations in data

Now you can extract the subset of out-of-bag predictions, i.e., 
predictions for observations with weight 0.

One further thing to mention:
You term your model blackbox, however you should note that you do NOT 
fit a blackbox model but an additive model using P-splines (which is the 
default). You can see this if you type, e.g.,

 > coef(model)

and look at the names.

Another idea for your data problem might be that you fit ONE model with 
country as effect modifier specified via the "by" argument in all 
base-learners. A call could look like

 > mboost(y ~ bbs(x1, by = country) + bbs(x2, by = country)
+            + bbs(x3, by = country), data = data)

Or you could use random effects via brandom() base-learners. Oh, and 
please note that you need to tune your mstop value (e.g. via cvrisk)!

HTH
  Benjamin

Travis Berge <travisrhelp at gmail.com> wrote:> Hi UseRs -
>
> I am new to R, and could use some help making out-of-sample predictions
> using a boosting model (the mboost command). The issue is complicated by
the
> fact that I have panel data (time by country), and am estimating the model
> separately for each country. FYI, this is monthly data and I have 1986m1 -
> 2009m12 for 9 countries.
>
> To give you a flavor of what I am doing, here is a simple example to show
> how I make in-sample predictions:
>
> # data has following columns: country year month y x1 x2 x3
> dat = read.csv(data.csv)
>
> # Create function that estimates model, produces in-sample predictions
> bbox = function(df)
> {
> blackbox = mboost(y ~ x1 + x2 + x3)
> predict(blackbox)
> }
>
> # Use lapply to estimate by country
> bycountry = lapply(split(dat, dat$country), bbox)
>
>
> So that in the end I have an object bycountry that contains the in-sample
> predictions of the model, estimated for each country separately. What I
> would like to do is take this model and estimate it for each country using
> some initial data. I.e., estimate Australia with 1986m1-2003m12, make
> prediction about 2004m1, roll data forward. Estimate AUS with
1986m2-2004m1,
> predict 2004m2, etc for all data points. Now do the same for Canada,
> Denmark, etc.
>
> So I guess my problem is twofold. 1) How to make these out-of-sample
> predictions, by country, when my data has not been declared as time-series?
> (I do not think that mboost can handle time-series data...x1 x2 and x3 have
> been lagged appropriately). 2) How to save the one-step ahead predictions
> into a vector?
>
> Any thoughts would be greatly appreciated. Many thanks!
>
> -Travis
>
> 	[[alternative HTML version deleted]]

TravisB

2010-Jul-30 15:12 UTC

head link

[R] Out-of-sample predictions with boosting model

Thanks, Ben.

Specifying the data in this way should give me what I need - I can do this
in a loop fairly easily and wind up with what I want in the end. And I
should've been more specific, I think that mboost can handle time-series
data, it's the fact that the data is a panel bit that was giving me
issues--I'd like to think about using base learners that can handle panel
data in the future so if you have any suggestions there I'd be happy to hear
them.

Also, don't worry, the models I specify are tuned and I have thought about
what base learners are appropriate; I left it as a blackbox for the
exposition of my post. :)

Anyway, many thanks,

Travis
-- 
View this message in context:
http://r.789695.n4.nabble.com/Out-of-sample-predictions-with-boosting-model-tp2305458p2308082.html
Sent from the R help mailing list archive at Nabble.com.

Maybe Matching Threads

Search for more seemingly similar threads

R help - Jul 2010 - Out-of-sample predictions with boosting model

[R] Out-of-sample predictions with boosting model

[R] Out-of-sample predictions with boosting model

[R] Out-of-sample predictions with boosting model

Maybe Matching Threads