thr3ads.net - R help - [R] MOB (party package) Question

If this information is useful, please help other people find it:
Share via:

mijohnso at eos.ubc.ca

2013-Aug-07 16:58 UTC

[R] MOB (party package) Question - Variable Selection

Hi. I am a grad student and I'm currently using the MOB function in the R
party package and I had a question. I am working on an environmental problem
with about 100 predictors. I am having trouble determining which predictors to
use for regression and which for partitioning, is there any sort of method to
determine this? Does it cause problems if a variable is used for both regression
and partitioning? I attempted to pre-screen the variables using stepwise linear
regression and I used the selected variables for regression and all others for
partitioning. However this lead to the model only having one node. Any
suggestions would be very much appreciated, thanks.
	[[alternative HTML version deleted]]

Achim Zeileis

2013-Aug-07 17:37 UTC

head link

[R] MOB (party package) Question - Variable Selection

Michael:
> Hi. I am a grad student and I'm currently using the MOB function in the
> R party package and I had a question. I am working on an environmental 
> problem with about 100 predictors. I am having trouble determining which 
> predictors to use for regression and which for partitioning, is there 
> any sort of method to determine this?
That depends a little bit on what exactly you are trying to achieve. When 
we developed MOB, we had the following situation in mind:

- You have some sort of data for which you know from the literature that a 
certain type of model works well. For example, log(y) ~ log(x1) + log(x2) 
or something like that.

- But you also have data on a bunch of other variables that you don't know 
yet how they should enter the model. Often these are categorical variables 
or numerical variables that are not part of the standard theory.

- Then MOB is one possible approach to check whether these additional 
variables affect the basic standard model or not. And by recursive 
partitioning you could capture various types of main and interaction 
effects.

However, if you just have a response variable and a bunch of regressors 
where you don't have much prior knowledge. And you want to select both the 
relevant variables and their functional form, then MOB might help you but 
there might also be other methods that are more natural. For example, GAMs 
or boosting etc.
> Does it cause problems if a variable is used for both regression and 
> partitioning?
In principle, this is possible. Whether or not this is meaningful and/or 
easy to interpret depends on the particular data though.
> I attempted to pre-screen the variables using stepwise linear regression 
> and I used the selected variables for regression and all others for 
> partitioning. However this lead to the model only having one node.
That's not very surprising, is it? You already tried to capture the 
potential influence of all regressors on your response. Of course, MOB 
might have turned up a few additional interactions but I'm not surprised 
if it doesn't.

We've obtained the most useful results when the basic model had relatively 
few parameters and was easy/natural to interpret.

Hope that helps,
Z

R help - Aug 2013 - MOB (party package) Question - Variable Selection

[R] MOB (party package) Question - Variable Selection

[R] MOB (party package) Question - Variable Selection