tudor
2010-Oct-05 13:20 UTC
[R] party with mob - parameter estimates not significant in terminal nodes
Dear useRs: I successfully model-based partitioned several datasets through the use of mob from the party package (thanks Achim et al. once again !!!). At times, however, the partitioning leads to terminal nodes in which the parameter estimates of the model are not significant (although the split points and in general the proposed segmentation both seem reasonable). As I do not seem to be able to come up with an intuitive explanation/interpretation for this (other than that the partitioning model may be appropriate for parts of the dataset(s)), I wonder if any of you could share your thoughts on this topic with me. For your convenience I attached a relevant set of results below. My system: Windowd XP, R 2.11.1, party version 0.9-9997. Thanks. Tudor $`2` Call: NULL Deviance Residuals: Min 1Q Median 3Q Max -2.1613499829328759 -0.1182099512510448 0.0000000000000000 0.1199438072333263 1.7963628663418680 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 38.6736721222665096 5.1182299436934375 7.55606 0.000000000000041545 *** P -3.8195232976021787 0.5042297985419135 -7.57497 0.000000000000035922 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 407.0806101624161 on 293 degrees of freedom Residual deviance: 132.0087256781199 on 292 degrees of freedom AIC: 136.0087256781199 Number of Fisher Scoring iterations: 7 $`3` Call: NULL Deviance Residuals: Min 1Q Median 3Q Max -0.00009134433923085110 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00009204763394325872 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 1755.7555999083327 601505.6700290179579 0.00292 0.99767 P -181.3394660743267 62127.5207770660636 -0.00292 0.99767 (Dispersion parameter for binomial family taken to be 1) Null deviance: 94.20918454290385568583588 on 67 degrees of freedom Residual deviance: 0.00000001683616309495537 on 66 degrees of freedom AIC: 4.000000016836163 Number of Fisher Scoring iterations: 25 -- View this message in context: http://r.789695.n4.nabble.com/party-with-mob-parameter-estimates-not-significant-in-terminal-nodes-tp2955980p2955980.html Sent from the R help mailing list archive at Nabble.com.
Achim Zeileis
2010-Oct-05 13:45 UTC
[R] party with mob - parameter estimates not significant in terminal nodes
Tudor:> I successfully model-based partitioned several datasets through the use > of mob from the party package (thanks Achim et al. once again !!!). At > times, however, the partitioning leads to terminal nodes in which the > parameter estimates of the model are not significant (although the split > points and in general the proposed segmentation both seem reasonable).There are two aspects to this: (1) The algorithm just determines whether the coefficients between two child nodes are significantly different. It may or may not be the case that they are significantly different from zero within each node. As an example: You may have a tree with a single split and two child nodes. In the first child node, you have a highly significant parameter value, but in the second node, you have no significant value. (2) Due to partitioning, it may be the case that not all parameters of the model are identified in all child nodes. Currently, within mob(), this is not systematically checked. In particular, you may have (quasi-)complete separation in binomial GLMs if a child node is particularly "pure". This seems to have happened in your example below. From a machine learning point of view, this is not a bad thing, you just need to interpret it correctly.> As I do not seem to be able to come up with an intuitive > explanation/interpretation for this (other than that the partitioning > model may be appropriate for parts of the dataset(s)), I wonder if any > of you could share your thoughts on this topic with me. For your > convenience I attached a relevant set of results below.I guess that the variable "P" is binary and that when you cross-tabulate it with the response for Node 3, that there are zeros in the contingency table. I.e. you may have a perfect split in that one sub-sample. hth, Z $`2` Call: NULL Deviance Residuals: Min 1Q Median 3Q Max -2.1613499829328759 -0.1182099512510448 0.0000000000000000 0.1199438072333263 1.7963628663418680 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 38.6736721222665096 5.1182299436934375 7.55606 0.000000000000041545 *** P -3.8195232976021787 0.5042297985419135 -7.57497 0.000000000000035922 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 407.0806101624161 on 293 degrees of freedom Residual deviance: 132.0087256781199 on 292 degrees of freedom AIC: 136.0087256781199 Number of Fisher Scoring iterations: 7 $`3` Call: NULL Deviance Residuals: Min 1Q Median 3Q Max -0.00009134433923085110 0.00000000000000000000 0.00000000000000000000 0.00000000000000000000 0.00009204763394325872 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 1755.7555999083327 601505.6700290179579 0.00292 0.99767 P -181.3394660743267 62127.5207770660636 -0.00292 0.99767 (Dispersion parameter for binomial family taken to be 1) Null deviance: 94.20918454290385568583588 on 67 degrees of freedom Residual deviance: 0.00000001683616309495537 on 66 degrees of freedom AIC: 4.000000016836163 Number of Fisher Scoring iterations: 25