Weidong Gu
2009-Feb-23 22:38 UTC
[R] why results from regression tree (rpart) are totally inconsistent with ordinary regression
Hi, In my analysis of impacts of insecticide-treated bednets on malaria, I look at the relationship between malaria incidence and mosquito behaviors. The condensed data set is copied here. Ordinary regression (lm) shows that Incidence was negatively related to Mortality. This makes sense because the latter reflected the strength of killing mosquitoes by insecticide-treated nets. Since the original data set has a complex structure with more parameters and scenarios. I guess a tree model would help explore the structure of the data. However, regression tree (rpart(Incidence~Mortality+Deterrence)) indicates that Mortality was positively related to Incidence. How this unintuitive result? Advice is appreciated. Weidong Gu, Department of Medicine University of Alabama, Birmingham Deterrence Mortality Incidence 0.695 0.51 66 0.255 0.501 48 0.612 0.483 55 0.209 0.158 47 0.499 0.589 53 0.755 0.285 73 0.764 0.351 77 0.749 0.211 64 0.101 0.336 45 0.556 0.066 72 0.576 0.403 45 0.232 0.667 35 0.424 0.891 34 0.432 0.458 54 0.197 0.269 59 0.188 0.523 40 0.291 0.864 32 0.504 0.791 36 0.387 0.138 66 0.71 0.676 56 0.235 0.183 59 0.358 0.579 41 0.718 0.57 49 0.775 0.254 46 0.269 0.633 42 0.443 0.741 40 0.28 0.438 49 0.385 0.778 37 0.539 0.653 37 0.73 0.094 84 0.489 0.611 40 0.595 0.431 39 0.305 0.003 69 0.511 0.595 37 0.394 0.798 37 0.369 0.541 47 0.414 0.552 51 0.468 0.858 34 0.311 0.201 59 0.142 0.36 43 0.514 0.195 46 0.365 0.325 48 0.608 0.224 67 0.177 0.04 62 0.475 0.146 65 0.526 0.702 46 0.735 0.372 43 0.172 0.66 36 0.622 0.531 53 0.651 0.055 76 0.223 0.296 54 0.783 0.566 52 0.439 0.698 34 0.527 0.493 41 0.766 0.89 49 0.634 0.749 42 0.24 0.732 35 0.792 0.764 36 0.268 0.823 34 0.418 0.407 53 0.251 0.241 54 0.705 0.843 40 0.546 0.474 55 0.685 0.384 62 0.582 0.086 72 0.63 0.618 57 0.131 0.028 56 0.555 0.803 41 0.463 0.299 57 0.154 0.164 55 0.406 0.074 66 0.168 0.118 58 0.597 0.323 47 0.672 0.816 42 0.698 0.623 48 0.676 0.177 43 0.743 0.109 81 0.121 0.244 49 0.799 0.014 85 0.45 0.645 36 0.484 0.448 52 0.585 0.307 68 0.348 0.417 43 0.345 0.459 44 0.374 0.835 30 0.657 0.134 65 0.331 0.022 67 0.141 0.045 66 0.568 0.1 67 0.11 0.876 30 0.212 0.39 46 0.298 0.519 40 0.322 0.721 44 0.201 0.77 35 0.641 0.855 39 0.156 0.277 48 0.327 0.714 40 0.663 0.231 44 0.119 0.688 37 0.287 0.354 46
Bert Gunter
2009-Feb-23 23:14 UTC
[R] why results from regression tree (rpart) are totallyinconsistent with ordinary regression
You did not read the tree graph correctly. Mortality is **not** "positively related" to incidence. You're reading the tree backwards. Read the output of summary() on your rpart fit object for clarity. -- Bert Gunter Genentech -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Weidong Gu Sent: Monday, February 23, 2009 2:39 PM To: r-help at r-project.org Subject: [R] why results from regression tree (rpart) are totallyinconsistent with ordinary regression Hi, In my analysis of impacts of insecticide-treated bednets on malaria, I look at the relationship between malaria incidence and mosquito behaviors. The condensed data set is copied here. Ordinary regression (lm) shows that Incidence was negatively related to Mortality. This makes sense because the latter reflected the strength of killing mosquitoes by insecticide-treated nets. Since the original data set has a complex structure with more parameters and scenarios. I guess a tree model would help explore the structure of the data. However, regression tree (rpart(Incidence~Mortality+Deterrence)) indicates that Mortality was positively related to Incidence. How this unintuitive result? Advice is appreciated. Weidong Gu, Department of Medicine University of Alabama, Birmingham Deterrence Mortality Incidence 0.695 0.51 66 0.255 0.501 48 0.612 0.483 55 0.209 0.158 47 0.499 0.589 53 0.755 0.285 73 0.764 0.351 77 0.749 0.211 64 0.101 0.336 45 0.556 0.066 72 0.576 0.403 45 0.232 0.667 35 0.424 0.891 34 0.432 0.458 54 0.197 0.269 59 0.188 0.523 40 0.291 0.864 32 0.504 0.791 36 0.387 0.138 66 0.71 0.676 56 0.235 0.183 59 0.358 0.579 41 0.718 0.57 49 0.775 0.254 46 0.269 0.633 42 0.443 0.741 40 0.28 0.438 49 0.385 0.778 37 0.539 0.653 37 0.73 0.094 84 0.489 0.611 40 0.595 0.431 39 0.305 0.003 69 0.511 0.595 37 0.394 0.798 37 0.369 0.541 47 0.414 0.552 51 0.468 0.858 34 0.311 0.201 59 0.142 0.36 43 0.514 0.195 46 0.365 0.325 48 0.608 0.224 67 0.177 0.04 62 0.475 0.146 65 0.526 0.702 46 0.735 0.372 43 0.172 0.66 36 0.622 0.531 53 0.651 0.055 76 0.223 0.296 54 0.783 0.566 52 0.439 0.698 34 0.527 0.493 41 0.766 0.89 49 0.634 0.749 42 0.24 0.732 35 0.792 0.764 36 0.268 0.823 34 0.418 0.407 53 0.251 0.241 54 0.705 0.843 40 0.546 0.474 55 0.685 0.384 62 0.582 0.086 72 0.63 0.618 57 0.131 0.028 56 0.555 0.803 41 0.463 0.299 57 0.154 0.164 55 0.406 0.074 66 0.168 0.118 58 0.597 0.323 47 0.672 0.816 42 0.698 0.623 48 0.676 0.177 43 0.743 0.109 81 0.121 0.244 49 0.799 0.014 85 0.45 0.645 36 0.484 0.448 52 0.585 0.307 68 0.348 0.417 43 0.345 0.459 44 0.374 0.835 30 0.657 0.134 65 0.331 0.022 67 0.141 0.045 66 0.568 0.1 67 0.11 0.876 30 0.212 0.39 46 0.298 0.519 40 0.322 0.721 44 0.201 0.77 35 0.641 0.855 39 0.156 0.277 48 0.327 0.714 40 0.663 0.231 44 0.119 0.688 37 0.287 0.354 46 ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.