Carlos M. Zambrana-Torrelio
2009-Oct-19 19:46 UTC
[R] Random Forest - partial dependence plot
Hi everybody, I used random forest regression to explain the patterns of species richness and a bunch of climate variables (e.g. Temperature, precipitation, etc.) All are continuos variables. My results are really interesting and my model explained 96,7% of the variance. Now I am trying to take advantage of the importance variable function and depicts the observed patterns using partial dependence plots. However, I found a really strange (at least for me...) behavior: the species number ranges between 1 to 150, but when I make the partial plot the graphic only represent values between 43 to 50!! I use the following code to get the partial plot: partialPlot(ampric.rf, amp.data, "Temp") where ampric.rf is the random forest object; amp.data are the data and Temp is the variable I am interested. How I can have partial plot explaining all species number (from 1 to 150)?? Also, I read the RF documentation and I was wondering what its the meaning of "marginal effect of a variable" Thanks for your help Carlos I found really interesting -- Carlos M. Zambrana-Torrelio Department of Biology University of Puerto Rico - RP PO BOX 23360 San Juan, PR 00931-3360
Are you talking about the y-axis or the x-axis? If you're talking about the y-axis, that range isn't really very meaningful. The partial dependence function basically gives you the "average" trend of that variable (integrating out all others in the model). It's the shape of that trend that is "important". You may interpret the relative range of these plots from different predictor variables, but not the absolute range. Hope that helps. Andy> -----Original Message----- > From: r-help-bounces at r-project.org > [mailto:r-help-bounces at r-project.org] On Behalf Of Carlos M. > Zambrana-Torrelio > Sent: Monday, October 19, 2009 3:47 PM > To: r-help at r-project.org > Subject: [R] Random Forest - partial dependence plot > > Hi everybody, > > I used random forest regression to explain the patterns of species > richness and a bunch of climate variables (e.g. Temperature, > precipitation, etc.) All are continuos variables. My results are > really interesting and my model explained 96,7% of the variance. > > Now I am trying to take advantage of the importance variable > function and depicts the observed patterns using partial dependence > plots. > > However, I found a really strange (at least for me...) behavior: the > species number ranges between 1 to 150, but when I make the partial > plot the graphic only represent values between 43 to 50!! > > > I use the following code to get the partial plot: > > partialPlot(ampric.rf, amp.data, "Temp") > > where ampric.rf is the random forest object; amp.data are the data and > Temp is the variable I am interested. > > How I can have partial plot explaining all species number > (from 1 to 150)?? > Also, I read the RF documentation and I was wondering what its the > meaning of "marginal effect of a variable" > > Thanks for your help > > Carlos > > > > I found really interesting > > -- > Carlos M. Zambrana-Torrelio > Department of Biology > University of Puerto Rico - RP > PO BOX 23360 > San Juan, PR 00931-3360 > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >Notice: This e-mail message, together with any attachme...{{dropped:12}}