thr3ads.net - similar to: "predict rpart newdata - introduce only values variables used in the tree"

Displaying 20 results from an estimated 8000 matches similar to: "predict rpart newdata - introduce only values variables used in the tree"

assign same legend colors than in the grouped data plot

2012 Feb 15

assign same legend colors than in the grouped data plot

Dear community, I've plotted data and coloured depending on the factor variable v3. In the legend, I'd like to assign properly the same colors than in the factor (the factor has 5 levels). I've been trying this but it doesn't work. plot(var1, var2, xlab = "var1", ylab = "var2", col =var3 , bty='L') legend(locator(1),c("level 1 var3",

Difference between "tree" and "rpart"

2005 May 04

Difference between "tree" and "rpart"

In the help for rpart it says, "This differs from the tree function mainly in its handling of surrogate variables." And it says that an rpart object is a superset of a tree object. Both cite Brieman et al. 1984. Both call external code which looks like martian poetry to me. I've seen posts in the archives where BDR, and other knowledgeable folks, have said that rpart() is to be

outlier identify in qqplot

2011 Nov 16

outlier identify in qqplot

Dear Community, I want to identify outliers in my data. I don't know how to use identify command in the plots obtained. I've gone through help files and use mahalanobis example for my purpose: NormalMultivarianteComparefunc <- function(x) { Sx <- cov(x) D2 <- mahalanobis(x, colMeans(x), Sx) plot(density(D2, bw=.5), main="Squared Mahalanobis distances, n=nrow(x),

rpart

2013 Jan 27

rpart

Hi, When I look at the summary of an rpart object run on my data, I get 7 nodes but when I plot the rpart object, I get only 3 nodes. Should the number of nodes not match in the results of the 2 functions (summary and plot) or it is not always the same? Look forward to your reply, Carol -------------------------------------------- ?summary(rpart.res) Call: rpart(formula = mydata$class ~ ., data

How to measure/rank ?variable importance when using rpart?

2011 Jan 24

How to measure/rank ?variable importance when using rpart?

--- included message ---- Thus, my question is: *What common measures exists for ranking/measuring variable importance of participating variables in a CART model? And how can this be computed using R (for example, when using the rpart package)* ---end ---- Consider the following printout from rpart summary(rpart(time ~ age + ph.ecog + pat.karno, data=lung)) Node number 1: 228 observations,

rpart package: why does predict.rpart require values for "unused" predictors?

2012 Aug 01

rpart package: why does predict.rpart require values for "unused" predictors?

After fitting and pruning an rpart model, it is often the case that one or more of the original predictors is not used by any of the splits of the final tree. It seems logical, therefore, that values for these "unused" predictors would not be needed for prediction. But when predict() is called on such models, all predictors seem to be required. Why is that, and can it be easily

help with rpart

2008 May 12

help with rpart

Hi, I am using rpart as a part of my masters' project. I am trying to print out the resulting model using plot() function along with text() function. I am having difficulties with labels being cut-off. In text() function, I am using use.n=T option to get the number of people in each nodes but the on the lower and left part of the plot, the numbers get cut off. Thanks! Linus [[alternative

rpart$where and predict.rpart

2008 Jul 22

rpart$where and predict.rpart

Hello there. I have fitted a rpart model. > rpartModel <- rpart(y~., data=data.frame(y=y,x=x),method="class", ....) and can use rpart$where to find out the terminal nodes that each observations belongs. Now, I have a set of new data and used predict.rpart which seems to give only the predicted value with no information similar to rpart$where. May I know how

predict rpart: new data has new level

2008 Jul 31

predict rpart: new data has new level

Hi. I uses rpart to build a regression tree. Y is continuous. Now, I try to predict on a new set of data. In the new set of data, one of my x (call Incoterm, a factor) has a new level. I wonder why the error below appears as the guide says "For factor predictors, if an observation contains a level not used to grow the tree, it is left at the deepest possible node and

help with predict.rpart

2011 Jul 29

help with predict.rpart

? data=read.table("http://statcourse.com/research/boston.csv", , sep=",", header = TRUE) ? library(rpart) ? fit=rpart (MV~ CRIM+ZN+INDUS+CHAS+NOX+RM+AGE+DIS+RAD+TAX+ PT+B+LSTAT) predict(fit,data[4,]) plot only reveals part of the tree in contrast to the results on obtains with CART or C5 -------- Original Message -------- Subject: Re: [R] help with rpart From: Sarah

predict.rpart question

2008 Feb 26

predict.rpart question

Dear All, I have a question regarding predict.rpart. I use rpart to build classification and regression trees and I deal with data with relatively large number of input variables (predictors). For example, I build an rpart model like this rpartModel <- rpart(Y ~ X, method="class", minsplit =1, minbucket=nMinBucket,cp=nCp); and get predictors used in building the model like

Rpart -- using predict() when missing data is present?

2005 Oct 08

Rpart -- using predict() when missing data is present?

I am doing > library(rpart) > m <- rpart("y ~ x", D[insample,]) > D[outsample,] y x 8 0.78391922 0.579025591 9 0.06629211 NA 10 NA 0.001593063 > p <- predict(m, newdata=D[9,]) Error in model.frame(formula, rownames, variables, varnames, extras, extranames, : invalid result from na.action How do I persuade him to give me NA

cv.lm syntax error

2011 Mar 04

cv.lm syntax error

Dear all, I've tried a multiple regression, and now I want to try a cross-validation. I obtain this error (it must be sth related to df) that I don't understand, any help would be appreciated. cv.lm(df= dat, lm2.52f, m=3) Error en `[.data.frame`(df, , ynam) : undefined columns selected lm2.52f is my lm object, dat is a dataframe where the variables involved in .lm are I tried CVlm

prune in rpart: choose number terminal nodes

2012 Sep 21

prune in rpart: choose number terminal nodes

Dear community, I've an rpart object, and I know the CP I want. I'd like to know if it's possible also to fix the number of terminal nodes I want. Thanks in advance, user at host.com as user at host.com -- View this message in context: http://r.789695.n4.nabble.com/prune-in-rpart-choose-number-terminal-nodes-tp4643837.html Sent from the R help mailing list archive at

bug in rpart?

2009 May 22

bug in rpart?

Greetings, I checked the Indian diabetes data again and get one tree for the data with reordered columns and another tree for the original data. I compared these two trees, the split points for these two trees are exactly the same but the fitted classes are not the same for some cases. And the misclassification errors are different too. I know how CART deal with ties --- even we are using the

oblique.tree : the predict function asserts the dependent variable to be included in "newdata"

2012 Nov 01

oblique.tree : the predict function asserts the dependent variable to be included in "newdata"

Dear R community, I have recently discovered the package oblique.tree and I must admit that it was a nice surprise for me, since I have actually made my own version of a kind of a classifier which uses the idea of oblique splits (splits by means of hyperplanes). So I am now interested in comparing these two classifiers. But what I do not seem to understand is why the function

predict() an rpart() model: how to ignore missing levels in a factor

2010 Nov 18

predict() an rpart() model: how to ignore missing levels in a factor

I am using an algorigm to split my data set into two random sections repeatedly and constuct a model using rpart() on one, test on the other and average out the results. One of my variables is a factor(crop) where each crop type has a code. Some crop types occur infrequently or singly. when the data set is randomly split, it may be that the first data set has a crop type which is not present in

rpart on Alpha under OSF

1999 Dec 23

rpart on Alpha under OSF

Running on an Alpha machine which reports (uname -a) OSF1 bsdx01.bs.ehu.es V4.0 878 alpha and using the binary distribution put together by Albrecht Gebhardt (in http://cran.at.r-project.org/bin/osf/osf4.0/tar/alpha_ev5/) I obtain core dumps whenever I try to use package rpart. I have R REMOVE'd the rpart package, downloaded the source rpart_1.0-7.tar from CRAN and

NA's when subset in a dataframe

2012 May 03

NA's when subset in a dataframe

Dear community, I'm having this silly problem. I've a linear model. After fixing it, I wanted to know which data had studentized residuals larger than 3, so i tried this: d1 <- cooks.distance(lmmodel) r <- sqrt(abs(rstandard(lmmodel))) rstu <- abs(rstudent(lmmodel)) a <- cbind( mydata, d1, r,rstu) alargerthan3 <- a[rstu >3, ] And suddenly a[rstu >3, ] has

Missing value in Rpart

2001 Aug 02

Missing value in Rpart

Hi, all Our understanding of how classification trees in Rpart treat missing is that if the variable is ordinal(continous), Rpart, by default, imputes a value for missing. How do we do the classification tree and tell Rpart not to impute. That is, what command is used to turn off the imputation. Also, if we do get true missing, how does classification tree analysis in Rpart treat missing when

similar to: predict rpart newdata - introduce only values variables used in the tree