similar to: predict.rpart and large datasets

Displaying 20 results from an estimated 10000 matches similar to: "predict.rpart and large datasets"

2005 Oct 18
1
Memory problems with large dataset in rpart
Dear helpers, I am a Dutch student from the Erasmus University. For my Bachelor thesis I have written a script in R using boosting by means of classification and regression trees. This script uses the function the predefined function rpart. My input file consists of about 4000 vectors each having 2210 dimensions. In the third iteration R complains of a lack of memory, although in each iteration
2008 Jul 22
2
rpart$where and predict.rpart
Hello there. I have fitted a rpart model. > rpartModel <- rpart(y~., data=data.frame(y=y,x=x),method="class", ....) and can use rpart$where to find out the terminal nodes that each observations belongs. Now, I have a set of new data and used predict.rpart which seems to give only the predicted value with no information similar to rpart$where. May I know how
2011 Jul 29
1
help with predict.rpart
? data=read.table("http://statcourse.com/research/boston.csv", , sep=",", header = TRUE) ? library(rpart) ? fit=rpart (MV~ CRIM+ZN+INDUS+CHAS+NOX+RM+AGE+DIS+RAD+TAX+ PT+B+LSTAT) predict(fit,data[4,]) plot only reveals part of the tree in contrast to the results on obtains with CART or C5 -------- Original Message -------- Subject: Re: [R] help with rpart From: Sarah
2008 Feb 26
1
predict.rpart question
Dear All, I have a question regarding predict.rpart. I use rpart to build classification and regression trees and I deal with data with relatively large number of input variables (predictors). For example, I build an rpart model like this rpartModel <- rpart(Y ~ X, method="class", minsplit =1, minbucket=nMinBucket,cp=nCp); and get predictors used in building the model like
2009 Jul 26
3
Question about rpart decision trees (being used to predict customer churn)
Hi, I am using rpart decision trees to analyze customer churn. I am finding that the decision trees created are not effective because they are not able to recognize factors that influence churn. I have created an example situation below. What do I need to do to for rpart to build a tree with the variable experience? My guess is that this would happen if rpart used the loss matrix while creating
2005 Oct 08
1
Rpart -- using predict() when missing data is present?
I am doing > library(rpart) > m <- rpart("y ~ x", D[insample,]) > D[outsample,] y x 8 0.78391922 0.579025591 9 0.06629211 NA 10 NA 0.001593063 > p <- predict(m, newdata=D[9,]) Error in model.frame(formula, rownames, variables, varnames, extras, extranames, : invalid result from na.action How do I persuade him to give me NA
2006 Apr 07
1
rpart.predict error--subscript out of bounds
Hi, I am using rpart to do leave one out cross validation, but met some problem, Data is a data frame, the first column is the subject id, the second column is the group id, and the rest columns are numerical variables, > Data[1:5,1:10] sub.id group.id X3262.345 X3277.402 X3369.036 X3439.895 X3886.935 X3939.054 X3953.777 X3970.352 1 32613 HAM_TSP 417.7082 430.4895 619.4776 720.8246
2013 Aug 12
2
Multiple return values / bug in rpart?
In the recommended package rpart (version 4.1-1), the file rpartpl.R contains the following line: return(x = x[!erase], y = y[!erase]) AFAIK, returning multiple values like this is not valid R. Is that correct? I can't seem to make it work in my own code. It doesn't appear that rpartpl.R is used anywhere, so this may have never caused an issue. But it's tripping up my R compiler.
2012 May 15
2
rpart - predict terminal nodes for new observations
Dear useRs: Is there a way I could predict the terminal node associated with a new data entry in an rpart environment? In the example below, if I had a new data entry with an AM of 5, I would like to link it to the terminal node 2. My searches led to http://tolstoy.newcastle.edu.au/R/e4/help/08/07/17702.html but I do not seem to be able to operationalize Professor Ripley's suggestions. Many
2012 Aug 01
1
rpart package: why does predict.rpart require values for "unused" predictors?
After fitting and pruning an rpart model, it is often the case that one or more of the original predictors is not used by any of the splits of the final tree. It seems logical, therefore, that values for these "unused" predictors would not be needed for prediction. But when predict() is called on such models, all predictors seem to be required. Why is that, and can it be easily
2008 Jul 31
1
predict rpart: new data has new level
Hi. I uses rpart to build a regression tree. Y is continuous. Now, I try to predict on a new set of data. In the new set of data, one of my x (call Incoterm, a factor) has a new level. I wonder why the error below appears as the guide says "For factor predictors, if an observation contains a level not used to grow the tree, it is left at the deepest possible node and
2011 Mar 23
2
predict.rpart help
Hi Everyone, Is there a way to get predict.rpart() to return the nodes reached by the new examples in addition to the predicted probabilities it already returns? In other words, I would like to know the leaf node in the tree object that each new example data drops down to. Thanks in advance for your help. Osei
2009 Feb 03
5
Large file size while persisting rpart model to disk
I am using rpart to build a model for later predictions. To save the prediction across restarts and share the data across nodes I have been using "save" to persist the result of rpart to a file and "load" it later. But the saved size was becoming unusually large (even with binary, compressed mode). The size was also proportional to the amount of data that was used to create the
2010 Nov 18
1
predict() an rpart() model: how to ignore missing levels in a factor
I am using an algorigm to split my data set into two random sections repeatedly and constuct a model using rpart() on one, test on the other and average out the results. One of my variables is a factor(crop) where each crop type has a code. Some crop types occur infrequently or singly. when the data set is randomly split, it may be that the first data set has a crop type which is not present in
2005 Oct 14
1
Predicting classification error from rpart
Hi, I think I'm missing something very obvious, but I am missing it, so I would be very grateful for help. I'm using rpart to analyse data on skull base morphology, essentially predicting sex from one or several skull base measurements. The sex of the people whose skulls are being studied is known, and lives as a factor (M,F) in the data. I want to get back predictions of gender, and
2005 Aug 04
1
Puzzled at rpart prediction
I'm in a situation where I say: > predict(m.rpart, newdata=D[N1+t,]) 0 1 173 0.8 0.2 which I interpret as meaning: an 80% chance of "0" and a 20% chance of "1". Okay. This is consistent with: > predict(m.rpart, newdata=D[N1+t,], type="class") [1] 0 Levels: 0 1 But I'm puzzled at the following. If I say: > predict(m.rpart,
2004 Mar 19
2
Why is rpart() so slow?
I've had rpart running on a problem now for a couple of *days*, but I'd expect a decision tree builder to run in minutes if not seconds. Why is rpart slow? Is there anything I can do to make it quicker?
2018 Aug 14
2
Xenial rpart package on CRAN built with wrong R version?
Hello, I just upgraded my Ubuntu Xenial system to R 3.5.1 (from 3.4.?) by changing the sources.list entry and doing an "apt-get dist-upgrade". Everything works except loading the rpart package in R: > library(rpart) Error: package or namespace load failed for ?rpart?: package ?rpart? was installed by an R version with different internals; it needs to be reinstalled for use with
2002 Feb 21
2
question regarding to The tree Package for R
I have a problem with running the tree package (dec.8, 2001) for R. The problem is, it will only give me 5/6 terminal node and then stop, while using Splus's tree on the same data with the same specification give me hundreds of nodes. Here's a little more background info: R-1.4.1 Solaris 5.7 rpart (most recent version) tree (..) Splus 6.0 Solaris 5.7 tree
2009 Jun 09
3
rpart - the xval argument in rpart.control and in xpred.rpart
Dear R users, I'm working with the rpart package and want to evaluate the performance of user defined split functions. I have some problems in understanding the meaning of the xval argument in the two functions rpart.control and xpred.rpart. In the former it is defined as the number of cross-validations while in the latter it is defined as the number of cross-validation groups. If I am