Displaying 20 results from an estimated 10000 matches similar to: "predict.rpart and large datasets"
2005 Oct 18
1
Memory problems with large dataset in rpart
Dear helpers,
I am a Dutch student from the Erasmus University. For my Bachelor thesis I
have written a script in R using boosting by means of classification and
regression trees. This script uses the function the predefined function
rpart. My input file consists of about 4000 vectors each having 2210
dimensions. In the third iteration R complains of a lack of memory,
although in each iteration
2008 Jul 22
2
rpart$where and predict.rpart
Hello there. I have fitted a rpart model.
> rpartModel <- rpart(y~., data=data.frame(y=y,x=x),method="class", ....)
and can use rpart$where to find out the terminal nodes that each
observations belongs.
Now, I have a set of new data and used predict.rpart which seems to give
only the predicted value with no information similar to rpart$where.
May I know how
2011 Jul 29
1
help with predict.rpart
? data=read.table("http://statcourse.com/research/boston.csv", ,
sep=",", header = TRUE)
? library(rpart)
? fit=rpart (MV~ CRIM+ZN+INDUS+CHAS+NOX+RM+AGE+DIS+RAD+TAX+
PT+B+LSTAT)
predict(fit,data[4,])
plot only reveals part of the tree in contrast to the results on obtains
with CART or C5
-------- Original Message --------
Subject: Re: [R] help with rpart
From: Sarah
2008 Feb 26
1
predict.rpart question
Dear All,
I have a question regarding predict.rpart. I use
rpart to build classification and regression trees and I deal with data with
relatively large number of input variables (predictors). For example, I build an
rpart model like this
rpartModel <- rpart(Y ~ X, method="class",
minsplit =1, minbucket=nMinBucket,cp=nCp);
and get predictors used in building the model like
2009 Jul 26
3
Question about rpart decision trees (being used to predict customer churn)
Hi,
I am using rpart decision trees to analyze customer churn. I am finding that
the decision trees created are not effective because they are not able to
recognize factors that influence churn. I have created an example situation
below. What do I need to do to for rpart to build a tree with the variable
experience? My guess is that this would happen if rpart used the loss matrix
while creating
2005 Oct 08
1
Rpart -- using predict() when missing data is present?
I am doing
> library(rpart)
> m <- rpart("y ~ x", D[insample,])
> D[outsample,]
y x
8 0.78391922 0.579025591
9 0.06629211 NA
10 NA 0.001593063
> p <- predict(m, newdata=D[9,])
Error in model.frame(formula, rownames, variables, varnames, extras, extranames, :
invalid result from na.action
How do I persuade him to give me NA
2006 Apr 07
1
rpart.predict error--subscript out of bounds
Hi,
I am using rpart to do leave one out cross validation, but met some problem,
Data is a data frame, the first column is the subject id, the second column is the group id, and the rest columns are numerical variables,
> Data[1:5,1:10]
sub.id group.id X3262.345 X3277.402 X3369.036 X3439.895 X3886.935 X3939.054 X3953.777 X3970.352
1 32613 HAM_TSP 417.7082 430.4895 619.4776 720.8246
2013 Aug 12
2
Multiple return values / bug in rpart?
In the recommended package rpart (version 4.1-1), the file rpartpl.R
contains the following line:
return(x = x[!erase], y = y[!erase])
AFAIK, returning multiple values like this is not valid R. Is that
correct? I can't seem to make it work in my own code.
It doesn't appear that rpartpl.R is used anywhere, so this may have
never caused an issue. But it's tripping up my R compiler.
2012 May 15
2
rpart - predict terminal nodes for new observations
Dear useRs:
Is there a way I could predict the terminal node associated with a new data
entry in an rpart environment? In the example below, if I had a new data
entry with an AM of 5, I would like to link it to the terminal node 2. My
searches led to http://tolstoy.newcastle.edu.au/R/e4/help/08/07/17702.html
but I do not seem to be able to operationalize Professor Ripley's
suggestions.
Many
2012 Aug 01
1
rpart package: why does predict.rpart require values for "unused" predictors?
After fitting and pruning an rpart model, it is often the case that one or
more of the original predictors is not used by any of the splits of the
final tree. It seems logical, therefore, that values for these "unused"
predictors would not be needed for prediction. But when predict() is called
on such models, all predictors seem to be required. Why is that, and can it
be easily
2008 Jul 31
1
predict rpart: new data has new level
Hi. I uses rpart to build a regression tree. Y is continuous. Now, I try
to predict on a new set of data. In the new set of data, one of my x (call
Incoterm, a factor) has a new level.
I wonder why the error below appears as the guide says "For factor
predictors, if an observation contains a level not used to grow the tree, it
is left at the deepest possible node and
2011 Mar 23
2
predict.rpart help
Hi Everyone,
Is there a way to get predict.rpart() to return the nodes reached by the new examples in addition to the predicted probabilities it already returns? In other words, I would like to know the leaf node in the tree object that each new example data drops down to.
Thanks in advance for your help.
Osei
2009 Feb 03
5
Large file size while persisting rpart model to disk
I am using rpart to build a model for later predictions. To save the
prediction across restarts and share the data across nodes I have been
using "save" to persist the result of rpart to a file and "load" it
later. But the saved size was becoming unusually large (even with
binary, compressed mode). The size was also proportional to the amount
of data that was used to create the
2010 Nov 18
1
predict() an rpart() model: how to ignore missing levels in a factor
I am using an algorigm to split my data set into two random sections
repeatedly and constuct a model using rpart() on one, test on the other and
average out the results.
One of my variables is a factor(crop) where each crop type has a code. Some
crop types occur infrequently or singly. when the data set is randomly
split, it may be that the first data set has a crop type which is not
present in
2005 Oct 14
1
Predicting classification error from rpart
Hi,
I think I'm missing something very obvious, but I am missing it, so I
would be very grateful for help. I'm using rpart to analyse data on
skull base morphology, essentially predicting sex from one or several
skull base measurements. The sex of the people whose skulls are being
studied is known, and lives as a factor (M,F) in the data. I want to
get back predictions of gender, and
2005 Aug 04
1
Puzzled at rpart prediction
I'm in a situation where I say:
> predict(m.rpart, newdata=D[N1+t,])
0 1
173 0.8 0.2
which I interpret as meaning: an 80% chance of "0" and a 20% chance of
"1". Okay. This is consistent with:
> predict(m.rpart, newdata=D[N1+t,], type="class")
[1] 0
Levels: 0 1
But I'm puzzled at the following. If I say:
> predict(m.rpart,
2004 Mar 19
2
Why is rpart() so slow?
I've had rpart running on a problem now for a couple of *days*,
but I'd expect a decision tree builder to run in minutes if not
seconds. Why is rpart slow? Is there anything I can do to make
it quicker?
2018 Aug 14
2
Xenial rpart package on CRAN built with wrong R version?
Hello,
I just upgraded my Ubuntu Xenial system to R 3.5.1 (from 3.4.?) by changing the sources.list entry and doing an "apt-get dist-upgrade". Everything works except loading the rpart package in R:
> library(rpart)
Error: package or namespace load failed for ?rpart?:
package ?rpart? was installed by an R version with different internals; it needs to be reinstalled for use with
2002 Feb 21
2
question regarding to The tree Package for R
I have a problem with running the tree package (dec.8, 2001) for R. The
problem is,
it will only give me 5/6 terminal node and then stop, while using Splus's
tree on the
same data with the same specification give me hundreds of nodes.
Here's a little more background info:
R-1.4.1
Solaris 5.7
rpart (most recent version)
tree (..)
Splus 6.0
Solaris 5.7
tree
2009 Jun 09
3
rpart - the xval argument in rpart.control and in xpred.rpart
Dear R users,
I'm working with the rpart package and want to evaluate the performance of
user defined split functions.
I have some problems in understanding the meaning of the xval argument in
the two functions rpart.control and xpred.rpart. In the former it is defined
as the number of cross-validations while in the latter it is defined as the
number of cross-validation groups. If I am