search for: minbucket

Displaying 18 results from an estimated 18 matches for "minbucket".

2005 Dec 07
0
Are minbucket and minsplit rpart options working as expected?
...ffice>=0.5 148 35 0 (0.76351351 0.23648649) * 63) back_office< 0.5 250 109 1 (0.43600000 0.56400000) * So I decide not to consider branches with less than 1000 observations, a 1% of the original number of observations. Therefore, according to the rpart.control help pages, I set minbucket=1000. However, > arbol.bsvg.02 n= 100000 node), split, n, loss, yval, (yprob) * denotes terminal node 1) root 100000 6657 0 (0.9334300 0.0665700) * And I get an "empty" tree. But there were branches in the original tree with more than 1000 observations. Something similar happ...
2010 May 26
1
how to Store loop output from a function
...t;,"position", "acid_per", "base_per", "charge_per", "hydrophob_per", "polar_per", "out") train<-train[myvar] # update training set valid<-valid[myvar] control<-rpart.control(xval=10, cp=0.01, minsplit=5, minbucket=5) #control the size of the initial tree tree.fit <- rpart(out ~ ., method="class", data=train, control=control) # model fitting p.tree<- prune(tree.fit, cp=tree.fit$cptable[which.min(tree.fit$cptable[,"xerror"]),"CP"]) # prune the tree #get the p...
2007 Jan 03
1
User defined split function in Rpart
...ame (note that x's values are already sorted) > D y x 1 7 0.428 2 3 0.876 3 1 1.467 4 6 1.492 5 3 1.703 6 4 2.406 7 8 2.628 8 6 2.879 9 5 3.025 10 3 3.494 11 2 3.496 12 6 4.623 13 4 4.824 14 6 4.847 15 2 6.234 16 7 7.041 17 2 8.600 18 4 9.225 19 5 9.381 20 8 9.986 Running rpart and setting minbucket=1 and maxdepth=1 we get the following tree (which uses, by default, deviance): > rpart(D$y~D$x,control=rpart.control(minbucket=1,maxdepth=1)) n= 20 node), split, n, deviance, yval * denotes terminal node 1) root 20 84.80000 4.600000 2) D$x< 9.6835 19 72.63158 4.421053 * 3) D$x>=9....
2007 Dec 10
1
Multiple Reponse CART Analysis
...taxa abundances (prey.data) Neither rpart or mvpart seem to allow me to do multiple responses. (Or if they can, I'm not using the functions properly.) > library(rpart) > turtle.rtree<-rpart(prey.data~., data=turtle.data$Clength, method="anova", maxsurrogate=0, minsplit=8, minbucket=4, xval=10); plot(turtle.rtree); text(turtle.rtree) Error in terms.formula(formula, data = data) : '.' in formula and no 'data' argument When I switch response for predictor, it works. But this is the opposite of what I wanted to test and gives me splits at abundance valu...
2008 Feb 26
1
predict.rpart question
...have a question regarding predict.rpart. I use rpart to build classification and regression trees and I deal with data with relatively large number of input variables (predictors). For example, I build an rpart model like this rpartModel <- rpart(Y ~ X, method="class", minsplit =1, minbucket=nMinBucket,cp=nCp); and get predictors used in building the model like this colnamesUsed<-unique(rownames(rpartModel$splits)); When later I apply the rpart model to predict the new data I strip the input data from unneccessary columns and only use X columns that exist in colnamesUsed. U...
2012 Jan 19
1
ctree question
Hello. I have used the "party" package to generate a regression tree as follows: >origdata<-read.csv("origdata.csv") >ctrl<-ctree_control(mincriterion=0.99,maxdepth=10,minbucket=10) >test.ct<-ctree(Y~X1+X2+X3,data=origdata,control=ctrl) The above works fine. Orig data was my training data. I now have a test data file (testdata), and would like to run the testdata through the above tree to see predictions. I tried using the following function >p...
2006 Apr 07
1
rpart.predict error--subscript out of bounds
...8 552.3632 719.9989 1306.6299 446.6184 1352.9955 867.4219 5 32629 HAM_TSP 898.8879 640.2680 342.5477 386.5816 811.6709 518.0244 715.9886 441.1622 Example, I use the first sample as test set, the rest as training set > fit <- rpart(as.factor(Data[-1,2]) ~., Data[-1, -c(1:2 ) ], minbucket=2 ) > predict(fit, Data[1,],type='prob') Error in predict.rpart(fit, Data[1, ]) : subscript out of bounds but when I changes the parameter of type into 'class' it works well > predict(fit, Data[1,-c(1:2)],type='class') [1] HTLV_Carrier Levels: HAM_TSP HTLV_Carrier...
2008 Jul 31
1
predict rpart: new data has new level
...servation contains a level not used to grow the tree, it is left at the deepest possible node and frame$yval at the node is the prediction. " Many thanks. > mod <- rpart(y~., data=data.frame(y=y,x=x), method="anova", + cp=0.05, minsplit=100, minbucket=50, maxdepth=5) > predictLost <- predict(mod, newdata=data.frame(y=yLost, x=xLost), type="vector") Error in model.frame.default(Terms, newdata, na.action = act, xlev = attr(object, : factor 'x.Incoterm' has new level(s) MTD ---- Chua Siang Li...
2001 Jul 02
1
text.rpart: Unwanted NA labels on terminal nodes (PR#1009)
...;- factor(paste("Leaf", 1:5)) Node <- factor(1:5) assign("tree.df", data.frame(Criterion = Criterion, Node = Node)) nobs <- dim(tree.df)[[1]] u.tree <- rpart(Node ~ Criterion, data = tree.df, all = F, control = list(minsplit = 2, minbucket = 1, cp = 9.999999999999998e-008)) plot(u.tree, uniform=T) text(u.tree) --please do not edit the information below-- Version: platform = i386-pc-mingw32 arch = x86 os = Win32 system = x86, Win32 status = major = 1 minor = 3.0 year = 2001 m...
2011 Feb 10
2
R 2.12.1 Windows 32bit and 64bit - are numerical differences expected?
...it to a few simple lines of code to replicate the differences (but had to stay with the weather dataset from rattle since could not replicate on standard datasets yet). library(rpart) library(rattle) set.seed(41) model <- rpart(RainTomorrow ~ ., data=weather[-c(1, 2, 23)], control=rpart.control(minbucket=0)) print(model$cptable) Final row on 32bit: 9 0.01000000 23 0.1515152 1.1060606 0.1158273 Final row on 64bit: 9 0.01000000 23 0.1515152 1.0909091 0.1152273 Pretty minor, but different. I've not found any seed other than 41 (only tried a few) that results in a difference. library(ada...
2002 Feb 13
0
tree size in rpart()
Dear all, I know in rpart(), one can control the tree size (i.e. number of terminal nodes) through rpart.control(), e.g. minsplit, minbucket, maxdepth etc. But is there any more direct way to specify the number of terminal nodes when rpart() does the recursive partitioning? Your help is highly appreciated! Regards, -Ji -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http:/...
2010 Feb 03
0
mboost: how to implement cost-sensitive boosting family
...model.blackboost <- blackboost(tr[,1:DIM], tr.y, family=CSAdaExp, weights=NULL, control=boost_control(mstop=MSTOP, nu=0.1,savedata=TRUE,save_ensembless=TRUE,trace=TRUE), tree_controls=ctree_control(teststat = "max",testtype = "Teststatistic",mincriterion = 0,minsplit = 2000, minbucket = 700,maxdepth = TREEDEPTH)); -------------------------------- regards, Yuchun Tang, Ph.D. Principal Engineer, Lead McAfee, Inc. 4501 North Point Parkway Suite 300 Alpharetta, GA 30022 Main: 770.776.2685 www.mcafee.com www.trustedsource.org www.linkedin.com/in/yuchuntang
2011 May 20
0
RPART error
...ctly produce output.? The second (problem.csv), with only one additional record, will return the error message and no output.? I am running R 2.13.0 on a Windows XP platform.? To reproduce the problem: library(rpart) data <- read.csv("problem.csv", header=T)control=rpart.control(minbucket=10) x <- rpart(cad~v1+v2+v3+v4+v5+v6+v7+v8+v9+v10,data=data, method = "class", control=control) summary(x) Similar code run on "noproblem.csv" will not produce the error. Any suggestions on how to proceed to debug this issue would be greatly appreciated.? I am a novice R...
2012 Apr 24
0
mvpart versus SPSS
...ild nodes. Now we would like to proceed with fitting a multivariate tree. We only used pruning by the way, no v-fold cross validation afterwards. Using the aforementioned criteria in univariate analyses resulted in relatively large trees in SPSS, but using mvpart with xv=1se, cp=0.000001,minsplit=5,minbucket=3 resulted in a tree with only 1 or 2 splits. This makes us wonder what causes this dramatic difference in the tree size produced by SPSS vs. mvpart. If I use the "pick" option in mvpart I am able to "pick" the SPSS-tree, but the X-val Relative Error is quite large. The plot loo...
2000 Mar 27
1
Behavior different inside function?
...ste(fn, sesnum, ".ps", sep="", collapse=NULL) fm1 <- as.formula(fm) ds <- read.table(file=dsn, header=T) rownames(ds) <- ds$unit nmavgres <- ds$mavgres * 1000 nravgres <- ds$ravgres * 1000 ds.mrpt <- rpart(formula=fm1, data=ds, control=rpart.control(minbucket=20)) plot(prune(ds.mrpt, cp=0.018)) text(prune(ds.mrpt, cp=0.018), digits=2) wait() plotcp(ds.mrpt) wait() post.rpart(prune(ds.mrpt, cp=0.018), title=c(tit, paste("SES Quartile", sesnum, sep=" ", collapse=NULL)), pretty=0, filename=psn, hori...
2012 Jul 06
2
Plotting rpart trees with long list of class members
I have a class with 732 members, so using rpart.plot is giving me a tiny plot in the middle of the window. Is there a good way to modify the plot, or replace the long list with something like "group1"? -- View this message in context: http://r.789695.n4.nabble.com/Plotting-rpart-trees-with-long-list-of-class-members-tp4635671.html Sent from the R help mailing list archive at
2005 May 25
0
Error with user defined split function in rpart (PR#7895)
...found by lining the groups up in this order # and going from left to right, so that only m-1 splits need to # be evaluated rather than 2^(m-1) # goodness = m-1 values, as before. # # The reason for returning a vector of goodness is that the C routine # enforces the "minbucket" constraint. It selects the best return value # that is not too close to an edge. temp2 <- function(y, wt, x, parms, continuous) { print("***** START: TEMP2 *****"); n <- length(y) # For binary y, get P(Y=0)/n and P(Y=1)/n at each split temp <- cumsum(y...
2007 Feb 27
3
rpart minimum sample size
Is there an optimal / minimum sample size for attempting to construct a classification tree using /rpart/? I have 27 seagrass disturbance sites (boat groundings) that have been monitored for a number of years. The monitoring protocol for each site is identical. From the monitoring data, I am able to determine the level of recovery that each site has experienced. Recovery is our