Dear r-help mailing list, Is there a way to incorporate weights into the minsplit criteria in rpart, when the weights are uneven? I could not find a way for the minsplit threshold to take the weights into account, and when the weights are uneven it becomes an issue, as the following example shows. My current workaround is to expand the data into one in which each row is an observation, but that seems wasteful in both time and memory (and I doubt I can keep the real datasets I need to work with in memory in their expanded form anyway), thus - turning for help. Thanks in advance for your help, -Saar The following code shows what the issue is; the first 3 trees are the same, but the following two (with uneven weights) turn out differently: ## playing with rpart weights require(rpart) dev.new() par(mfrow=c(2,3), xpd=NA) data(kyphosis) fitOriginal <- rpart(Kyphosis ~ Age + Number + Start, data=kyphosis, control=rpart.control(minsplit=15)) plot(fitOriginal) text(fitOriginal, use.n=TRUE) # this dataset is the original data repeated 3 times kyphosisRepeated <- rbind(kyphosis, kyphosis, kyphosis) fitRepeated <- rpart(Kyphosis ~ Age + Number + Start, data=kyphosisRepeated, control=rpart.control(minsplit=45)) plot(fitRepeated) text(fitRepeated, use.n=TRUE) # instead of repeating, use weights kyphosisWeighted <- kyphosis kyphosisWeighted$myWeights <- 3 fitWeighted <- rpart(Kyphosis ~ Age + Number + Start, data=kyphosisWeighted, weights=myWeights, control=rpart.control(minsplit=15)) ## minsplit has to be adjusted for weights... plot(fitWeighted) text(fitWeighted, use.n=TRUE) # uneven weights don't works the same way kyphosisUnevenWeights <- rbind(kyphosis, kyphosis) kyphosisUnevenWeights$myWeights <- c(rep(1,length.out=nrow(kyphosis)), rep(2,length.out=nrow(kyphosis))) fitUneven15 <- rpart(Kyphosis ~ Age + Number + Start, data=kyphosisUnevenWeights, weights=myWeights, control=rpart.control(minsplit=15)) plot(fitUneven15) text(fitUneven15, use.n=TRUE) fitUneven45 <- rpart(Kyphosis ~ Age + Number + Start, data=kyphosisUnevenWeights, weights=myWeights, control=rpart.control(minsplit=45)) plot(fitUneven45) text(fitUneven45, use.n=TRUE) ## 30 works, but seems like a special case fitUneven30 <- rpart(Kyphosis ~ Age + Number + Start, data=kyphosisUnevenWeights, weights=myWeights, control=rpart.control(minsplit=30)) plot(fitUneven30) text(fitUneven30, use.n=TRUE) [[alternative HTML version deleted]]