Dear R community, I am trying to write my own user defined split function for rpart. I read the example in the tests directory and I understand the general idea of the how to implement user defined splitting functions. However, I am having troubles with addressing the data frame used in calling rpart in my split functions. For example, in the evaluation function that is called once per node, I want to fit a proportional odds model to the data in the node and use its deviance as node deviance: evalf <- function(y,x,parms) { pomnode<-polr(dataframe$y~dataframe$x,dataframe,weights=dataframe$Freq) more code } The dataframe used in the polr call should be the data of the current node. How can I address the data of the current node and assign it to the dataframe? Thank you for your help, Tobias Guennel [[alternative HTML version deleted]]
Dear R community, I am trying to write my own user defined split function for rpart. I read the example in the tests directory and I understand the general idea of the how to implement user defined splitting functions. However, I am having troubles with addressing the data frame used in calling rpart in my split functions. For example, in the evaluation function that is called once per node, I want to fit a proportional odds model to the data in the node and use its deviance as node deviance: evalf <- function(y,x,parms) { ?????? pomnode<-polr(dataframe$y~dataframe$x,dataframe,weights=dataframe$Freq) more code } The dataframe used in the polr call should be the data of the current node. How can I address the data of the current node and assign it to the dataframe? Thank you for your help, Tobias Guennel
Maybe I should explain my Problem a little bit more detailed. The rpart package allows for user defined split functions. An example is given in the source/test directory of the package as usersplits.R. The comments say that three functions have to be supplied: 1. "The 'evaluation' function. Called once per node. Produce a label (1 or more elements long) for labeling each node, and a deviance." 2. The split function, where most of the work occurs. Called once per split variable per node. 3. The init function: fix up y to deal with offsets return a dummy parms list numresp is the number of values produced by the eval routine's "label". I have altered the evaluation function and the split function for my needs. Within those functions, I need to fit a proportional odds model to the data of the current node. I am using the polr() routine from the MASS package to fit the model. Now my problem is, how can I call the polr() function only with the data of the current node. That's what I tried so far: evalfunc <- function(y,x,parms,data) { pomnode<-polr(data$y~data$x,data,weights=data$Freq) parprobs<-predict(pomnode,type="probs") dev<-0 K<-dim(parprobs)[2] N<-dim(parprobs)[1]/K for(i in 1:N){ tempsum<-0 Ni<-0 for(l in 1:K){ Ni<-Ni+data$Freq[K*(i-1)+l] } for(j in 1:K){ tempsum<-tempsum+data$Freq[K*(i-1)+j]/Ni*log(parprobs[i,j]*Ni/data$Freq[K*(i -1)+j]) } dev=dev+Ni*tempsum } dev=-2*dev wmean<-1 list(label= wmean, deviance=dev) } I get the error: Error in eval(expr, envir, enclos) : argument "data" is missing, with no default How can I use the data of the current node? Thank you Tobias Guennel
I have made some progress with the user defined splitting function and I got a lot of the things I needed to work. However, I am still stuck on accessing the node data. It would probably be enough if somebody could tell me, how I can access the original data frame of the call to rpart. So if the call is: fit0 <- rpart(Sat ~Infl +Cont+ Type, housing, control=rpart.control(minsplit=10, xval=0), method=alist) how can I access the housing data frame within the user defined splitting function? Any input would be highly appreciated! Thank you Tobias Guennel -----Original Message----- From: Tobias Guennel [mailto:tguennel at vcu.edu] Sent: Monday, February 19, 2007 3:40 PM To: 'r-help at stat.math.ethz.ch' Subject: [R] User defined split function in rpart Maybe I should explain my Problem a little bit more detailed. The rpart package allows for user defined split functions. An example is given in the source/test directory of the package as usersplits.R. The comments say that three functions have to be supplied: 1. "The 'evaluation' function. Called once per node. Produce a label (1 or more elements long) for labeling each node, and a deviance." 2. The split function, where most of the work occurs. Called once per split variable per node. 3. The init function: fix up y to deal with offsets return a dummy parms list numresp is the number of values produced by the eval routine's "label". I have altered the evaluation function and the split function for my needs. Within those functions, I need to fit a proportional odds model to the data of the current node. I am using the polr() routine from the MASS package to fit the model. Now my problem is, how can I call the polr() function only with the data of the current node. That's what I tried so far: evalfunc <- function(y,x,parms,data) { pomnode<-polr(data$y~data$x,data,weights=data$Freq) parprobs<-predict(pomnode,type="probs") dev<-0 K<-dim(parprobs)[2] N<-dim(parprobs)[1]/K for(i in 1:N){ tempsum<-0 Ni<-0 for(l in 1:K){ Ni<-Ni+data$Freq[K*(i-1)+l] } for(j in 1:K){ tempsum<-tempsum+data$Freq[K*(i-1)+j]/Ni*log(parprobs[i,j]*Ni/data$Freq[K*(i -1)+j]) } dev=dev+Ni*tempsum } dev=-2*dev wmean<-1 list(label= wmean, deviance=dev) } I get the error: Error in eval(expr, envir, enclos) : argument "data" is missing, with no default How can I use the data of the current node? Thank you Tobias Guennel
Maybe Matching Threads
- Two-argument functions in tapply()
- Winsorisation function
- Nested variance-covariance matrix in Multilevel model
- suggestions argument in rbga function in genalg package
- [LLVMdev] Fail to load a pointer to a function inside MCJIT-ed code when it is reload from ObjectCache