Christiaan Pauw
2013-Oct-16 18:18 UTC
[R] Extract a predictors form constparty object (CHAID output) in R
I have a large dataset (questionnaire results) of mostly categorical variables. I have tested for dependency between the variables using chi-square test. There are an incomprehensible number of dependencies. I used the chaid() function in the CHAID package to detect interactions and separate out (what I hope to be) the underlying structure of these dependencies for each variable. What typically happens is that the chi-square test will reveal a large number of dependencies (say 10-20) for a variable and the chaid function will reduce this to something much more comprehensible (say 3-5). What I want to do is to extract the names of those variable that were shown to be relevant in the chaid() results. The chaid() output is in the form of a constparty object. My question is how to extract the variable names associated with the nodes in such an object. Here is a self contained code example: library(evtree) # for the ContraceptiveChoice dataset library(CHAID) library(vcd) library(MASS) data("ContraceptiveChoice") longform <- formula(contraceptive_method_used ~ wifes_education + husbands_education + wifes_religion + wife_now_working + husbands_occupation + standard_of_living_index + media_exposure) z <- chaid(longform, data = ContraceptiveChoice) # plot(z) z # This is the part I want to do programatically shortform <- formula(contraceptive_method_used ~ wifes_education + husbands_occupation) # The thing I want is a programatic way to extract 'shortform' from 'z' # Examples of use of 'shortfom' loglm(shortform, data = ContraceptiveChoice) Thanks in advance Christiaan -- Christiaan Pauw Nova Institute www.nova.org.za
Christiaan Pauw
2013-Oct-17 10:56 UTC
[R] Extract a predictors form constparty object (CHAID output) in R
For the record. I have found a possible sollution: nn <- nodeapply(z) n.names= names(unlist(nn[[1]])) ext <- unlist(sapply(n.names, function(x) grep("split.varid.", x, value=T))) ext <- gsub("kids.split.varid.", "", ext) ext <- gsub("split.varid.", "", ext) dep.var <- as.character(terms(z)[1][[2]]) plus = paste(ext, collapse=" + ") mul = paste(ext, collapse=" * ") shortform <- as.formula(paste (dep.var, plus, sep = " ~ ")) satform <- as.formula(paste (dep.var, mul, sep = " ~ ")) mosaic(shortform, data = ContraceptiveChoice) #stp <- step(glm(satform, data=ContraceptiveChoice, family=binomial), direction="both") On 16 October 2013 20:18, Christiaan Pauw <cjpauw@gmail.com> wrote:> I have a large dataset (questionnaire results) of mostly categorical > variables. I have tested for dependency between the variables using > chi-square test. There are an incomprehensible number of dependencies. > I used the chaid() function in the CHAID package to detect > interactions and separate out (what I hope to be) the underlying > structure of these dependencies for each variable. What typically > happens is that the chi-square test will reveal a large number of > dependencies (say 10-20) for a variable and the chaid function will > reduce this to something much more comprehensible (say 3-5). What I > want to do is to extract the names of those variable that were shown > to be relevant in the chaid() results. > > The chaid() output is in the form of a constparty object. My question > is how to extract the variable names associated with the nodes in such > an object. > > Here is a self contained code example: > > library(evtree) # for the ContraceptiveChoice dataset > library(CHAID) > library(vcd) > library(MASS) > > data("ContraceptiveChoice") > longform <- formula(contraceptive_method_used ~ wifes_education + > husbands_education + wifes_religion + wife_now_working + > husbands_occupation + standard_of_living_index + > media_exposure) > z <- chaid(longform, data = ContraceptiveChoice) > # plot(z) > z > # This is the part I want to do programatically > shortform <- formula(contraceptive_method_used ~ wifes_education + > husbands_occupation) > # The thing I want is a programatic way to extract 'shortform' from 'z' > > # Examples of use of 'shortfom' > loglm(shortform, data = ContraceptiveChoice) > > Thanks in advance > Christiaan > -- > Christiaan Pauw > Nova Institute > www.nova.org.za >-- Christiaan Pauw Nova Institute www.nova.org.za [[alternative HTML version deleted]]