Christiaan Pauw
2013-Oct-16 18:18 UTC
[R] Extract a predictors form constparty object (CHAID output) in R
I have a large dataset (questionnaire results) of mostly categorical
variables. I have tested for dependency between the variables using
chi-square test. There are an incomprehensible number of dependencies.
I used the chaid() function in the CHAID package to detect
interactions and separate out (what I hope to be) the underlying
structure of these dependencies for each variable. What typically
happens is that the chi-square test will reveal a large number of
dependencies (say 10-20) for a variable and the chaid function will
reduce this to something much more comprehensible (say 3-5). What I
want to do is to extract the names of those variable that were shown
to be relevant in the chaid() results.
The chaid() output is in the form of a constparty object. My question
is how to extract the variable names associated with the nodes in such
an object.
Here is a self contained code example:
library(evtree) # for the ContraceptiveChoice dataset
library(CHAID)
library(vcd)
library(MASS)
data("ContraceptiveChoice")
longform <- formula(contraceptive_method_used ~ wifes_education +
husbands_education + wifes_religion + wife_now_working +
husbands_occupation + standard_of_living_index +
media_exposure)
z <- chaid(longform, data = ContraceptiveChoice)
# plot(z)
z
# This is the part I want to do programatically
shortform <- formula(contraceptive_method_used ~ wifes_education +
husbands_occupation)
# The thing I want is a programatic way to extract 'shortform' from
'z'
# Examples of use of 'shortfom'
loglm(shortform, data = ContraceptiveChoice)
Thanks in advance
Christiaan
--
Christiaan Pauw
Nova Institute
www.nova.org.za
Christiaan Pauw
2013-Oct-17 10:56 UTC
[R] Extract a predictors form constparty object (CHAID output) in R
For the record. I have found a possible sollution:
nn <- nodeapply(z)
n.names= names(unlist(nn[[1]]))
ext <- unlist(sapply(n.names, function(x) grep("split.varid.", x,
value=T)))
ext <- gsub("kids.split.varid.", "", ext)
ext <- gsub("split.varid.", "", ext)
dep.var <- as.character(terms(z)[1][[2]])
plus = paste(ext, collapse=" + ")
mul = paste(ext, collapse=" * ")
shortform <- as.formula(paste (dep.var, plus, sep = " ~ "))
satform <- as.formula(paste (dep.var, mul, sep = " ~ "))
mosaic(shortform, data = ContraceptiveChoice)
#stp <- step(glm(satform, data=ContraceptiveChoice, family=binomial),
direction="both")
On 16 October 2013 20:18, Christiaan Pauw <cjpauw@gmail.com> wrote:
> I have a large dataset (questionnaire results) of mostly categorical
> variables. I have tested for dependency between the variables using
> chi-square test. There are an incomprehensible number of dependencies.
> I used the chaid() function in the CHAID package to detect
> interactions and separate out (what I hope to be) the underlying
> structure of these dependencies for each variable. What typically
> happens is that the chi-square test will reveal a large number of
> dependencies (say 10-20) for a variable and the chaid function will
> reduce this to something much more comprehensible (say 3-5). What I
> want to do is to extract the names of those variable that were shown
> to be relevant in the chaid() results.
>
> The chaid() output is in the form of a constparty object. My question
> is how to extract the variable names associated with the nodes in such
> an object.
>
> Here is a self contained code example:
>
> library(evtree) # for the ContraceptiveChoice dataset
> library(CHAID)
> library(vcd)
> library(MASS)
>
> data("ContraceptiveChoice")
> longform <- formula(contraceptive_method_used ~ wifes_education +
> husbands_education + wifes_religion + wife_now_working +
> husbands_occupation + standard_of_living_index +
> media_exposure)
> z <- chaid(longform, data = ContraceptiveChoice)
> # plot(z)
> z
> # This is the part I want to do programatically
> shortform <- formula(contraceptive_method_used ~ wifes_education +
> husbands_occupation)
> # The thing I want is a programatic way to extract 'shortform'
from 'z'
>
> # Examples of use of 'shortfom'
> loglm(shortform, data = ContraceptiveChoice)
>
> Thanks in advance
> Christiaan
> --
> Christiaan Pauw
> Nova Institute
> www.nova.org.za
>
--
Christiaan Pauw
Nova Institute
www.nova.org.za
[[alternative HTML version deleted]]