I am exploring the result of clustering a large multivariate data set
into a number of groups, represented, say, by a factor G.
I wrote a function to see how categorical variables vary between groups:
> ddisp <- function(dvar) {
+ csqt <- chisq.test(G,dvar)
+ print(csqt$statistic)
+ print(csqt$observed)
+ print(round(csqt$expected))
+ round(csqt$residuals)
+ }
>
> x <- ceiling(4*runif(100))
> G <- gl(4,1,100)
> ddisp(x)
X-squared
6.235645
dvar
G 1 2 3 4
1 10 5 5 5
2 6 9 5 5
3 8 6 5 6
4 7 4 4 10
dvar
G 1 2 3 4
1 8 6 5 6
2 8 6 5 6
3 8 6 5 6
4 8 6 5 6
dvar
G 1 2 3 4
1 1 0 0 -1
2 -1 1 0 -1
3 0 0 0 0
4 0 -1 0 1
Warning message:
Chi-squared approximation may be incorrect in: chisq.test(G, dvar)
As I need to apply this function to a large number of variables x it
would be helpful if the function printed "x" rather than the formal
argument "dvar". I have a vague idea that things like deparse() and
substitute() will come into the solution but I have not yet come up with
the right incantation. Any help appreciated!
Murray Jorgensen
--
Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: maj at waikato.ac.nz Fax 7 838 4155
Phone +64 7 838 4773 wk Home +64 7 825 0441 Mobile 021 1395 862
ddisp <- function(dvar) {
yn <- substitute(dvar)
csqt <- eval.parent(substitute(chisq.test(G,dvar), list(dvar=yn)))
....
}
There are other ways, such as forming the cross-classification table,
setting its dimnames and passing that to chisq.test.
On Mon, 13 Nov 2006, Murray Jorgensen wrote:
> I am exploring the result of clustering a large multivariate data set
> into a number of groups, represented, say, by a factor G.
>
> I wrote a function to see how categorical variables vary between groups:
>
> > ddisp <- function(dvar) {
> + csqt <- chisq.test(G,dvar)
> + print(csqt$statistic)
> + print(csqt$observed)
> + print(round(csqt$expected))
> + round(csqt$residuals)
> + }
> >
> > x <- ceiling(4*runif(100))
> > G <- gl(4,1,100)
> > ddisp(x)
> X-squared
> 6.235645
> dvar
> G 1 2 3 4
> 1 10 5 5 5
> 2 6 9 5 5
> 3 8 6 5 6
> 4 7 4 4 10
> dvar
> G 1 2 3 4
> 1 8 6 5 6
> 2 8 6 5 6
> 3 8 6 5 6
> 4 8 6 5 6
> dvar
> G 1 2 3 4
> 1 1 0 0 -1
> 2 -1 1 0 -1
> 3 0 0 0 0
> 4 0 -1 0 1
> Warning message:
> Chi-squared approximation may be incorrect in: chisq.test(G, dvar)
>
> As I need to apply this function to a large number of variables x it
> would be helpful if the function printed "x" rather than the
formal
> argument "dvar". I have a vague idea that things like deparse()
and
> substitute() will come into the solution but I have not yet come up with
> the right incantation. Any help appreciated!
>
> Murray Jorgensen
>
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
Thanks for these suggestions, Professor Ripley. It's interesting that the function parameters in R are not truly "dummy" as they can effect the result of a function. Murray Prof Brian Ripley wrote:> ddisp <- function(dvar) { > yn <- substitute(dvar) > csqt <- eval.parent(substitute(chisq.test(G,dvar), list(dvar=yn))) > .... > } > > There are other ways, such as forming the cross-classification table, > setting its dimnames and passing that to chisq.test. > > On Mon, 13 Nov 2006, Murray Jorgensen wrote: > >> I am exploring the result of clustering a large multivariate data set >> into a number of groups, represented, say, by a factor G. >> >> I wrote a function to see how categorical variables vary between groups: >> >> > ddisp <- function(dvar) { >> + csqt <- chisq.test(G,dvar) >> + print(csqt$statistic) >> + print(csqt$observed) >> + print(round(csqt$expected)) >> + round(csqt$residuals) >> + } >> > >> > x <- ceiling(4*runif(100)) >> > G <- gl(4,1,100) >> > ddisp(x) >> X-squared >> 6.235645 >> dvar >> G 1 2 3 4 >> 1 10 5 5 5 >> 2 6 9 5 5 >> 3 8 6 5 6 >> 4 7 4 4 10 >> dvar >> G 1 2 3 4 >> 1 8 6 5 6 >> 2 8 6 5 6 >> 3 8 6 5 6 >> 4 8 6 5 6 >> dvar >> G 1 2 3 4 >> 1 1 0 0 -1 >> 2 -1 1 0 -1 >> 3 0 0 0 0 >> 4 0 -1 0 1 >> Warning message: >> Chi-squared approximation may be incorrect in: chisq.test(G, dvar) >> >> As I need to apply this function to a large number of variables x it >> would be helpful if the function printed "x" rather than the formal >> argument "dvar". I have a vague idea that things like deparse() and >> substitute() will come into the solution but I have not yet come up with >> the right incantation. Any help appreciated! >> >> Murray Jorgensen >> >> >-- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: maj at waikato.ac.nz Fax 7 838 4155 Phone +64 7 838 4773 wk Home +64 7 825 0441 Mobile 021 1395 862 -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: maj at waikato.ac.nz Fax 7 838 4155 Phone +64 7 838 4773 wk Home +64 7 825 0441 Mobile 021 1395 862