I am exploring the result of clustering a large multivariate data set into a number of groups, represented, say, by a factor G. I wrote a function to see how categorical variables vary between groups: > ddisp <- function(dvar) { + csqt <- chisq.test(G,dvar) + print(csqt$statistic) + print(csqt$observed) + print(round(csqt$expected)) + round(csqt$residuals) + } > > x <- ceiling(4*runif(100)) > G <- gl(4,1,100) > ddisp(x) X-squared 6.235645 dvar G 1 2 3 4 1 10 5 5 5 2 6 9 5 5 3 8 6 5 6 4 7 4 4 10 dvar G 1 2 3 4 1 8 6 5 6 2 8 6 5 6 3 8 6 5 6 4 8 6 5 6 dvar G 1 2 3 4 1 1 0 0 -1 2 -1 1 0 -1 3 0 0 0 0 4 0 -1 0 1 Warning message: Chi-squared approximation may be incorrect in: chisq.test(G, dvar) As I need to apply this function to a large number of variables x it would be helpful if the function printed "x" rather than the formal argument "dvar". I have a vague idea that things like deparse() and substitute() will come into the solution but I have not yet come up with the right incantation. Any help appreciated! Murray Jorgensen -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: maj at waikato.ac.nz Fax 7 838 4155 Phone +64 7 838 4773 wk Home +64 7 825 0441 Mobile 021 1395 862
ddisp <- function(dvar) { yn <- substitute(dvar) csqt <- eval.parent(substitute(chisq.test(G,dvar), list(dvar=yn))) .... } There are other ways, such as forming the cross-classification table, setting its dimnames and passing that to chisq.test. On Mon, 13 Nov 2006, Murray Jorgensen wrote:> I am exploring the result of clustering a large multivariate data set > into a number of groups, represented, say, by a factor G. > > I wrote a function to see how categorical variables vary between groups: > > > ddisp <- function(dvar) { > + csqt <- chisq.test(G,dvar) > + print(csqt$statistic) > + print(csqt$observed) > + print(round(csqt$expected)) > + round(csqt$residuals) > + } > > > > x <- ceiling(4*runif(100)) > > G <- gl(4,1,100) > > ddisp(x) > X-squared > 6.235645 > dvar > G 1 2 3 4 > 1 10 5 5 5 > 2 6 9 5 5 > 3 8 6 5 6 > 4 7 4 4 10 > dvar > G 1 2 3 4 > 1 8 6 5 6 > 2 8 6 5 6 > 3 8 6 5 6 > 4 8 6 5 6 > dvar > G 1 2 3 4 > 1 1 0 0 -1 > 2 -1 1 0 -1 > 3 0 0 0 0 > 4 0 -1 0 1 > Warning message: > Chi-squared approximation may be incorrect in: chisq.test(G, dvar) > > As I need to apply this function to a large number of variables x it > would be helpful if the function printed "x" rather than the formal > argument "dvar". I have a vague idea that things like deparse() and > substitute() will come into the solution but I have not yet come up with > the right incantation. Any help appreciated! > > Murray Jorgensen > >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Thanks for these suggestions, Professor Ripley. It's interesting that the function parameters in R are not truly "dummy" as they can effect the result of a function. Murray Prof Brian Ripley wrote:> ddisp <- function(dvar) { > yn <- substitute(dvar) > csqt <- eval.parent(substitute(chisq.test(G,dvar), list(dvar=yn))) > .... > } > > There are other ways, such as forming the cross-classification table, > setting its dimnames and passing that to chisq.test. > > On Mon, 13 Nov 2006, Murray Jorgensen wrote: > >> I am exploring the result of clustering a large multivariate data set >> into a number of groups, represented, say, by a factor G. >> >> I wrote a function to see how categorical variables vary between groups: >> >> > ddisp <- function(dvar) { >> + csqt <- chisq.test(G,dvar) >> + print(csqt$statistic) >> + print(csqt$observed) >> + print(round(csqt$expected)) >> + round(csqt$residuals) >> + } >> > >> > x <- ceiling(4*runif(100)) >> > G <- gl(4,1,100) >> > ddisp(x) >> X-squared >> 6.235645 >> dvar >> G 1 2 3 4 >> 1 10 5 5 5 >> 2 6 9 5 5 >> 3 8 6 5 6 >> 4 7 4 4 10 >> dvar >> G 1 2 3 4 >> 1 8 6 5 6 >> 2 8 6 5 6 >> 3 8 6 5 6 >> 4 8 6 5 6 >> dvar >> G 1 2 3 4 >> 1 1 0 0 -1 >> 2 -1 1 0 -1 >> 3 0 0 0 0 >> 4 0 -1 0 1 >> Warning message: >> Chi-squared approximation may be incorrect in: chisq.test(G, dvar) >> >> As I need to apply this function to a large number of variables x it >> would be helpful if the function printed "x" rather than the formal >> argument "dvar". I have a vague idea that things like deparse() and >> substitute() will come into the solution but I have not yet come up with >> the right incantation. Any help appreciated! >> >> Murray Jorgensen >> >> >-- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: maj at waikato.ac.nz Fax 7 838 4155 Phone +64 7 838 4773 wk Home +64 7 825 0441 Mobile 021 1395 862 -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: maj at waikato.ac.nz Fax 7 838 4155 Phone +64 7 838 4773 wk Home +64 7 825 0441 Mobile 021 1395 862