Hi all, In a data-frame, I have two columns of data that are categorical. How do I form some sort of measure of correlation between these two columns? For numerical data, I just need to regress one to the other, or do some pairs plot. But for categorical data, how do I find and/or visualize correlation between the two columns of data? Thanks!
Not an expert, but I would try some of the following: # tabulate joint frequencies ?table ?xtabs # plotting mosaicplot(Titanic, main = "Survival on the Titanic", color = TRUE, shade=TRUE) # log-linear models check the library for more ideas. Cheers, Dylan On Fri, Jun 19, 2009 at 2:04 PM, Michael<comtech.usa at gmail.com> wrote:> Hi all, > > In a data-frame, I have two columns of data that are categorical. > > How do I form some sort of measure of correlation between these two columns? > > For numerical data, I just need to regress one to the other, or do > some pairs plot. > > But for categorical data, how do I find and/or visualize correlation > between the two columns of data? > > Thanks! > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
On 2009.06.19 14:04:59, Michael wrote:> Hi all, > > In a data-frame, I have two columns of data that are categorical. > > How do I form some sort of measure of correlation between these two columns? > > For numerical data, I just need to regress one to the other, or do > some pairs plot. > > But for categorical data, how do I find and/or visualize correlation > between the two columns of data?As Dylan mentioned, using crosstabs may be the easiest way. Also, a simple correlation between the two variables may be informative. If each variable is ordinal, you can use Kendall's tau-b (square table) or tau-c (rectangular table). The former you can calculate with ?cor (set method="kendall"), the latter you may have to hack something together yourself, there is code on the Internet to do this. If the data are nominal, then a simple chi-squared test (large-n) or Fisher's exact test (small-n) may be more appropriate. There are rules about which to use when one variable is ordinal and one is nominal, but I don't have my notes in front of me. Maybe someone else can provide more assistance (and correct me if I'm wrong :). Cheers, ~Jason -- Jason W. Morgan Graduate Student Department of Political Science *The Ohio State University* 154 North Oval Mall Columbus, Ohio 43210