Lara Poplarski
2010-Nov-11 22:09 UTC
[R] exploratory analysis of large categorical datasets
Dear List, I am looking to perform exploratory analyses of two (relatively) large datasets of categorical data. The first one is a binary 80x100 matrix, in the form: matrix(sample(c(0,1),25,replace=TRUE), nrow = 5, ncol=5, dimnames = list(c( "group1", "group2","group3", "group4","group5"), c("V.1", "V.2", "V.3", "V.4", "V.5"))) and the second one is a multistate 750x1500 matrix, with up to 15 *unordered* states per variable, in the form: matrix(sample(c(1:15),25,replace=TRUE), nrow = 5, ncol=5, dimnames = list(c( "group1", "group2","group3", "group4","group5"), c("V.1", "V.2", "V.3", "V.4", "V.5"))) Specifically, I am looking to see which pairs of variables are correlated. For continuos data, I would use cor() and cov() to generate the correlation matrix and the variance-covariance matrix, which I would then visualize with symnum() or image(). However, it is not clear to me whether this approach is suitable for categorical data of this kind. Since I am new to R, I would greatly appreciate any input on how to approach this task and on efficient visualization of the results. Many thanks in advance, Lara [[alternative HTML version deleted]]
Dennis Murphy
2010-Nov-12 01:39 UTC
[R] exploratory analysis of large categorical datasets
Hi: A good place to start would be package vcd and its suite of demos and vignettes, as well as the vcdExtra package, which adds a few more goodies and a very nice introductory vignette by Michael Friendly. You can't fault the package for a lack of documentation :) You might also find the following link useful: http://www.datavis.ca/R/ Scroll down to 'vcd and vcdExtra', and further down to 'tableplot', which was recently released on CRAN. HTH, Dennis On Thu, Nov 11, 2010 at 2:09 PM, Lara Poplarski <larapoplarski@gmail.com>wrote:> Dear List, > > > I am looking to perform exploratory analyses of two (relatively) large > datasets of categorical data. The first one is a binary 80x100 matrix, in > the form: > > > matrix(sample(c(0,1),25,replace=TRUE), nrow = 5, ncol=5, dimnames = list(c( > "group1", "group2","group3", "group4","group5"), c("V.1", "V.2", "V.3", > "V.4", "V.5"))) > > > and the second one is a multistate 750x1500 matrix, with up to 15 > *unordered* states per variable, in the form: > > > matrix(sample(c(1:15),25,replace=TRUE), nrow = 5, ncol=5, dimnames > list(c( > "group1", "group2","group3", "group4","group5"), c("V.1", "V.2", "V.3", > "V.4", "V.5"))) > > > Specifically, I am looking to see which pairs of variables are correlated. > For continuos data, I would use cor() and cov() to generate the correlation > matrix and the variance-covariance matrix, which I would then visualize > with > symnum() or image(). However, it is not clear to me whether this approach > is > suitable for categorical data of this kind. > > > Since I am new to R, I would greatly appreciate any input on how to > approach > this task and on efficient visualization of the results. > > > Many thanks in advance, > > Lara > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]