Tim Finney
2007-Jan-11 19:53 UTC
[R] Exploratory multivariate analysis of categorical data
This is my first post to R-help. I am doing some research into the text of the New Testament, specifically places where textual variation occurs across manuscripts. (See http://purl.org/tfinney/NTText/book/index.html for details.) New Testament textual critics call places where the text varies "variation units," and each state of the text in a variation unit is called a "reading." The apparatus of a critical edition can be transformed into a data matrix by making each witness (typically a manuscript, but might be an early version or church father) an observation (i.e. a row) and each variation unit a variable (i.e. a column). I encode readings, which consist of words or phrases, as numerals in the data matrix. (There are often more than two readings in a variation unit.) I make a dissimilarity matrix by calculating the proportion of variation units in which each pair of witnesses disagrees. Here is my question: Which exploratory multivariate techniques are applicable to this kind of data matrix and this kind of dissimilarity matrix? From reading the R docs, it seems to me that MDS (metric and non-metric) and hierarchical clustering are appropriate, but I am not so sure about others. Best Tim Finney