Hi all, I' m trying to figure out if it is appropriate to do a PCA having only categorical data (not ordinal). I have only find the following quote: One method to find such relationships is to select appropriate variables and to view the data using a method like Principle Components Analysis (PCA) [4]. This approach gives us a clear picture of the data using KL-plot of the PCA. However, the method is not settled for the data including categorical data. [http://hp.vector.co.jp/authors/VA038807/personal/covEigGiniRep17.pdf] but I'm still not sure if it WRONG to do so. Any opinion or reference would be very helpful thanks
You might want to look into correspondence analysis, which has several variants of PCA designed for categorical data. On Fri, 6 Mar 2009, Galanidis Alexandros wrote:> Hi all, > > I' m trying to figure out if it is appropriate to do a PCA having only categorical data (not ordinal). I have only find the following quote: > > One method to find such relationships is to select appropriate variables and > to view the data using a method like Principle Components Analysis (PCA) [4]. > This approach gives us a clear picture of the data using KL-plot of the PCA. > However, the method is not settled for the data including categorical data. > [http://hp.vector.co.jp/authors/VA038807/personal/covEigGiniRep17.pdf] > > but I'm still not sure if it WRONG to do so.Since normally categorical data is taken to be binomial or Poisson distributed, the variance varies with the mean and least-squares (the basis of PCA) is then sub-optimal. Correspondence analysis takes that into account (at least to some extent).> Any opinion or reference would be very helpfulThere is a basic introduction in MASS4, with references to more comprehensive accounts.> thanks > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Hi Galandis, dudi.mix() in package ade4 does PCA using categorical and/or quantitative variables. Ordered cats are replaced by poly(x, deg=2). Squares of categoricals can also be used. The method is a generalization by Chessel of the method of Hill and Smith. Regards, Mark. Galanidis Alexandros wrote:> > Hi all, > > I' m trying to figure out if it is appropriate to do a PCA having only > categorical data (not ordinal). I have only find the following quote: > > One method to find such relationships is to select appropriate variables > and > to view the data using a method like Principle Components Analysis (PCA) > [4]. > This approach gives us a clear picture of the data using KL-plot of the > PCA. > However, the method is not settled for the data including categorical > data. > [http://hp.vector.co.jp/authors/VA038807/personal/covEigGiniRep17.pdf] > > but I'm still not sure if it WRONG to do so. > > Any opinion or reference would be very helpful > > thanks > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >-- View this message in context: http://www.nabble.com/PCA-and-categorical-data-tp22368671p22369249.html Sent from the R help mailing list archive at Nabble.com.
See homals package in R. But also look documents for ade4 package. Justin BEM BP 1917 Yaoundé Tél (237) 76043774 ________________________________ De : Galanidis Alexandros <agal@env.aegean.gr> À : "r-help@r-project.org" <r-help@r-project.org> Envoyé le : Vendredi, 6 Mars 2009, 10h09mn 18s Objet : [R] PCA and categorical data Hi all, I' m trying to figure out if it is appropriate to do a PCA having only categorical data (not ordinal). I have only find the following quote: One method to find such relationships is to select appropriate variables and to view the data using a method like Principle Components Analysis (PCA) [4]. This approach gives us a clear picture of the data using KL-plot of the PCA. However, the method is not settled for the data including categorical data. [http://hp.vector.co.jp/authors/VA038807/personal/covEigGiniRep17.pdf] but I'm still not sure if it WRONG to do so. Any opinion or reference would be very helpful thanks ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]]