thr3ads.net - R help - [R] Newbie question regarding graphing of Princomp object [Jan 2005]

If this information is useful, please help other people find it:
Share via:

List account

2005-Jan-15 04:39 UTC

[R] Newbie question regarding graphing of Princomp object

Greetings,

I am working on a stylometric analysis of some latin texts; one of the  
latest stylometric techniques involves using principal components  
analysis.  Not being a statistician, I can't really fully rely on PCA  
as my primary tool, since I don't really understand the statistics  
behind the PCA technique.  Nevertheless, the ability to use PCA and  
graph the results has been marvelously helpful as a preliminary  
technique to determine what kinds of stylometric variables are worth  
pursuing as indicators of authorship.

For instance, I'm doing the following...  I have a set of data for  
approximately 120 different latin works, about half of which are by St.  
Thomas Aquinas, and the other half are by various other authors in the  
Thomistic tradition, some known and some anonymous.  My data for  
frequencies of prepositions looks like the following:

A,AD,CIRCA,CUM,DE, .... (total of 10 variables)
1,0.00967667222531036,0.0208124884194923,0.00142671854734112,0.004863813 
22957198,0.00758291643505651 ...
2,0.00874917700292081,0.0217315416668508,0.00133005165549453,0.004379007 
27772451,0.00537323193714733 ....
3,0.0064258378627327,0.0280901956627422,0.00178739176045295,0.0043058230 
9573329,0.00821688482105979 ....
4,0.00706850368364528,0.027446604903448,0.000821141574836712,0.004617615 
47172807,0.00812783899774761 ....
5,0.010214039424891,0.015409971157808,0.000745993537614122,0.00584650749 
246416,0.00475787738815518 ....
6,0.00952534711010655,0.0180981595092025,0.00125928317726832,0.005150145 
30190507,0.00447206974491443 ...
.... (and so on for the rest of the 120 works)

The works are numbered such that works 100 and below are by St. Thomas,  
those from 101 to 117 are of dubious authenticity, and those from 118  
to 179 are by other authors.

When I perform a biplot, on the results of the princomp() function, I  
get a nice graph that plots the 120 works on the two principal  
component axes (I've figured out how to get rid of the red arrows  
already).  Given that the data points tend to jumble together, I'd like  
some way to color the different categories of works in the biplot, so  
that data points for works 1-100 are red, those from 101-117 are blue,  
and those from 118 to 179 are green (for instance).

I've included a sample of the output that I'm currently getting, in  
case it's helpful to anybody.  BTW, I am running RAqua (for the Mac),  
version 1.8.1.

Thanks in advance for any help!

-Erik Norvelle
erik (at) norvelle (dot) org
Facultad de Filosof?a y Letras
Universidad de Navarra
Pamplona, Navarra, Espa?a

-------------- next part --------------
A non-text attachment was scrubbed...
Name: prepositions.pdf
Type: application/pdf
Size: 12639 bytes
Desc: not available
Url :
https://stat.ethz.ch/pipermail/r-help/attachments/20050115/3611db92/prepositions.pdf
-------------- next part --------------

Tobias Verbeke

2005-Jan-15 08:47 UTC

head link

[R] Newbie question regarding graphing of Princomp object

On Sat, 15 Jan 2005 05:39:00 +0100
List account <lists at norvelle.org> wrote:
> Greetings,
> 
> I am working on a stylometric analysis of some latin texts; one of the  
> latest stylometric techniques involves using principal components  
> analysis.  Not being a statistician, I can't really fully rely on PCA  
> as my primary tool, since I don't really understand the statistics  
> behind the PCA technique.  Nevertheless, the ability to use PCA and  
> graph the results has been marvelously helpful as a preliminary  
> technique to determine what kinds of stylometric variables are worth  
> pursuing as indicators of authorship.
> 
> For instance, I'm doing the following...  I have a set of data for  
> approximately 120 different latin works, about half of which are by St.  
> Thomas Aquinas, and the other half are by various other authors in the  
> Thomistic tradition, some known and some anonymous.  My data for  
> frequencies of prepositions looks like the following:
> 
> A,AD,CIRCA,CUM,DE, .... (total of 10 variables)
> 1,0.00967667222531036,0.0208124884194923,0.00142671854734112,0.004863813 
> 22957198,0.00758291643505651 ...
> 2,0.00874917700292081,0.0217315416668508,0.00133005165549453,0.004379007 
> 27772451,0.00537323193714733 ....
> 3,0.0064258378627327,0.0280901956627422,0.00178739176045295,0.0043058230 
> 9573329,0.00821688482105979 ....
> 4,0.00706850368364528,0.027446604903448,0.000821141574836712,0.004617615 
> 47172807,0.00812783899774761 ....
> 5,0.010214039424891,0.015409971157808,0.000745993537614122,0.00584650749 
> 246416,0.00475787738815518 ....
> 6,0.00952534711010655,0.0180981595092025,0.00125928317726832,0.005150145 
> 30190507,0.00447206974491443 ...
> .... (and so on for the rest of the 120 works)
> 
> The works are numbered such that works 100 and below are by St. Thomas,  
> those from 101 to 117 are of dubious authenticity, and those from 118  
> to 179 are by other authors.
> 
> When I perform a biplot, on the results of the princomp() function, I  
> get a nice graph that plots the 120 works on the two principal  
> component axes (I've figured out how to get rid of the red arrows  
> already).  Given that the data points tend to jumble together, I'd like
> some way to color the different categories of works in the biplot, so  
> that data points for works 1-100 are red, those from 101-117 are blue,  
> and those from 118 to 179 are green (for instance).
You can use the `col' argument in the biplot call. In this case, I
would do something like 

biplot(mydata, col = c(rep("red", 100), rep("blue", 17),
rep("green", 62)))

For a list of built-in color names, you can type colors() at the R prompt.
For more information on biplot, type ?biplot

VaRiis modis bene fit.

HTH,
Tobias
> I've included a sample of the output that I'm currently getting, in
> case it's helpful to anybody.  BTW, I am running RAqua (for the Mac),  
> version 1.8.1.
> 
> Thanks in advance for any help!
> 
> -Erik Norvelle
> erik (at) norvelle (dot) org
> Facultad de Filosof?a y Letras
> Universidad de Navarra
> Pamplona, Navarra, Espa?a
> 
>

Tobias Verbeke

2005-Jan-15 15:30 UTC

head link

[R] graphing of Princomp object

On Sat, 15 Jan 2005 15:53:18 +0100
List account <lists at norvelle.org> wrote:
> Thanks, Tobias for the response.
> 
> I tried the suggestion you gave, and apparently (at least according to 
> the biplot manpage, only the first two members of the col vector are 
> used, the first to plot the first set of values, i.e. the scores, and 
> the second color is used for the loadings (I think I have that right).  
> At any rate, if I add the clause 'col = c(rep("red", 100),
rep("blue",
> 17), rep("green", 62))' I just get a bunch of red points! :(
You're right. I'm sorry I did not read ?biplot, but only checked it had
a col argument (Semel in anno licet insanire..). 
Anyway, with PCA it is not a good idea to plot both variables and
cases on one single plot, because the temptation is too great to interpret
proximities between variables and cases. You'd better plot two different
graphs, one for the cases and one for the `circle of correlations'.

For plotting the cases, you could make up your own plot using
something similar to this:

library(MASS) # for eqscplot
F1 <- yourpca$score[,1]
F2 <- yourpca$score[,2]
eqscplot(F1, F2, pch = 20)
text(F1, F2, labels = names(F1), 
     col = c(rep("red", 100), 
             rep("blue", 17), 
             rep("green", 62)),
     pos = 3)


Tobias
> Si vales, valeo...
> 
> -Erik

Apparently Analagous Threads

Search for more maybe matching threads

R help - Jan 2005 - Newbie question regarding graphing of Princomp object

[R] Newbie question regarding graphing of Princomp object

[R] Newbie question regarding graphing of Princomp object

[R] graphing of Princomp object

Apparently Analagous Threads