Hi, I am doing PCA on several columns of data in a data.frame. I am interested in particular rows of data which may have a particular combination of 'types' of column values (without any pre-conception of what they may be). I do the following... # My data table. allDat <- read.table("big_select_thresh_5", header=1) # Where some rows look like this... # PDB SUNID1 SUNID2 AA CH IPCA PCA IBB BB # 3sdh 14984 14985 6 10 24 24 93 116 # 3hbi 14986 14987 6 10 20 22 94 117 # 4sdh 14988 14989 6 10 20 20 104 122 # NB First three columns = row ID, last 6 = variables attach(allDat) # My columns of interest (variables). part <- data.frame(AA,CH,IPCA,PCA,IBB,BB) pc <- princomp(part) plot(pc) The above plot shows that 95% of the variance is due to the first 'Component' (which I assume is AA). i.e. All the variables behave in quite much the same way. I then did ... biplot(pc) Which showed some outliers with a numeric ID - How do I get back my old 3 part ID used in allDat? In the above plot I saw all the variables (correctly named) pointing in more or less the same direction (as shown by the variance). I then did the following... postscript(file="test.ps",paper="a4") biplot(pc) dev.off() However, looking at test.ps shows that the arrows are missing (using ggv)... Hmmm, they come back when I pstoimg then xv... never mind. Finally, I would like to make a contour plot of the above biplot, is this possible? (or even a good way to present the data? Thanks very much for any feedback, Dan.
On 06/29/04 11:04, Dan Bolser wrote:> >Hi, I am doing PCA on several columns of data in a data.frame. > >I am interested in particular rows of data which may have a particular >combination of 'types' of column values (without any pre-conception of >what they may be). > >I do the following... > ># My data table. >allDat <- read.table("big_select_thresh_5", header=1) > ># Where some rows look like this... ># PDB SUNID1 SUNID2 AA CH IPCA PCA IBB BB ># 3sdh 14984 14985 6 10 24 24 93 116 ># 3hbi 14986 14987 6 10 20 22 94 117 ># 4sdh 14988 14989 6 10 20 20 104 122 > ># NB First three columns = row ID, last 6 = variables > >attach(allDat) > ># My columns of interest (variables). >part <- data.frame(AA,CH,IPCA,PCA,IBB,BB) > >pc <- princomp(part) > >plot(pc) > >The above plot shows that 95% of the variance is due to the first >'Component' (which I assume is AA).No. It is the first principal component, which is some linear combination of all the variables. Try loadings(pc). It sounds like you need to read up on principal component analysis.>i.e. All the variables behave in quite much the same way. > >I then did ... > > >biplot(pc) > >Which showed some outliers with a numeric ID - How do I get back my old 3 >part ID used in allDat?The numeric ID is taken from the row names of pc. So, if the IDs in question are 3 and 5, then alldat[c(3,5),] should work.>In the above plot I saw all the variables (correctly named) pointing in >more or less the same direction (as shown by the variance). I then did the >following... > >postscript(file="test.ps",paper="a4") > >biplot(pc) > >dev.off() > >However, looking at test.ps shows that the arrows are missing (using >ggv)... Hmmm, they come back when I pstoimg then xv... never mind.I get red arrows for the components in both the original graph and the ps output (R 1.9.1, Fedora Core 2). This may be a platform-specific problem or one specific to ggv. I have neither ggv nor pstoimg. (But xv and gv both work.)>Finally, I would like to make a contour plot of the above biplot, is this >possible? (or even a good way to present the data?No idea how to do this or why you would want it. Jon -- Jonathan Baron, Professor of Psychology, University of Pennsylvania Home page: http://www.sas.upenn.edu/~baron R search page: http://finzi.psych.upenn.edu/
On Tue, 29 Jun 2004, Dan Bolser wrote:> Hi, I am doing PCA on several columns of data in a data.frame. > > I am interested in particular rows of data which may have a particular > combination of 'types' of column values (without any pre-conception of > what they may be). > > I do the following... > > # My data table. > allDat <- read.table("big_select_thresh_5", header=1) > > # Where some rows look like this... > # PDB SUNID1 SUNID2 AA CH IPCA PCA IBB BB > # 3sdh 14984 14985 6 10 24 24 93 116 > # 3hbi 14986 14987 6 10 20 22 94 117 > # 4sdh 14988 14989 6 10 20 20 104 122 > > # NB First three columns = row ID, last 6 = variables > > attach(allDat) > > # My columns of interest (variables). > part <- data.frame(AA,CH,IPCA,PCA,IBB,BB) > > pc <- princomp(part)Do you really want an unscaled PCA on that data set? Looks unlikely (but then two of the columns are constant in the sample, which is also worrying).> plot(pc) > > The above plot shows that 95% of the variance is due to the first > 'Component' (which I assume is AA).No, it is the first (principal) component. You did ask for P>C<A!> i.e. All the variables behave in quite much the same way.Or you failed to scale the data so one dominates.> I then did ... > > > biplot(pc) > > Which showed some outliers with a numeric ID - How do I get back my old 3 > part ID used in allDat?Set row names on your data frame. Like almost all of R, it is the row names of a data frame that are used for labelling, and you did not give any so you got numbers.> In the above plot I saw all the variables (correctly named) pointing in > more or less the same direction (as shown by the variance). I then did the > following... > > postscript(file="test.ps",paper="a4") > > biplot(pc) > > dev.off() > > However, looking at test.ps shows that the arrows are missing (using > ggv)... Hmmm, they come back when I pstoimg then xv... never mind.So ggv is unreliable, perhaps cannot cope with colours?> Finally, I would like to make a contour plot of the above biplot, is this > possible? (or even a good way to present the data?What do you propose to represent by the contours? Biplots have a well-defined interpretation in terms of distances and angles. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595