thr3ads.net - R help - [R] PCA on high dimentional data [Dec 2011]

If this information is useful, please help other people find it:
Share via:

mail me

2011-Dec-10 15:56 UTC

[R] PCA on high dimentional data

Hi:

I have a large dataset mydata, of 1000 rows and 1000 columns. The rows
have gene names and columns have condition names (cond1, cond2, cond3,
etc).

mydata<- read.table(file="c:/file1.mtx", header=TRUE,
sep="")

I applied PCA as follows:

data_after_pca<- prcomp(mydata, retx=TRUE, center=TRUE, scale.=TRUE);

Now i get 1000 PCs and i choose first three PCs and make a new data frame

new_data_frame<- cbind(data_after_pca$x[,1], data_after_pca$x[,2],
data_after_pca$x[,3]);

After the PCA, in the new_data_frame, i loose the previous cond1,
cond2, cond3 labels, and instead have PC1, PC2, PC3 as column names.

My question is, is there any way I can map the PC1, PC2, PC3 to the
original conditions, so that i can still have a reference to original
condition labels after PCA?

Thanks:
deb

Stephen Sefick

2011-Dec-10 17:07 UTC

head link

[R] PCA on high dimentional data

By doing PCA you are trying to find a lower dimensional representation 
of the major variation structure in your data.  You get PC* to represent 
the "new" data.  If you want to know what loads on the axes then you 
need to look at the loadings.  These are the link between the original 
data and the "new" data.  Maybe you need to read up on what PCA does?
Or, maybe I am misunderstanding your question...
FWIW


Stephen

On Sat 10 Dec 2011 09:56:35 AM CST, mail me wrote:>
> Hi:
>
> I have a large dataset mydata, of 1000 rows and 1000 columns. The rows
> have gene names and columns have condition names (cond1, cond2, cond3,
> etc).
>
> mydata<- read.table(file="c:/file1.mtx", header=TRUE,
sep="")
>
> I applied PCA as follows:
>
> data_after_pca<- prcomp(mydata, retx=TRUE, center=TRUE, scale.=TRUE);
>
> Now i get 1000 PCs and i choose first three PCs and make a new data frame
>
> new_data_frame<- cbind(data_after_pca$x[,1], data_after_pca$x[,2],
> data_after_pca$x[,3]);
>
> After the PCA, in the new_data_frame, i loose the previous cond1,
> cond2, cond3 labels, and instead have PC1, PC2, PC3 as column names.
>
> My question is, is there any way I can map the PC1, PC2, PC3 to the
> original conditions, so that i can still have a reference to original
> condition labels after PCA?
>
> Thanks:
> deb
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> -- 
> Stephen Sefick
> **************************************************
> Auburn University
> Biological Sciences
> 331 Funchess Hall
> Auburn, Alabama
> 36849
> **************************************************
> sas0025 at auburn.edu
> http://www.auburn.edu/~sas0025
> **************************************************
>
> Let's not spend our time and resources thinking about things that are
so little or so large that all they really do for us is puff us up and make us
feel like gods.  We are mammals, and have not exhausted the annoying little
problems of being mammals.
>
>                                  -K. Mullis
>
> "A big computer, a complex algorithm and a long time does not equal
science."
>
>                                -Robert Gentleman
>

Mark Difford

2011-Dec-10 18:40 UTC

head link

[R] PCA on high dimentional data

On Dec 10, 2011 at 5:56pm deb wrote:
> My question is, is there any way I can map the PC1, PC2, PC3 to the
> original conditions, 
> so that i can still have a reference to original condition labels after
> PCA?
deb,

To add to what Stephen has said. Best to do read up on principal component
analysis. Briefly, each PCA is composite variable, composed of different
"amounts" of each and every one of your column variables, i.e. cond1,
...,
cond1000.

So the short answer to your question is no. There is no way to do this
mapping, except as loadings on each principal component (PC).

Regards, Mark.

-----
Mark Difford (Ph.D.)
Research Associate
Botany Department
Nelson Mandela Metropolitan University
Port Elizabeth, South Africa
--
View this message in context:
http://r.789695.n4.nabble.com/PCA-on-high-dimentional-data-tp4180467p4180890.html
Sent from the R help mailing list archive at Nabble.com.

Bert Gunter

2011-Dec-10 19:48 UTC

head link

[R] PCA on high dimentional data

... and adding to what has already been said, PCA can be distorted by
non-ellipsoidal distributions or small numbers of unusual values.
Careful (chiefly graphical) examination of results is therefore
essential, and usually fairly easy to do. There are robust/resistant
versions of PCA in R, but they come with their own issues. As you have
already been told, you need to do some homework -- or get some local
advice.

Also, you need to post on some other list, e.g.
stats.stackexchange.com, as you have wandered outside the realm of R
issues.

-- Bert

On Sat, Dec 10, 2011 at 10:40 AM, Mark Difford <mark_difford at
yahoo.co.uk> wrote:> On Dec 10, 2011 at 5:56pm deb wrote:
>
>> My question is, is there any way I can map the PC1, PC2, PC3 to the
>> original conditions,
>> so that i can still have a reference to original condition labels after
>> PCA?
>
> deb,
>
> To add to what Stephen has said. Best to do read up on principal component
> analysis. Briefly, each PCA is composite variable, composed of different
> "amounts" of each and every one of your column variables, i.e.
cond1, ...,
> cond1000.
>
> So the short answer to your question is no. There is no way to do this
> mapping, except as loadings on each principal component (PC).
>
> Regards, Mark.
>
> -----
> Mark Difford (Ph.D.)
> Research Associate
> Botany Department
> Nelson Mandela Metropolitan University
> Port Elizabeth, South Africa
> --
> View this message in context:
http://r.789695.n4.nabble.com/PCA-on-high-dimentional-data-tp4180467p4180890.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

Apparently Analagous Threads

Search for more possibly parallel threads

R help - Dec 2011 - PCA on high dimentional data

[R] PCA on high dimentional data

[R] PCA on high dimentional data

[R] PCA on high dimentional data

[R] PCA on high dimentional data

Apparently Analagous Threads