thr3ads.net - R help - [R] Regarding Principal Component Analysis result Interpretation [Sep 2017]

If this information is useful, please help other people find it:
Share via:

Shylashree U.R

2017-Sep-15 10:43 UTC

[R] Regarding Principal Component Analysis result Interpretation

Dear Sir/Madam,

I am trying to do PCA analysis with "iris" dataset and trying to
interpret
the result. Dataset contains 150 obs of 5 variables

    Sepal.Length  Sepal.Width  Petal.Length  Petal.Width  Species
     1             5.1                    3.5                 1.4
    0.2             setosa
     2             4.9                3.0                 1.4
0.2             setosa
     .....
     .....
    150         5.9                3.0                  5.1              18
             verginica

now I used 'prcomp' function on dataset and got result as
following:>print(pc)Standard deviations (1, .., p=4):
[1] 1.7083611 0.9560494 0.3830886 0.1439265

Rotation (n x k) = (4 x 4):
                    PC1         PC2        PC3        PC4
Sepal.Length  0.5210659 -0.37741762  0.7195664  0.2612863
Sepal.Width  -0.2693474 -0.92329566 -0.2443818 -0.1235096
Petal.Length  0.5804131 -0.02449161 -0.1421264 -0.8014492
Petal.Width   0.5648565 -0.06694199 -0.6342727  0.5235971

I'm planning to use PCA as feature selection process and remove variables
which are corelated in my project, I have interpreted the PCA result, but
not sure is my interpretation is correct or wrong.
If you can correct me it will be of great help.
If i notice the PCs result, I found both positive and negative data.

	[[alternative HTML version deleted]]

Suzen, Mehmet

2017-Sep-15 11:59 UTC

head link

[R] Regarding Principal Component Analysis result Interpretation

Usually, PCA is used for a large number of features. FactoMineR [1]
package provides a couple of examples, check for temperature example.
But you may want to consult to basic PCA material as well, I suggest a
book from Chris Bishop [2].


[1] https://cran.r-project.org/web/packages/FactoMineR/vignettes/clustering.pdf
[2] http://www.springer.com/de/book/9780387310732?referer=www.springer.de

Ismail SEZEN

2017-Sep-15 12:12 UTC

head link

[R] Regarding Principal Component Analysis result Interpretation

First, see the example at https://isezen.github.io/PCA/
> On 15 Sep 2017, at 13:43, Shylashree U.R <shylashivashree at
gmail.com> wrote:
> 
> Dear Sir/Madam,
> 
> I am trying to do PCA analysis with "iris" dataset and trying to
interpret
> the result. Dataset contains 150 obs of 5 variables
> 
>    Sepal.Length  Sepal.Width  Petal.Length  Petal.Width  Species
>     1             5.1                    3.5                 1.4
>    0.2             setosa
>     2             4.9                3.0                 1.4
> 0.2             setosa
>     .....
>     .....
>    150         5.9                3.0                  5.1              18
>             verginica
> 
> now I used 'prcomp' function on dataset and got result as
following:
>> print(pc)
> Standard deviations (1, .., p=4):
> [1] 1.7083611 0.9560494 0.3830886 0.1439265
> 
> Rotation (n x k) = (4 x 4):
>                    PC1         PC2        PC3        PC4
> Sepal.Length  0.5210659 -0.37741762  0.7195664  0.2612863
> Sepal.Width  -0.2693474 -0.92329566 -0.2443818 -0.1235096
> Petal.Length  0.5804131 -0.02449161 -0.1421264 -0.8014492
> Petal.Width   0.5648565 -0.06694199 -0.6342727  0.5235971
> 
> I'm planning to use PCA as feature selection process and remove
variables
> which are corelated in my project, I have interpreted the PCA result, but
> not sure is my interpretation is correct or wrong.

You want to ?remove variables which are correlated?. Correlated among
themselves? If so, why don?t you create a pearson correlation matrix (see ?cor)
and define a threshold and remove variables which are correlated according to
this threshold? Perhaps I did not understand you correctly, excuse me.

for iris dataset, each component will be as much as correlated with PC1 and
remaining part will be correlated PC2 and so on. Hence, you can identify which
variables are similar in terms of VARIANCE. You can understand it if you examine
the example that I gave above.

In PCA, you can also calculate the correlations between variables and PCs but
this shows you how PCs are affected by this variables. I don?t know how you plan
to accomplish feature selection process so I hope this helps you. Also note that
resources part at the end of example.

isezen

Bert Gunter

2017-Sep-15 23:40 UTC

head link

[R] Regarding Principal Component Analysis result Interpretation

This list is about R programming, not statistics, although they do often
intersect. Nevertheless, this discussion seems to be all about the latter,
not the former, so I think you would do better bringing it to a statistics
list like stats.stackexchange.com rather than here.

Cheers,
Bert



Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Fri, Sep 15, 2017 at 5:12 AM, Ismail SEZEN <sezenismail at gmail.com>
wrote:
> First, see the example at https://isezen.github.io/PCA/
>
> > On 15 Sep 2017, at 13:43, Shylashree U.R <shylashivashree at
gmail.com>
> wrote:
> >
> > Dear Sir/Madam,
> >
> > I am trying to do PCA analysis with "iris" dataset and
trying to
> interpret
> > the result. Dataset contains 150 obs of 5 variables
> >
> >    Sepal.Length  Sepal.Width  Petal.Length  Petal.Width  Species
> >     1             5.1                    3.5                 1.4
> >    0.2             setosa
> >     2             4.9                3.0                 1.4
> > 0.2             setosa
> >     .....
> >     .....
> >    150         5.9                3.0                  5.1
> 18
> >             verginica
> >
> > now I used 'prcomp' function on dataset and got result as
following:
> >> print(pc)
> > Standard deviations (1, .., p=4):
> > [1] 1.7083611 0.9560494 0.3830886 0.1439265
> >
> > Rotation (n x k) = (4 x 4):
> >                    PC1         PC2        PC3        PC4
> > Sepal.Length  0.5210659 -0.37741762  0.7195664  0.2612863
> > Sepal.Width  -0.2693474 -0.92329566 -0.2443818 -0.1235096
> > Petal.Length  0.5804131 -0.02449161 -0.1421264 -0.8014492
> > Petal.Width   0.5648565 -0.06694199 -0.6342727  0.5235971
> >
> > I'm planning to use PCA as feature selection process and remove
variables
> > which are corelated in my project, I have interpreted the PCA result,
but
> > not sure is my interpretation is correct or wrong.
>
>
> You want to ?remove variables which are correlated?. Correlated among
> themselves? If so, why don?t you create a pearson correlation matrix (see
> ?cor) and define a threshold and remove variables which are correlated
> according to this threshold? Perhaps I did not understand you correctly,
> excuse me.
>
> for iris dataset, each component will be as much as correlated with PC1
> and remaining part will be correlated PC2 and so on. Hence, you can
> identify which variables are similar in terms of VARIANCE. You can
> understand it if you examine the example that I gave above.
>
> In PCA, you can also calculate the correlations between variables and PCs
> but this shows you how PCs are affected by this variables. I don?t know how
> you plan to accomplish feature selection process so I hope this helps you.
> Also note that resources part at the end of example.
>
> isezen
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
	[[alternative HTML version deleted]]

Possibly Parallel Threads

Search for more seemingly similar threads

R help - Sep 2017 - Regarding Principal Component Analysis result Interpretation

[R] Regarding Principal Component Analysis result Interpretation

[R] Regarding Principal Component Analysis result Interpretation

[R] Regarding Principal Component Analysis result Interpretation

[R] Regarding Principal Component Analysis result Interpretation

Possibly Parallel Threads