rm(list=ls())
yx.df<-read.csv("c:/MK-2-72.csv",sep=',',header=T,dec='.')
dim(yx.df)
#get X matrix
y<-yx.df[,1]
x<-yx.df[,2:643]
#conver to matrix
mat<-as.matrix(x)
#get row number
rownum<-nrow(mat)
#remove the constant parameters
mat1<-mat[,apply(mat,2,function(.col)!(all(.col[1]==.col[2:rownum])))]
dim(yx.df)
dim(mat1)
#remove columns with numbers of zero >0.95
mat2<-mat1[,apply(mat1,2,function(.col)!(sum(.col==0)/rownum>0.95))]
dim(yx.df)
dim(mat2)
#remove colunms that sd<0.5
mat3<-mat2[,apply(mat2,2,function(.col)!all(sd(.col)<0.5))]
dim(yx.df)
dim(mat3)
#PCA analysis
mat3.pr<-prcomp(mat3,cor=T)
summary(mat3.pr,loading=T)
pre.cmp<-predict(mat3.pr)
cmp<-pre.cmp[,1:3]
cmp
DF<-cbind(Y,cmp)
DF<-as.data.frame(DF)
names(DF)<-c('y','p1','p2','p3')
DF
summary(lm(y~p1+p2+p3,data=DF))
mat3.pr<-prcomp(DF,cor=T)
summary(mat3.pr)
pre<-predict(mat3.pr)
pre1<-pre[,1:3]
pre1
colnames(pre1)<-c("x1","x2","x3")
pre1
pc<-cbind(y,pre1)
pc<-as.data.frame(pc)
lm.pc<-lm(y~x1+x2+x3,data=pc)
summary(lm.pc)
above, my code about pca, but after finishing it, the first three pcs are
some large, why? and the fit value
r2 are bad. belowe is my value on the firest 3 pcs.> pre1
PC1 PC2 PC3
[1,] -15181.5190 1944.392700 -1074.326182
[2,] -32152.4533 1007.113729 3201.361408
[3,] -15836.5362 2117.988273 -555.799383
[4,] -1618.5561 1481.020337 255.530132
[5,] -5407.5030 1975.779398 -84.646283
[6,] -9662.1949 2611.220928 -417.435782
[7,] -30488.2102 577.385588 1853.420297
[8,] -2135.2563 -4506.112873 1382.413284
[9,] -1584.2796 -4645.142062 929.146895
[10,] -668.7664 -4876.250486 177.691446
[11,] -2188.5914 -4495.203080 1432.428127
[12,] -19633.9581 2159.000138 -1598.710872
[13,] -26849.1088 -515.574085 -2683.552623
[14,] -9492.9503 -4868.648205 1236.986097
[15,] -13857.6517 -4810.228193 1296.342199
[16,] -11596.5097 -8181.631403 462.913210
[17,] -25948.6564 -746.442386 -3415.426682
[18,] 15386.4477 709.974524 555.160973
[19,] 21642.7516 1163.456075 -609.437740
[20,] 22236.7094 675.562564 -136.992578
[21,] 14354.9927 611.996274 -4.867054
[22,] 12569.9493 1111.842240 585.540985
[23,] 20739.0219 3078.679745 1662.902248
[24,] 9472.0249 648.769910 381.487034
[25,] 17299.5307 1424.712428 1522.311676
[26,] 13231.2735 587.761915 170.448061
[27,] 10843.5590 705.485396 -79.931518
[28,] 9402.8803 -1978.216853 -1534.244078
[29,] 13094.9525 212.042937 -363.941664
[30,] 9337.3522 537.885230 189.558999
[31,] 7747.1347 -141.004825 -1664.082447
[32,] 4640.1161 -1489.652284 -3584.574135
[33,] 13241.5054 175.630689 -486.250927
[34,] 3867.2204 814.830143 1584.358007
[35,] 8614.5030 708.274447 814.295587
[36,] -18815.6774 -480.311541 1248.369916
[37,] -1860.0810 1195.557861 269.322703
[38,] 7172.0057 4.216905 -1191.448702
[39,] -7233.2271 -2361.951658 -235.293358
[40,] 1841.3548 1187.225488 632.116420
[41,] 12465.2336 367.822405 160.751014
[42,] -39021.7259 1972.333778 3167.504098
[43,] 13098.7736 -424.152058 -567.846037
[44,] 9793.7729 -559.084900 -210.696126
[45,] 13111.1861 22.772626 -318.242722
[46,] 13169.0604 7.808885 -363.995563
[47,] 3306.6293 -694.908211 -642.996604
[48,] 10779.8582 -989.175596 -1619.861931
[49,] 10872.6913 -747.979343 -1375.317959
[50,] -3057.5633 1838.449143 1454.886518
[51,] -6854.9316 2338.753165 1113.510561
[52,] -15077.1823 1917.776905 -1158.158633
[53,] -45862.8305 1173.157521 -1707.293955
[54,] -14294.1553 1716.708462 -1794.064434
[55,] 24645.0508 2519.904889 1424.233563
[56,] 23303.5998 2250.088386 839.587354
[57,] 18865.5231 897.566446 36.240598
[58,] 227.2659 -6582.661199 -712.892569
[59,] 15336.8371 722.953549 593.903314
[60,] 13030.8715 228.509670 -312.933654
[61,] 5826.0388 331.077814 -53.417878
[62,] 13150.4446 -437.612023 -608.342969
[63,] 11728.3897 -83.151510 569.007995
[64,] 11021.5720 -869.425283 -1216.724017
[65,] 9625.3142 137.388994 138.735249
[66,] -15905.2704 3735.547166 421.846379
[67,] -15539.7628 3331.399648 104.886572
[68,] -2294.9924 1648.164750 822.075221
[69,] -10120.0153 1558.766306 -333.378256
[70,] -24241.4554 -533.700229 1516.603088
[71,] -1036.6022 -4782.136067 475.195011
[72,] -24575.2244 2655.599986 -1965.946921
the fit result below:
Call:
lm(formula = y ~ x1 + x2 + x3, data = pc)
Residuals:
Min 1Q Median 3Q Max
-1.29638 -0.47622 0.01059 0.49268 1.69335
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.613e+00 8.143e-02 68.932 < 2e-16 ***
x1 -3.089e-05 5.150e-06 -5.998 8.58e-08 ***
x2 -4.095e-05 3.448e-05 -1.188 0.239
x3 -8.106e-05 6.412e-05 -1.264 0.210
---
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
Residual standard error: 0.691 on 68 degrees of freedom
Multiple R-squared: 0.3644, Adjusted R-squared: 0.3364
F-statistic: 12.99 on 3 and 68 DF, p-value: 8.368e-07
x2,x3 is not significance. by pricipal, after PCA, the pcs should
significance, but my data is not, why?
--
View this message in context:
http://old.nabble.com/after-PCA%2C-the-pc-values-are-so-large%2C-wrong--tp26240926p26240926.html
Sent from the R help mailing list archive at Nabble.com.