Dear all:
I am comparing the PLS outputs of R and SAS for the following data set:
Y x1 x2 x3
3 6 2 2
3 1 5 5
4 7 4 1
5 6 5 6
2 4 3 2
8 5 0 9
where Y is the dependent variable and x1, x2, x3 are the independent
variables. I found several PLS algorithms in R (NIPALS,SIMPLS,KERNEL PLS). SAS
has SIMPLS and NIPALS.
The following are the NIPALS calculations of the regression coefficients for
the above data using 2 principal components:
Using R:
x1 0.4002324
x2 -0.2679829
x3 0.5684680
Using SAS:
x1 0.4671608452
x2 -.1537662492
x3 0.6090024992
Why is the discrepancy very large? I observed that SAS and Minitab have the
same output, but the R output is very different. Using the SIMPLS algorithm also
produced R and SAS outputs that are different.
Any clarification on this matter will be greatly appreciated.
Sincerely,
Kyle Rogers
---------------------------------
Building a website is a piece of cake.
[[alternative HTML version deleted]]
Dnia 2007-05-21 21:59, U?ytkownik Kyle Rogers napisa?:> Any clarification on this matter will be greatly appreciated.The result obtained with R is calculated with centering the x1-x3 matrix around the column means. The SAS result is calculating without any preprocessing of data. Unfortunately, the pls package has no easy option to turn data centering off. If you really want to turn it off, you must comment corresponding lines in for example simpls.fit: Xmeans <- colMeans(X) X <- X - rep(Xmeans, each = nobj) Ymeans <- colMeans(Y) Y <- Y - rep(Ymeans, each = nobj) I do not know what is your data, but you probably WANT to center them around the column means. The uncentered version of PLS, PCR and other multivariate regression can be considered only if all columns of X are in the same unit and there is no expected intercept term (for example spectral data). Any other approach requires at least centering. If you fit uncentered version, you should compare its RMSEP with centered and choose better variant. Regards, Lukasz