Hello, I am trying to understand how to utilize the "mvr" function in the pls Package of R. I am utilizing the R "pls Package" document dated 18 May 2005 as guidance. My data set consists of a 12 x 12 data frame created from reading in a table of values. I have read the data in via the command: volumes <- read.table("THA_vol.txt", header = TRUE) and then created a data.frame called "vol". My response variable is in the last column of the "vol" data frame and my dependent variables are in columns 1 through 11. To familiarize myself with this approach I have utilized the NIR data set (included in the pls Package). I get the following command to work with the NIR data set: NIR.pcr <- pcr(y ~ X,6,data=NIR,validation="CV") However, when I run the following script which effectively substitutes my data set (& modify variable names accordingly) into the above equation: y <- vol[,12] X <- vol[,1:11] ans.pcr <- pcr(y ~ X,6,data=vol,validation="CV") I get the following error: Error in model.frame(formula, rownames, variables, varnames, extras, extranames, : invalid variable type I have looked at the NIR data set in the pls Package and tried to see how it "structurally" differs from my data-set "structure" (other than in its size). Does anyone have any insight they might be willing to share? Thank you kindly. - Jim
Jim, I had some of the same difficulties. The NIR data frame consists of a column of y variables and a matrix of X variables (and until looking at this dataset, I had not realized that data frames could hold matrices). So, after consulting the R-help sages, I turned by data into an identical structure using something like this: dataSet <- data.frame(y = vol[, 12]) dataSet$X <- data.matrix(vol[, 1:11]) ans.pcr <- pcr(y ~ X, 6, data = dataSet, validation = "CV") If there's a more elegant way of doing this without using data frames of matrices, I'd be interested as well. HTH, Robert -----Original Message----- From: Jim BRINDLE [mailto:j_brindle at hotmail.com] Sent: Wednesday, June 01, 2005 5:03 PM To: r-help at stat.math.ethz.ch Subject: [R] "mvr" function Hello, I am trying to understand how to utilize the "mvr" function in the pls Package of R. I am utilizing the R "pls Package" document dated 18 May 2005 as guidance. My data set consists of a 12 x 12 data frame created from reading in a table of values. I have read the data in via the command: volumes <- read.table("THA_vol.txt", header = TRUE) and then created a data.frame called "vol". My response variable is in the last column of the "vol" data frame and my dependent variables are in columns 1 through 11. To familiarize myself with this approach I have utilized the NIR data set (included in the pls Package). I get the following command to work with the NIR data set: NIR.pcr <- pcr(y ~ X,6,data=NIR,validation="CV") However, when I run the following script which effectively substitutes my data set (& modify variable names accordingly) into the above equation: y <- vol[,12] X <- vol[,1:11] ans.pcr <- pcr(y ~ X,6,data=vol,validation="CV") I get the following error: Error in model.frame(formula, rownames, variables, varnames, extras, extranames, : invalid variable type I have looked at the NIR data set in the pls Package and tried to see how it "structurally" differs from my data-set "structure" (other than in its size). Does anyone have any insight they might be willing to share? Thank you kindly. - Jim ______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Jim BRINDLE writes:> volumes <- read.table("THA_vol.txt", header = TRUE) > > and then created a data.frame called "vol". My response variable is > in the last column of the "vol" data frame and my dependent variables > are in columns 1 through 11.[...]> y <- vol[,12] > X <- vol[,1:11] > ans.pcr <- pcr(y ~ X,6,data=vol,validation="CV")There are two problems here: 1) X is a data frame, not a matrix. This is what causes the error message. 2) You specify in the call that pcr should look in the data frame `vol' for variables called 'y' and 'X'. (Presumably) they don't exist there, but in the global environment (because of the assignments `y <- vol[,12]', etc). (This will not lead to an error, because pcr will find the variables anyway, but might lead to confusion or errors if you later modify those variables.) The first problem can be overcome by doing X <- as.matrix(vol[,1:11]) and the second one by ans.pcr <- pcr(y ~ X, 6, validation = "CV") However, there are (as always in R :) several ways of accomplishing the same thing. One solution is simply ans.pcr <- pcr(V12 ~ ., 6, data = vol, validation = "CV") (where V12 must be substituted with the name of the 12th variable of vol; see names(vol)). This formula tells pcr to use V12 as the response, and the remaining variable (in vol) as predictors. A more general solution is to say vol2 <- data.frame(y = vol[,12], X = I(as.matrix(vol[,1:11]))) ans.pcr <- pcr(y ~ X, 6, data = vol2, validation = "CV") The I() makes R store X as a matrix in vol2, instead of as 11 separate variables. This is handy for cases where you have several matrices. The manual page for `lm' and the R manual `An Introduction to R' (chapter 11) are good references for the formula handling in R. -- HTH, Bj??rn-Helge Mevik