thr3ads.net - R help - [R] Predicting with a principal component regression model: "non-conformable arguments" error [Apr 2011]

If this information is useful, please help other people find it:
Share via:

Alison Callahan

2011-Apr-18 22:18 UTC

[R] Predicting with a principal component regression model: "non-conformable arguments" error

Hello all,

I have generated a principal components regression model using the pcr()
function from the PLS package (R version 2.12.0).  I am getting a
"non-conformable arguments" error when I try to use the predict()
function
on new data, but only when I try to read in the new data from a separate
file.

More specifically, when my data looks like this

#########training data #1#################

var1          var2           var3             response            train
1                2              type1            33                     TRUE
2                23            type2            44                     TRUE
.....
   .......
18              11            type1            45                      FALSE


and I use the predict() function from the PLS package as in the example
from http://rss.acs.unt.edu/Rdoc/library/pls/html/predict.mvr.html, e.g.

###################################
mydata <- read.csv("mydata.csv", header=TRUE)

mydata <- data.frame(mydata)

pcrmodel <- pcr(response ~ var1+var2+var3, data = mydata[mydata$train,])

predict(pcrmodel, type = "response", newdata = mydata[!mydata$train,])

###################################

the code works, and the model predicts new values for the "response"
variable rows where train=FALSE.

However, as soon as I put the rows where train = FALSE into a separate file
and remove the "train" column so that my training data looks like
this:

#########training data #2 ################
var1          var2           var3             response
1                2              type1            33
2                23            type2            44
.....


and my new test data, saved in a separate file (say "newdata.csv")
looks
like this

########test data in separate file, newdata.csv ###############
var1          var2           var3             response
3                5              type1            23
4                7              type2            30
.....
18              11            type1            45

if I train a PCR model using the training data #2 and try to predict with
the resulting model and the data from "newdata.csv", e.g.,

##################################
trainingdata <- read.csv("mydata_without_train_column.csv",
header=TRUE)

trainingdata <- data.frame(trainingdata)

testingdata <- read.csv("newdata.csv", header=TRUE)

testingdata <- data.frame(testingdata)

pcrmodel2 <- pcr(response ~ var1+var2+var3, data = trainingdata)

predict(pcrmodel, type = "response", newdata = testingdata)
##############################

I get the following error:

"Error in newX %*% B : non-conformable arguments"

I don't understand why I get this error only when I put the non-training
data into a separate file from the training data and load it as a separate
object. Any help is appreciated,

Alison

	[[alternative HTML version deleted]]

Alison Callahan

2011-Apr-26 18:26 UTC

head link

[R] Predicting with a principal component regression model: "non-conformable arguments" error

Hello again all,

I am responding to my own earlier post about a "non-conformable
arguments"
error with the predict() function of the pls package (
http://cran.r-project.org/web/packages/pls/) in R 2.13.0 (running in Ubuntu
10.10).

I believe I have narrowed down the cause of the error. My new understanding
is that if the test data to be predicted using a regression model (where the
test data is passed in as 'newdata' to the predict() function) does not
contain all possible levels of factors in the training data then the
predict() function returns a "non-conformable arguments" error.

However, this seems like an odd behaviour to me. Surely not all new data for
which the dependent variable(s) are to be predicted will contain all levels
of a factor present in the training data. Can someone shed some light on why
the predict() function of the pls package has this behaviour? And how to
avoid it if possible in a way that doesn't involve users having to insert
dummy values in new data?

Thanks,

Alison

On Mon, Apr 18, 2011 at 6:18 PM, Alison Callahan
<alison.callahan@gmail.com>wrote:
> Hello all,
>
> I have generated a principal components regression model using the pcr()
> function from the PLS package (R version 2.13.0).  I am getting a
> "non-conformable arguments" error when I try to use the predict()
function
> on new data, but only when I try to read in the new data from a separate
> file.
>
> More specifically, when my data looks like this
>
> #########training data #1#################
>
> var1          var2           var3             response            train
> 1                2              type1            33
> TRUE
> 2                23            type2            44                     TRUE
> .....
>    .......
> 18              11            type1            45
>  FALSE
>
>
> and I use the predict() function from the PLS package as in the example
> from http://rss.acs.unt.edu/Rdoc/library/pls/html/predict.mvr.html, e.g.
>
> ###################################
> mydata <- read.csv("mydata.csv", header=TRUE)
>
> mydata <- data.frame(mydata)
>
> pcrmodel <- pcr(response ~ var1+var2+var3, data = mydata[mydata$train,])
>
> predict(pcrmodel, type = "response", newdata =
mydata[!mydata$train,])
>
> ###################################
>
> the code works, and the model predicts new values for the
"response"
> variable rows where train=FALSE.
>
> However, as soon as I put the rows where train = FALSE into a separate file
> and remove the "train" column so that my training data looks like
this:
>
> #########training data #2 ################
> var1          var2           var3             response
> 1                2              type1            33
> 2                23            type2            44
> .....
>
>
> and my new test data, saved in a separate file (say
"newdata.csv") looks
> like this
>
> ########test data in separate file, newdata.csv ###############
> var1          var2           var3             response
> 3                5              type1            23
> 4                7              type2            30
> .....
> 18              11            type1            45
>
> if I train a PCR model using the training data #2 and try to predict with
> the resulting model and the data from "newdata.csv", e.g.,
>
> ##################################
> trainingdata <- read.csv("mydata_without_train_column.csv",
header=TRUE)
>
> trainingdata <- data.frame(trainingdata)
>
> testingdata <- read.csv("newdata.csv", header=TRUE)
>
> testingdata <- data.frame(testingdata)
>
> pcrmodel2 <- pcr(response ~ var1+var2+var3, data = trainingdata)
>
> predict(pcrmodel, type = "response", newdata = testingdata)
> ##############################
>
> I get the following error:
>
> "Error in newX %*% B : non-conformable arguments"
>
> I don't understand why I get this error only when I put the
non-training
> data into a separate file from the training data and load it as a separate
> object. Any help is appreciated,
>
> Alison
>
	[[alternative HTML version deleted]]

Alison Callahan

2011-Apr-27 14:16 UTC

head link

[R] Predicting with a principal component regression model: "non-conformable arguments" error

Hi Dennis,

My replies are in-line.

On Tue, Apr 26, 2011 at 9:15 PM, Dennis Murphy <djmuser@gmail.com> wrote:
> Hi:
>
> My view, which may well be narrow, is that techniques like PLS and PCR
> are useful fit procedures, but I would be very leery about using them
> as prediction machines. With new data, why should a similar set of
> principal components emerge? Why should the ordering be (close to) the
> same? Why should features present in the training data necessarily be
> present in test data? And if the PCs vary considerably from one set of
> data to another, what's the point of prediction, since the covariate
> set is variable from one iteration to the next? Thinking a little more
> mathematically, why should I believe that the same set of basis
> functions (covariates + PCs) would reasonably apply to future data?
> One problem, as I see it, is that the principal components, when used
> as basis functions, are functions of the training data; in that
> context, why is it believable that they would well predict future
> data? [If this is Greek to you (or 'Kling-on', as one of my friends
> says), the basis functions in regression are the columns of the model
> matrix X, which map to the terms in the 'linear predictor'.] One of
> the potential problems is that the effective dimension of the reduced
> PC space may well change from one data set to the next. If all PCs are
> retained, then there is a serious danger of overfitting, which is a
> serious problem in prediction.
>
> If you're going to contemplate using such models for prediction, I
> would seriously consider looking into model validation procedures;
> they should provide some clue about how well a fitted model predicts
> to new cases. One of the best treatments of the subject I know is
> Frank Harrell's Regression Modeling Strategies book (which I believe
> will have a new edition out within the next couple of months). There
> is a current thread about this topic re logistic regression validation
> where the OP has done a nice job of working through the process -
> Prof. Harrell has chimed in a few times with some nice comments and
> observations. Most of the code to do this kind of thing in R resides
> in the rms package; see ?validate and its related functions. I don't
> know if it can be applied to PLS/PCR models (rather doubtful) but the
> methodology is what is important; e.g., the estimation of optimism in
> various figures of merit (e.g., R^2, MSE) when applied over a number
> of test sets, which provides an indication of how much bias is present
> in the fitted model due to potential overfitting. The process relies
> heavily on bootstrapping, so is in some sense vulnerable to the issues
> that arise with the bootstrap (e.g., population undercoverage), but in
> very large training sets this becomes less of a problem. If you can
> validate a PCR model and provide evidence to back it up, then most
> people (present company included) would have less ammunition to attack
> your prediction model.
>
> Thank you for these suggestions. The PLS package I am using does includemethods for cross validation to evaluate the quality of PCR/PLSR models, as
well as for selecting the optimal number of components to use for predicting
using a given model to avoid over fitting. I will also have a look at the
RMS package.
>
> On Tue, Apr 26, 2011 at 11:26 AM, Alison Callahan
> <alison.callahan@gmail.com> wrote:
> > Hello again all,
> >
> > I am responding to my own earlier post about a "non-conformable
> arguments"
> > error with the predict() function of the pls package (
> > http://cran.r-project.org/web/packages/pls/) in R 2.13.0 (running in
> Ubuntu
> > 10.10).
> >
> > I believe I have narrowed down the cause of the error. My new
> understanding
> > is that if the test data to be predicted using a regression model
(where
> the
> > test data is passed in as 'newdata' to the predict() function)
does not
> > contain all possible levels of factors in the training data then the
> > predict() function returns a "non-conformable arguments"
error.
> >
> > However, this seems like an odd behaviour to me. Surely not all new
data
> for
> > which the dependent variable(s) are to be predicted will contain all
> levels
> > of a factor present in the training data. Can someone shed some light
on
> why
> > the predict() function of the pls package has this behaviour? And how
to
> > avoid it if possible in a way that doesn't involve users having to
insert
> > dummy values in new data?
>
> I don't find this odd at all; rather, I find it comforting. From an R
> programming perspective, the factors in your newdata should have
> exactly the same defined levels as those in the training data. You
> could do this with something like
>
> newdata$somefactor <- factor(newdata$somefactor, levels >
levels(trainingdata$somefactor))
>
> What happens if, in future data, one or more new levels of a factor
> arise that were not in the training data used to build the prediction
> model?
>
>I absolutely agree with you. New levels for factors in future data that
didn't exist in training data used would of course be a problem for
predicting. However, in my case, I am trying to use predict() on new data
that has a *subset* of the factors present in the training data, and I am
getting a "non-conformable arguments" error. For example, my training
data
has levels A,B,C,D and E for a given factor, while my test data contains
only levels B,C and D.

Being somewhat new to R, I confused the values of the factor in the new data
with the possible levels of that factor. When I specified that the levels of
the factor in my test data were to be the same as in the training data, I
did not get the "non-conformable arguments" error.

Thanks!

Alison

Dennis> >
> > Thanks,
> >
> > Alison
> >
> > On Mon, Apr 18, 2011 at 6:18 PM, Alison Callahan
> > <alison.callahan@gmail.com>wrote:
> >
> >> Hello all,
> >>
> >> I have generated a principal components regression model using the
pcr()
> >> function from the PLS package (R version 2.13.0).  I am getting a
> >> "non-conformable arguments" error when I try to use the
predict()
> function
> >> on new data, but only when I try to read in the new data from a
separate
> >> file.
> >>
> >> More specifically, when my data looks like this
> >>
> >> #########training data #1#################
> >>
> >> var1          var2           var3             response           
train
> >> 1                2              type1            33
> >> TRUE
> >> 2                23            type2            44
> TRUE
> >> .....
> >>    .......
> >> 18              11            type1            45
> >>  FALSE
> >>
> >>
> >> and I use the predict() function from the PLS package as in the
example
> >> from
http://rss.acs.unt.edu/Rdoc/library/pls/html/predict.mvr.html,
> e.g.
> >>
> >> ###################################
> >> mydata <- read.csv("mydata.csv", header=TRUE)
> >>
> >> mydata <- data.frame(mydata)
> >>
> >> pcrmodel <- pcr(response ~ var1+var2+var3, data =
mydata[mydata$train,])
> >>
> >> predict(pcrmodel, type = "response", newdata =
mydata[!mydata$train,])
> >>
> >> ###################################
> >>
> >> the code works, and the model predicts new values for the
"response"
> >> variable rows where train=FALSE.
> >>
> >> However, as soon as I put the rows where train = FALSE into a
separate
> file
> >> and remove the "train" column so that my training data
looks like this:
> >>
> >> #########training data #2 ################
> >> var1          var2           var3             response
> >> 1                2              type1            33
> >> 2                23            type2            44
> >> .....
> >>
> >>
> >> and my new test data, saved in a separate file (say
"newdata.csv") looks
> >> like this
> >>
> >> ########test data in separate file, newdata.csv ###############
> >> var1          var2           var3             response
> >> 3                5              type1            23
> >> 4                7              type2            30
> >> .....
> >> 18              11            type1            45
> >>
> >> if I train a PCR model using the training data #2 and try to
predict
> with
> >> the resulting model and the data from "newdata.csv",
e.g.,
> >>
> >> ##################################
> >> trainingdata <-
read.csv("mydata_without_train_column.csv", header=TRUE)
> >>
> >> trainingdata <- data.frame(trainingdata)
> >>
> >> testingdata <- read.csv("newdata.csv", header=TRUE)
> >>
> >> testingdata <- data.frame(testingdata)
> >>
> >> pcrmodel2 <- pcr(response ~ var1+var2+var3, data =
trainingdata)
> >>
> >> predict(pcrmodel, type = "response", newdata =
testingdata)
> >> ##############################
> >>
> >> I get the following error:
> >>
> >> "Error in newX %*% B : non-conformable arguments"
> >>
> >> I don't understand why I get this error only when I put the
non-training
> >> data into a separate file from the training data and load it as a
> separate
> >> object. Any help is appreciated,
> >>
> >> Alison
> >>
> >
> >        [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
	[[alternative HTML version deleted]]

Possibly Parallel Threads

Search for more reasonably related threads

R help - Apr 2011 - Predicting with a principal component regression model: "non-conformable arguments" error

[R] Predicting with a principal component regression model: "non-conformable arguments" error

[R] Predicting with a principal component regression model: "non-conformable arguments" error

[R] Predicting with a principal component regression model: "non-conformable arguments" error

Possibly Parallel Threads