I was under the impression that in PLS analysis, R2 was calculated by 1-
(Residual sum of squares) / (Sum of squares). Is this still what you are
referring to? I am aware of the linear R2 which is how well two variables
are correlated but the prior equation seems different to me. Could you
explain if this is the same concept?
Charles
On Sun, Mar 3, 2013 at 12:46 PM, Max Kuhn <mxkuhn@gmail.com> wrote:
> > Is there some literature that you make that statement?
>
> No, but there isn't literature on changing a lightbulb with a duck
either.
>
> > Are these papers incorrect in using these statistics?
>
> Definitely, if they convert 3+ categories to integers (but there are
> specialized R^2 metrics for binary classification models). Otherwise, they
> are just using an ill-suited "score".
>
> How would you explain such an R^2 value to someone? R^2 is a function of
> correlation between the two random variables. For two classes, one of them
> is binary. What does it mean?
>
> Historically, models rooted in computer science (eg neural networks) used
> RMSE or SSE to fit models with binary outcomes and that *can* work work
> well.
>
> However, I don't think that communicating R^2 is effective. Other
metrics
> (e.g. accuracy, Kappa, area under the ROC curve, etc) are designed to
> measure the ability of a model to classify and work well. With 3+
> categories, I tend to use Kappa.
>
> Max
>
>
>
>
> On Sun, Mar 3, 2013 at 10:53 AM, Charles Determan Jr
<deter088@umn.edu>wrote:
>
>> Thank you for your response Max. Is there some literature that you
make
>> that statement? I am confused as I have seen many publications that
>> contain R^2 and Q^2 following PLSDA analysis. The analysis usually is
to
>> discriminate groups (ie. classification). Are these papers incorrect
in
>> using these statistics?
>>
>> Regards,
>> Charles
>>
>>
>> On Sat, Mar 2, 2013 at 10:39 PM, Max Kuhn <mxkuhn@gmail.com>
wrote:
>>
>>> Charles,
>>>
>>> You should not be treating the classes as numeric (is virginica
really
>>> three times setosa?). Q^2 and/or R^2 are not appropriate for
classification.
>>>
>>> Max
>>>
>>>
>>> On Sat, Mar 2, 2013 at 5:21 PM, Charles Determan Jr
<deter088@umn.edu>wrote:
>>>
>>>> I have discovered on of my errors. The timematrix was
unnecessary and
>>>> an
>>>> unfortunate habit I brought from another package. The
following
>>>> provides
>>>> the same R2 values as it should, however, I still don't
know how to
>>>> retrieve Q2 values. Any insight would again be appreciated:
>>>>
>>>> library(caret)
>>>> library(pls)
>>>>
>>>> data(iris)
>>>>
>>>> #needed to convert to numeric in order to do regression
>>>> #I don't fully understand this but if I left as a factor I
would get an
>>>> error following the summary function
>>>> iris$Species=as.numeric(iris$Species)
>>>> inTrain1=createDataPartition(y=iris$Species,
>>>> p=.75,
>>>> list=FALSE)
>>>>
>>>> training1=iris[inTrain1,]
>>>> testing1=iris[-inTrain1,]
>>>>
>>>> ctrl1=trainControl(method="cv",
>>>> number=10)
>>>>
>>>> plsFit2=train(Species~.,
>>>> data=training1,
>>>> method="pls",
>>>> trControl=ctrl1,
>>>> metric="Rsquared",
>>>> preProc=c("scale"))
>>>>
>>>> data(iris)
>>>> training1=iris[inTrain1,]
>>>> datvars=training1[,1:4]
>>>> dat.sc=scale(datvars)
>>>>
>>>> pls.dat=plsr(as.numeric(training1$Species)~dat.sc,
>>>> ncomp=3, method="oscorespls", data=training1)
>>>>
>>>> x=crossval(pls.dat, segments=10)
>>>>
>>>> summary(x)
>>>> summary(plsFit2)
>>>>
>>>> Regards,
>>>> Charles
>>>>
>>>> On Sat, Mar 2, 2013 at 3:55 PM, Charles Determan Jr
<deter088@umn.edu
>>>> >wrote:
>>>>
>>>> > Greetings,
>>>> >
>>>> > I have been exploring the use of the caret package to
conduct some
>>>> plsda
>>>> > modeling. Previously, I have come across methods that
result in a R2
>>>> and
>>>> > Q2 for the model. Using the 'iris' data set, I
wanted to see if I
>>>> could
>>>> > accomplish this with the caret package. I use the
following code:
>>>> >
>>>> > library(caret)
>>>> > data(iris)
>>>> >
>>>> > #needed to convert to numeric in order to do regression
>>>> > #I don't fully understand this but if I left as a
factor I would get
>>>> an
>>>> > error following the summary function
>>>> > iris$Species=as.numeric(iris$Species)
>>>> > inTrain1=createDataPartition(y=iris$Species,
>>>> > p=.75,
>>>> > list=FALSE)
>>>> >
>>>> > training1=iris[inTrain1,]
>>>> > testing1=iris[-inTrain1,]
>>>> >
>>>> > ctrl1=trainControl(method="cv",
>>>> > number=10)
>>>> >
>>>> > plsFit2=train(Species~.,
>>>> > data=training1,
>>>> > method="pls",
>>>> > trControl=ctrl1,
>>>> > metric="Rsquared",
>>>> > preProc=c("scale"))
>>>> >
>>>> > data(iris)
>>>> > training1=iris[inTrain1,]
>>>> > datvars=training1[,1:4]
>>>> > dat.sc=scale(datvars)
>>>> >
>>>> > n=nrow(dat.sc)
>>>> > dat.indices=seq(1,n)
>>>> >
>>>> > timematrix=with(training1,
>>>> > classvec2classmat(Species[dat.indices]))
>>>> >
>>>> > pls.dat=plsr(timematrix ~ dat.sc,
>>>> > ncomp=3, method="oscorespls",
data=training1)
>>>> >
>>>> > x=crossval(pls.dat, segments=10)
>>>> >
>>>> > summary(x)
>>>> > summary(plsFit2)
>>>> >
>>>> > I see two different R2 values and I cannot figure out how
to get the
>>>> Q2
>>>> > value. Any insight as to what my errors may be would be
appreciated.
>>>> >
>>>> > Regards,
>>>> >
>>>> > --
>>>> > Charles
>>>> >
>>>>
>>>>
>>>>
>>>> --
>>>> Charles Determan
>>>> Integrated Biosciences PhD Student
>>>> University of Minnesota
>>>>
>>>> [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> R-help@r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible
code.
>>>>
>>>
>>>
>>>
>>> --
>>>
>>> Max
>>>
>>
>>
>>
>> --
>> Charles Determan
>> Integrated Biosciences PhD Student
>> University of Minnesota
>>
>
>
>
> --
>
> Max
>
--
Charles Determan
Integrated Biosciences PhD Student
University of Minnesota
[[alternative HTML version deleted]]