Dear R Help Team, My research group and I use R scripts for our multivariate data screening routines. During routine use, we encountered some inconsistencies within the predict() function of the R Stats Package. Through internal research, we were unable to find the reason for this and have decided to contact your help team with the following issue: The predict() function is used once to predict the class membership of a new sample (type = "class") on a trained linear SVM model for distinguishing two classes (using the caret package). It is then used to also examine the probability of class membership (type = "prob"). Both are then presented in an R shiny output. Within the routine, we noticed two samples (out of 100+) where the class prediction and probability prediction did not match. The prediction probabilities of one class (52%) did not match the class membership within the predict function. We use the same seed and the discrepancy is reproducible in this sample. The same problem did not occur in other trained models (lda, random forest, radial SVM...). Is there a weighing of classes within the prediction function or is the classification limit not at 50%/a majority vote? Or do you have another explanation for this discrepancy, please let us know. PS: If this is an issue based on the model training function of the caret package and therefore not your responsibility, please let us know. Thank you in advance for your support! Yours sincerely, Sabine Milbert [[alternative HTML version deleted]]
?s 11:12 de 22/09/2023, Milbert, Sabine (LGL) escreveu:> Dear R Help Team, > > My research group and I use R scripts for our multivariate data screening routines. During routine use, we encountered some inconsistencies within the predict() function of the R Stats Package. Through internal research, we were unable to find the reason for this and have decided to contact your help team with the following issue: > > The predict() function is used once to predict the class membership of a new sample (type = "class") on a trained linear SVM model for distinguishing two classes (using the caret package). It is then used to also examine the probability of class membership (type = "prob"). Both are then presented in an R shiny output. Within the routine, we noticed two samples (out of 100+) where the class prediction and probability prediction did not match. The prediction probabilities of one class (52%) did not match the class membership within the predict function. We use the same seed and the discrepancy is reproducible in this sample. The same problem did not occur in other trained models (lda, random forest, radial SVM...). > > Is there a weighing of classes within the prediction function or is the classification limit not at 50%/a majority vote? Or do you have another explanation for this discrepancy, please let us know. > > PS: If this is an issue based on the model training function of the caret package and therefore not your responsibility, please let us know. > > Thank you in advance for your support! > > Yours sincerely, > Sabine Milbert > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.Hello, I cannot tell what is going on but I would like to make a correction to your post. predict() is a generic function with methods for objects of several classes in many packages. In base package stats you will find methods for objects (fits) of class lm, glm and others, see ?predict. The method you are asking about is predict.train, defined in package caret, not in package stats. to see what predict method is being called, check class(your_fit) Hope this helps, Rui Barradas
? Fri, 22 Sep 2023 10:12:51 +0000 "Milbert, Sabine (LGL)" <Sabine.Milbert at lgl.bayern.de> ?????:> PS: If this is an issue based on the model training function of the > caret package and therefore not your responsibility, please let us > know.Indeed, as Rui Barradas said, predict() is a generic function. Calling it with your model as an argument resolves to a function in the caret package. It's hard to say without looking at your code and data (the R-help posting guide has some hints on how to prepare a reproducible example), but I think that the caret package fits your linear SVM models using kernlab::ksvm, and then predict() resolves to a combination of kernlab::predict (potentially with the argument type = "probabilities") and kernlab::lev. Try replicating your results using just the kernlab package. -- Best regards, Ivan
On Fri, 22 Sep 2023 10:12:51 +0000 "Milbert, Sabine (LGL)" <Sabine.Milbert at lgl.bayern.de> wrote:> Dear R Help Team,<SNIP> In addition to other misapprehensions that others have pointed out, you seem to have a fundamental misunderstanding of R-help (and perhaps of R). There is no such thing as the "R Help Team". This is a *mailing list*, to which some R users subscribe, and from time to time contribute. All advice given is the personal opinion of the contributor. It has no official status, and may or may not be sound advice, depending on the contributor. (Those contributors who have responded to your enquiry so far may be relied upon to give sound advice.) cheers, Rolf Turner -- Honorary Research Fellow Department of Statistics University of Auckland Stats. Dep't. (secretaries) phone: +64-9-373-7599 ext. 89622 Home phone: +64-9-480-4619