Hello everyone, I have text data with output variable have three subgroups. I am using the following code but getting the error message (see error after the code). d=read.csv("SONAR_RULES.csv", stringsAsFactors = FALSE) d$REMEDIATION_FUNCTION=NULL d$DEF_REMEDIATION_GAP_MULT=NULL d$REMEDIATION_BASE_EFFORT=NULL index <- createDataPartition(d$TYPE, p = .70,list = FALSE) tr <- d[index, ] ts <- d[-index, ] ctrl <- trainControl(method = "cv",number=3, index = index, classProbs TRUE, summaryFunction = multiClassSummary) ran <- train(TYPE ~ ., data = tr, method = "rpart", ## Will create 48 parameter combinations tuneLength = 3, na.action= na.pass, metric = "Accuracy", preProc = c("center", "scale", "nzv"), trControl = ctrl) getTrainPerf(ran) *It gives me error:* *Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : contrasts can be applied only to factors with 2 or more levels* *My data is as follow* Rows: 1,819 Columns: 14 $ PLUGIN_RULE_KEY <chr> "InsufficientBranchCoverage", "InsufficientLin~ $ PLUGIN_CONFIG_KEY <chr> "", "", "", "", "", "", "", "", "", "", "S1120~ $ PLUGIN_NAME <chr> "common-java", "common-java", "common-java", "~ $ DESCRIPTION <chr> "An issue is created on a file as soon as the ~ $ SEVERITY <chr> "MAJOR", "MAJOR", "MAJOR", "MAJOR", "MAJOR", "~ $ NAME <chr> "Branches should have sufficient coverage by t~ $ DEF_REMEDIATION_FUNCTION <chr> "LINEAR", "LINEAR", "LINEAR", "LINEAR_OFFSET",~ $ REMEDIATION_GAP_MULT <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA~ $ DEF_REMEDIATION_BASE_EFFORT <chr> "", "", "", "10min", "", "", "5min", "5min", "~ $ GAP_DESCRIPTION <chr> "number of uncovered conditions", "number of l~ $ SYSTEM_TAGS <chr> "bad-practice", "bad-practice", "convention", ~ $ IS_TEMPLATE <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~ $ DESCRIPTION_FORMAT <chr> "HTML", "HTML", "HTML", "HTML", "HTML", "HTML"~ $ TYPE <chr> "CODE_SMELL", "CODE_SMELL", "CODE_SMELL", "COD~ [[alternative HTML version deleted]]
Hi Neha, The error message is about not having _factors_ with two or more levels. Apart from using stringsAsFactors=FALSE (meaning that you probably won't get any factors in "d"), your sample data doesn't look like CSV format. Perhaps the lines have been truncated. You may get something with stringsAsFactors=TRUE, but I don't know whether it will be sensibler. Jim On Wed, Apr 13, 2022 at 8:12 AM Neha gupta <neha.bologna90 at gmail.com> wrote:> > Hello everyone, I have text data with output variable have three subgroups. > I am using the following code but getting the error message (see error > after the code). > > d=read.csv("SONAR_RULES.csv", stringsAsFactors = FALSE) > d$REMEDIATION_FUNCTION=NULL > d$DEF_REMEDIATION_GAP_MULT=NULL > d$REMEDIATION_BASE_EFFORT=NULL > > index <- createDataPartition(d$TYPE, p = .70,list = FALSE) > tr <- d[index, ] > ts <- d[-index, ] > > ctrl <- trainControl(method = "cv",number=3, index = index, classProbs > TRUE, summaryFunction = multiClassSummary) > > ran <- train(TYPE ~ ., data = tr, > method = "rpart", > ## Will create 48 parameter combinations > tuneLength = 3, > na.action= na.pass, > metric = "Accuracy", > preProc = c("center", "scale", "nzv"), > trControl = ctrl) > getTrainPerf(ran) > > *It gives me error:* > > > *Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : > contrasts can be applied only to factors with 2 or more levels* > > > *My data is as follow* > > Rows: 1,819 > Columns: 14 > $ PLUGIN_RULE_KEY <chr> "InsufficientBranchCoverage", > "InsufficientLin~ > $ PLUGIN_CONFIG_KEY <chr> "", "", "", "", "", "", "", "", "", "", > "S1120~ > $ PLUGIN_NAME <chr> "common-java", "common-java", > "common-java", "~ > $ DESCRIPTION <chr> "An issue is created on a file as soon > as the ~ > $ SEVERITY <chr> "MAJOR", "MAJOR", "MAJOR", "MAJOR", > "MAJOR", "~ > $ NAME <chr> "Branches should have sufficient > coverage by t~ > $ DEF_REMEDIATION_FUNCTION <chr> "LINEAR", "LINEAR", "LINEAR", > "LINEAR_OFFSET",~ > $ REMEDIATION_GAP_MULT <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, > NA, NA~ > $ DEF_REMEDIATION_BASE_EFFORT <chr> "", "", "", "10min", "", "", "5min", > "5min", "~ > $ GAP_DESCRIPTION <chr> "number of uncovered conditions", > "number of l~ > $ SYSTEM_TAGS <chr> "bad-practice", "bad-practice", > "convention", ~ > $ IS_TEMPLATE <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > 0, 0, 0~ > $ DESCRIPTION_FORMAT <chr> "HTML", "HTML", "HTML", "HTML", "HTML", > "HTML"~ > $ TYPE <chr> "CODE_SMELL", "CODE_SMELL", > "CODE_SMELL", "COD~ > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
This sounds like what I think is a bug in stats::model.matrix.default(): a numeric column with all identical entries is fine but a constant character or factor column is not.> d <- data.frame(y=1:5, sex=rep("Female",5)) > d$sexFactor <- factor(d$sex, levels=c("Male","Female")) > d$sexCode <- as.integer(d$sexFactor) > dy sex sexFactor sexCode 1 1 Female Female 2 2 2 Female Female 2 3 3 Female Female 2 4 4 Female Female 2 5 5 Female Female 2> lm(y~sex, data=d)Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : contrasts can be applied only to factors with 2 or more levels> lm(y~sexFactor, data=d)Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : contrasts can be applied only to factors with 2 or more levels> lm(y~sexCode, data=d)Call: lm(formula = y ~ sexCode, data = d) Coefficients: (Intercept) sexCode 3 NA Calling traceback() after the error would clarify this. -Bill On Tue, Apr 12, 2022 at 3:12 PM Neha gupta <neha.bologna90 at gmail.com> wrote:> Hello everyone, I have text data with output variable have three subgroups. > I am using the following code but getting the error message (see error > after the code). > > d=read.csv("SONAR_RULES.csv", stringsAsFactors = FALSE) > d$REMEDIATION_FUNCTION=NULL > d$DEF_REMEDIATION_GAP_MULT=NULL > d$REMEDIATION_BASE_EFFORT=NULL > > index <- createDataPartition(d$TYPE, p = .70,list = FALSE) > tr <- d[index, ] > ts <- d[-index, ] > > ctrl <- trainControl(method = "cv",number=3, index = index, classProbs > TRUE, summaryFunction = multiClassSummary) > > ran <- train(TYPE ~ ., data = tr, > method = "rpart", > ## Will create 48 parameter combinations > tuneLength = 3, > na.action= na.pass, > metric = "Accuracy", > preProc = c("center", "scale", "nzv"), > trControl = ctrl) > getTrainPerf(ran) > > *It gives me error:* > > > *Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : > contrasts can be applied only to factors with 2 or more levels* > > > *My data is as follow* > > Rows: 1,819 > Columns: 14 > $ PLUGIN_RULE_KEY <chr> "InsufficientBranchCoverage", > "InsufficientLin~ > $ PLUGIN_CONFIG_KEY <chr> "", "", "", "", "", "", "", "", "", "", > "S1120~ > $ PLUGIN_NAME <chr> "common-java", "common-java", > "common-java", "~ > $ DESCRIPTION <chr> "An issue is created on a file as soon > as the ~ > $ SEVERITY <chr> "MAJOR", "MAJOR", "MAJOR", "MAJOR", > "MAJOR", "~ > $ NAME <chr> "Branches should have sufficient > coverage by t~ > $ DEF_REMEDIATION_FUNCTION <chr> "LINEAR", "LINEAR", "LINEAR", > "LINEAR_OFFSET",~ > $ REMEDIATION_GAP_MULT <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, > NA, NA~ > $ DEF_REMEDIATION_BASE_EFFORT <chr> "", "", "", "10min", "", "", "5min", > "5min", "~ > $ GAP_DESCRIPTION <chr> "number of uncovered conditions", > "number of l~ > $ SYSTEM_TAGS <chr> "bad-practice", "bad-practice", > "convention", ~ > $ IS_TEMPLATE <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > 0, 0, 0~ > $ DESCRIPTION_FORMAT <chr> "HTML", "HTML", "HTML", "HTML", "HTML", > "HTML"~ > $ TYPE <chr> "CODE_SMELL", "CODE_SMELL", > "CODE_SMELL", "COD~ > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]