Hi Neha,
The suggestion I made was to try stringsAsFactors=TRUE, although I
will be surprised if it solves your problem.
CSV means "Comma Separated Variables". The following examples are
valid CSV formats:
Date,Temperature,Humidity
13/04/2022,18,87
Country,PrimeMinister,Party
Australia,Morrison,Liberal
You could read in the second example as character OR factor type,
depending upon the setting of stringsAsFactors
Jim
On Wed, Apr 13, 2022 at 7:05 PM Neha gupta <neha.bologna90 at gmail.com>
wrote:>
> Thank you Jim
>
> So what solution you do suggest? The features are text so it doesn't
look like a csv format.
>
> Best regards
>
> On Wednesday, April 13, 2022, Jim Lemon <drjimlemon at gmail.com>
wrote:
>>
>> Hi Neha,
>> The error message is about not having _factors_ with two or more
>> levels. Apart from using stringsAsFactors=FALSE (meaning that you
>> probably won't get any factors in "d"), your sample data
doesn't look
>> like CSV format. Perhaps the lines have been truncated. You may get
>> something with stringsAsFactors=TRUE, but I don't know whether it
will
>> be sensibler.
>>
>> Jim
>>
>> On Wed, Apr 13, 2022 at 8:12 AM Neha gupta <neha.bologna90 at
gmail.com> wrote:
>> >
>> > Hello everyone, I have text data with output variable have three
subgroups.
>> > I am using the following code but getting the error message (see
error
>> > after the code).
>> >
>> > d=read.csv("SONAR_RULES.csv", stringsAsFactors = FALSE)
>> > d$REMEDIATION_FUNCTION=NULL
>> > d$DEF_REMEDIATION_GAP_MULT=NULL
>> > d$REMEDIATION_BASE_EFFORT=NULL
>> >
>> > index <- createDataPartition(d$TYPE, p = .70,list = FALSE)
>> > tr <- d[index, ]
>> > ts <- d[-index, ]
>> >
>> > ctrl <- trainControl(method = "cv",number=3, index =
index, classProbs >> > TRUE, summaryFunction = multiClassSummary)
>> >
>> > ran <- train(TYPE ~ ., data = tr,
>> > method = "rpart",
>> > ## Will create 48 parameter combinations
>> > tuneLength = 3,
>> > na.action= na.pass,
>> > metric = "Accuracy",
>> > preProc = c("center",
"scale", "nzv"),
>> > trControl = ctrl)
>> > getTrainPerf(ran)
>> >
>> > *It gives me error:*
>> >
>> >
>> > *Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 +
isOF[nn]]) :
>> > contrasts can be applied only to factors with 2 or more levels*
>> >
>> >
>> > *My data is as follow*
>> >
>> > Rows: 1,819
>> > Columns: 14
>> > $ PLUGIN_RULE_KEY <chr>
"InsufficientBranchCoverage",
>> > "InsufficientLin~
>> > $ PLUGIN_CONFIG_KEY <chr> "",
"", "", "", "", "",
"", "", "", "",
>> > "S1120~
>> > $ PLUGIN_NAME <chr> "common-java",
"common-java",
>> > "common-java", "~
>> > $ DESCRIPTION <chr> "An issue is
created on a file as soon
>> > as the ~
>> > $ SEVERITY <chr> "MAJOR",
"MAJOR", "MAJOR", "MAJOR",
>> > "MAJOR", "~
>> > $ NAME <chr> "Branches should
have sufficient
>> > coverage by t~
>> > $ DEF_REMEDIATION_FUNCTION <chr> "LINEAR",
"LINEAR", "LINEAR",
>> > "LINEAR_OFFSET",~
>> > $ REMEDIATION_GAP_MULT <lgl> NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA,
>> > NA, NA~
>> > $ DEF_REMEDIATION_BASE_EFFORT <chr> "",
"", "", "10min", "", "",
"5min",
>> > "5min", "~
>> > $ GAP_DESCRIPTION <chr> "number of
uncovered conditions",
>> > "number of l~
>> > $ SYSTEM_TAGS <chr>
"bad-practice", "bad-practice",
>> > "convention", ~
>> > $ IS_TEMPLATE <int> 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0,
>> > 0, 0, 0~
>> > $ DESCRIPTION_FORMAT <chr> "HTML",
"HTML", "HTML", "HTML", "HTML",
>> > "HTML"~
>> > $ TYPE <chr> "CODE_SMELL",
"CODE_SMELL",
>> > "CODE_SMELL", "COD~
>> >
>> > [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.