@Rui Barradas <ruipbarradas at sapo.pt>
I tried the code according to your comments and it works. However, when I
try it for another dataset with a different number of input features, it
again shows the same error message. I tried it with different types of
datasets and the same error appeared.
Best regards
On Fri, Jul 1, 2022 at 9:18 PM Neha gupta <neha.bologna90 at gmail.com>
wrote:
> @Rui Barradas <ruipbarradas at sapo.pt>
>
> Thank you again for the useful explanation.
>
> Best regards
>
> On Fri, Jul 1, 2022 at 8:26 PM Rui Barradas <ruipbarradas at sapo.pt>
wrote:
>
>> Hello,
>>
>> The error doesn't arise in randomForest because rf has a function
tuneRF
>> that looks for the best mtry (best relative to OOB error estimate). And
>> it's this value that it uses.
>>
>> The question's code gives Ranger errors but it also gives R
warnings:
>>
>> Warning messages:
>> 1: model fit failed for Fold01: mtry=48, min.node.size=5,
>> splitrule=variance Error in ranger::ranger(dependent.variable.name
>> ".outcome", data = x, :
>> User interrupt or internal error.
>>
>>
>> As you can see, mtry=48 is the double of ncol(tr) when should *never*
be
>> greater than the number of variables in the data set. Why it is using
>> this value, I don't know. Function bug? Ask the package maintainer?
>>
>> And, by the way, package caret does or can do a grid search for optimal
>> parameter values. If that is giving errors and you are calling rf
>> directly why bother whith caret's error? Use the original function.
Here
>> is an example with tuneRF. Setting argument doBest to TRUE you'll
have
>> both the optimal value for mtry and the fitted random forest. 2 in 1.
>>
>>
>> library(randomForest)
>> # randomForest 4.7-1.1
>> # Type rfNews() to see new features/changes/bug fixes.
>>
>> c2 <- tuneRF(
>> x = tr[-ncol(tr)],
>> y = tr$act_effort,
>> mtryStart = ncol(tr)/2,
>> doBest = TRUE
>> )
>> # mtry = 12 OOB error = 139920.7
>> # Searching left ...
>> # mtry = 6 OOB error = 170909.3
>> # -0.2214729 0.05
>> # Searching right ...
>> # mtry = 23 OOB error = 128566.7
>> # 0.08114586 0.05
>>
>> c2
>> #
>> # Call:
>> # randomForest(x = x, y = y, mtry = res[which.min(res[, 2]), 1])
>> # Type of random forest: regression
>> # Number of trees: 500
>> # No. of variables tried at each split: 23
>> #
>> # Mean of squared residuals: 129734.8
>> # % Var explained: 39.98
>>
>>
>> Hope this helps,
>>
>> Rui Barradas
>>
>>
>>
>> ?s 17:18 de 01/07/2022, Neha gupta escreveu:
>> > Thank you so much for your help. I hope it will work.
>> >
>> > However, why the same error doesn't arise when I am using rf.
They both
>> > have the same parameters and it's default values.
>> >
>> > Best regards
>> >
>> > On Friday, July 1, 2022, Rui Barradas <ruipbarradas at sapo.pt
>> > <mailto:ruipbarradas at sapo.pt>> wrote:
>> >
>> > Hello,
>> >
>> > The error is in Ranger parameter mtry becoming greater than
the
>> > number of variables (columns).
>> > mtry can be set manually in caret::train argument tuneGrid.
But for
>> > random forests you must also set the split rule and the
minimum
>> node.
>> >
>> >
>> > library(caret)
>> > library(farff)
>> >
>> > boot <- trainControl(method = "cv", number = 10)
>> >
>> > # set the maximum mtry manually to ncol(tr)
>> > # this creates a sequence of mtry values
>> > mtry <- var_seq(ncol(tr), len = 3) # 3 is the default
value
>> > mtry
>> > # [1] 2 13 24
>> > #[1] 2 13 24
>> >
>> > splitrule <- c("variance",
"extratrees")
>> > min.node.size <- 1:10
>> > mtrygrid <- expand.grid(mtry, splitrule, min.node.size)
>> > names(mtrygrid) <- c("mtry",
"splitrule", "min.node.size")
>> >
>> > c1 <- train(act_effort ~ ., data = tr,
>> > method = "ranger",
>> > tuneLength = 5,
>> > metric = "MAE",
>> > preProc = c("center", "scale",
"nzv"),
>> > tuneGrid = mtrygrid,
>> > trControl = boot)
>> > c1
>> > # Random Forest
>> > #
>> > # 30 samples
>> > # 23 predictors
>> > #
>> > # Pre-processing: centered (48), scaled (48), remove (58)
>> > # Resampling: Cross-Validated (10 fold)
>> > # Summary of sample sizes: 28, 27, 27, 28, 27, 27, ...
>> > # Resampling results across tuning parameters:
>> > #
>> > # mtry splitrule min.node.size RMSE Rsquared MAE
>> > # 2 variance 1 256.6391 0.8103759
186.3609
>> > # 2 variance 2 249.7120 0.8628109
183.6696
>> > # 2 variance 3 258.8240 0.8284449
189.0712
>> > #
>> > # [...omit...]
>> > #
>> > # 13 extratrees 10 254.9569 0.8918014
191.2524
>> > # 24 variance 1 177.7188 0.9458652
112.2800
>> > # 24 variance 2 172.6826 0.9204287
108.5943
>> > # 24 variance 3 172.9954 0.9271006
109.2554
>> > # 24 variance 4 172.2467 0.9523067
110.0776
>> > # 24 variance 5 175.2485 0.9283317
112.8798
>> > # 24 variance 6 177.9285 0.9369881
115.8970
>> > # 24 variance 7 180.5959 0.9485035
117.5816
>> > # 24 variance 8 178.8037 0.9358033
117.8725
>> > # 24 variance 9 176.5849 0.9210959
117.0055
>> > # 24 variance 10 178.6439 0.9257969
119.8035
>> > # 24 extratrees 1 219.1368 0.8801770
141.0720
>> > # 24 extratrees 2 216.1900 0.8550002
140.9263
>> > # 24 extratrees 3 212.4138 0.8979379
141.4282
>> > # 24 extratrees 4 218.2631 0.9121471
146.2908
>> > # 24 extratrees 5 212.5679 0.9279598
144.2715
>> > # 24 extratrees 6 218.9856 0.9141754
152.2099
>> > # 24 extratrees 7 222.8540 0.9412682
152.4614
>> > # 24 extratrees 8 228.1156 0.9423414
161.8456
>> > # 24 extratrees 9 226.6182 0.9408306
160.5264
>> > # 24 extratrees 10 226.9280 0.9429413
165.6878
>> > #
>> > # MAE was used to select the optimal model using the smallest
>> value.
>> > # The final values used for the model were mtry = 24,
splitrule >> > variance
>> > # and min.node.size = 2.
>> > plot(c1)
>> >
>> >
>> >
>> > Hope this helps,
>> >
>> > Rui Barradas
>> >
>> >
>> > ?s 23:03 de 30/06/2022, Neha gupta escreveu:
>> >
>> > Ok, the data is pasted below
>> >
>> > But on the same data (everything the same) and with other
models
>> > like RF, SVM etc, it works fine.
>> >
>> > > dput(head(tr, 30))
>> > structure(list(recordnumber = c(0, 0.02, 0.04, 0.06, 0.07,
0.08,
>> > 0.09, 0.1, 0.11, 0.12, 0.13, 0.14, 0.16, 0.17, 0.18, 0.23,
0.24,
>> > 0.25, 0.28, 0.29, 0.3, 0.31, 0.32, 0.33, 0.35, 0.36, 0.37,
0.38,
>> > 0.4, 0.41), projectname = structure(c(1L, 1L, 1L, 1L, 2L,
3L,
>> > 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L,
4L, 4L,
>> > 4L, 4L, 4L, 4L, 4L, 4L, 5L, 6L), levels =
c("de", "erb", "gal",
>> > "X", "hst", "slp",
"spl", "Y"), class = "factor"), cat2 >> >
structure(c(3L,
>> > 3L, 3L, 3L, 3L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L,
9L, 9L,
>> > 9L, 11L, 5L, 4L, 6L, 8L, 3L, 9L, 9L, 9L, 9L, 6L, 7L),
levels >> > c("Avionics",
>> > "application_ground",
"avionicsmonitoring",
>> "batchdataprocessing",
>> > "communications", "datacapture",
"launchprocessing",
>> > "missionplanning",
>> > "monitor_control", "operatingsystem",
"realdataprocessing",
>> > "science",
>> > "simulation", "utility"), class =
"factor"), forg >> structure(c(2L,
>> > 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L,
>> > 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L),
levels >> c("f",
>> > "g"), class = "factor"), center =
structure(c(2L, 2L, 2L, 2L,
>> > 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L,
>> > 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 6L), levels =
c("1", "2",
>> > "3", "4", "5",
"6"), class = "factor"), year = c(0.5, 0.5, 0.5,
>> > 0.5, 0.6875, 0.5625, 0.5625, 0.8125, 0.5625, 0.875,
0.5625,
>> 0.75,
>> > 0.5625, 0.8125, 0.75, 0.9375, 0.9375, 0.9375, 0.6875,
0.6875,
>> > 0.6875, 0.6875, 0.875, 1, 0.9375, 0.9375, 0.9375, 0.9375,
>> 0.5625,
>> > 0.25), mode = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L,
>> > 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L,
>> > 3L, 3L, 3L, 3L, 3L), levels = c("embedded",
"organic",
>> > "semidetached"
>> > ), class = "factor"), rely = structure(c(4L, 4L,
4L, 4L, 4L,
>> > 4L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 3L, 3L,
3L, 3L,
>> > 3L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 4L), levels =
c("vl", "l", "n",
>> > "h", "vh", "xh"), class =
"factor"), data = structure(c(2L, 2L,
>> > 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L,
3L, 3L,
>> > 5L, 5L, 5L, 5L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 2L), levels
>> c("vl",
>> > "l", "n", "h",
"vh", "xh"), class = "factor"), cplx >> >
structure(c(4L,
>> > 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
3L, 4L,
>> > 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L),
levels >> > c("vl",
>> > "l", "n", "h",
"vh", "xh"), class = "factor"), time >> >
structure(c(3L,
>> > 3L, 3L, 3L, 3L, 6L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
4L, 3L,
>> > 3L, 5L, 5L, 5L, 5L, 3L, 3L, 3L, 3L, 3L, 3L, 5L, 3L),
levels >> > c("vl",
>> > "l", "n", "h",
"vh", "xh"), class = "factor"), stor >> >
structure(c(3L,
>> > 3L, 3L, 3L, 3L, 6L, 3L, 3L, 3L, 3L, 3L, 3L, 6L, 3L, 3L,
3L, 3L,
>> > 3L, 5L, 5L, 5L, 5L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 3L),
levels >> > c("vl",
>> > "l", "n", "h",
"vh", "xh"), class = "factor"), virt >> >
structure(c(2L,
>> > 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 4L, 2L, 2L, 2L, 2L,
3L, 3L,
>> > 3L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 2L, 2L),
levels >> > c("vl",
>> > "l", "n", "h",
"vh", "xh"), class = "factor"), turn >> >
structure(c(2L,
>> > 2L, 2L, 2L, 2L, 4L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
3L, 3L,
>> > 3L, 4L, 4L, 4L, 4L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 2L),
levels >> > c("vl",
>> > "l", "n", "h",
"vh", "xh"), class = "factor"), acap >> >
structure(c(3L,
>> > 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
3L, 3L,
>> > 3L, 5L, 5L, 5L, 5L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 3L),
levels >> > c("vl",
>> > "l", "n", "h",
"vh", "xh"), class = "factor"), aexp >> >
structure(c(3L,
>> > 3L, 3L, 3L, 3L, 4L, 5L, 5L, 5L, 5L, 4L, 5L, 5L, 4L, 5L,
4L, 4L,
>> > 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L),
levels >> > c("vl",
>> > "l", "n", "h",
"vh", "xh"), class = "factor"), pcap >> >
structure(c(3L,
>> > 3L, 3L, 3L, 3L, 4L, 5L, 4L, 5L, 3L, 4L, 4L, 5L, 4L, 4L,
4L, 4L,
>> > 4L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 3L, 4L, 4L),
levels >> > c("vl",
>> > "l", "n", "h",
"vh", "xh"), class = "factor"), vexp >> >
structure(c(3L,
>> > 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 3L, 3L, 3L, 3L,
3L, 3L,
>> > 3L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 3L),
levels >> > c("vl",
>> > "l", "n", "h",
"vh", "xh"), class = "factor"), lexp >> >
structure(c(4L,
>> > 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 2L, 1L, 4L, 4L, 4L, 4L,
3L, 3L,
>> > 3L, 4L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 4L, 3L, 4L, 3L),
levels >> > c("vl",
>> > "l", "n", "h",
"vh", "xh"), class = "factor"), modp >> >
structure(c(4L,
>> > 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L,
>> > 3L, 5L, 5L, 5L, 5L, 4L, 4L, 3L, 3L, 4L, 3L, 4L, 4L),
levels >> > c("vl",
>> > "l", "n", "h",
"vh", "xh"), class = "factor"), tool >> >
structure(c(3L,
>> > 3L, 3L, 3L, 3L, 4L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L,
>> > 3L, 5L, 5L, 5L, 5L, 3L, 3L, 3L, 3L, 4L, 3L, 3L, 1L),
levels >> > c("vl",
>> > "l", "n", "h",
"vh", "xh"), class = "factor"), sced >> >
structure(c(2L,
>> > 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L,
>> > 3L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 2L, 3L),
levels >> > c("vl",
>> > "l", "n", "h",
"vh", "xh"), class = "factor"), equivphyskloc
>> > c(0.025534,
>> > 0.006945, 0.008988, 0.002655, 0.067102, 0.006741,
0.019508,
>> > 0.005209,
>> > 0.101215, 0.010622, 0.101215, 0.019508, 0.152283,
0.031253,
>> > 0.014401,
>> > 0.014401, 0.037892, 0.009294, 0.015729, 0.012154,
0.032377,
>> > 0.035339,
>> > 0.004698, 0.009703, 0.00572, 0.012358, 0.091002, 0.007252,
>> 0.180778,
>> > 0.307527), act_effort = c(117.6, 31.2, 25.2, 10.8, 352.8,
72,
>> > 72, 24, 360, 36, 215, 48, 324, 60, 48, 90, 210, 48, 82,
62, 170,
>> > 192, 18, 50, 42, 60, 444, 42, 1248, 2400)), row.names =
c(1L,
>> > 3L, 5L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 17L,
18L, 19L,
>> > 24L, 25L, 26L, 29L, 30L, 31L, 32L, 33L, 34L, 36L, 37L,
38L, 39L,
>> > 41L, 42L), class = "data.frame")
>> >
>> >
>> >
>> > On Thu, Jun 30, 2022 at 11:28 PM Rui Barradas
>> > <ruipbarradas at sapo.pt <mailto:ruipbarradas at
sapo.pt>
>> > <mailto:ruipbarradas at sapo.pt <mailto:ruipbarradas
at sapo.pt>>>
>> wrote:
>> >
>> > Hello,
>> >
>> > Please post data in dput format, without it it's
difficult
>> > to tell.
>> > If I substitute
>> >
>> > mpg for act_effort
>> > mtcars for tr
>> >
>> > keeping everything else, I don't get any errors.
>> > And the error message says clearly that the error is
in tr
>> > (data).
>> >
>> > Can you post the output of dput(head(tr, 30))?
>> >
>> > Rui Barradas
>> >
>> >
>> > ?s 19:32 de 30/06/2022, Neha gupta escreveu:
>> > > I posted it for the second time as I didn't
get any
>> > response from
>> > group
>> > > members. I am not sure if some problem is with
the
>> question.
>> > >
>> > >
>> > >
>> > > I cannot run the "ranger" model with
caret. I am only
>> > using the
>> > farff and
>> > > caret libraries and the following code:
>> > >
>> > > boot <- trainControl(method =
"cv", number=10)
>> > >
>> > > c1 <-train(act_effort ~ ., data = tr,
>> > > method = "ranger",
>> > > tuneLength = 5,
>> > > metric = "MAE",
>> > > preProc = c("center",
"scale", "nzv"),
>> > > trControl = boot)
>> > >
>> > > The error I get is the repeating of the
following
>> > message until I
>> > interrupt
>> > > it.
>> > >
>> > > Error: mtry can not be larger than number of
variables
>> > in data.
>> > Ranger will
>> > > EXIT now.
>> > >
>> > > [[alternative HTML version deleted]]
>> > >
>> > > ______________________________________________
>> > > R-help at r-project.org <mailto:R-help at
r-project.org>
>> > <mailto:R-help at r-project.org <mailto:R-help at
r-project.org>>
>> > mailing list
>> > -- To UNSUBSCRIBE and more, see
>> > > https://stat.ethz.ch/mailman/listinfo/r-help
>> > <https://stat.ethz.ch/mailman/listinfo/r-help>
>> > <https://stat.ethz.ch/mailman/listinfo/r-help
>> > <https://stat.ethz.ch/mailman/listinfo/r-help>>
>> > > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > <http://www.R-project.org/posting-guide.html>
>> > <http://www.R-project.org/posting-guide.html
>> > <http://www.R-project.org/posting-guide.html>>
>> > > and provide commented, minimal, self-contained,
>> > reproducible code.
>> >
>>
>
[[alternative HTML version deleted]]