Hello,
I think the function find_x_from_profile below does what you want.
I have used the data set in the first example of ?readARFF, the built-in
and all-present data set iris.
The function returns a one line data.frame whose column names are "x"
and "y". Pass the y-axis value in argument ynew and the value you want
is output column "x".
The function only takes one y value at a time, this can be changed if
needed.
suppressPackageStartupMessages({
library(farff)
library(mlr3)
library(mlr3learners)
library(mlr3filters)
library(mlr3extralearners)
library(DALEX)
library(DALEXtra)
library(readr)
library(ggplot2)
})
# make the results reproducible
set.seed(2022)
# this is the data for the reprex
path <- tempfile()
writeARFF(iris, path = path)
data <- readARFF(path)
# Parse with reader=readr :
C:\Users\ruipb\AppData\Local\Temp\RtmpUxSDP3\file578778a1417
# header: 0.000000; preproc: 0.000000; data: 0.110000; postproc:
0.000000; total: 0.110000
# data = readARFF("ant.arff")
index <- sample(1:nrow(data), 0.7*nrow(data))
train <- data[index,]
test <- data[-index,]
task <- TaskRegr$new("data", backend = train, target =
"Sepal.Length")
learner <- lrn("regr.randomForest")
model <- learner$train(task )
explainer <- explain_mlr3(model,
data = test[,-16],
y = as.numeric(test$Sepal.Length)-1,
label="RF")
# Preparation of a new explainer is initiated
# -> model label : RF
# -> data : 45 rows 5 cols
# -> target variable : 45 values
# -> predict function : yhat.LearnerRegr will be used ( default )
# -> predicted values : No value for predict function target
column. ( default )
# -> model_info : package mlr3 , ver. 0.13.3 , task
regression ( default )
# -> predicted values : numerical, min = 4.775823 , mean =
5.892271 , max = 7.226967
# -> residual function : difference between y and yhat ( default )
# -> residuals : numerical, min = -1.642701 , mean =
-0.9922714 , max = -0.2101927
# A new explainer has been created!
m <- model_profile(explainer = explainer, variables =
"Sepal.Width")
find_x_from_profile <- function(model, xvar, ynew) {
if(length(ynew) > 1) {
warn <- "'ynew' length is greater than 1, only the first is
considered."
warning(warn)
ynew <- ynew[1]
}
ap <- m$agr_profiles[c("_yhat_", "_x_")]
names(ap) <- c("yhat", "x")
i <- order(ap$yhat)
ap <- ap[i, ]
j <- findInterval(ynew, ap$yhat)
olddata <- data.frame(
x = ap$yhat[order(i)][j:(j + 1)],
y = ap$x[order(i)][j:(j + 1)]
)
newdata <- approx(olddata, xout = ynew)
newdata <- as.data.frame(newdata)
names(newdata) <- rev(names(newdata))
newdata[2:1]
}
find_x_from_profile(m, xvar = "Sepal.Width", 5.85)
# x y
# 1 2.941472 5.85
newdata <- find_x_from_profile(m, xvar = "Sepal.Width", 5.85)
p <- plot(m)
p +
geom_point(
data = newdata,
mapping = aes(x, y),
color = "red",
size = 2,
inherit.aes = FALSE
)
Hope this helps,
Rui Barradas
?s 08:54 de 27/05/2022, Neha gupta escreveu:> I am sorry for that.
>
> I used
>
> library(farff)
> library(mlr3learners)
> library(mlr3filters)
> library(mlr3extralearners)
> library(mlr3)
> library(DALEX)
> library(DALEXtra)
>
> data = readARFF("ant.arff")
> index= sample(1:nrow(data), 0.7*nrow(data))
> train= data[index,]
> test= data[-index,]
> task = TaskRegr$new("data", backend = train, target =
"bug")
>
> learner= lrn("regr.randomForest")
> model= learner$train(task )
>
> explainer = explain_mlr3(model,
> ? ? ? ? ? ? ? ? ? ? ? ? ? data = test[,-16],
> ? ? ? ? ? ? ? ? ? ? ? ? ? y = as.numeric(test$bug)-1,
> ? ? ? ? ? ? ? ? ? ? ? ? ? label="RF")
>
> m=model_profile(explainer = explainer, variables = "rfc")
>
> plot(m)
>
> Ant it shows a plot, with values of x axis (bug) and y axis (rfc)
>
> I can manually?see what is the value of bug at rfc=75, but I need the
> exact value and by seeing the plot and guessing the rfc=75 value for bug
> might not be the exact value I need.
>
> Thank you
>
> On Fri, May 27, 2022 at 9:39 AM Rui Barradas <ruipbarradas at sapo.pt
> <mailto:ruipbarradas at sapo.pt>> wrote:
>
> Hello,
>
> Neha, it's not the first time you post questions to R-Help, please,
> please!, start your scripts by loading the packages needed.
>
> I have never used package DALEX but for what I understand from its
> documentation it? helps to explore and explain models behavior. If your
> profile plot was output by method plot.model_profile(), the workflow is
> or seems to be
>
> 1. fit a model;
> 2. create an object of S3 class "model_profile" with
functions
> explain()
> and model_profile();
> 3. plot that object.
>
>
> So to know what is the value of y for a given x, predict from the
> fitted
> model, package DALEX and its plots have nothing to do with it.
> If there's a predict method for the fitting function, then it
should be
> as simple as
>
>
> newdata75 <- data.frame(x = 75)
> y75 <- predict(fit, newdata = newdata75)
>
>
> or something similar.
>
> I have never used this package so I might be completely wrong.
>
> Hope this helps,
>
> Rui Barradas
>
> ?s 08:09 de 27/05/2022, Neha gupta escreveu:
> > Thank you Rui, Avi
> >
> > I am using the plot(), in the Dalex package and it implements the
> ggplot.
> >
> > So I only used plot(mydata) and it displays the ggplot . If we
> need to
> > adjust or make further?changes in the plot, I think people use
> >
> > plot?+ .....
> > I don't know if this group support the image pasting but my
plot is
> > showing?like below. (bugs is a variable in my data whose values
are
> > displayed on y-axis and RFC is another?variable in my dataset
whose
> > value is shown on the x-axis. I want to know exactly (not
> necessarily
> > using the plot, a simple print function should also work for me)
> what is
> > the value of 'bug' when the value of 'rfc' is 75.
> >
> > image.png
> >
> >
> > On Fri, May 27, 2022 at 7:49 AM Rui Barradas
> <ruipbarradas at sapo.pt <mailto:ruipbarradas at sapo.pt>
> > <mailto:ruipbarradas at sapo.pt <mailto:ruipbarradas at
sapo.pt>>> wrote:
> >
> >? ? ?Hello,
> >
> >? ? ?If you cannot determine the exact value of y for given x,
> then isn't
> >? ? ?your problem how to determine an approximate value of y? Once
> you have
> >? ? ?it, it's easy to plot it.
> >
> >? ? ?With newdata = data.frame(x = 75, y = ???),
> >
> >
> >? ? ?ggplot(mydata, mapping = aes(x, y)) +
> >? ? ? ? ?geom_point(color = "black") +
> >? ? ? ? ?geom_point(newdata, mapping = aes(x, y), color =
"red") +
> >? ? ? ? ?xlim(0, 200)
> >
> >
> >? ? ?The question is how to find newdata$y, interpolation, other
> method?
> >
> >? ? ?Hope this helps,
> >
> >? ? ?Rui Barradas
> >
> >? ? ??s 00:40 de 27/05/2022, Neha gupta escreveu:
> >? ? ? > I have a ggplot2 which has x-values 0-200 and y values
0-10
> >? ? ? >
> >? ? ? > p=plot(mydata)
> >? ? ? > p+xlim(0, 200)
> >? ? ? >
> >? ? ? > I want to show what is the y value when we have 75 as x
value.
> >? ? ?The graph
> >? ? ? > which is displayed has a broad range (like 0-50, 50-100
> etc on x
> >? ? ?axis) and
> >? ? ? > cannot determine the exact value of y at the value of
75
> on x-axis.
> >? ? ? >
> >? ? ? > Thank you
> >? ? ? >
> >? ? ? >? ? ? ?[[alternative HTML version deleted]]
> >? ? ? >
> >? ? ? > ______________________________________________
> >? ? ? > R-help at r-project.org <mailto:R-help at
r-project.org>
> <mailto:R-help at r-project.org <mailto:R-help at
r-project.org>> mailing list
> >? ? ?-- To UNSUBSCRIBE and more, see
> >? ? ? > stat.ethz.ch/mailman/listinfo/r-help
> <stat.ethz.ch/mailman/listinfo/r-help>
> >? ? ?<stat.ethz.ch/mailman/listinfo/r-help
> <stat.ethz.ch/mailman/listinfo/r-help>>
> >? ? ? > PLEASE do read the posting guide
> > R-project.org/posting-guide.html
> <R-project.org/posting-guide.html>
> >? ? ?<R-project.org/posting-guide.html
> <R-project.org/posting-guide.html>>
> >? ? ? > and provide commented, minimal, self-contained,
> reproducible code.
> >
>