Title: | Variable Importance via Oscillations |
Version: | 0.2.1 |
Description: | Provides an easy to calculate local variable importance measure based on Ceteris Paribus profile and global variable importance measure based on Partial Dependence Profiles. |
Depends: | R (≥ 3.0) |
License: | GPL-2 |
Encoding: | UTF-8 |
LazyData: | true |
Imports: | ggplot2, DALEX |
Suggests: | knitr, rmarkdown, mlbench, randomForest, gridExtra, grid, lattice, testthat, ingredients |
VignetteBuilder: | knitr |
RoxygenNote: | 7.1.0 |
URL: | https://github.com/ModelOriented/vivo |
BugReports: | https://github.com/ModelOriented/vivo/issues |
NeedsCompilation: | no |
Packaged: | 2020-09-07 08:52:51 UTC; AnnA |
Author: | Anna Kozak [aut, cre], Przemyslaw Biecek [aut, ths] |
Maintainer: | Anna Kozak <anna1993kozak@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2020-09-07 11:00:02 UTC |
Internal Function for Split Points for Selected Variables
Description
This function calculate candidate splits for each selected variable.
For numerical variables splits are calculated as percentiles
(in general uniform quantiles of the length grid_points
).
For all other variables splits are calculated as unique values.
Usage
calculate_variable_split(data, variables = colnames(data), grid_points = 101)
Arguments
data |
validation dataset. Is used to determine distribution of observations. |
variables |
names of variables for which splits shall be calculated |
grid_points |
number of points used for response path |
Value
A named list with splits for selected variables
Note
This function is a copy of calculate_varaible_split()
from ingredients
package with small change.
Author(s)
Przemyslaw Biecek
Calculated empirical density and weight based on variable split.
Description
This function calculate an empirical density of raw data based on variable split from Ceteris Paribus profiles. Then calculated weight for values generated by DALEX::predict_profile()
, DALEX::individual_profile()
or ingredients::ceteris_paribus()
.
Usage
calculate_weight(profiles, data, variable_split)
Arguments
profiles |
|
data |
|
variable_split |
list generated by |
Value
Return an weight based on empirical density.
Examples
library("DALEX", warn.conflicts = FALSE, quietly = TRUE)
data(apartments)
split <- vivo::calculate_variable_split(apartments,
variables = colnames(apartments),
grid_points = 101)
library("randomForest", warn.conflicts = FALSE, quietly = TRUE)
apartments_rf_model <- randomForest(m2.price ~ construction.year + surface +
floor + no.rooms, data = apartments)
explainer_rf <- explain(apartments_rf_model, data = apartmentsTest[,2:5],
y = apartmentsTest$m2.price)
new_apartment <- data.frame(construction.year = 1998, surface = 88, floor = 2L, no.rooms = 3)
profiles <- predict_profile(explainer_rf, new_apartment)
library("vivo")
calculate_weight(profiles, data = apartments[, 2:5], variable_split = split)
Global Variable Importance measure based on Partial Dependence profiles.
Description
This function calculate global importance measure.
Usage
global_variable_importance(profiles)
Arguments
profiles |
|
Value
A data.frame
of the class global_variable_importance
.
It's a data.frame
with calculated global variable importance measure.
Examples
library("DALEX")
data(apartments)
library("randomForest")
apartments_rf_model <- randomForest(m2.price ~ construction.year + surface +
floor + no.rooms, data = apartments)
explainer_rf <- explain(apartments_rf_model, data = apartmentsTest[,2:5],
y = apartmentsTest$m2.price)
profiles <- model_profile(explainer_rf)
library("vivo")
global_variable_importance(profiles)
Local Variable Importance measure based on Ceteris Paribus profiles.
Description
This function calculate local importance measure in eight variants. We obtain eight variants measure through the possible options of three parameters such as absolute_deviation
, point
and density
.
Usage
local_variable_importance(
profiles,
data,
absolute_deviation = TRUE,
point = TRUE,
density = TRUE,
grid_points = 101
)
Arguments
profiles |
|
data |
|
absolute_deviation |
logical parameter, if |
point |
logical parameter, if |
density |
logical parameter, if |
grid_points |
maximum number of points for profile calculations, the default values is 101, the same as in |
Value
A data.frame
of the class local_variable_importance
.
It's a data.frame
with calculated local variable importance measure.
Examples
library("DALEX")
data(apartments)
library("randomForest")
apartments_rf_model <- randomForest(m2.price ~ construction.year + surface +
floor + no.rooms, data = apartments)
explainer_rf <- explain(apartments_rf_model, data = apartmentsTest[,2:5],
y = apartmentsTest$m2.price)
new_apartment <- data.frame(construction.year = 1998, surface = 88, floor = 2L, no.rooms = 3)
profiles <- predict_profile(explainer_rf, new_apartment)
library("vivo")
local_variable_importance(profiles, apartments[,2:5],
absolute_deviation = TRUE, point = TRUE, density = TRUE)
local_variable_importance(profiles, apartments[,2:5],
absolute_deviation = TRUE, point = TRUE, density = FALSE)
local_variable_importance(profiles, apartments[,2:5],
absolute_deviation = TRUE, point = FALSE, density = TRUE)
Plot Global Variable Importance measure
Description
Function plot.global_importance plots global importance measure based on Partial Dependence profiles.
Usage
## S3 method for class 'global_importance'
plot(x, ..., variables = NULL, type = NULL, title = "Variable importance")
Arguments
x |
object returned from |
... |
other object returned from |
variables |
if not |
type |
a character. How variables shall be plotted? Either "bars" (default) or "lines". |
title |
the plot's title, by default |
Value
a ggplot2 object
Examples
library("DALEX")
data(apartments)
library("randomForest")
apartments_rf_model <- randomForest(m2.price ~ construction.year + surface +
floor + no.rooms, data = apartments)
explainer_rf <- explain(apartments_rf_model, data = apartmentsTest[,2:5],
y = apartmentsTest$m2.price)
profiles <- model_profile(explainer_rf)
library("vivo")
measure <- global_variable_importance(profiles)
plot(measure)
Plot Local Variable Importance measure
Description
Function plot.local_importance plots local importance measure based on Ceteris Paribus profiles.
Usage
## S3 method for class 'local_importance'
plot(
x,
...,
variables = NULL,
color = NULL,
type = NULL,
title = "Local variable importance"
)
Arguments
x |
object returned from |
... |
other object returned from |
variables |
if not |
color |
a character. How to aggregated measure? Either "_label_method_" or "_label_model_". |
type |
a character. How variables shall be plotted? Either "bars" (default) or "lines". |
title |
the plot's title, by default |
Value
a ggplot2 object
Examples
library("DALEX")
data(apartments)
library("randomForest")
apartments_rf_model <- randomForest(m2.price ~ construction.year + surface +
floor + no.rooms, data = apartments)
explainer_rf <- explain(apartments_rf_model, data = apartmentsTest[,2:5],
y = apartmentsTest$m2.price)
new_apartment <- data.frame(construction.year = 1998, surface = 88, floor = 2L, no.rooms = 3)
profiles <- predict_profile(explainer_rf, new_apartment)
library("vivo")
measure1 <- local_variable_importance(profiles, apartments[,2:5],
absolute_deviation = TRUE, point = TRUE, density = FALSE)
plot(measure1)
measure2 <- local_variable_importance(profiles, apartments[,2:5],
absolute_deviation = TRUE, point = TRUE, density = TRUE)
plot(measure1, measure2, color = "_label_method_", type = "lines")