An intelligent teaching assistant based on LLMs to help interpret
statistical model outputs in R.
EnTraineR builds audience-aware prompts (beginner, applied,
advanced) that never invent numbers: it passes verbatim
outputs from R and instructs how to explain them.
Works out of the box to produce high-quality prompts.
Optionally, you can connect your own LLM backend (via your functions built on top oftrainer_core_generate_or_return()).
From GitHub:
# install.packages("remotes")
remotes::install_github("Sebastien-Le/EnTraineR")Optional but recommended packages for examples: - FactoMineR, SensoMineR (model objects used in examples) - stringr (to squish multi-line intros)
AovSum) with F-tests and T-testsFactoMineR::LinearModel) including model
selection notesbeginner: plain-language teaching focusapplied: decisions and practical implicationsadvanced: technical but concise, with appropriate
cautionsgemini_generate() sends your prompt to Google Gemini
(Generative Language API) and returns the text reply.The package ships 3 small datasets for teaching:
deforestation
Air and water temperatures before/after riparian deforestation.
Variables: Temp_water, Temp_air,
Deforestation (BEFORE/AFTER).
ham
Sensory descriptors for 21 hams and an Overall liking
score.
Useful for multiple regression demonstrations.
poussin
Chick weights by brooding Temperature (T1/T2/T3) and
Gender (Female/Male).
Useful for two-factor ANOVA examples.
These datasets are the intellectual property of L’Institut Agro Rennes Angers and are used for the “Statistical Approach” course module.
data(deforestation); str(deforestation)
data(ham); summary(ham)
data(poussin); with(poussin, table(Temperature, Gender))# install.packages("SensoMineR")
library(SensoMineR)
data(chocolates)
# Build AovSum (example similar to chocolates::Granular ~ Product*Panelist)
res <- AovSum(Granular ~ Product*Panelist, data = sensochoc)
intro <- "Six chocolates have been evaluated by a sensory panel,
according to a sensory attribute: granular.
The panel has been trained according to this attribute
and panellists should be reproducible when rating this attribute."
intro <- gsub("\n", " ", intro)
intro <- stringr::str_squish(intro)
p <- trainer_AovSum(
aovsum_obj = res,
audience = "applied",
t_test = c("Product", "Panelist"), # filter T-test section
introduction = intro
)
cat(p) # a ready-to-use prompt for an LLM or for teaching# install.packages("FactoMineR"); install.packages("stringr")
library(FactoMineR)
intro_ham <- "Can we predict ham overall liking from its sensory profile?"
intro_ham <- stringr::str_squish(gsub("\n", " ", intro_ham))
fit <- LinearModel(`Overall liking` ~ ., data = ham, selection = "bic")
pr <- trainer_LinearModel(
lm_obj = fit,
introduction = intro_ham,
audience = "advanced"
)
cat(pr)Another linear model with interaction and a categorical factor:
fit2 <- LinearModel(Temp_water ~ Temp_air * Deforestation,
data = deforestation, selection = "none")
pr2 <- trainer_LinearModel(
lm_obj = fit2,
introduction = "Effect of deforestation on the air-water temperature link.",
audience = "beginner"
)
cat(pr2)t-test:
tt <- t.test(rnorm(20, 0.1), mu = 0)
cat(trainer_t_test(tt, audience = "beginner"))Variance F-test:
vt <- var.test(rnorm(25, sd = 1.0), rnorm(30, sd = 1.3))
cat(trainer_var_test(vt, audience = "applied"))Proportion test:
pt <- prop.test(x = c(42, 35), n = c(100, 90))
cat(trainer_prop_test(pt, audience = "advanced", summary_only = TRUE))Correlation test:
set.seed(1)
x <- rnorm(30); y <- 0.5 * x + rnorm(30, sd = 0.8)
ct <- cor.test(x, y, method = "pearson")
cat(trainer_cor_test(ct, audience = "applied"))Chi-squared test:
m <- matrix(c(10, 20, 30, 40), nrow = 2)
cx <- chisq.test(m, correct = TRUE)
cat(trainer_chisq_test(cx, audience = "beginner"))gemini_generate() lets you send a prompt to Google
Gemini and get the response back as text.
# 1) Set your API key once per session (or in .Renviron)
Sys.setenv(GEMINI_API_KEY = "your_key_here")
# 2) Send a prompt
txt <- gemini_generate(
prompt = "Say hello in one short sentence.",
model = "gemini-2.5-flash", # accepts "gemini-2.5-flash" or "models/gemini-2.5-flash"
temperature = 0.2,
user_agent = "EnTraineR/0.9.0 (https://github.com/Sebastien-Le/EnTraineR)"
)
cat(txt)All prompts emphasize: do not invent numbers; use only what appears in the printed output.
By default, trainers return a prompt string (i.e.,
generate = FALSE).
If you have a generator backend, you can pass
generate = TRUE and a llm_model name;
implement your own trainer_core_generate_or_return() to
call your LLM API.
Issues and pull requests are welcome. Please: - Keep code ASCII and roxygen2-ready. - Add tests and examples where relevant. - Follow the audience style guidelines.
See the DESCRIPTION file for license terms.
If EnTraineR helps your teaching or analyses, starring the
repo is appreciated.
Thanks to the R community and the authors of FactoMineR
and SensoMineR for inspiring teaching tools and example
datasets used in demonstrations.