Type: | Package |
Title: | Example Use of 'mlpack' from C++ via R |
Version: | 0.0.1 |
Date: | 2025-09-14 |
Description: | A Minimal Example Package which demonstrates 'mlpack' use via C++ Code from R. |
URL: | https://github.com/eddelbuettel/rcppmlpack-examples |
BugReports: | https://github.com/eddelbuettel/rcppmlpack-examples/issues |
License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
Suggests: | tinytest |
Depends: | R (≥ 3.5.0) |
Imports: | Rcpp (≥ 1.1.0) |
LinkingTo: | Rcpp, RcppArmadillo (≥ 15.0.2-1), RcppEnsmallen, mlpack (≥ 4.6.3) |
Encoding: | UTF-8 |
RoxygenNote: | 7.3.3 |
NeedsCompilation: | yes |
Packaged: | 2025-09-14 23:36:11 UTC; edd |
Author: | Dirk Eddelbuettel [aut, cre], Authors of mlpack [aut], Constantinos Giachalis [ctb] |
Maintainer: | Dirk Eddelbuettel <edd@debian.org> |
Repository: | CRAN |
Date/Publication: | 2025-09-21 13:30:02 UTC |
Example Use of 'mlpack' from C++ via R
Description
A Minimal Example Package which demonstrates 'mlpack' use via C++ Code from R.
Package Content
Index of help topics:
covertype_small Covertype data subset used for classification kMeans Run a k-means clustering analysis linearRegression Run a linear regression with optional ridge regression loanData Loan data subset used for default prediction loanDefaultPrediction loanDefaultPrediction randomForest Run a Random Forest classificatio rcppmlpackexamples-package Example Use of 'mlpack' from C++ via R
Maintainer
Dirk Eddelbuettel <edd@debian.org>
Author(s)
Dirk Eddelbuettel [aut, cre], Authors of mlpack [aut], Constantinos Giachalis [ctb]
Covertype data subset used for classification
Description
A subset of the UCI machine learning data set ‘covertype’ describing cloud coverage in seven different states of coverage. This smaller subset contains with 100,000 observations and 55 variables. The first 54 variables are explanatory (i.e. “features”), with the last providing the dependent variable (“labels”. The data is in the ‘wide’ 55 x 100,000 format used by mlpack. The dependent variable has been transformed to the range zero to six by subtracting one from the values found in the data file.
Details
The original source of the data is the US Forest Service, and the complete file is part of the UC Irvince machine learning data repository.
Source
https://www.mlpack.org/datasets/covertype-small.csv.gz
References
https://archive.ics.uci.edu/dataset/31/covertype
Run a k-means clustering analysis
Description
Run a k-means clustering analysis, returning a list of cluster assignments
Usage
kMeans(data, clusters)
Arguments
data |
A matrix of data values |
clusters |
An integer specifying the number of clusters |
Details
This function performs a k-means clustering analysis on the given data set.
Value
A list with cluster assignments
Examples
x <- rbind(matrix(rnorm(100, sd = 0.3), ncol = 2),
matrix(rnorm(100, mean = 1, sd = 0.3), ncol = 2))
colnames(x) <- c("x", "y")
cl <- kMeans(x, 2)
data(trees, package="datasets")
cl2 <- kMeans(t(trees),3)
Run a linear regression with optional ridge regression
Description
Run a linear regression (with optional ridge regression)
Usage
linearRegression(matX, vecY, lambda = 0, intercept = TRUE)
Arguments
matX |
A matrix of explanatory variables (‘predictors’) in standard R format (i.e. ‘tall and skinny’ to be transposed internally to MLPACK format (i.e. ‘short and wide’). |
vecY |
A vector of dependent variables (‘responses’) |
lambda |
An optional ridge parameter, defaults to zero |
intercept |
An optional boolean switch about an intercept, default is true. |
Details
This function performs a linear regression, and serves as a simple test case for accessing an MLPACK function.
Value
A vector with fitted values
Examples
suppressMessages(library(utils))
data("trees", package="datasets")
X <- with(trees, cbind(log(Girth), log(Height)))
y <- with(trees, log(Volume))
lmfit <- lm(y ~ X)
# summary(fitted(lmfit))
mlfit <- linearRegression(X, y)
# summary(mlfit)
all.equal(unname(fitted(lmfit)), as.vector(mlfit))
Loan data subset used for default prediction
Description
A four column data set containing a binary variable ‘Employed’ (with zero denoting unemployment and one employment), a numeric variable ‘Bank Balance’, a numeric variable ‘Annual Salary’ and a binary target variable ‘Defaulted?’ (with zero denoting loan repayment and one denoting default).
Details
The original source of the data is not documented by mlpack.
Source
https://datasets.mlpack.org/LoanDefault.csv
References
https://archive.ics.uci.edu/dataset/31/covertype
loanDefaultPrediction
Description
Predict loan default using a decision tree model
Usage
loanDefaultPrediction(loanDataFeatures, loanDataTargets, pct = 0.25)
Arguments
loanDataFeatures |
A matrix of dimension 3 by N, i.e. transposed relative to what R uses, with the three explanantory variables |
loanDataTargets |
A vector of (integer-valued) binary variables loan repayment or default |
pct |
A numeric variable with the percentage of data to be used for testing, defaults to 25% |
Details
This functions performs a loan default prediction based on three variables on employment, bank balance and annual salary to predict loan repayment or default
Value
A list object with predictions, probabilities, accuracy and a report matrix
Examples
data(loanData)
res <- loanDefaultPrediction(t(as.matrix(loanData[,-4])), # col 1 to 3, transposed
loanData[, 4], # col 4 is the target
0.25) # retain 25% for testing
str(res)
res$report
Run a Random Forest classificatio
Description
Run a Random Forest Classifier
Usage
randomForest(dataset, labels, pct = 0.3, nclasses = 7L, ntrees = 10L)
Arguments
dataset |
A matrix of explanatory variables, i.e. “features” |
labels |
A vector of the dependent variable as integer values, i.e. “labels” |
pct |
A numeric value for the percentage of data to be retained for the test set |
nclasses |
An integer value for the number of a distinct values in |
ntrees |
An integer value for the number of trees |
Details
This function performs a Random Forest classification on a subset of the standard ‘covertype’ data set
Value
A list object
See Also
covertype_small
Examples
data(covertype_small) # see help(covertype_small)
res <- randomForest(covertype_small[-55,], # features (already transposed)
covertype_small[55,], # labels now in [0, 6] range
0.3) # percentage used for testing
str(res) # accuracy varies as method is randomized but not seed set here