Type: | Package |
Title: | Automating Choosing Statistical Tests |
Version: | 0.1.2 |
Maintainer: | Wouter Zeevat <wouterzeevat@gmail.com> |
Description: | Automatically selects and runs the most appropriate statistical test for your data, returning clear, easy-to-read results. Ideal for all experience levels. |
License: | GPL-3 |
Encoding: | UTF-8 |
URL: | https://github.com/wouterzeevat/automatedtests |
BugReports: | https://github.com/wouterzeevat/automatedtests/issues |
RoxygenNote: | 7.3.2 |
VignetteBuilder: | knitr |
Suggests: | knitr, rmarkdown |
Imports: | R6, nnet, nortest, stats, DescTools |
Depends: | R (≥ 4.0) |
NeedsCompilation: | no |
Packaged: | 2025-06-16 17:31:41 UTC; seaba |
Author: | Wouter Zeevat [aut, cre] |
Repository: | CRAN |
Date/Publication: | 2025-06-16 17:50:02 UTC |
AutomatedTest class
Description
The AutomatedTest class represents a result of a statistical test. It contains attributes such as the p-value, degrees of freedom, and more.
Methods
Public methods
Method new()
Initialize an instance of the AutomatedTest class
Usage
AutomatedTest$new(data, identifiers, compare_to = NULL, paired = FALSE)
Arguments
data
A dataframe containing the data for the test.
identifiers
A vector with the identifiers.
compare_to
Numeric value to compare to for comparison in one-sample tests. Default is NULL.
paired
Logical; if TRUE, the test will be performed as paired if applicable. Default is FALSE.
Method get_data()
Get the data used in the test
Usage
AutomatedTest$get_data()
Returns
A dataframe with all features
Method is_paired()
Shows if the data is paired, if there are multiple rows with the same identifier, the data has more samples (TIDY DATA). Making the data paired.
Usage
AutomatedTest$is_paired()
Returns
Whether the data is paired (TRUE/FALSE).
Method get_identifiers()
A list of the identifiers used for the data
Usage
AutomatedTest$get_identifiers()
Returns
Returns the identifiers
Method get_compare_to()
Get the comparison value for one-sample tests
Usage
AutomatedTest$get_compare_to()
Returns
A numeric value for comparison
Method set_compare_co()
Updates the compare_to variable. Is public because the compare value can get changed depending on the type of test. This function is public because it needs to be able to be called by automatical_test()
Usage
AutomatedTest$set_compare_co(compare_to)
Arguments
compare_to
Numeric value to compare to.
Returns
Updated object with comparison value set.
Method get_datatypes()
Get the data types of the features in the object
Usage
AutomatedTest$get_datatypes()
Returns
A list of data types (e.g., Quantitative or Qualitative)
Method get_parametric_list()
Get the parametric test results of the features
Usage
AutomatedTest$get_parametric_list()
Returns
A list of parametric test results
Method is_parametric()
Check if the data meets parametric assumptions
Usage
AutomatedTest$is_parametric()
Returns
TRUE if parametric assumptions are met, otherwise FALSE
Method get_test()
Get the statistical test that was chosen
Usage
AutomatedTest$get_test()
Returns
The name of the statistical test
Method get_result()
Get the result of selected statistical test
Usage
AutomatedTest$get_result()
Returns
The result of the statistical test
Method get_strength()
Get the strength(s) of selected statistical test.
Usage
AutomatedTest$get_strength()
Returns
A named numeric value indicating the strength of the result. The type and meaning depend on the test used:
- coefficient
Effect size and direction of predictors in regression
- r
Correlation strength and direction
- mean difference
Difference in group means
- statistic
Test statistic measuring group difference or association
- F statistic
Ratio of variances across groups
- proportion
Estimated success rate in the sample
- non-existent
No interpretable strength measure available
Method is_significant()
Whether the test results are significant or not.
Usage
AutomatedTest$is_significant()
Returns
TRUE / FALSE depending on the significance of the test.
Method print()
Print a summary of the test object
Usage
AutomatedTest$print()
Method clone()
The objects of this class are cloneable with this method.
Usage
AutomatedTest$clone(deep = FALSE)
Arguments
deep
Whether to make a deep clone.
Automatically Run a Statistical Test
Description
Automatically choose the best fitting statistical test for your data, and returns an easily readable AutomatedTest
object from either a data frame or individual vectors. This object contains the executed test together with all statistics and properties.
Usage
automatical_test(..., compare_to = NULL, identifiers = FALSE, paired = FALSE)
Arguments
... |
Either a single data frame or multiple equal-length vectors representing columns of data. |
compare_to |
A numeric value to compare against during a one-sample test.
If the data is categorical, the value will default to |
identifiers |
Logical; if TRUE, the first column/vector is treated as identifiers and excluded from testing. |
paired |
Logical; if TRUE, the test will be performed as paired if applicable, regardless of whether identifiers are provided. This applies to paired tests like McNemar's or the Cochran Q test. |
Details
The automatical_test
function automatically selects and runs the most fitting statistical test based on the data provided.
It can accept data as either a single data frame or multiple individual vectors, provided the vectors are of equal length.
If identifiers
is set to TRUE, the first column will be treated as identifiers and excluded from the test, supporting TIDY data.
When a multiple group test is selected (i.e., more than two groups, columns, or variables are used), the first non-identifier column will be used as the grouping or target variable, meaning all other variables will be tested against it.
The paired
parameter can be used to force paired testing for supported tests (such as McNemar's test or Cochran's Q),
even if identifiers are not explicitly included in the input.
If you want to override the defaults, you can change the compare_to
value to specify one-sample tests.
Once the test has been executed, you can use the method $get_result()
on the resulting object to get more detailed information about the test's execution, including a summary of the test used and all statistics.
Supported tests:
ID | Test |
1 | One-proportion test |
2 | Chi-square goodness-of-fit test |
3 | One-sample Student's t-test |
4 | One-sample Wilcoxon test |
5 | Multiple linear regression |
6 | Binary logistic regression |
7 | Multinomial logistic regression |
8 | Pearson correlation |
9 | Spearman's rank correlation |
10 | Cochran's Q test |
11 | McNemar's test |
12 | Fisher's exact test |
13 | Chi-square test of independence |
14 | Student's t-test for independent samples |
15 | Welch's t-test for independent samples |
16 | Mann-Whitney U test |
17 | Student's t-test for paired samples |
18 | Wilcoxon signed-rank test |
19 | One-way ANOVA |
20 | Welch's ANOVA |
21 | Repeated measures ANOVA |
22 | Kruskal-Wallis test |
23 | Friedman test |
Value
An object of class AutomatedTest
.
The object contains the results of the statistical test performed on the data.
You can use the method $get_result()
to obtain more detailed information about the execution of the test.
Author(s)
Wouter Zeevat
See Also
AutomatedTest
for the class used by this function.
Examples
# Example 1: Using individual vectors
test1 <- automatical_test(iris$Species, iris$Sepal.Length, identifiers = FALSE)
# Example 2: Forcing a paired test
before <- c(200, 220, 215, 205, 210)
after <- c(202, 225, 220, 210, 215)
paired_data <- data.frame(before, after)
test2 <- automatical_test(before, after, paired = TRUE)
# Retrieve more detailed information about the test
# test1$get_result()
Internal: Check if a numeric vector follows a normal distribution
Description
This function checks whether a numeric vector is approximately normally distributed,
using the Shapiro-Wilk test for small samples (n < 5000) and the Anderson-Darling test
for larger ones. If the input is not numeric, the function returns NULL
.
Usage
check_parametric(data)
Arguments
data |
A numeric vector to test for normality. |
Value
A list containing:
- test
Name of the test used ("Shapiro-Wilk Test" or "Anderson-Darling Test")
- statistic
The test statistic
- p_value
The p-value from the test
- result
Logical;
TRUE
if p > 0.05 (assumed normal),FALSE
otherwise
Returns NULL
if input is not numeric.
Returns the strength of a test. This is a different kind of value for each test. It will also return what the value is. These are the different types of data it can return:
Description
This function takes a 'test_object' that contains the result of a statistical test and returns the main coefficient, estimate, or test statistic as a numeric value. It supports various tests such as t-tests, ANOVAs, regressions, and correlations.
Usage
get_strength_from_test(test_object)
Arguments
test_object |
An object containing a statistical test result and metadata, expected to have methods 'get_result()' and 'get_test()'. |
Value
A named numeric value indicating the strength of the result. The type and meaning depend on the test used:
- coefficient
Effect size and direction of predictors in regression
- r
Correlation strength and direction
- mean difference
Difference in group means
- statistic
Test statistic measuring group difference or association
- F statistic
Ratio of variances across groups
- proportion
Estimated success rate in the sample
- non-existent
No interpretable strength measure available
Internal: Returns the result of a statistical test based on a string identifier
Description
This internal function selects and runs a statistical test using data from a test object, based on the name of the test specified. It supports a wide variety of tests including t-tests, chi-square tests, ANOVA, correlation tests, regression models, and more.
Usage
get_test_from_string(test_object)
Arguments
test_object |
An object containing data, identifiers, datatypes, and test selection. |
Value
The result of the selected statistical test. Typically, this is a test object with class 'htest', 'aov', 'lm', or similar.
Pick the appropriate test for multiple variables (Internal Function)
Description
Pick the appropriate test for multiple variables (Internal Function)
Usage
pick_multiple_variable_test(test_object)
Arguments
test_object |
An object containing the data, types, and metadata needed for test selection. |
Value
A character string with the name of the appropriate regression or classification model.
Pick the appropriate test for one variable (Internal Function)
Description
Pick the appropriate test for one variable (Internal Function)
Usage
pick_one_variable_test(test_object)
Arguments
test_object |
An object containing the data, data types, and comparison value. |
Value
A character string with the name of the appropriate one-sample statistical test.
Check if a dataframe is parametric (Internal Function) inst
Description
Check if a dataframe is parametric (Internal Function) inst
Usage
pick_test(test_object)
Arguments
test_object |
The data to check (vector of integers). |
Value
TRUE if data is normalized, FALSE otherwise.
Pick the appropriate test for two variables (Internal Function)
Description
Pick the appropriate test for two variables (Internal Function)
Usage
pick_two_variable_test(test_object)
Arguments
test_object |
An object containing the data, types, and metadata needed for test selection. |
Value
A character string with the name of the appropriate statistical test.