Type: Package
Title: Automating Choosing Statistical Tests
Version: 0.1.2
Maintainer: Wouter Zeevat <wouterzeevat@gmail.com>
Description: Automatically selects and runs the most appropriate statistical test for your data, returning clear, easy-to-read results. Ideal for all experience levels.
License: GPL-3
Encoding: UTF-8
URL: https://github.com/wouterzeevat/automatedtests
BugReports: https://github.com/wouterzeevat/automatedtests/issues
RoxygenNote: 7.3.2
VignetteBuilder: knitr
Suggests: knitr, rmarkdown
Imports: R6, nnet, nortest, stats, DescTools
Depends: R (≥ 4.0)
NeedsCompilation: no
Packaged: 2025-06-16 17:31:41 UTC; seaba
Author: Wouter Zeevat [aut, cre]
Repository: CRAN
Date/Publication: 2025-06-16 17:50:02 UTC

AutomatedTest class

Description

The AutomatedTest class represents a result of a statistical test. It contains attributes such as the p-value, degrees of freedom, and more.

Methods

Public methods


Method new()

Initialize an instance of the AutomatedTest class

Usage
AutomatedTest$new(data, identifiers, compare_to = NULL, paired = FALSE)
Arguments
data

A dataframe containing the data for the test.

identifiers

A vector with the identifiers.

compare_to

Numeric value to compare to for comparison in one-sample tests. Default is NULL.

paired

Logical; if TRUE, the test will be performed as paired if applicable. Default is FALSE.


Method get_data()

Get the data used in the test

Usage
AutomatedTest$get_data()
Returns

A dataframe with all features


Method is_paired()

Shows if the data is paired, if there are multiple rows with the same identifier, the data has more samples (TIDY DATA). Making the data paired.

Usage
AutomatedTest$is_paired()
Returns

Whether the data is paired (TRUE/FALSE).


Method get_identifiers()

A list of the identifiers used for the data

Usage
AutomatedTest$get_identifiers()
Returns

Returns the identifiers


Method get_compare_to()

Get the comparison value for one-sample tests

Usage
AutomatedTest$get_compare_to()
Returns

A numeric value for comparison


Method set_compare_co()

Updates the compare_to variable. Is public because the compare value can get changed depending on the type of test. This function is public because it needs to be able to be called by automatical_test()

Usage
AutomatedTest$set_compare_co(compare_to)
Arguments
compare_to

Numeric value to compare to.

Returns

Updated object with comparison value set.


Method get_datatypes()

Get the data types of the features in the object

Usage
AutomatedTest$get_datatypes()
Returns

A list of data types (e.g., Quantitative or Qualitative)


Method get_parametric_list()

Get the parametric test results of the features

Usage
AutomatedTest$get_parametric_list()
Returns

A list of parametric test results


Method is_parametric()

Check if the data meets parametric assumptions

Usage
AutomatedTest$is_parametric()
Returns

TRUE if parametric assumptions are met, otherwise FALSE


Method get_test()

Get the statistical test that was chosen

Usage
AutomatedTest$get_test()
Returns

The name of the statistical test


Method get_result()

Get the result of selected statistical test

Usage
AutomatedTest$get_result()
Returns

The result of the statistical test


Method get_strength()

Get the strength(s) of selected statistical test.

Usage
AutomatedTest$get_strength()
Returns

A named numeric value indicating the strength of the result. The type and meaning depend on the test used:

coefficient

Effect size and direction of predictors in regression

r

Correlation strength and direction

mean difference

Difference in group means

statistic

Test statistic measuring group difference or association

F statistic

Ratio of variances across groups

proportion

Estimated success rate in the sample

non-existent

No interpretable strength measure available


Method is_significant()

Whether the test results are significant or not.

Usage
AutomatedTest$is_significant()
Returns

TRUE / FALSE depending on the significance of the test.


Method print()

Print a summary of the test object

Usage
AutomatedTest$print()

Method clone()

The objects of this class are cloneable with this method.

Usage
AutomatedTest$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.


Automatically Run a Statistical Test

Description

Automatically choose the best fitting statistical test for your data, and returns an easily readable AutomatedTest object from either a data frame or individual vectors. This object contains the executed test together with all statistics and properties.

Usage

  automatical_test(..., compare_to = NULL, identifiers = FALSE, paired = FALSE)

Arguments

...

Either a single data frame or multiple equal-length vectors representing columns of data.

compare_to

A numeric value to compare against during a one-sample test. If the data is categorical, the value will default to 1/k, where k is the number of categories, assuming a uniform distribution. If numeric, the default will be 0.

identifiers

Logical; if TRUE, the first column/vector is treated as identifiers and excluded from testing.

paired

Logical; if TRUE, the test will be performed as paired if applicable, regardless of whether identifiers are provided. This applies to paired tests like McNemar's or the Cochran Q test.

Details

The automatical_test function automatically selects and runs the most fitting statistical test based on the data provided. It can accept data as either a single data frame or multiple individual vectors, provided the vectors are of equal length.

If identifiers is set to TRUE, the first column will be treated as identifiers and excluded from the test, supporting TIDY data.

When a multiple group test is selected (i.e., more than two groups, columns, or variables are used), the first non-identifier column will be used as the grouping or target variable, meaning all other variables will be tested against it.

The paired parameter can be used to force paired testing for supported tests (such as McNemar's test or Cochran's Q), even if identifiers are not explicitly included in the input.

If you want to override the defaults, you can change the compare_to value to specify one-sample tests.

Once the test has been executed, you can use the method $get_result() on the resulting object to get more detailed information about the test's execution, including a summary of the test used and all statistics.

Supported tests:

ID Test
1 One-proportion test
2 Chi-square goodness-of-fit test
3 One-sample Student's t-test
4 One-sample Wilcoxon test
5 Multiple linear regression
6 Binary logistic regression
7 Multinomial logistic regression
8 Pearson correlation
9 Spearman's rank correlation
10 Cochran's Q test
11 McNemar's test
12 Fisher's exact test
13 Chi-square test of independence
14 Student's t-test for independent samples
15 Welch's t-test for independent samples
16 Mann-Whitney U test
17 Student's t-test for paired samples
18 Wilcoxon signed-rank test
19 One-way ANOVA
20 Welch's ANOVA
21 Repeated measures ANOVA
22 Kruskal-Wallis test
23 Friedman test

Value

An object of class AutomatedTest. The object contains the results of the statistical test performed on the data. You can use the method $get_result() to obtain more detailed information about the execution of the test.

Author(s)

Wouter Zeevat

See Also

AutomatedTest for the class used by this function.

Examples

  # Example 1: Using individual vectors
  test1 <- automatical_test(iris$Species, iris$Sepal.Length, identifiers = FALSE)

  # Example 2: Forcing a paired test
  before <- c(200, 220, 215, 205, 210)
  after <- c(202, 225, 220, 210, 215)
  paired_data <- data.frame(before, after)
  test2 <- automatical_test(before, after, paired = TRUE)

  # Retrieve more detailed information about the test
  # test1$get_result()

Internal: Check if a numeric vector follows a normal distribution

Description

This function checks whether a numeric vector is approximately normally distributed, using the Shapiro-Wilk test for small samples (n < 5000) and the Anderson-Darling test for larger ones. If the input is not numeric, the function returns NULL.

Usage

check_parametric(data)

Arguments

data

A numeric vector to test for normality.

Value

A list containing:

test

Name of the test used ("Shapiro-Wilk Test" or "Anderson-Darling Test")

statistic

The test statistic

p_value

The p-value from the test

result

Logical; TRUE if p > 0.05 (assumed normal), FALSE otherwise

Returns NULL if input is not numeric.


Returns the strength of a test. This is a different kind of value for each test. It will also return what the value is. These are the different types of data it can return:

Description

This function takes a 'test_object' that contains the result of a statistical test and returns the main coefficient, estimate, or test statistic as a numeric value. It supports various tests such as t-tests, ANOVAs, regressions, and correlations.

Usage

get_strength_from_test(test_object)

Arguments

test_object

An object containing a statistical test result and metadata, expected to have methods 'get_result()' and 'get_test()'.

Value

A named numeric value indicating the strength of the result. The type and meaning depend on the test used:

coefficient

Effect size and direction of predictors in regression

r

Correlation strength and direction

mean difference

Difference in group means

statistic

Test statistic measuring group difference or association

F statistic

Ratio of variances across groups

proportion

Estimated success rate in the sample

non-existent

No interpretable strength measure available


Internal: Returns the result of a statistical test based on a string identifier

Description

This internal function selects and runs a statistical test using data from a test object, based on the name of the test specified. It supports a wide variety of tests including t-tests, chi-square tests, ANOVA, correlation tests, regression models, and more.

Usage

get_test_from_string(test_object)

Arguments

test_object

An object containing data, identifiers, datatypes, and test selection.

Value

The result of the selected statistical test. Typically, this is a test object with class 'htest', 'aov', 'lm', or similar.


Pick the appropriate test for multiple variables (Internal Function)

Description

Pick the appropriate test for multiple variables (Internal Function)

Usage

pick_multiple_variable_test(test_object)

Arguments

test_object

An object containing the data, types, and metadata needed for test selection.

Value

A character string with the name of the appropriate regression or classification model.


Pick the appropriate test for one variable (Internal Function)

Description

Pick the appropriate test for one variable (Internal Function)

Usage

pick_one_variable_test(test_object)

Arguments

test_object

An object containing the data, data types, and comparison value.

Value

A character string with the name of the appropriate one-sample statistical test.


Check if a dataframe is parametric (Internal Function) inst

Description

Check if a dataframe is parametric (Internal Function) inst

Usage

pick_test(test_object)

Arguments

test_object

The data to check (vector of integers).

Value

TRUE if data is normalized, FALSE otherwise.


Pick the appropriate test for two variables (Internal Function)

Description

Pick the appropriate test for two variables (Internal Function)

Usage

pick_two_variable_test(test_object)

Arguments

test_object

An object containing the data, types, and metadata needed for test selection.

Value

A character string with the name of the appropriate statistical test.