Help for package automatedtests

Type:

Package

Title:

Automating Choosing Statistical Tests

Version:

0.1.2

Maintainer:

Wouter Zeevat <wouterzeevat@gmail.com>

Description:

Automatically selects and runs the most appropriate statistical test for your data, returning clear, easy-to-read results. Ideal for all experience levels.

License:

GPL-3

Encoding:

UTF-8

URL:

https://github.com/wouterzeevat/automatedtests

BugReports:

https://github.com/wouterzeevat/automatedtests/issues

RoxygenNote:

7.3.2

VignetteBuilder:

knitr

Suggests:

knitr, rmarkdown

Imports:

R6, nnet, nortest, stats, DescTools

Depends:

R (≥ 4.0)

NeedsCompilation:

Packaged:

2025-06-16 17:31:41 UTC; seaba

Author:

Wouter Zeevat [aut, cre]

Repository:

CRAN

Date/Publication:

2025-06-16 17:50:02 UTC

AutomatedTest class

Description

The AutomatedTest class represents a result of a statistical test. It contains attributes such as the p-value, degrees of freedom, and more.

Methods

Method `new()`

Initialize an instance of the AutomatedTest class

Usage

AutomatedTest$new(data, identifiers, compare_to = NULL, paired = FALSE)

Arguments

data: A dataframe containing the data for the test.
identifiers: A vector with the identifiers.
compare_to: Numeric value to compare to for comparison in one-sample tests. Default is NULL.
paired: Logical; if TRUE, the test will be performed as paired if applicable. Default is FALSE.

Method `get_data()`

Get the data used in the test

Usage

AutomatedTest$get_data()

Returns

A dataframe with all features

Method `is_paired()`

Shows if the data is paired, if there are multiple rows with the same identifier, the data has more samples (TIDY DATA). Making the data paired.

Usage

AutomatedTest$is_paired()

Returns

Whether the data is paired (TRUE/FALSE).

Method `get_identifiers()`

A list of the identifiers used for the data

Usage

AutomatedTest$get_identifiers()

Returns

Returns the identifiers

Method `get_compare_to()`

Get the comparison value for one-sample tests

Usage

AutomatedTest$get_compare_to()

Returns

A numeric value for comparison

Method `set_compare_co()`

Updates the compare_to variable. Is public because the compare value can get changed depending on the type of test. This function is public because it needs to be able to be called by automatical_test()

Usage

AutomatedTest$set_compare_co(compare_to)

Arguments

compare_to: Numeric value to compare to.

Returns

Updated object with comparison value set.

Method `get_datatypes()`

Get the data types of the features in the object

Usage

AutomatedTest$get_datatypes()

Returns

A list of data types (e.g., Quantitative or Qualitative)

Method `get_parametric_list()`

Get the parametric test results of the features

Usage

AutomatedTest$get_parametric_list()

Returns

A list of parametric test results

Method `is_parametric()`

Check if the data meets parametric assumptions

Usage

AutomatedTest$is_parametric()

Returns

TRUE if parametric assumptions are met, otherwise FALSE

Method `get_test()`

Get the statistical test that was chosen

Usage

AutomatedTest$get_test()

Returns

The name of the statistical test

Method `get_result()`

Get the result of selected statistical test

Usage

AutomatedTest$get_result()

Returns

The result of the statistical test

Method `get_strength()`

Get the strength(s) of selected statistical test.

Usage

AutomatedTest$get_strength()

Returns

A named numeric value indicating the strength of the result. The type and meaning depend on the test used:

coefficient: Effect size and direction of predictors in regression
r: Correlation strength and direction
mean difference: Difference in group means
statistic: Test statistic measuring group difference or association
F statistic: Ratio of variances across groups
proportion: Estimated success rate in the sample
non-existent: No interpretable strength measure available

Method `is_significant()`

Whether the test results are significant or not.

Usage

AutomatedTest$is_significant()

Returns

TRUE / FALSE depending on the significance of the test.

Method `print()`

Print a summary of the test object

Usage

AutomatedTest$print()

Method `clone()`

The objects of this class are cloneable with this method.

Usage

AutomatedTest$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Automatically Run a Statistical Test

Description

Automatically choose the best fitting statistical test for your data, and returns an easily readable AutomatedTest object from either a data frame or individual vectors. This object contains the executed test together with all statistics and properties.

Usage

  automatical_test(..., compare_to = NULL, identifiers = FALSE, paired = FALSE)

Arguments

...

Either a single data frame or multiple equal-length vectors representing columns of data.

compare_to

A numeric value to compare against during a one-sample test. If the data is categorical, the value will default to 1/k, where k is the number of categories, assuming a uniform distribution. If numeric, the default will be 0.

identifiers

Logical; if TRUE, the first column/vector is treated as identifiers and excluded from testing.

paired

Logical; if TRUE, the test will be performed as paired if applicable, regardless of whether identifiers are provided. This applies to paired tests like McNemar's or the Cochran Q test.

Details

The automatical_test function automatically selects and runs the most fitting statistical test based on the data provided. It can accept data as either a single data frame or multiple individual vectors, provided the vectors are of equal length.

If identifiers is set to TRUE, the first column will be treated as identifiers and excluded from the test, supporting TIDY data.

When a multiple group test is selected (i.e., more than two groups, columns, or variables are used), the first non-identifier column will be used as the grouping or target variable, meaning all other variables will be tested against it.

The paired parameter can be used to force paired testing for supported tests (such as McNemar's test or Cochran's Q), even if identifiers are not explicitly included in the input.

If you want to override the defaults, you can change the compare_to value to specify one-sample tests.

Once the test has been executed, you can use the method $get_result() on the resulting object to get more detailed information about the test's execution, including a summary of the test used and all statistics.

Supported tests:

ID	Test
1	One-proportion test
2	Chi-square goodness-of-fit test
3	One-sample Student's t-test
4	One-sample Wilcoxon test
5	Multiple linear regression
6	Binary logistic regression
7	Multinomial logistic regression
8	Pearson correlation
9	Spearman's rank correlation
10	Cochran's Q test
11	McNemar's test
12	Fisher's exact test
13	Chi-square test of independence
14	Student's t-test for independent samples
15	Welch's t-test for independent samples
16	Mann-Whitney U test
17	Student's t-test for paired samples
18	Wilcoxon signed-rank test
19	One-way ANOVA
20	Welch's ANOVA
21	Repeated measures ANOVA
22	Kruskal-Wallis test
23	Friedman test

Value

An object of class AutomatedTest. The object contains the results of the statistical test performed on the data. You can use the method $get_result() to obtain more detailed information about the execution of the test.

Author(s)

Wouter Zeevat

Examples

  # Example 1: Using individual vectors
  test1 <- automatical_test(iris$Species, iris$Sepal.Length, identifiers = FALSE)

  # Example 2: Forcing a paired test
  before <- c(200, 220, 215, 205, 210)
  after <- c(202, 225, 220, 210, 215)
  paired_data <- data.frame(before, after)
  test2 <- automatical_test(before, after, paired = TRUE)

  # Retrieve more detailed information about the test
  # test1$get_result()

Internal: Check if a numeric vector follows a normal distribution

Description

This function checks whether a numeric vector is approximately normally distributed, using the Shapiro-Wilk test for small samples (n < 5000) and the Anderson-Darling test for larger ones. If the input is not numeric, the function returns NULL.

Usage

check_parametric(data)

Arguments

data

A numeric vector to test for normality.

Value

A list containing:

test: Name of the test used ("Shapiro-Wilk Test" or "Anderson-Darling Test")
statistic: The test statistic
p_value: The p-value from the test
result: Logical; TRUE if p > 0.05 (assumed normal), FALSE otherwise

Returns NULL if input is not numeric.

Returns the strength of a test. This is a different kind of value for each test. It will also return what the value is. These are the different types of data it can return:

Description

This function takes a 'test_object' that contains the result of a statistical test and returns the main coefficient, estimate, or test statistic as a numeric value. It supports various tests such as t-tests, ANOVAs, regressions, and correlations.

Usage

get_strength_from_test(test_object)

Arguments

test_object

An object containing a statistical test result and metadata, expected to have methods 'get_result()' and 'get_test()'.

Value

A named numeric value indicating the strength of the result. The type and meaning depend on the test used:

coefficient: Effect size and direction of predictors in regression
r: Correlation strength and direction
mean difference: Difference in group means
statistic: Test statistic measuring group difference or association
F statistic: Ratio of variances across groups
proportion: Estimated success rate in the sample
non-existent: No interpretable strength measure available

Internal: Returns the result of a statistical test based on a string identifier

Description

This internal function selects and runs a statistical test using data from a test object, based on the name of the test specified. It supports a wide variety of tests including t-tests, chi-square tests, ANOVA, correlation tests, regression models, and more.

Usage

get_test_from_string(test_object)

Arguments

test_object

An object containing data, identifiers, datatypes, and test selection.

Value

The result of the selected statistical test. Typically, this is a test object with class 'htest', 'aov', 'lm', or similar.

Pick the appropriate test for multiple variables (Internal Function)

Description

Pick the appropriate test for multiple variables (Internal Function)

Usage

pick_multiple_variable_test(test_object)

Arguments

test_object

An object containing the data, types, and metadata needed for test selection.

Value

A character string with the name of the appropriate regression or classification model.

Pick the appropriate test for one variable (Internal Function)

Description

Pick the appropriate test for one variable (Internal Function)

Usage

pick_one_variable_test(test_object)

Arguments

test_object

An object containing the data, data types, and comparison value.

Value

A character string with the name of the appropriate one-sample statistical test.

Check if a dataframe is parametric (Internal Function) inst

Description

Check if a dataframe is parametric (Internal Function) inst

Usage

pick_test(test_object)

Arguments

test_object

The data to check (vector of integers).

Value

TRUE if data is normalized, FALSE otherwise.

Pick the appropriate test for two variables (Internal Function)

Description

Pick the appropriate test for two variables (Internal Function)

Usage

pick_two_variable_test(test_object)

Arguments

test_object

An object containing the data, types, and metadata needed for test selection.

Value

A character string with the name of the appropriate statistical test.

AutomatedTest class

Description

Methods

Public methods

Method new()

Usage

Arguments

Method get_data()

Usage

Returns

Method is_paired()

Usage

Returns

Method get_identifiers()

Usage

Returns

Method get_compare_to()

Usage

Returns

Method set_compare_co()

Usage

Arguments

Returns

Method get_datatypes()

Usage

Returns

Method get_parametric_list()

Usage

Returns

Method is_parametric()

Usage

Returns

Method get_test()

Usage

Returns

Method get_result()

Usage

Returns

Method get_strength()

Usage

Returns

Method is_significant()

Usage

Returns

Method print()

Usage

Method clone()

Usage

Arguments

Automatically Run a Statistical Test

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Internal: Check if a numeric vector follows a normal distribution

Description

Usage

Arguments

Value

Returns the strength of a test. This is a different kind of value for each test. It will also return what the value is. These are the different types of data it can return:

Description

Usage

Arguments

Value

Internal: Returns the result of a statistical test based on a string identifier

Description

Usage

Arguments

Value

Pick the appropriate test for multiple variables (Internal Function)

Description

Usage

Arguments

Value

Pick the appropriate test for one variable (Internal Function)

Description

Method `new()`

Method `get_data()`

Method `is_paired()`

Method `get_identifiers()`

Method `get_compare_to()`

Method `set_compare_co()`

Method `get_datatypes()`

Method `get_parametric_list()`

Method `is_parametric()`

Method `get_test()`

Method `get_result()`

Method `get_strength()`

Method `is_significant()`

Method `print()`

Method `clone()`