stddiff.spark: Calculate the Standardized Difference for Numeric, Binary and Category Variables in Apache Spark

Provides functions to compute standardized differences for numeric, binary, and categorical variables on Apache Spark DataFrames using 'sparklyr'. The implementation mirrors the methods used in the 'stddiff' package but operates on distributed data. See Zhicheng Du, Yuantao Hao (2022) <doi:10.32614/CRAN.package.stddiff> for reference.

Version: 1.0
Depends: R (≥ 4.1.0)
Imports: dplyr (≥ 1.1.0), tidyr (≥ 1.3.0), sparklyr (≥ 1.8.0)
Suggests: stddiff (≥ 2.1.0), testthat (≥ 3.0.0), withr
Published: 2026-01-15
DOI: 10.32614/CRAN.package.stddiff.spark (may not be active yet)
Author: Alicja Januszkiewicz [aut, cre, cph]
Maintainer: Alicja Januszkiewicz <cran.alicja.januszkiewicz at gmail.com>
BugReports: https://github.com/alicja-januszkiewicz/stddiff.spark/issues
License: GPL (≥ 3)
URL: https://github.com/alicja-januszkiewicz/stddiff.spark
NeedsCompilation: no
SystemRequirements: Apache Spark (tested with 3.4.4)
Materials: README, NEWS
CRAN checks: stddiff.spark results

Documentation:

Reference manual: stddiff.spark.html , stddiff.spark.pdf

Downloads:

Package source: stddiff.spark_1.0.tar.gz
Windows binaries: r-devel: not available, r-release: not available, r-oldrel: not available
macOS binaries: r-release (arm64): not available, r-oldrel (arm64): not available, r-release (x86_64): not available, r-oldrel (x86_64): not available

Linking:

Please use the canonical form https://CRAN.R-project.org/package=stddiff.spark to link to this page.