% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/diagnose.R
\name{diagnose_numeric}
\alias{diagnose_numeric}
\alias{diagnose_numeric.data.frame}
\title{Diagnose data quality of numerical variables}
\usage{
diagnose_numeric(.data, ...)

\method{diagnose_numeric}{data.frame}(.data, ...)
}
\arguments{
\item{.data}{a data.frame or a \code{\link{tbl_df}}.}

\item{...}{one or more unquoted expressions separated by commas.
You can treat variable names like they are positions.
Positive values select variables; negative values to drop variables.
If the first expression is negative, diagnose_numeric() will automatically
start with all variables.
These arguments are automatically quoted and evaluated in a context where column names
represent column positions.
They support unquoting and splicing.}
}
\value{
an object of tbl_df.
}
\description{
The diagnose_numeric() produces information
for diagnosing the quality of the numerical data.
}
\details{
The scope of the diagnosis is the calculate a statistic that can be
used to understand the distribution of numerical data.
min, Q1, mean, median, Q3, max can be used to estimate the distribution
of data. If the number of zero or minus is large, it is necessary to suspect
the error of the data. If the number of outliers is large, a strategy of
eliminating or replacing outliers is needed.
}
\section{Numerical diagnostic information}{

The information derived from the numerical data diagnosis is as follows.

\itemize{
\item variables : variable names
\item min : minimum
\item Q1 : 25 percentile
\item mean : arithmetic average
\item median : median. 50 percentile
\item Q3 : 75 percentile
\item max : maximum
\item zero : count of zero values
\item minus : count of minus values
\item outlier : count of outliers
}

See vignette("diagonosis") for an introduction to these concepts.
}

\examples{
# Generate data for the example
carseats <- ISLR::Carseats
carseats[sample(seq(NROW(carseats)), 20), "Income"] <- NA
carseats[sample(seq(NROW(carseats)), 5), "Urban"] <- NA

# Diagnosis of numerical variables
diagnose_numeric(carseats)

# Select the variable to diagnose
diagnose_numeric(carseats, Sales, Income)
diagnose_numeric(carseats, -Sales, -Income)
diagnose_numeric(carseats, "Sales", "Income")
diagnose_numeric(carseats, 5)

# Using pipes ---------------------------------
library(dplyr)

# Diagnosis of all numerical variables
carseats \%>\%
  diagnose_numeric()
# Positive values select variables
carseats \%>\%
  diagnose_numeric(Sales, Income)
# Negative values to drop variables
carseats \%>\%
  diagnose_numeric(-Sales, -Income)
# Positions values select variables
carseats \%>\%
  diagnose_numeric(5)
# Positions values select variables
carseats \%>\%
  diagnose_numeric(-1, -5)

# Using pipes & dplyr -------------------------
# Information records of zero variable more than 0
carseats \%>\%
  diagnose_numeric()  \%>\%
  filter(zero > 0)
}
\seealso{
\code{\link{diagnose_numeric.tbl_dbi}}, \code{\link{diagnose.data.frame}}, \code{\link{diagnose_category.data.frame}}, \code{\link{diagnose_outlier.data.frame}}.
}
