% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/bind_tf_idf2.R
\name{bind_tf_idf2}
\alias{bind_tf_idf2}
\title{Bind the term frequency and inverse document frequency}
\usage{
bind_tf_idf2(
  tbl,
  term = "token",
  document = "doc_id",
  n = "n",
  tf = c("tf", "tf2", "tf3"),
  idf = c("idf", "idf2", "idf3", "idf4"),
  norm = FALSE,
  rmecab_compat = TRUE
)
}
\arguments{
\item{tbl}{A tidy text dataset.}

\item{term}{Column containing terms as string or symbol.}

\item{document}{Column containing document IDs as string or symbol.}

\item{n}{Column containing document-term counts as string or symbol.}

\item{tf}{Method for computing term frequency.}

\item{idf}{Method for computing inverse document frequency.}

\item{norm}{Logical; If passed as \code{TRUE}, the raw term counts are normalized
being divided with L2 norms before computing IDF values.}

\item{rmecab_compat}{Logical; If passed as \code{TRUE}, computes values while
taking care of compatibility with 'RMeCab'.
Note that 'RMeCab' always computes IDF values using term frequency
rather than raw term counts, and thus TF-IDF values may be
doubly affected by term frequency.}
}
\value{
A data.frame.
}
\description{
Calculates and binds the term frequency, inverse document frequency,
and TF-IDF of the dataset.
This function experimentally supports 3 types of term frequencies
and 4 types of inverse document frequencies,
which are implemented in 'RMeCab' package.
}
\details{
Types of term frequency can be switched with \code{tf} argument:
\itemize{
\item \code{tf} is term frequency (not raw count of terms).
\item \code{tf2} is logarithmic term frequency of which base is 10.
\item \code{tf3} is binary-weighted term frequency.
}

Types of inverse document frequencies can be switched with \code{idf} argument:
\itemize{
\item \code{idf} is inverse document frequency of which base is 2, with smoothed.
'smoothed' here means just adding 1 to raw counts after logarithmizing.
\item \code{idf2} is global frequency IDF.
\item \code{idf3} is probabilistic IDF of which base is 2.
\item \code{idf4} is global entropy, not IDF in actual.
}
}
\examples{
\dontrun{
df <- tokenize(
  data.frame(
    doc_id = seq_along(audubon::polano[5:8]),
    text = audubon::polano[5:8]
  )
) |>
  dplyr::group_by(doc_id) |>
  dplyr::count(token) |>
  dplyr::ungroup()
bind_tf_idf2(df)
}
}
