% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/ngram.R
\name{ngram}
\alias{ngram}
\title{Get n-gram frequencies}
\usage{
ngram(
  phrases,
  corpus = "eng_2019",
  year_start = 1800,
  year_end = 2020,
  smoothing = 3,
  case_ins = FALSE,
  aggregate = FALSE,
  count = FALSE,
  drop_corpus = FALSE,
  drop_parent = FALSE,
  drop_all = FALSE,
  type = FALSE
)
}
\arguments{
\item{phrases}{vector of phrases, with a maximum of 12 items}

\item{corpus}{Google corpus to search (see Details for possible values)}

\item{year_start}{start year, default is 1800. Data available back to 1500.}

\item{year_end}{end year, default is 2008}

\item{smoothing}{smoothing parameter, default is 3}

\item{case_ins}{Logical indicating whether to force a case insensitive search.
Default is \code{FALSE}.}

\item{aggregate}{Sum up the frequencies for ngrams associated with wildcard
or case insensitive searches. Default is \code{FALSE}.}

\item{count}{Default is \code{FALSE}.}

\item{drop_corpus}{When a corpus is specified directly with the ngram 
(e.g \code{dog:eng_fiction_2012}) should the corpus be used retained in
the phrase column of the results. Default is \code{FALSE}.}

\item{drop_parent}{Drop the parent phrase associated with a wildcard
or case-insensitive search. Default is \code{FALSE}.}

\item{drop_all}{Delete the suffix "(All)" from aggregated case-insensitive
searches. Default is \code{FALSE}.}

\item{type}{Include the Google return type (e.g. NGRAM, NGRAM_COLLECTION,
EXPANSION) from result set. Default is \code{FALSE}.}
}
\description{
\code{ngram} downloads data from the Google Ngram Viewer website and
returns it in a dataframe.
}
\details{
Google generated two datasets drawn from digitised books in the Google
 Books collection. One was generated in July 2009, the second in July 2012
 and the third in 2019. Google is expected to update these datasets as book
 scanning continues.

 This function provides the annual frequency of words or phrases, known
 as n-grams, in a sub-collection or "corpus" taken from the Google Books
 collection.The search across the corpus is case-sensitive. For a
 case-insensitive search use \code{\link{ngrami}}.
 
 Note that the \code{tag} option is no longer available. Tags should be
 specified directly in the ngram string (see examples).

Below is a list of available corpora.
\tabular{ll}{
\bold{Corpus} \tab \bold{Corpus Name}\cr
eng_us_2019\tab American English 2019\cr
eng_us_2012\tab American English 2012\cr
eng_us_2009\tab American English 2009\cr
eng_gb_2019\tab British English 2019\cr
eng_gb_2012\tab British English 2012\cr
eng_gb_2009\tab British English 2009\cr
chi_sim_2019\tab Chinese 2019\cr
chi_sim_2012\tab Chinese 2012\cr
chi_sim_2009\tab Chinese 2009\cr
eng_2019\tab English 2019\cr
eng_2012\tab English 2012\cr
eng_2009\tab English 2009\cr
eng_fiction_2019\tab English Fiction 2019\cr
eng_fiction_2012\tab English Fiction 2012\cr
eng_fiction_2009\tab English Fiction 2009\cr
eng_1m_2009\tab Google One Million\cr
fre_2019\tab French 2019\cr
fre_2012\tab French 2012\cr
fre_2009\tab French 2009\cr
ger_2019\tab German 2019\cr
ger_2012\tab German 2012\cr
ger_2009\tab German 2009\cr
heb_2019\tab Hebrew 2019\cr
heb_2012\tab Hebrew 2012\cr
heb_2009\tab Hebrew 2009\cr
spa_2019\tab Spanish 2019\cr
spa_2012\tab Spanish 2012\cr
spa_2009\tab Spanish 2009\cr
rus_2019\tab Russian 2019\cr
rus_2012\tab Russian 2012\cr
rus_2009\tab Russian 2009\cr
ita_2019\tab Italian 2019\cr
ita_2012\tab Italian 2012\cr
}

The Google Million is a sub-collection of Google Books. All are in
English with dates ranging from 1500 to 2008.
No more than about 6,000 books were chosen from any one year, which
means that all of the scanned books from early years are present,
and books from later years are randomly sampled. The random samplings
reflect the subject distributions for the year (so there are more
computer books in 2000 than 1980).

See \url{http://books.google.com/ngrams/info} for the full Ngram syntax.
}
\examples{
\donttest{ngram(c("mouse", "rat"), year_start = 1950)
ngram(c("blue_ADJ", "red_ADJ"))
ngram(c("_START_ President Roosevelt", "_START_ President Truman"), year_start = 1920)
}
}
