% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/tbl_store.R
\name{tbl_store}
\alias{tbl_store}
\title{Define a store of tables with table-prep formulas: a table store}
\usage{
tbl_store(..., .list = list2(...), .init = NULL)
}
\arguments{
\item{...}{Expressions that contain table-prep formulas and table names for
data retrieval. Two-sided formulas (e.g, \verb{<LHS> ~ <RHS>}) are to be used,
where the left-hand side is an identifier and the right-hand contains a
statement that obtains a table (i.e., the table-prep formula). If the LHS
is omitted then an identifier will be generated for you.}

\item{.list}{Allows for the use of a list as an input alternative to \code{...}.}

\item{.init}{We can optionally provide an initialization statement (in a
one-sided formula) that should be executed whenever \emph{any} of tables in the
table store are obtained. This is useful, for instance, for including a
\code{library()} call that can be executed before any table-prep formulas in
\code{...}.}
}
\value{
A \code{tbl_store} object that contains table-prep formulas.
}
\description{
It can be useful to set up all the data sources you need and just draw from
them when necessary. This upfront configuration with \code{tbl_store()} lets us
define the methods for obtaining tabular data from mixed sources (e.g.,
database tables, tables generated from flat files, etc.) and provide
identifiers for these data preparation procedures.

What results from this work is a convenient way to materialize tables with
\code{\link[=tbl_get]{tbl_get()}}. We can also get any table-prep formula from the table store
with \code{\link[=tbl_source]{tbl_source()}}. The content of a table-prep formulas can involve reading
a table from a location, or, it can involve data transformation. One can
imagine scenarios where we might (1) procure several mutated variations of
the same source table, (2) generate a table using disparate data sources, or
(3) filter the rows of a database table according to the system time. Another
nice aspect of organizing table-prep formulas in a single object is supplying
it to the \code{tbl} argument of \code{\link[=create_agent]{create_agent()}} or \code{\link[=create_informant]{create_informant()}} via \code{$}
notation (e.g, \verb{create_agent(tbl = <tbl_store>$<name>)}) or with
\code{\link[=tbl_source]{tbl_source()}} (e.g.,
\verb{create_agent(tbl = ~ tbl_source("<name>", <tbl_store>))}).
}
\section{YAML}{


A \strong{pointblank} table store can be written to YAML with \code{\link[=yaml_write]{yaml_write()}} and
the resulting YAML can be used in several ways. The ideal scenario is to have
pointblank agents and informants also in YAML form. This way the agent and
informant can refer to the table store YAML (via \code{\link[=tbl_source]{tbl_source()}}), and, the
processing of both agents and informants can be performed with
\code{\link[=yaml_agent_interrogate]{yaml_agent_interrogate()}} and \code{\link[=yaml_informant_incorporate]{yaml_informant_incorporate()}}. With the
following R code, a table store with two table-prep formulas is generated and
written to YAML (if no filename is given then the YAML is written to
\code{"tbl_store.yml"}).

R statement for generating the \code{"tbl_store.yml"} file:

\if{html}{\out{<div class="sourceCode r">}}\preformatted{tbl_store(
  tbl_duckdb ~ db_tbl(small_table, dbname = ":memory:", dbtype = "duckdb"),
  sml_table_high ~ small_table \%>\% dplyr::filter(f == "high"),
  .init = ~ library(tidyverse)
) \%>\%
  yaml_write()
}\if{html}{\out{</div>}}

YAML representation (\code{"tbl_store.yml"}):

\if{html}{\out{<div class="sourceCode yaml">}}\preformatted{type: tbl_store
tbls:
  tbl_duckdb: ~ db_tbl(small_table, dbname = ":memory:", dbtype = "duckdb")
  sml_table_high: ~ small_table \%>\% dplyr::filter(f == "high")
init: ~library(tidyverse)
}\if{html}{\out{</div>}}

This is useful when you want to get fresh pulls of prepared data from a
source materialized in an R session (with the \code{\link[=tbl_get]{tbl_get()}} function. For
example, the \code{sml_table_high} table can be obtained by using
\code{tbl_get("sml_table_high", "tbl_store.yml")}. To get an agent to check this
prepared data periodically, then the following example with \code{\link[=tbl_source]{tbl_source()}}
will be useful:

R code to generate agent that checks \code{sml_table_high} and writing the agent
to YAML:

\if{html}{\out{<div class="sourceCode r">}}\preformatted{create_agent(
  tbl = ~ tbl_source("sml_table_high", "tbl_store.yml"),
  label = "An example that uses a table store.",
  actions = action_levels(warn_at = 0.10)
) \%>\%
  col_exists(c(date, date_time)) \%>\%
  write_yaml()
}\if{html}{\out{</div>}}

The YAML representation (\code{"agent-sml_table_high.yml"}):

\if{html}{\out{<div class="sourceCode yaml">}}\preformatted{tbl: ~ tbl_source("sml_table_high", "tbl_store.yml")
tbl_name: sml_table_high
label: An example that uses a table store.
actions:
  warn_fraction: 0.1
locale: en
steps:
  - col_exists:
    columns: c(date, date_time)
}\if{html}{\out{</div>}}

Now, whenever the \code{sml_table_high} table needs to be validated, it can be
done with \code{\link[=yaml_agent_interrogate]{yaml_agent_interrogate()}} (e.g.,
\code{yaml_agent_interrogate("agent-sml_table_high.yml")}).
}

\section{Examples}{

\subsection{Creating an in-memory table store and adding table-prep formulas}{

The table store provides a way to get the tables we need fairly easily. Think
of an identifier for the table you'd like and then provide the code necessary
to obtain that table. Then repeat as many times as you like!

Here we'll define two tables that can be materialized later: \code{tbl_duckdb} (an
in-memory DuckDB database table with \strong{pointblank}'s \code{small_table} dataset)
and \code{sml_table_high} (a filtered version of \code{tbl_duckdb}):

\if{html}{\out{<div class="sourceCode r">}}\preformatted{store_1 <-
  tbl_store(
    tbl_duckdb ~
      db_tbl(
        pointblank::small_table,
        dbname = ":memory:",
        dbtype = "duckdb"
      ),
    sml_table_high ~
      db_tbl(
        pointblank::small_table,
        dbname = ":memory:",
        dbtype = "duckdb"
      ) \%>\%
      dplyr::filter(f == "high")
  )
}\if{html}{\out{</div>}}

We can see what's in the table store \code{store_1} by printing it out:

\if{html}{\out{<div class="sourceCode r">}}\preformatted{store_1
}\if{html}{\out{</div>}}

\preformatted{## -- The `table_store` table-prep formulas
## 1 tbl_duckdb // ~ db_tbl(pointblank::small_table, dbname = ":memory:",
## dbtype = "duckdb")
## 2 sml_table_high // ~ db_tbl(pointblank::small_table, dbname = ":memory:",
## dbtype = "duckdb") \%>\% dplyr::filter(f == "high")
## ----}

It's good to check that the tables can be obtained without error. We can do
this with the \code{\link[=tbl_get]{tbl_get()}} function. With that function, we need to supply the
given name of the table-prep formula (in quotes) and the table store object.

\if{html}{\out{<div class="sourceCode r">}}\preformatted{tbl_get(tbl = "tbl_duckdb", store = store_1)
}\if{html}{\out{</div>}}

\preformatted{## # Source:   table<pointblank::small_table> [?? x 8]
## # Database: duckdb_connection
##    date_time           date           a b             c      d e     f
##    <dttm>              <date>     <int> <chr>     <dbl>  <dbl> <lgl> <chr>
##  1 2016-01-04 11:00:00 2016-01-04     2 1-bcd-345     3  3423. TRUE  high
##  2 2016-01-04 00:32:00 2016-01-04     3 5-egh-163     8 10000. TRUE  low
##  3 2016-01-05 13:32:00 2016-01-05     6 8-kdg-938     3  2343. TRUE  high
##  4 2016-01-06 17:23:00 2016-01-06     2 5-jdo-903    NA  3892. FALSE mid
##  5 2016-01-09 12:36:00 2016-01-09     8 3-ldm-038     7   284. TRUE  low
##  6 2016-01-11 06:15:00 2016-01-11     4 2-dhe-923     4  3291. TRUE  mid
##  7 2016-01-15 18:46:00 2016-01-15     7 1-knw-093     3   843. TRUE  high
##  8 2016-01-17 11:27:00 2016-01-17     4 5-boe-639     2  1036. FALSE low
##  9 2016-01-20 04:30:00 2016-01-20     3 5-bce-642     9   838. FALSE high
## 10 2016-01-20 04:30:00 2016-01-20     3 5-bce-642     9   838. FALSE high
## # … with more rows}



\if{html}{\out{<div class="sourceCode r">}}\preformatted{tbl_get(tbl = "sml_table_high", store = store_1)
}\if{html}{\out{</div>}}

\preformatted{## # Source:   lazy query [?? x 8]
## # Database: duckdb_connection
##   date_time           date           a b             c     d e     f
##   <dttm>              <date>     <int> <chr>     <dbl> <dbl> <lgl> <chr>
## 1 2016-01-04 11:00:00 2016-01-04     2 1-bcd-345     3 3423. TRUE  high
## 2 2016-01-05 13:32:00 2016-01-05     6 8-kdg-938     3 2343. TRUE  high
## 3 2016-01-15 18:46:00 2016-01-15     7 1-knw-093     3  843. TRUE  high
## 4 2016-01-20 04:30:00 2016-01-20     3 5-bce-642     9  838. FALSE high
## 5 2016-01-20 04:30:00 2016-01-20     3 5-bce-642     9  838. FALSE high
## 6 2016-01-30 11:23:00 2016-01-30     1 3-dka-303    NA 2230. TRUE  high}

We can shorten the \code{tbl_store()} statement with some syntax that
\strong{pointblank} provides. The \code{sml_table_high} table-prep is simply a
transformation of \code{tbl_duckdb}, so, we can use \code{{{ tbl_duckdb }}} in place of
the repeated statement. Additionally, we can provide a \code{library()} call to
the \code{.init} argument of \code{tbl_store()} so that \strong{dplyr} is available (thus
allowing us to use \code{filter(...)} instead of \code{dplyr::filter(...)}). Here is
the revised \code{tbl_store()} call:

\if{html}{\out{<div class="sourceCode r">}}\preformatted{store_2 <-
  tbl_store(
    tbl_duckdb ~
      db_tbl(
        pointblank::small_table,
        dbname = ":memory:",
        dbtype = "duckdb"
      ),
    sml_table_high ~
      \{\{ tbl_duckdb \}\} \%>\%
      filter(f == "high"),
    .init = ~ library(tidyverse)
  )
}\if{html}{\out{</div>}}

Printing the table store \code{store_2} now shows that we used an \code{.init}
statement:

\if{html}{\out{<div class="sourceCode r">}}\preformatted{store_2
}\if{html}{\out{</div>}}

\preformatted{## -- The `table_store` table-prep formulas
## 1 tbl_duckdb // ~ db_tbl(pointblank::small_table, dbname = ":memory:",
## dbtype = "duckdb")
## 2 sml_table_high // ~ \{\{tbl_duckdb\}\} \%>\% filter(f == "high")
## ----
## INIT // ~library(tidyverse)
## ----}

Checking again with \code{\link[=tbl_get]{tbl_get()}} should provide the same tables as before:

\if{html}{\out{<div class="sourceCode r">}}\preformatted{tbl_get(tbl = "tbl_duckdb", store = store_2)
}\if{html}{\out{</div>}}

\preformatted{## # Source:   table<pointblank::small_table> [?? x 8]
## # Database: duckdb_connection
##    date_time           date           a b             c      d e     f
##    <dttm>              <date>     <int> <chr>     <dbl>  <dbl> <lgl> <chr>
##  1 2016-01-04 11:00:00 2016-01-04     2 1-bcd-345     3  3423. TRUE  high
##  2 2016-01-04 00:32:00 2016-01-04     3 5-egh-163     8 10000. TRUE  low
##  3 2016-01-05 13:32:00 2016-01-05     6 8-kdg-938     3  2343. TRUE  high
##  4 2016-01-06 17:23:00 2016-01-06     2 5-jdo-903    NA  3892. FALSE mid
##  5 2016-01-09 12:36:00 2016-01-09     8 3-ldm-038     7   284. TRUE  low
##  6 2016-01-11 06:15:00 2016-01-11     4 2-dhe-923     4  3291. TRUE  mid
##  7 2016-01-15 18:46:00 2016-01-15     7 1-knw-093     3   843. TRUE  high
##  8 2016-01-17 11:27:00 2016-01-17     4 5-boe-639     2  1036. FALSE low
##  9 2016-01-20 04:30:00 2016-01-20     3 5-bce-642     9   838. FALSE high
## 10 2016-01-20 04:30:00 2016-01-20     3 5-bce-642     9   838. FALSE high
## # … with more rows}



\if{html}{\out{<div class="sourceCode r">}}\preformatted{tbl_get(tbl = "sml_table_high", store = store_2)
}\if{html}{\out{</div>}}

\preformatted{## # Source:   lazy query [?? x 8]
## # Database: duckdb_connection
##   date_time           date           a b             c     d e     f
##   <dttm>              <date>     <int> <chr>     <dbl> <dbl> <lgl> <chr>
## 1 2016-01-04 11:00:00 2016-01-04     2 1-bcd-345     3 3423. TRUE  high
## 2 2016-01-05 13:32:00 2016-01-05     6 8-kdg-938     3 2343. TRUE  high
## 3 2016-01-15 18:46:00 2016-01-15     7 1-knw-093     3  843. TRUE  high
## 4 2016-01-20 04:30:00 2016-01-20     3 5-bce-642     9  838. FALSE high
## 5 2016-01-20 04:30:00 2016-01-20     3 5-bce-642     9  838. FALSE high
## 6 2016-01-30 11:23:00 2016-01-30     1 3-dka-303    NA 2230. TRUE  high}
}

\subsection{Using a table store in a data validation workflow}{

Define a \code{tbl_store} object by adding table-prep formulas inside the
\code{\link[=tbl_store]{tbl_store()}} call.

\if{html}{\out{<div class="sourceCode r">}}\preformatted{store_3 <-
  tbl_store(
    small_table_duck ~ db_tbl(
      table = small_table,
      dbname = ":memory:",
      dbtype = "duckdb"
    ),
    ~ db_tbl(
      table = "rna",
      dbname = "pfmegrnargs",
      dbtype = "postgres",
      host = "hh-pgsql-public.ebi.ac.uk",
      port = 5432,
      user = I("reader"),
      password = I("NWDMCE5xdipIjRrp")
    ),
    all_revenue ~ db_tbl(
      table = file_tbl(
        file = from_github(
          file = "sj_all_revenue_large.rds",
          repo = "rich-iannone/intendo",
          subdir = "data-large"
        )
      ),
      dbname = ":memory:",
      dbtype = "duckdb"
    ),
    sml_table ~ pointblank::small_table
  )
}\if{html}{\out{</div>}}

Let's get a summary of what's in the table store \code{store_3} through printing:

\if{html}{\out{<div class="sourceCode r">}}\preformatted{store_3
}\if{html}{\out{</div>}}

\preformatted{## -- The `table_store` table-prep formulas
## 1 small_table_duck // ~ db_tbl(table = small_table, dbname = ":memory:",
## dbtype = "duckdb")
## 2 rna // ~db_tbl(table = "rna", dbname = "pfmegrnargs", dbtype =
## "postgres", host = "hh-pgsql-public.ebi.ac.uk", port = 5432, user =
## I("reader"), password = I("NWDMCE5xdipIjRrp"))
## 3 all_revenue // ~ db_tbl(table = file_tbl(file = from_github(file =
## "sj_all_revenue_large.rds", repo = "rich-iannone/intendo", subdir =
## "data-large")), dbname = ":memory:", dbtype = "duckdb")
## 4 sml_table // ~ pointblank::small_table
## ----}

Once this object is available, you can check that the table of interest is
produced to your specification with the \code{\link[=tbl_get]{tbl_get()}} function.

\if{html}{\out{<div class="sourceCode r">}}\preformatted{tbl_get(
  tbl = "small_table_duck",
  store = store_3
)
}\if{html}{\out{</div>}}

\preformatted{## # Source:   table<small_table> [?? x 8]
## # Database: duckdb_connection
##    date_time           date           a b             c      d e     f
##    <dttm>              <date>     <int> <chr>     <dbl>  <dbl> <lgl> <chr>
##  1 2016-01-04 11:00:00 2016-01-04     2 1-bcd-345     3  3423. TRUE  high
##  2 2016-01-04 00:32:00 2016-01-04     3 5-egh-163     8 10000. TRUE  low
##  3 2016-01-05 13:32:00 2016-01-05     6 8-kdg-938     3  2343. TRUE  high
##  4 2016-01-06 17:23:00 2016-01-06     2 5-jdo-903    NA  3892. FALSE mid
##  5 2016-01-09 12:36:00 2016-01-09     8 3-ldm-038     7   284. TRUE  low
##  6 2016-01-11 06:15:00 2016-01-11     4 2-dhe-923     4  3291. TRUE  mid
##  7 2016-01-15 18:46:00 2016-01-15     7 1-knw-093     3   843. TRUE  high
##  8 2016-01-17 11:27:00 2016-01-17     4 5-boe-639     2  1036. FALSE low
##  9 2016-01-20 04:30:00 2016-01-20     3 5-bce-642     9   838. FALSE high
## 10 2016-01-20 04:30:00 2016-01-20     3 5-bce-642     9   838. FALSE high
## # … with more rows}


Another way to get the same table materialized is by using \code{$} to get the
entry of choice for \code{\link[=tbl_get]{tbl_get()}}.

\if{html}{\out{<div class="sourceCode r">}}\preformatted{store_3$small_table_duck \%>\% tbl_get()
}\if{html}{\out{</div>}}

\preformatted{## # Source:   table<small_table> [?? x 8]
## # Database: duckdb_connection
##    date_time           date           a b             c      d e     f
##    <dttm>              <date>     <int> <chr>     <dbl>  <dbl> <lgl> <chr>
##  1 2016-01-04 11:00:00 2016-01-04     2 1-bcd-345     3  3423. TRUE  high
##  2 2016-01-04 00:32:00 2016-01-04     3 5-egh-163     8 10000. TRUE  low
##  3 2016-01-05 13:32:00 2016-01-05     6 8-kdg-938     3  2343. TRUE  high
##  4 2016-01-06 17:23:00 2016-01-06     2 5-jdo-903    NA  3892. FALSE mid
##  5 2016-01-09 12:36:00 2016-01-09     8 3-ldm-038     7   284. TRUE  low
##  6 2016-01-11 06:15:00 2016-01-11     4 2-dhe-923     4  3291. TRUE  mid
##  7 2016-01-15 18:46:00 2016-01-15     7 1-knw-093     3   843. TRUE  high
##  8 2016-01-17 11:27:00 2016-01-17     4 5-boe-639     2  1036. FALSE low
##  9 2016-01-20 04:30:00 2016-01-20     3 5-bce-642     9   838. FALSE high
## 10 2016-01-20 04:30:00 2016-01-20     3 5-bce-642     9   838. FALSE high
## # … with more rows}


Creating an agent is easy when all table-prep formulas are encapsulated in a
\code{tbl_store} object. Use \code{$} notation to pass the appropriate procedure for
reading a table to the \code{tbl} argument.

\if{html}{\out{<div class="sourceCode r">}}\preformatted{agent_1 <-
  create_agent(
    tbl = store_3$small_table_duck
  )
}\if{html}{\out{</div>}}

There are other ways to use the table store to assign a target table to an
agent, like using the \code{\link[=tbl_source]{tbl_source()}} function (which extracts the table-prep
formula from the table store).

\if{html}{\out{<div class="sourceCode r">}}\preformatted{agent_2 <-
  create_agent(
    tbl = ~ tbl_source(
      tbl = "small_table_duck",
      store = store_3
      )
  )
}\if{html}{\out{</div>}}
}

\subsection{Writing a table store to a YAML file}{

The table store can be moved to YAML with \code{yaml_write} and the \code{\link[=tbl_source]{tbl_source()}}
call could then refer to that on-disk table store. Let's do that YAML
conversion.

\if{html}{\out{<div class="sourceCode r">}}\preformatted{yaml_write(store_3)
}\if{html}{\out{</div>}}

The above writes the \code{tbl_store.yml} file (by not providing a \code{filename} this
default filename is chosen).

It can be convenient to read table-prep formulas from a YAML file that's a
table store. To achieve this, we can modify the \code{\link[=tbl_source]{tbl_source()}} statement in
the \code{\link[=create_agent]{create_agent()}} call so that \code{store} refers to the on-disk YAML file.

\if{html}{\out{<div class="sourceCode r">}}\preformatted{agent_3 <-
  create_agent(
    tbl = ~ tbl_source(
      tbl = "small_table_duck",
      store = "tbl_store.yml"
    )
  )
}\if{html}{\out{</div>}}
}
}

\section{Function ID}{

1-8
}

\seealso{
Other Planning and Prep: 
\code{\link{action_levels}()},
\code{\link{create_agent}()},
\code{\link{create_informant}()},
\code{\link{db_tbl}()},
\code{\link{draft_validation}()},
\code{\link{file_tbl}()},
\code{\link{scan_data}()},
\code{\link{tbl_get}()},
\code{\link{tbl_source}()},
\code{\link{validate_rmd}()}
}
\concept{Planning and Prep}
