Type: | Package |
Title: | Handle Bitfields to Record Meta Data |
Version: | 0.6.1 |
Description: | Record algorithmic and analytic meta data along a workflow to store that in a bitfield, which can be published alongside any (modelled) data products. |
URL: | https://github.com/bitfloat/bitfield, https://bitfloat.github.io/bitfield/ |
BugReports: | https://github.com/bitfloat/bitfield/issues |
License: | GPL (≥ 3) |
Encoding: | UTF-8 |
LazyData: | true |
RoxygenNote: | 7.3.2 |
Imports: | base64enc, checkmate, codetools, crayon, dplyr, gh, gitcreds, glue, httr, methods, purrr, rlang, stringr, terra, tibble, tidyr, tidyselect, yaml |
Suggests: | knitr, rmarkdown, testthat (≥ 3.0.0) |
Depends: | R (≥ 4.1.0) |
Config/testthat/edition: | 3 |
NeedsCompilation: | no |
Packaged: | 2025-04-29 11:57:55 UTC; steff |
Author: | Steffen Ehrmann |
Maintainer: | Steffen Ehrmann <steffen.ehrmann@posteo.de> |
Repository: | CRAN |
Date/Publication: | 2025-05-01 10:30:08 UTC |
bitfield: Handle Bitfields to record Meta Data
Description
The bitfield package provides tools to record analytic and algorithmic meta data or just any ordinary values to store in a bitfield. A bitfield can accompany any (modelled) dataset and can give insight into data quality, provenance, and intermediate values, or can be used to store various output values per observation in a highly compressed form.
Details
The general workflow consists of defining a registry with
bf_registry
, mapping tests to bit-flags with
bf_map
, to encode this with bf_encode
into an
integer value that can be stored and published, or decoded (with
bf_decode
) and re-used in a downstream application. Additional
bit-flag protocols can be defined (with bf_protocol
) and shared
as standard with the community via bf_standards
.
Author(s)
Maintainer, Author: Steffen Ehrmann steffen.ehrmann@posteo.de
See Also
Github project: https://github.com/bitfloat/bitfield
Report bugs: https://github.com/bitfloat/bitfield/issues
Identify packages to custom functions
Description
Identify packages to custom functions
Usage
.getDependencies(fun)
Arguments
fun |
|
Value
vector of packages that are required to run the function.
Determine encoding
Description
Determine encoding
Usage
.makeEncoding(var, type, ...)
Arguments
var |
the variable for which to determine encoding. |
type |
the encoding type for which to determine encoding. |
... |
|
Details
Floating point values are encoded with three fields that can be readily stored as bit sequence. Any numeric value can be represented in scientific notation, for example, the decimal 923.52 can be represented as 9.2352 * 10^2. These decimal values can be transformed to binary values, which can then likewise be represented in scientific notation. Here, the 10 is replaced by a 2 (because we go from decimal to binary), for example the binary value 101011.101 can be represented as 1.01011101 * 2^5. This scientific notation can now be broken down into the three previously mentioned fields, one for the sign (positive or negative), one for the exponent and one for the remaining part, the mantissa (or significand). For background information on how these fields are processed, study for instance 'Floating Point' by Thomas Finley and check out https://float.exposed/ to play around with floating point encoding. Depending on the encoding needs, these three values can be adapted, for example increase the exponent to provide a wider range (i.e., smaller small and larger large values) or increase the mantissa to provide more precision (i.e., more decimal digits). In the scope of this package, these three values are documented with a tag of the form [x.y.z], with x = number of sign bits (either 0 or 1), y = number of exponent bits, and z number of mantissa bits.
When handling values that are not numeric, this package makes use of the same system, only that sign and exponent are set to 0, while the mantissa bits are set to either 1 (for binary responses [0.0.1]), or to whatever number of cases are required (i.e., for 8 cases with 3 required bits, resulting in the tag [0.0.3]).
Possible options (...
) of this function are
-
precision
: switch that determines the configuration of the floating point encoding. Possible values are"half"
[1.5.10],"bfloat16"
[1.8.7],"tensor19"
[1.8.10],"fp24"
[1.7.16],"pxr24"
[1.8.15],"single"
[1.8.23] and"double"
[1.11.52], -
fields
: list of custom values that control how many bits are allocated tosign
,exponent
andmantissa
for encoding the numeric values, -
range
: the ratio between the smallest and largest possible value to be reliably represented (modifies the exponent), -
decimals
: the number of decimal digits that should be represented reliably (modifies the mantissa).
Value
list of the encoding values for sign, exponent and mantissa, and an additional provenance term.
Extract values and metadata from terra::SpatRaster
Description
Extract values and metadata from terra::SpatRaster
Usage
.rast(x)
Arguments
x |
SpatRaster(1) |
Details
This function simply extracts the values from x
and appends
the raster metadata as attributes.
Value
the function that extracts values and metadata.
Make a binary value from an integer
Description
Make a binary value from an integer
Usage
.toBin(x, len = NULL, pad = TRUE, dec = FALSE)
Arguments
x |
|
len |
|
pad |
|
dec |
|
Make an integer from a binary value
Description
Make an integer from a binary value
Usage
.toDec(x)
Arguments
x |
|
Determine and write MD5 sum
Description
Determine and write MD5 sum
Usage
.updateMD5(x)
Arguments
x |
|
Details
This function follows this algorithm:
set the current MD5 checksum to NA_character_,
write the registry into the temporary directory,
calculate the checksum of this file and finally
store the checksum in the md5 slot of the registry.
This means that when comparing the MD5 checksum in this slot, one first has to set that value also to NA_character_, otherwise the two values won't coincide.
Value
this function is called for its side-effect of storing the MD5 checksum in the md5 slot of the registry.
Validate a bit-flag protocol
Description
Validate a bit-flag protocol
Usage
.validateProtocol(protocol)
Arguments
protocol |
the protocol to validate |
Value
the validated protocol
Validate a github token
Description
This function checks whether the user-provided token is valid for use with this package.
Usage
.validateToken(token)
Arguments
token |
|
Value
the validated user token
Decode (unpack) a bitfield
Description
This function takes an integer bitfield and the registry used to build it upstream to decode it into bit representation and thereby unpack the data stored in the bitfield.
Usage
bf_decode(x, registry, flags = NULL, sep = NULL, verbose = TRUE)
Arguments
x |
|
registry |
|
flags |
|
sep |
|
verbose |
|
Value
data.frame with the binary values of flags in the registry in columns.
Examples
# build registry
reg <- bf_map(protocol = "na", data = bf_tbl, x = commodity)
reg <- bf_map(protocol = "matches", data = bf_tbl, x = commodity, set = c("soybean", "maize"),
registry = reg)
reg
# encode the flags into a bitfield
field <- bf_encode(registry = reg)
field
# decode (somewhere downstream)
flags <- bf_decode(x = field, registry = reg, sep = "-")
flags
# more reader friendly
cbind(bf_tbl, bf_decode(x = field, registry = reg, verbose = FALSE))
Encode bit flags into a bitfield
Description
This function picks up the flags mentioned in a registry and encodes them as integer values.
Usage
bf_encode(registry)
Arguments
registry |
|
Value
data.frame of the same length as the input data. Depending on type and amount of bit flags, this can a table with any number of columns, each of which encodes a sequence of 32 bits into an integer.
Examples
reg <- bf_map(protocol = "na", data = bf_tbl, x = y)
field <- bf_encode(registry = reg)
Build a bit flag
Description
This function maps values from a dataset into bit flags that can be encoded into a bitfield.
Usage
bf_map(
protocol,
data,
...,
name = NULL,
pos = NULL,
na.val = NULL,
description = NULL,
registry = NULL
)
Arguments
protocol |
|
data |
the object to build bit flags for. |
... |
the protocol-specific arguments for building a bit flag, see Details. |
name |
|
pos |
|
na.val |
value, of the same encoding type as the flag, that needs to be
given, if the test for this flag results in |
description |
|
registry |
|
Details
protocol
can either be the name of an internal item
bf_pcl
, a newly built local protocol or one that has been
imported from the bitfield community standards repo on github. Any
protocol
has specific arguments, typically at least the name of the
column containing the variable values (x
). To make this function as
general as possible, all of these arguments are specified via the
...
argument of bf_map
. Internal
protocols are:
-
na
(x): test whether a variable containsNA
-values (boolean). -
nan
(x): test whether a variable containsNaN
-values (boolean). -
inf
(x): test whether a variable containsInf
-values (boolean). -
identical
(x, y): element-wise test whether values are identical across two variables (boolean). -
range
(x, min, max): test whether the values are within a given range (boolean). -
matches
(x, set): test whether the values match a given set (boolean). -
grepl
(x, pattern): test whether the values match a given pattern (boolean). -
case
(...): test whether values are part of given cases (enumeration). -
nChar
(x): count the number of characters of the values (unsigned integer). -
nInt
(x): count the number of integer digits of the values (unsigned integer). -
nDec
(x): count the decimal digits of the variable values (unsigned integer). -
integer
(x, ...): encode the integer values as bit-sequence (signed integer). -
numeric
(x, ...): encode the numeric value as floating-point bit-sequence (with an adapted precision) (floating-point).
Value
an (updated) object of class 'registry' with the additional flag defined here.
Notes
The console output of various classes (such as tibble) shows
decimals that are not present or rounds decimals that are present, even for
ordinary numeric vectors. R stores numeric values internally as
double-precision floating-point values (with 64 bits, where 52 bits encode
the mantissa), which corresponds to a decimal precision of ~16 digits
(log10(2^52)
). Hence, if a bit flag doesn't seem to coincide with
the values you see in the console, double check the values with
sprintf("%16f", values)
. If you use a larger decimal precision,
you'll see more digits, but those are not meaningful, as they result merely
from the binary-to-decimal conversion (check out
.makeEncoding
for additional information.
When testing for cases, they are evaluate in the order they have been
defined in. If an observation is part of two cases, it will thus have the
value of the last case it matches. The encoding type of cases is given as
enumeration, which means that the values can be either
integer
or factor
. Both are handled as if they were integers
internally, so even though an enumeration data type could in principle also
be a character
, this is possible within the scope of this package.
Bitflag protocols that extend the case protocol must thus always
result in integer values.
Examples
opr <- "identical"
# identify which arguments need to be given to call a test ...
formalArgs(bf_pcl[[opr]]$test)
# put the test together
bf_map(protocol = opr, data = bf_tbl, x = x, y = y, na.val = FALSE)
# some other examples of ...
# boolean encoding
bf_map(protocol = "matches", data = bf_tbl, x = commodity, set = c("soybean", "honey"))
bf_map(protocol = "range", data = bf_tbl, x = yield, min = 10.4, max = 11)
# enumeration encoding
bf_map(protocol = "case", data = bf_tbl,
yield >= 11, yield < 11 & yield > 9, yield < 9 & commodity == "maize")
# integer encoding
bf_map(protocol = "integer", data = bf_tbl, x = as.integer(year), na.val = 0L)
# floating-point encoding
bf_map(protocol = "numeric", data = bf_tbl, x = yield, decimals = 2)
Internal bit-flag protocols
Description
Internal bit-flag protocols
Usage
bf_pcl
Format
a list containing bit-flag protocols for the internal tests. Each
protocol is a list itself with the fields "name"
, "version"
,
"extends"
, "extends_note"
, "description"
,
"encoding_type"
, "bits"
, "requires"
, "test"
,
"data"
and "reference"
. For information on how they were set
up and how you can set up additional protocols, go to
bf_protocol
.
Define a new bit-flag protocol
Description
Define a new bit-flag protocol
Usage
bf_protocol(
name,
description,
test,
example,
type,
bits = NULL,
version = NULL,
extends = NULL,
note = NULL,
author = NULL
)
Arguments
name |
|
description |
|
test |
|
example |
|
type |
|
bits |
|
version |
|
extends |
|
note |
|
author |
|
Value
list containing bit-flag protocol
Examples
newFlag <- bf_protocol(name = "na",
description = "{x} contains NA-values{result}.",
test = function(x) is.na(x = x),
example = list(x = bf_tbl$commodity),
type = "bool")
Initiate a new registry
Description
Initiate a new registry
Usage
bf_registry(name = NULL, description = NULL)
Arguments
name |
|
description |
|
Value
an empty registry that captures some metadata of the bitfield, but doesn't contain any flags yet.
Examples
reg <- bf_registry(name = "currentWorkflow",
description = "this is to document my current workflow so
that I can share it with my colleagues
alongside a publication.")
Handle community standard protocols
Description
This function allows the user to list, pull or push bit-flag protocols to the bitfloat/standards repository on github
Usage
bf_standards(
protocol = NULL,
remote = NULL,
action = "list",
version = "latest",
change = NULL,
token = NULL
)
Arguments
protocol |
|
remote |
|
action |
|
version |
|
change |
|
token |
|
Details
Create a Personal Access Token in your github developer settings (or
by running usethis::create_github_token()
) and store it with
gitcreds::gitcreds_set()
. The token must have the scope 'repo' so
you can authenticate yourself to pull or push community standards, and will
only be accessible to your personal R session.
Value
description
Examples
## Not run:
# list all currently available standards
bf_standards()
## End(Not run)
Example table
Description
A 10 × 5 tibble with a range of example data to showcase functionality of this package.
Usage
bf_tbl
Format
object of class tibble
has two columns that indicate
coordinates, one column that indicates a crop that is grown there, one
column that indicates the yield of that crop there and one column that
indicates the year of harvest. All columns contain some sort of deviation
that may occur in data.
Bit registry class (S4) and methods
Description
A registry
stores metadata and flag configuration of a bitfield.
Slots
width
integerish(1)
how many bits is the bitfield wide.length
integerish(1)
how many observations are encoded in the bitfield.name
character(1)
short name of the bitfield.version
character(1)
automatically created version tag of the bitfield. This consists of the package version, the version of R and the date of creation of the bitfield.md5
character(1)
the MD5 checksum of the bitfield as determined with md5sum.description
character(1)
longer description of the bitfield.flags
list(.)
list of flags in the registry.
Print registry in the console
Description
Print registry in the console
Usage
## S4 method for signature 'registry'
show(object)
Arguments
object |
|
Details
This method produces an overview of the registry by printing a
header with information about the setup of the bitfield and a table with
one line for each flag in the bitfield. The table shows the start position
of each flag, the encoding type (see .makeEncoding
), the
bitfield operator type and the columns that are tested by the flag.