Title: | Support Many Languages in R |
Version: | 0.1.0 |
Description: | An object model for source text and translations. Find and extract translatable strings. Provide translations and seamlessly retrieve them at runtime. |
License: | MIT + file LICENSE |
URL: | https://transltr.ununoctium.dev |
BugReports: | https://github.com/jeanmathieupotvin/transltr/issues |
Encoding: | UTF-8 |
Language: | en |
RoxygenNote: | 7.3.2 |
Config/testthat/edition: | 3 |
Depends: | R (≥ 4.3) |
Imports: | digest, R6, stringi, utils, yaml |
Suggests: | covr, devtools, lifecycle, microbenchmark, pkgdown, testthat (≥ 3.0.0), usethis, withr |
Collate: | 'aaa.R' 'assert.R' 'class-location.R' 'class-text.R' 'class-translator.R' 'find-source-in-exprs.R' 'find-source.R' 'flat.R' 'hash.R' 'language.R' 'normalize.R' 'serialize.R' 'text-io.R' 'translator-io.R' 'transltr-package.R' 'utils-format-vector.R' 'utils-map.R' 'utils-nullish-op.R' 'utils-stop.R' 'utils-strings.R' 'uuid.R' 'zzz.R' |
NeedsCompilation: | no |
Packaged: | 2025-02-14 16:05:36 UTC; jmp |
Author: | Jean-Mathieu Potvin [aut, cre, cph],
Jérôme Lavoué |
Maintainer: | Jean-Mathieu Potvin <jeanmathieupotvin@ununoctium.dev> |
Repository: | CRAN |
Date/Publication: | 2025-02-14 16:40:02 UTC |
Support Many Languages in R
Description
An object model for source text and translations. Find and extract translatable strings. Provide translations and seamlessly retrieve them at runtime.
Introduction
R relies on GNU gettext
to
produce multi-lingual messages (if Native Language Support is enabled).
This is well-designed software offering an extensive set of functionalities.
It is ubiquitous and has withstood the test of time. It is not the objective
of transltr
to (fully) replace it.
Package transltr
provides an alternative in-memory object
model (and further functions) to easily inspect and manipulate source text
and translations.
It does not change any aspect of the underlying locale.
It has its own data serialization formats for I/O purposes. Source text and translations can be exported to text formats that are sharable and easily modifiable, even by non-technical collaborators.
Its features are extensively documented (even internal ones).
It can always locate and extract translatable strings (no matter where they are in the source code).
Translatable source text is treated as a regular R object.
Getting Started
Write code as you normally would. Whenever a piece of text (literal
character vectors) should be available in multiple languages, pass it
to method Translator$translate()
. You may also use your
own function.
Once you are ready to translate your project, call
find_source()
. This returns aTranslator
object.Export the
Translator
object withtranslator_write()
. Fill in the underlying translation files.Import translations back into an R session with
translator_read()
.
Current language and source language are respectively set with
language_set()
and language_source_get()
. By default, the latter is set
equal to "en"
(English).
Bugs and Feedback
You may submit bugs, request features, and provide feedback by creating an issue on GitHub.
Acknowledgements
Warm thanks to Jérôme Lavoué, who supported and sponsored the first release of this project.
Author(s)
Maintainer: Jean-Mathieu Potvin jeanmathieupotvin@ununoctium.dev [copyright holder]
Other contributors:
Jérôme Lavoué jerome.lavoue@umontreal.ca (ORCID) [contributor, funder, reviewer]
See Also
The scattered and incomplete documentation of R's Native Language Support:
-
tools::xgettext2pot()
,tools::update_pkg_po()
,tools::checkPoFiles()
, -
Section 3 (Internationalization) of R Internals,
-
Section 7 (Internationalization and Localization) of R Installation and Administration, and
-
Section 1.8 (Internationalization) of Writing R Extensions.
The comprehensive technical documentation of
GNU gettext
.
Hashing Algorithms
Description
These algorithms map a character string to another character string of hexadecimal characters highly likely to be unique. The latter is used to uniquely identify a source text (and the underlying source language).
Usage
algorithms()
Details
Secure Hash Algorithm 1
Method sha1
corresponds to SHA-1 (Secure Hash Algorithm version 1), a
cryptographic hashing function. While it is now superseded by more secure
variants (SHA-256, SHA-512, etc.), it is still useful for non-sensitive
purposes. It is fast, collision-resistant, and may handle very large inputs.
It emits strings of 40 hexadecimal characters.
Cumulative UTF-8 Sum
This method is experimental. Use with caution.
Method utf8
is a simple method derived from cumulative sums of UTF-8 code
points (converted to integers). It is slightly faster than method sha1
for
small inputs and emits hashes with a width porportional to the underlying
input's length. It is used for testing purposes internally.
Find Source Text
Description
Find and extract source text that must be translated.
Usage
find_source(
path = ".",
encoding = "UTF-8",
verbose = getOption("transltr.verbose", TRUE),
tr = translator(),
interface = NULL
)
find_source_in_files(
paths = character(),
encoding = "UTF-8",
verbose = getOption("transltr.verbose", TRUE),
algorithm = algorithms(),
interface = NULL
)
Arguments
path |
A non-empty and non-NA character string. A path to a directory
containing R source scripts. All subdirectories are searched. Files that
do not have a |
encoding |
A non-empty and non-NA character string. The source character encoding. In almost all cases, this should be UTF-8. Other encodings are internally re-encoded to UTF-8 for portability. |
verbose |
A non-NA logical value. Should progress information be reported? |
tr |
A |
interface |
A |
paths |
A character vector of non-empty and non-NA values. A set of paths to R source scripts that must be searched. |
algorithm |
A non-empty and non-NA character string equal to |
Details
find_source()
and find_source_in_files()
look for calls to method
Translator$translate()
in R scripts and convert them
to Text
objects. The former further sets these resulting
objects into a Translator
object. See argument tr
.
find_source()
and find_source_in_files()
work on a purely lexical basis.
The source code is parsed but never evaluated (aside from extracted literal
character vectors).
The underlying
Translator
object is never evaluated and does not need to exist (placeholders may be used in the source code).Only literal character vectors can be passed to arguments of method
Translator$translate()
.
Interfaces
In some cases, it may not be desirable to call method
Translator$translate()
directly. A custom function wrapping
(interfacing) this method may always be used as long as it has the same
signature as method
Translator$translate()
. In other words, it must minimally
have two formal arguments: ...
and source_lang
.
Custom interfaces must be passed to find_source()
and
find_source_in_files()
for extraction purposes. Since these functions work
on a lexical basis, interfaces can be placeholders in the source code (non-
existent bindings) at the time these functions are called. However, they must
be bound to a function (ultimately) calling Translator$translate()
at runtime.
Custom interfaces are passed to find_source()
and find_source_in_files()
as name
or call
objects in a variety of ways. The most
straightforward way is to use base::quote()
. See Examples below.
Methodology
find_source()
and find_source_in_files()
go through these steps to
extract source text from a single R script.
It is read with
text_read()
and re-encoded to UTF-8 if necessary.It is parsed with
parse()
and underlying tokens are extracted from parsed expressions withutils::getParseData()
.Each expression (
expr
) token is converted to language objects withstr2lang()
. Parsing errors and invalid expressions are silently skipped.Valid
call
objects stemming from step 3 are filtered withis_source()
.Calls to method
Translator$translate()
or tointerface
stemming from step 4 are coerced toText
objects withas_text()
.
These steps are repeated for each R script. find_source()
further merges
all resulting Text
objects into a coherent set with merge_texts()
(identical source code is merged into single Text
entities).
Extracted character vectors are always normalized for consistency (at step
5). See normalize()
for more information.
Limitations
The current version of transltr
can only handle literal
character vectors. This means it cannot resolve non-trivial expressions
that depends on a state. All values passed to argument ...
of method
Translator$translate()
must yield character vectors
(trivially).
Value
find_source()
returns an R6
object of class
Translator
. If an existing Translator
object is passed to tr
, it is modified in place and returned.
find_source_in_files()
returns a list of Text
objects. It may
contain duplicated elements, depending on the extracted contents.
See Also
Translator
,
Text
,
normalize()
,
translator_read()
,
translator_write()
,
base::quote()
,
base::call()
,
base::as.name()
Examples
# Create a directory containing dummy R scripts for illustration purposes.
temp_dir <- file.path(tempdir(TRUE), "find-source")
temp_files <- file.path(temp_dir, c("ex-script-1.R", "ex-script-2.R"))
dir.create(temp_dir, showWarnings = FALSE, recursive = TRUE)
cat(
"tr$translate('Hello, world!')",
"tr$translate('Farewell, world!')",
sep = "\n",
file = temp_files[[1L]])
cat(
"tr$translate('Hello, world!')",
"tr$translate('Farewell, world!')",
sep = "\n",
file = temp_files[[2L]])
# Extract calls to method Translator$translate().
find_source(temp_dir)
find_source_in_files(temp_files)
# Use custom functions.
# For illustrations purposes, assume the package
# exports an hypothetical translate() function.
cat(
"translate('Hello, world!')",
"transtlr::translate('Farewell, world!')",
sep = "\n",
file = temp_files[[1L]])
cat(
"translate('Hello, world!')",
"transltr::translate('Farewell, world!')",
sep = "\n",
file = temp_files[[2L]])
# Extract calls to translate() and transltr::translate().
# Since find_source() and find_source_in_files() work on
# a lexical basis, these are always considered to be two
# distinct functions. They also don't need to exist in the
# R session calling find_source() and find_source_in_files().
find_source(temp_dir, interface = quote(translate))
find_source_in_files(temp_files, interface = quote(transltr::translate))
Find Source Text in Expressions
Description
Find and extract source text that must be translated from a single file
or a set of R expr
tokens.
Arguments listed below are not explicitly validated for efficiency.
Usage
find_source_in_file(
path = "",
encoding = "UTF-8",
verbose = getOption("transltr.verbose", TRUE),
algorithm = algorithms(),
interface = NULL
)
find_source_in_exprs(
tokens = utils::getParseData(),
path = "",
algorithm = algorithms(),
interface = NULL
)
find_source_exprs(path = "", encoding = "UTF-8")
is_source(x, interface = NULL)
Arguments
path |
A non-empty and non-NA character string. A path to an R source script. |
encoding |
A non-empty and non-NA character string. The source character encoding. In almost all cases, this should be UTF-8. Other encodings are internally re-encoded to UTF-8 for portability. |
verbose |
A non-NA logical value. Should progress information be reported? |
algorithm |
A non-empty and non-NA character string equal to |
interface |
A |
tokens |
A |
x |
Any R object. |
Details
find_source_in_exprs()
silently skips parsing errors. See find_source()
for more information.
is_source()
checks if an object conceptually represents a source text.
This can either be
a
call
to methodTranslator$translate()
ora
call
to a custom function referenced byinterface
.
Calls to method Translator$translate()
that include
...
in their argument(s) are ignored. Such calls are part
of the definition of a custom interface
and should not be extracted.
Value
find_source_in_file()
and find_source_in_exprs()
return a list of
Text
objects. It may contain duplicated elements, depending
on the extracted contents.
find_source_exprs()
returns a subset of the output of
utils::getParseData()
. Only expr
tokens are returned.
is_source()
returns a logical value.
See Also
Text
,
find_source()
,
utils::getParseData()
Serialize Objects to Flat Strings
Description
Serialize R objects into textual sequences of unindented (flat) and identifiable sections. These are called FLAT (1.0) objects.
Usage
flat_serialize(x = list(), tag_sep = ": ", tag_empty = "")
flat_deserialize(string = "", tag_sep = ": ")
flat_tag(x = list(), tag_sep = ": ", tag_empty = "")
flat_format(x = list())
flat_example()
Arguments
x |
A list. It can be empty. |
tag_sep |
A non-empty and non-NA character string. The separator to use
when creating tags from names (recursively) extracted from |
tag_empty |
A non-NA character string. The value to use as a substitute for empty names. Positional indices are automatically appended to it to ensure tags are always unique. |
string |
A non-NA character string. It can be empty. Contents to deserialize. |
Details
The Flat format (Flat List As Text, or FLAT) is a minimal
textual data serialization format optimized for R list
objects.
Elements are converted to character strings and organized into unindented
sections identified by a tag. Call flat_example()
for a valid example.
flat_serialize()
serializes x
into a FLAT object.
flat_deserialize()
is the inverse operation: it converts a FLAT object
back into a list. The latter has the same shape as the original one, but
-
atomic vectors are not reconstituted (they are deserialized as elements of length 1), and
all elements are also left as character strings.
The convention is to serialize an empty list to an empty character string.
Internal mechanisms
flat_tag()
and flat_format()
are called internally by flat_serialize()
.
Aside from debugging purposes, they should not be called outside of the
former.
flat_tag()
creates tags from names extracted from x
and formats them.
Tags may not be unique, depending on x
's structure and names.
flat_format()
recursively formats the elements of x
as part of the
serialization process. It
converts
NULL
to the"NULL"
character string,converts other elements to character strings using
format()
andreplaces empty lists by a
<empty list>
constant treated as a placeholder.
Value
flat_serialize()
returns a character string, possibly empty.
flat_deserialize()
returns a named list, possibly empty. Its structure
depends on the underlying tags.
flat_tag()
returns a character vector, possibly empty.
flat_format()
returns an unnamed list having the same shape as x
. See
Details.
flat_example()
returns a character string (a serialized example),
invisibly. It is used for its side-effect of printing an illustration of
the format (with useful information).
Format Vectors
Description
Format atomic vectors, lists, and pairlists.
Usage
format_vector(
x = vector(),
label = NULL,
level = 0L,
indent = 1L,
fill_names = FALSE,
null = "<null>",
empty = "<empty>",
validate = TRUE
)
Arguments
x |
A vector of any atomic mode, a list, or a pairlist. It can be empty and it can contain NA values. |
label |
A |
level |
A non-NA integer value. The current depth, or current nesting level to use for indentation purposes. |
indent |
A non-NA integer value. The number of single space(s) to use
for each |
fill_names |
A non-NA logical value. Should |
null |
A non-empty and non-NA character string. The value to use to
represent |
empty |
A non-empty and non-NA character string. The value to use to
represent empty vectors, excluding |
validate |
A non-NA logical value. Should the arguments be validated before being used? This argument should be left as is. |
Details
format_vector()
is an alternative to utils::str()
that exposes a much
simpler generic formatting interface and yields terser outputs of name/value
pairs. Indentation is used for nested values.
format_vector()
does not attempt to cover all R objects like
utils::str()
. Instead, it (merely) focuses on efficiently handling the
types used by transltr
. It is the low-level workhorse function of
format.Translator()
, format.Text()
, and format.Location()
.
Value
A character vector, possibly trimmed by str_trim()
.
See Also
Other utility functions:
stops()
,
str_to()
,
vapply_1l()
Hashing
Description
Map an arbitrary character string to a shorter string of hexadecimal characters highly likely to be unique. It typically has a fixed width.
Arguments listed below are not validated for efficiency.
Usage
hash(lang = "", text = "", algorithm = "")
Arguments
lang |
A non-empty and non-NA character string. The underlying language. A language is usually a code (of two or three letters) for a native language name. While users retain full control over codes, it is best to use language codes stemming from well-known schemes such as IETF BCP 47, or ISO 639-1 to maximize portability and cross-compatibility. |
text |
A non-NA character string. It can be empty. |
algorithm |
A non-empty and non-NA character string equal to |
Details
Hashes generated by hash()
uniquely identify the lang
and text
pair.
Values passed to these arguments are concatenated with a colon character for
hashing purposes.
Value
hash()
returns a character string, or NULL
if algorithm
is not
supported.
See Also
Translator
,
Text
,
normalize()
,
algorithms()
Assertions
Description
These functions are a functional implementation of defensive programming.
is_*()
functions check whether their argument meets certain criteria.
assert_*()
functions further throw an error message when at least one
criterion is not met.
Arguments listed below are not explicitly validated for efficiency.
Usage
is_int(x, allow_empty = FALSE)
is_chr(x, allow_empty = FALSE)
is_lgl1(x)
is_int1(x)
is_chr1(x, allow_empty_string = FALSE)
is_list(x, allow_empty = FALSE)
is_between(x, min = -Inf, max = Inf)
is_named(x, allow_empty_names = FALSE, allow_na_names = FALSE)
is_match(x, choices = vector(), allow_partial = FALSE)
assert_int(
x,
allow_empty = FALSE,
throw_error = TRUE,
x_name = deparse(substitute(x))
)
assert_chr(
x,
allow_empty = FALSE,
throw_error = TRUE,
x_name = deparse(substitute(x))
)
assert_lgl1(x, throw_error = TRUE, x_name = deparse(substitute(x)))
assert_int1(x, throw_error = TRUE, x_name = deparse(substitute(x)))
assert_chr1(
x,
allow_empty_string = FALSE,
throw_error = TRUE,
x_name = deparse(substitute(x))
)
assert_list(
x,
allow_empty = FALSE,
throw_error = TRUE,
x_name = deparse(substitute(x))
)
assert_between(
x,
min = -Inf,
max = Inf,
throw_error = TRUE,
x_name = deparse(substitute(x))
)
assert_named(
x,
allow_empty_names = FALSE,
allow_na_names = FALSE,
throw_error = TRUE,
x_name = deparse(substitute(x))
)
assert_match(
x,
choices,
allow_partial = FALSE,
quote_values = FALSE,
throw_error = TRUE,
x_name = deparse(substitute(x))
)
assert_arg(x, quote_values = FALSE, throw_error = TRUE)
assert(x, ...)
## Default S3 method:
assert(x, ...)
Arguments
x |
Any R object. |
allow_empty |
A non-NA logical value. Should vectors of length 0 be considered as valid values? |
allow_empty_string |
A non-NA logical value. Should empty character strings be considered as valid values? |
min |
A non-NA numeric lower bound. It can be infinite. |
max |
A non-NA numeric upper bound. It can be infinite. |
allow_empty_names |
A non-NA logical value. Should empty character strings be considered as valid names? This is different from having no names at all. |
allow_na_names |
A non-NA logical value. Should NA values be considered as valid names? |
choices |
A non-empty vector of valid candidates. |
allow_partial |
A non-NA logical value. Should |
throw_error |
A non-NA logical value. Should an error be thrown? If so,
|
x_name |
A non-empty and non-NA character string. The name of |
quote_values |
A non-NA logical value. Passed as is to |
Details
Guard clauses tend to be verbose and recycled many times within a project.
This makes it hard to keep error messages consistent over time. assert_*()
functions encapsulate usual guard clause into simple semantic functions.
This reduces code repetition and number of required unit tests. See
Examples below.
By convention, NA values are always disallowed.
assert_arg()
is a partial refactoring of base::match.arg()
. It relies
on assert_match()
internally and does not have an equivalent is_arg()
function. It must be called within another function.
assert()
is a S3 generic function that covers specific data structures.
Classes (and underlying objects) that do not have an assert()
method are
considered to be valid by default.
Value
is_int()
,
is_chr()
,
is_lgl1()
,
is_int1()
,
is_chr1()
,
is_list()
,
is_between()
,
is_named()
, and
is_match()
return a logical value.
assert()
,
assert_int()
,
assert_chr()
,
assert_lgl1()
,
assert_int1()
,
assert_chr1()
,
assert_list()
,
assert_between()
,
assert_named()
,
assert_match()
, and
assert_arg()
return an empty character vector if x
meets the underlying
criteria and throw an error otherwise. If throw_error
is FALSE
, the
error message is returned as a character vector. Unless otherwise stated,
the latter is of length 1 (a character string).
assert.default()
always returns an empty character vector.
Get or Set Language
Description
Get or set the current, and source languages.
They are registered as environment variables named
TRANSLTR_LANGUAGE
, and TRANSLTR_SOURCE_LANGUAGE
.
Usage
language_set(lang = "en")
language_get()
language_source_set(lang = "en")
language_source_get()
Arguments
lang |
A non-empty and non-NA character string. The underlying language. A language is usually a code (of two or three letters) for a native language name. While users retain full control over codes, it is best to use language codes stemming from well-known schemes such as IETF BCP 47, or ISO 639-1 to maximize portability and cross-compatibility. |
Details
The language and the source language can always be temporarily changed. See
argument lang
of method Translator$translate()
for more
information.
The underlying locale is left as is. To change an R session's locale,
use Sys.setlocale()
or Sys.setLanguage()
instead. See below for more
information.
Value
language_set()
, and language_source_set()
return NULL
, invisibly. They
are used for their side-effect of setting environment variables
TRANSLTR_LANGUAGE
and TRANSLTR_SOURCE_LANGUAGE
, respectively.
language_get()
returns a character string. It is the current value of
environment variable TRANSLTR_LANGUAGE
. It is empty if the latter is
unset.
language_source_get()
returns a character string. It is the current value
of environment variable TRANSLTR_SOURCE_LANGUAGE
. It returns "en"
if the
latter is unset.
Locales versus languages
A locale is a set of multiple low-level settings that relate to the user's language and region. The language itself is just one parameter among many others.
Modifying a locale on-the-fly can be considered risky in some situations. It may not be the optimal solution for merely changing textual representations of a program or an application at runtime, as it may introduce unintended changes and induce subtle bugs that are harder to fix.
Moreover, it makes sense for some applications and/or programs such as Shiny applications to decouple the front-end's current language (what users see) from the back-end's locale (what developers see). A UI may be displayed in a certain language while keeping logs and R internal messages, warnings, and errors as is.
Consequently, the language setting of transltr
is purposely
kept separate from the underlying locale and removes the complexity of
having to support many of them. Users can always change both the locale and
the language
parameter of the package. See Examples.
Note
Environment variables are used because they can be shared among different
processes. This matters when using parallel and/or concurrent R sessions.
It can further be shared among direct and transitive dependencies (other
packages that rely on transltr
).
Examples
# Change the language parameters (globally).
language_source_set("en")
language_set("fr")
language_source_get() ## Outputs "en"
language_get() ## Outputs "fr"
# Change both the language parameter and the locale.
# Note that while users control how languages are named
# for language_set(), they do not for Sys.setLanguage().
language_set("fr")
Sys.setLanguage("fr-CA")
# Reset settings.
language_source_set(NULL)
language_set(NULL)
# Source language has a default value.
language_source_get() ## Outputs "en"
Source Locations
Description
Structure and manipulate source locations. Class Location
is
a lighter alternative to srcfile()
and other related functionalities.
Usage
location(path = tempfile(), line1 = 1L, col1 = 1L, line2 = 1L, col2 = 1L)
is_location(x)
## S3 method for class 'Location'
format(x, ...)
## S3 method for class 'Location'
print(x, ...)
## S3 method for class 'Location'
c(...)
merge_locations(...)
Arguments
path |
A non-empty and non-NA character string. The origin of the ranges. |
line1 , col1 |
A non-empty integer vector of non-NA values. The (inclusive) starting point(s) of what is being referenced. |
line2 , col2 |
A non-empty integer vector of non-NA values. The (inclusive) end(s) of what is being referenced. |
x |
Any R object. |
... |
Usage depends on the underlying function.
|
Details
A Location
is a set of one or more line/column ranges
referencing contents (like text or source code) within a common origin
identified by an underlying path
. The latter is generic and can be
anything: a file on disk, on a network, a pointer, a binding, etc. What
matters is the underlying context.
Location
objects may refer to multiple distinct ranges for
the the same origin. This is why arguments line1
, col1
, line2
and
col2
accept integer vectors (and not only scalar values).
Combining Location Objects
c()
can only combine Location
objects having the same
path
. In that case, the underlying ranges are combined into a set of
non-duplicated range(s).
merge_locations()
is a generalized version of c()
that handles any
number of Location
objects having possibly different paths.
It can be viewed as a vectorized version of c()
.
Value
location()
, and c()
return a named list of length 5 and of S3 class
Location
containing the values of path
, line1
, col1
,
line2
, and col2
.
is_location()
returns a logical value.
format()
returns a character vector.
print()
returns argument x
invisibly.
merge_locations()
returns a list of (combined) Location
objects.
Examples
# Create Location objects.
loc1 <- location("file-a", 1L, 2L, 3L, 4L)
loc2 <- location("file-a", 5L, 6L, 7L, 8L)
loc3 <- location("file-c", c(9L, 10L), c(11L, 12L), c(13L, 14L), c(15L, 16L))
is_location(loc1) ## TRUE
print(loc1)
print(loc2)
print(loc3)
# Combine Location objects.
# c() throws an error if they do not have the same path.
c(loc1, loc2)
# Location objects with different paths can be merged.
# This groups Location objects according to their paths
# and calls c() on each group. It returns a list.
merge_locations(loc1, loc2, loc3)
# The path of a Location object can be whatever fits the context.
# Below is an example that references text in a character vector
# bound to variable x in the global environment.
x <- "This is a string and it is held in memory for some purpose."
location("<environment: R_GlobalEnv: x>", 1L, 11L, 1L, 16L)
Normalize Text
Description
Construct a standardized string from values passed to ...
Usage
normalize(...)
Arguments
... |
Any number of vectors containing atomic elements. Each vector is normalized as a paragraph.
|
Details
Input text can written in a variety of ways using single-line and multi-line
strings. Values passed to ...
are normalized (to ensure their consistency)
and collapsed to a single character string using the standard paragraph
separator. The latter is defined as two newline characters ("\n\n"
).
NA values and empty strings are discarded before reducing
...
to a character string.Whitespaces (tabs, newlines, and repeated spaces) characters are replaced by a single space. Paragraph separators are preserved.
Leading or trailing whitespaces are stripped.
Value
A character string, possibly empty.
Source Ranges
Description
Create, parse, and validate source ranges.
Usage
range_format(x = location())
range_parse(ranges = character())
range_is_parseable(ranges = character())
Arguments
x |
A |
ranges |
A character vector of non-NA and non-empty values. The ranges to extract pairs of indices (line, column) from. |
Details
Ranges are Ln <int>, Col <int> @ Ln <int>, Col <int>
strings created on-the-fly from
Location
objects for outputting purposes.
Value
range_format()
returns a character vector. It assumes that x
is valid.
range_parse()
returns a list having the same length as ranges
. Each
element is an integer vectors containing 4 non-NA values (unless the
underlying range is invalid).
range_is_parseable()
returns a logical vector having the same length as
ranges
.
See Also
Serialize Objects
Description
Convert Translator
objects, Text
objects, and
Location
objects to a YAML object, or
vice-versa.
Convert translations contained by a Translator
object to
a custom textual representation (a FLAT object), or
vive-versa.
Usage
serialize(x, ...)
serialize_translations(tr = translator(), lang = "")
deserialize(string = "")
deserialize_translations(string = "", tr = NULL)
export_translations(tr = translator(), lang = "")
export(x, ...)
## S3 method for class 'Translator'
export(x, ...)
## S3 method for class 'Text'
export(x, id = uuid(), set_translations = FALSE, ...)
## S3 method for class 'Location'
export(x, id = uuid(), ...)
## S3 method for class 'ExportedTranslator'
assert(x, throw_error = TRUE, ...)
## S3 method for class 'ExportedText'
assert(x, throw_error = TRUE, ...)
## S3 method for class 'ExportedLocation'
assert(x, throw_error = TRUE, ...)
## S3 method for class 'ExportedTranslations'
assert(x, throw_error = TRUE, ...)
import(x, ...)
## S3 method for class 'ExportedTranslator'
import(x, ...)
## S3 method for class 'ExportedText'
import(x, ...)
## S3 method for class 'ExportedLocation'
import(x, ...)
## S3 method for class 'ExportedTranslations'
import(x, tr = NULL, ...)
## Default S3 method:
import(x, ...)
format_errors(errors = character(), id = uuid(), throw_error = TRUE)
Arguments
x |
Any R object. |
... |
Further arguments passed to, or from other methods. |
tr |
A This argument is |
lang |
A non-empty and non-NA character string. The underlying language. A language is usually a code (of two or three letters) for a native language name. While users retain full control over codes, it is best to use language codes stemming from well-known schemes such as IETF BCP 47, or ISO 639-1 to maximize portability and cross-compatibility. |
string |
A non-empty and non-NA character string. Contents to deserialize. |
id |
A non-empty and non-NA character string. A unique identifier for the underlying object. It is used for validation purposes. |
set_translations |
A non-NA logical value. Should translations be
included in the resulting |
throw_error |
A non-NA logical value. Should an error be thrown? If so,
|
errors |
A non-empty character vector of non-NA values. Error message(s) describing why object(s) are invalid. |
Details
The information contained within a Translator
object is
split by default. Unless set_translations
is TRUE
, translations are
serialized independently from other fields. This is useful when creating
Translator files and translations files.
While serialize()
and serialize_translations()
are distinct, they share
a common design and perform the same thing, at least conceptually. The
same is true for deserialize()
and deserialize_translations()
. These 4
functions are those that should be used in almost all circumstances.
Serialization
The data serialization process performed by serialize()
and
serialize_translations()
is internally broken down into 2 steps: objects
are first exported before being serialized.
export()
and export_translations()
are preserializing mechanisms that
convert objects into transient objects that ease the conversion process.
They are never returned to the user: serialize()
, and
serialize_translations()
immediately transform them into character strings.
serialize()
returns a YAML object.
serialize_translations()
returns a FLAT object.
Deserialization
The data deserialization process performed by deserialize()
and
deserialize_translations()
is internally broken down into 3 steps: objects
are first deserialized, then validated and finally, imported.
deserialize()
and deserialize_translations()
are
raw deserializer mechanisms: string
is converted into an R named list
that is presumed to be an exported object. deserialize()
relies on
YAML tags to infer the class of each
object.
The contents of the transient objects is thoroughly checked with an
assert()
method (based on the underlying presumed class). Valid objects
are imported back into an appropriate R object with import()
.
Custom fields and comments added by users to serialized objects are ignored.
Formatting errors
assert()
methods accumulate error messages before returning, or throwing
them. format_errors()
is a helper function that eases this process. It
exists to avoid repeting code in each method. There is no reason to call
it outside of assert()
methods.
Value
See other sections for further information.
serialize()
, and serialize_translations()
return a character string.
export()
returns a named list of S3 class
-
ExportedTranslator
ifx
is aTranslator
object, -
ExportedText
ifx
is aText
object, or -
ExportedLocation
ifx
is aLocation
object.
export_translations()
returns an ExportedTranslations
object.
deserialize()
and import()
return
a
Translator
object ifx
is a validExportedTranslator
object,a
Text
object ifx
is a validExportedText
object, ora
Location
object ifx
a validExportedLocation
object.
deserialize_translations()
and import.ExportedTranslations()
return an
ExportedTranslations
object. They further register imported
translations if a Translator
object is passed to tr
.
Translations must correspond to an existing source text (a registered
Text
object). Otherwise, they are skipped.The value passed to
tr
is updated by reference and is not returned.
import.default()
is used for its side-effect of throwing an error for
unsupported objects.
assert.ExportedTranslator()
,
assert.ExportedText()
,
assert.ExportedLocation()
, and
assert.ExportedTranslations()
return a character vector, possibly empty.
If throw_error
is TRUE
, an error is thrown if an object is invalid.
format_errors()
returns a character vector, and outputs its contents as
an error if throw_error
is TRUE
.
Exported Objects
An exported object is a named list of S3 class
ExportedTranslator
,
ExportedText
,
ExportedLocation
, or
ExportedTranslations
and
always having a tag
attribute whose value is equal to the super-class of
x
.
There are four main differences between an object and its exported counterpart.
Field names are slightly more verbose.
Source text is treated independently from translations.
Unset fields are set equal to
NULL
(a~
in YAML).Each object has an
Identifier
used to locate errors.
The correspondance between objects is self-explanatory.
See class
Translator
for more information on classExportedTranslator
.See class
Text
for more information on classExportedText
.See class
Location
for more information on classExportedLocation
.
You may also explore provided examples below.
The ExportedTranslations
Class
ExportedTranslations
objects are created from a
Translator
object with export_translations()
. Their purpose
is to restructure translations by language. They are different from other
exported objects because there is no corresponding Translations
class.
An ExportedTranslations
object is a named list of S3 class
ExportedTranslations
containing the following elements.
Identifier
The unique identifier of argument
tr
. SeeTranslator$id
for more information.Language Code
The value of argument
lang
.Language
The translation's language. See
Translator$native_languages
for more information.Source Language
The source text's language. See
Translator$source_langs
for more information.Translations
A named list containing further named lists. Each sublist contains two values:
Source Text
A non-empty and non-NA character string.
Translation
A non-empty and non-NA character string.
See
Text$translations
for more information.
Unavailable translations are automatically replaced by a placeholder that depends on whether they are exported or imported.
Note
Dividing the serialization and deserialization processes into multiple steps helps keeping the underlying functions short, and easier to test.
See Also
Official YAML 1.1 specification,
yaml::as.yaml()
,
yaml::yaml.load()
,
flat_serialize()
,
flat_deserialize()
,
translator_read()
,
translator_write()
,
translations_read()
,
translations_write()
Throw Errors
Description
stops()
is equivalent to stop(..., call. = FALSE)
. It removes calls
from error messages by default. These are rarely useful and confuse users
more often than they help them.
stopf()
is equivalent to stops(sprintf(fmt, ...))
. It wraps
base::sprintf()
and stops()
and is used to construct flexible
error messages.
Usage
stops(...)
stopf(fmt = "", ...)
Arguments
... |
Further arguments respectively passed to |
fmt |
A character of length 1 passed as is to |
Value
Nothing. These functions are used for their side-effect of raising an error.
See Also
Other utility functions:
format_vector()
,
str_to()
,
vapply_1l()
Character String Utilities
Description
str_to()
converts an R object to a character string. It is a slightly
more flexible alternative to base::toString()
.
str_trim()
wraps base::strtrim()
and further adds a ...
suffix to
each trimmed element.
str_wrap()
wraps base::strwrap()
and ensures a character string is
returned.
Usage
str_to(x, ...)
## Default S3 method:
str_to(x, quote_values = FALSE, last_sep = ", or ", ...)
str_trim(x = character(), width = 80L)
str_wrap(x = character(), width = 80L)
Arguments
x |
Any R object for |
... |
Further arguments passed to, or from other methods. |
quote_values |
A non-NA logical value. Should elements of |
last_sep |
A non-empty and non-NA character string separating the last and penultimate elements. |
width |
A non-NA integer value. The target width for individual
elements of |
Details
str_to()
concatenates all elements with ", "
, except for the last
one. See argument last_sep
.
str_wrap()
preserves existing paragraph separators ("\n\n"
).
Value
str_to()
and str_wrap()
return a character string.
str_trim()
returns a character vector having the same length as x
.
See Also
Other utility functions:
format_vector()
,
stops()
,
vapply_1l()
Source Text
Description
Structure source text and its translations.
Usage
text(..., source_lang = language_source_get(), algorithm = algorithms())
is_text(x)
## S3 method for class 'Text'
format(x, ...)
## S3 method for class 'Text'
print(x, ...)
## S3 method for class 'Text'
c(...)
merge_texts(..., algorithm = algorithms())
as_text(x, ...)
## S3 method for class 'call'
as_text(x, loc = location(), algorithm = algorithms(), ...)
Arguments
... |
Usage depends on the underlying function. |
source_lang |
A non-empty and non-NA character string. The language of the source text. A language is usually a code (of two or three letters) for a native language name. While users retain full control over codes, it is best to use language codes stemming from well-known schemes such as IETF BCP 47, or ISO 639-1. Doing so maximizes portability and cross-compatibility between packages. |
algorithm |
A non-empty and non-NA character string equal to |
x |
Any R object. |
loc |
A |
Details
A Text
object is a piece of source text that is extracted from R
source scripts.
It (typically) has one or more
Locations
within a project.It has zero or more translations.
The Text
class structures this information and exposes a set of
methods to manipulate it.
Combining Text Objects
c()
can only combine Text
objects having the same hash
.
This is equivalent to having the same algorithm
, source_lang
, and
source_text
. In that case, the underlying translations and
Location
objects are combined and a new object is returned.
It throws an error if all Text
objects are empty (they have no
set source_lang
).
merge_texts()
is a generalized version of c()
that handles any number
of Text
objects having possibly different hashes. It can be
viewed as a vectorized version of c()
. It silently ignores and drops
all empty Text
objects.
Coercion
as_text()
is an S3 generic function that attempts to coerce its argument
into a suitable Text
object. as_text.call()
is the method used
by find_source()
to coerce a call
object to a Text
object. While it can be used, it should be avoided most of the time. Users
may extend it by defining their own methods.
Value
text()
, c()
, and as_text()
return an R6
object of
class Text
.
is_text()
returns a logical value.
format()
returns a character vector.
print()
returns argument x
invisibly.
merge_texts()
returns a list of (combined) Text
objects. It
can be empty if all underlying Text
objects are empty.
Active bindings
hash
A non-empty and non-NA character string. A reproducible hash generated from
source_lang
andsource_text
, and by using the algorithm specified byalgorithm
. It is used as a unique identifier for the underlyingText
object.This is a read-only field. It is automatically updated whenever fields
source_lang
and/oralgorithm
are updated.algorithm
A non-empty and non-NA character string equal to
"sha1"
, or"utf8"
. The algorithm to use when hashing source information for identification purposes.source_lang
A non-empty and non-NA character string. The language of the source text.
A language is usually a code (of two or three letters) for a native language name. While users retain full control over codes, it is best to use language codes stemming from well-known schemes such as IETF BCP 47, or ISO 639-1. Doing so maximizes portability and cross-compatibility between packages.
source_text
A non-empty and non-NA character string. The source text. This is a read-only field.
languages
A character vector. Registered language codes. This is a read-only field. Use methods below to update it.
translations
A named character vector. Registered translations of
source_text
, including the latter. Names correspond tolanguages
. This is a read-only field. Use methods below to update it.locations
A list of
Location
objects giving the location(s) ofsource_text
in the underlying project. It can be empty. This is a read-only field. Use methods below to update it.
Methods
Public methods
Method new()
Create a Text
object.
Usage
Text$new(algorithm = algorithms())
Arguments
algorithm
A non-empty and non-NA character string equal to
"sha1"
, or"utf8"
. The algorithm to use when hashing source information for identification purposes.
Returns
Examples
# Consider using text() instead. txt <- Text$new()
Method get_translation()
Extract a translation, or the source text.
Usage
Text$get_translation(lang = "")
Arguments
lang
A non-empty and non-NA character string. The underlying language.
A language is usually a code (of two or three letters) for a native language name. While users retain full control over codes, it is best to use language codes stemming from well-known schemes such as IETF BCP 47, or ISO 639-1 to maximize portability and cross-compatibility.
Returns
A character string. NULL
is returned if the requested
translation is not available.
Examples
txt <- Text$new() txt$set_translation("en", "Hello, world!") txt$get_translation("en") ## Outputs "Hello, world!" txt$get_translation("fr") ## Outputs NULL
Method set_translation()
Register a translation, or the source text.
Usage
Text$set_translation(lang = "", text = "")
Arguments
lang
A non-empty and non-NA character string. The underlying language.
A language is usually a code (of two or three letters) for a native language name. While users retain full control over codes, it is best to use language codes stemming from well-known schemes such as IETF BCP 47, or ISO 639-1 to maximize portability and cross-compatibility.
text
A non-empty and non-NA character string. A translation, or the source text.
Details
This method is also used to register source_lang
and
source_text
before setting them as such. See Examples below.
Returns
A NULL
, invisibly.
Examples
# Register a pair of source_lang and source_text. txt <- Text$new() txt$set_translation("en", "Hello, world!") txt$source_lang <- "en"
Method set_translations()
Register one or more translations, and/or the source text.
Usage
Text$set_translations(...)
Arguments
...
Any number of named, non-empty, and non-NA character strings.
Details
This method can be viewed as a vectorized version of
method set_translation()
.
Returns
A NULL
, invisibly.
Examples
txt <- Text$new() txt$set_translations(en = "Hello, world!", fr = "Bonjour, monde!")
Method set_locations()
Register one or more locations.
Usage
Text$set_locations(...)
Arguments
...
Any number of
Location
objects.
Details
This method calls merge_locations()
to merge all
values passed to ...
together with previously registered
Location
objects. The underlying registered
paths and/or ranges won't be duplicated.
Returns
A NULL
, invisibly.
Examples
txt <- Text$new() txt$set_locations( location("a", 1L, 2L, 3L, 4L), location("a", 1L, 2L, 3L, 4L), location("b", 5L, 6L, 7L, 8L))
Method rm_translation()
Remove a registered translation.
Usage
Text$rm_translation(lang = "")
Arguments
lang
A non-empty and non-NA character string identifying a translation to be removed.
Details
You cannot remove lang
when it is registered as the
current source_lang
. You must update source_lang
before
doing so.
Returns
A NULL
, invisibly.
Examples
txt <- Text$new() txt$set_translations(en = "Hello, world!", fr = "Bonjour, monde!") txt$source_lang <- "en" # Remove source_lang and source_text. txt$source_lang <- "fr" txt$rm_translation("en")
Method rm_location()
Remove a registered location.
Usage
Text$rm_location(path = "")
Arguments
path
A non-empty and non-NA character string identifying a
Location
object to be removed.
Returns
A NULL
, invisibly.
Examples
txt <- Text$new() txt$set_locations( location("a", 1L, 2L, 3L, 4L), location("b", 5L, 6L, 7L, 8L)) txt$rm_location("a")
Examples
# Set source language.
language_source_set("en")
# Create Text objects.
txt1 <- text(
location("a", 1L, 2L, 3L, 4L),
location("a", 1L, 2L, 3L, 4L),
location("b", 5L, 6L, 7L, 8L),
location("c", c(9L, 10L), c(11L, 12L), c(13L, 14L), c(15L, 16L)),
en = "Hello, world!",
fr = "Bonjour, monde!",
es = "¡Hola, mundo!")
txt2 <- text(
location("a", 1L, 2L, 3L, 4L),
en = "Hello, world!",
fr = "Bonjour, monde!",
es = "¡Hola, mundo!")
txt3 <- text(
source_lang = "fr2",
location("a", 5L, 6L, 7L, 8L),
en = "Hello, world!",
fr2 = "Bonjour le monde!",
es = "¡Hola, mundo!")
is_text(txt1)
# Texts objects has a specific format.
# print() calls format() internally, as expected.
print(txt1)
print(txt2)
print(txt3)
# Combine Texts objects.
# c() throws an error if they do not have the same
# hash (same souce_text, source_lang, and algorithm).
c(txt1, txt2)
# Text objects with different hashes can be merged.
# This groups Text objects according to their hashes
# and calls c() on each group. It returns a list.
merge_texts(txt1, txt2, txt3)
# Objects can be coerced to a Text object with as_text(). Below is an
# example for call objects. This is for illustration purposes only,
# and the latter should not be used. This method is used internally by
# find_source().
cl <- str2lang("translate('Hello, world!')")
loc <- location("example in class-text", 2L, 32L, 2L, 68L)
as_text(cl, loc)
## ------------------------------------------------
## Method `Text$new`
## ------------------------------------------------
# Consider using text() instead.
txt <- Text$new()
## ------------------------------------------------
## Method `Text$get_translation`
## ------------------------------------------------
txt <- Text$new()
txt$set_translation("en", "Hello, world!")
txt$get_translation("en") ## Outputs "Hello, world!"
txt$get_translation("fr") ## Outputs NULL
## ------------------------------------------------
## Method `Text$set_translation`
## ------------------------------------------------
# Register a pair of source_lang and source_text.
txt <- Text$new()
txt$set_translation("en", "Hello, world!")
txt$source_lang <- "en"
## ------------------------------------------------
## Method `Text$set_translations`
## ------------------------------------------------
txt <- Text$new()
txt$set_translations(en = "Hello, world!", fr = "Bonjour, monde!")
## ------------------------------------------------
## Method `Text$set_locations`
## ------------------------------------------------
txt <- Text$new()
txt$set_locations(
location("a", 1L, 2L, 3L, 4L),
location("a", 1L, 2L, 3L, 4L),
location("b", 5L, 6L, 7L, 8L))
## ------------------------------------------------
## Method `Text$rm_translation`
## ------------------------------------------------
txt <- Text$new()
txt$set_translations(en = "Hello, world!", fr = "Bonjour, monde!")
txt$source_lang <- "en"
# Remove source_lang and source_text.
txt$source_lang <- "fr"
txt$rm_translation("en")
## ------------------------------------------------
## Method `Text$rm_location`
## ------------------------------------------------
txt <- Text$new()
txt$set_locations(
location("a", 1L, 2L, 3L, 4L),
location("b", 5L, 6L, 7L, 8L))
txt$rm_location("a")
Read and Write Text
Description
text_read()
and text_write()
respectively wrap base::readLines()
and
base::writeLines()
. They further validate their arguments, normalize
file paths and re-encode inputs to UTF-8
before reading and writing.
Usage
text_read(path = "", encoding = "UTF-8")
text_write(x = character(), path = "", encoding = "UTF-8")
Arguments
path |
A non-empty and non-NA character string. A path to a file to read text from, or write text to. |
encoding |
A non-empty and non-NA character string. The source character encoding. In almost all cases, this should be UTF-8. Other encodings are internally re-encoded to UTF-8 for portability. |
x |
A character vector. Lines of text to write. Its current encoding is
given by |
Value
text_read()
returns a character vector.
text_write()
returns NULL
, invisibly.
See Also
readLines()
,
writeLines()
,
iconv()
Source Text and Translations
Description
Structure and manipulate the source text of a project and its translations.
Usage
translator(..., id = uuid(), algorithm = algorithms())
is_translator(x)
## S3 method for class 'Translator'
format(x, ...)
## S3 method for class 'Translator'
print(x, ...)
Arguments
... |
Usage depends on the underlying function.
|
id |
A non-empty and non-NA character string. A globally unique
identifier for the |
algorithm |
A non-empty and non-NA character string equal to |
x |
Any R object. |
Details
A Translator
object encapsulates the source text of a project
(or any other context) and all related translations. Under the hood,
Translator
objects are collections of Text
objects.
These do most of the work. They are treated as lower-level component and in
typical situations, users rarely interact with them.
Translator
objects can be saved and exported with
translator_write()
. They can be imported back into an R session
with translator_read()
.
Value
translator()
returns an R6
object of class
Translator
.
is_translator()
returns a logical value.
format()
returns a character vector.
print()
returns argument x
invisibly.
Active bindings
id
A non-empty and non-NA character string. A globally unique identifier for the underlying object. Beware of plausible collisions when using user-defined values.
algorithm
A non-empty and non-NA character string equal to
"sha1"
, or"utf8"
. The algorithm to use when hashing source information for identification purposes.hashes
A character vector of non-empty and non-NA values, or
NULL
. The set of allhash
exposed by registeredText
objects. If there is none,hashes
isNULL
. This is a read-only field updated whenever fieldalgorithm
is updated.source_texts
A character vector of non-empty and non-NA values, or
NULL
. The set of all registered source texts. If there is none,source_texts
isNULL
. This is a read-only field.source_langs
A character vector of non-empty and non-NA values, or
NULL
. The set of all registered source languages. This is a read-only field.If there is none,
source_langs
isNULL
.If there is one unique value,
source_langs
is an unnamed character string.Otherwise, it is a named character vector.
languages
A character vector of non-empty and non-NA values, or
NULL
. The set of all registeredlanguages
(codes). If there is none,languages
isNULL
. This is a read-only field.native_languages
A named character vector of non-empty and non-NA values, or
NULL
. A map (bijection) oflanguages
(codes) to native language names. Names are codes and values are native languages. If there is none,native_languages
isNULL
.While users retain full control over
native_languages
, it is best to use well-known schemes such as IETF BCP 47, or ISO 639-1. Doing so maximizes portability and cross-compatibility between packages.Update this field with method
$set_native_languages()
. See below for more information.
Methods
Public methods
Method new()
Create a Translator
object.
Usage
Translator$new(id = uuid(), algorithm = algorithms())
Arguments
id
A non-empty and non-NA character string. A globally unique identifier for the
Translator
object. Beware of collisions when using user-defined values.algorithm
A non-empty and non-NA character string equal to
"sha1"
, or"utf8"
. The algorithm to use when hashing source information for identification purposes.
Returns
An R6
object of class Translator
.
Examples
# Consider using translator() instead. tr <- Translator$new()
Method translate()
Translate source text.
Usage
Translator$translate( ..., lang = language_get(), source_lang = language_source_get() )
Arguments
...
Any number of vectors containing atomic elements. Each vector is normalized as a paragraph.
Elements are coerced to character values.
NA values and empty strings are discarded.
Multi-line strings are supported and encouraged. Blank lines are interpreted (two or more newline characters) as paragraph separators.
lang
A non-empty and non-NA character string. The underlying language.
A language is usually a code (of two or three letters) for a native language name. While users retain full control over codes, it is best to use language codes stemming from well-known schemes such as IETF BCP 47, or ISO 639-1 to maximize portability and cross-compatibility.
source_lang
A non-empty and non-NA character string. The language of the source text. See argument
lang
for more information.
Details
See normalize()
for further details on how ...
is normalized.
Returns
A character string. If there is no corresponding translation,
the value passed to method $set_default_value()
is
returned. NULL
is returned by default.
Examples
tr <- Translator$new() tr$set_text(en = "Hello, world!", fr = "Bonjour, monde!") tr$translate("Hello, world!", lang = "en") ## Outputs "Hello, world!" tr$translate("Hello, world!", lang = "fr") ## Outputs "Bonjour, monde!"
Method get_translation()
Extract a translation or a source text.
Usage
Translator$get_translation(hash = "", lang = "")
Arguments
hash
A non-empty and non-NA character string. The unique identifier of the requested source text.
lang
A non-empty and non-NA character string. The underlying language.
A language is usually a code (of two or three letters) for a native language name. While users retain full control over codes, it is best to use language codes stemming from well-known schemes such as IETF BCP 47, or ISO 639-1 to maximize portability and cross-compatibility.
Returns
A character string. If there is no corresponding translation,
the value passed to method $set_default_value()
is
returned. NULL
is returned by default.
Examples
tr <- Translator$new() tr$set_text(en = "Hello, world!") # Consider using translate() instead. tr$get_translation("256e0d7", "en") ## Outputs "Hello, world!"
Method get_text()
Extract a source text and its translations.
Usage
Translator$get_text(hash = "")
Arguments
hash
A non-empty and non-NA character string. The unique identifier of the requested source text.
Returns
A Text
object, or NULL
.
Examples
tr <- Translator$new() tr$set_text(en = "Hello, world!") tr$get_translation("256e0d7", "en") ## Outputs "Hello, world!"
Method set_text()
Register a source text.
Usage
Translator$set_text(..., source_lang = language_source_get())
Arguments
Returns
A NULL
, invisibly.
Examples
tr <- Translator$new() tr$set_text(en = "Hello, world!", location())
Method set_texts()
Register one or more source texts.
Usage
Translator$set_texts(...)
Arguments
...
Any number of
Text
objects.
Details
This method calls merge_texts()
to merge all values
passed to ...
together with previously registered
Text
objects. The underlying registered source texts,
translations, and Location
objects won't be
duplicated.
Returns
A NULL
, invisibly.
Examples
# Set source language. language_source_set("en") tr <- Translator$new() # Create Text objects. txt1 <- text( location("a", 1L, 2L, 3L, 4L), en = "Hello, world!", fr = "Bonjour, monde!") txt2 <- text( location("b", 5L, 6L, 7L, 8L), en = "Farewell, world!", fr = "Au revoir, monde!") tr$set_texts(txt1, txt2)
Method rm_text()
Remove a registered source text.
Usage
Translator$rm_text(hash = "")
Arguments
hash
A non-empty and non-NA character string identifying the source text to remove.
Returns
A NULL
, invisibly.
Examples
tr <- Translator$new() tr$set_text(en = "Hello, world!") tr$rm_text("256e0d7")
Method set_native_languages()
Map a language code to a native language name.
Usage
Translator$set_native_languages(...)
Arguments
...
Any number of named, non-empty, and non-NA character strings. Names are codes and values are native languages. See field
native_languages
for more information.
Returns
A NULL
, invisibly.
Examples
tr <- Translator$new() tr$set_native_languages(en = "English", fr = "Français") # Remove existing entries. tr$set_native_languages(fr = NULL)
Method set_default_value()
Register a default value to return when there is no corresponding translations for the requested language.
Usage
Translator$set_default_value(value = NULL)
Arguments
value
A
NULL
or a non-NA character string. It can be empty. The former is returned by default.
Details
This modifies what methods $translate()
and
$get_translation()
returns when there is no
translation for lang
.
Returns
A NULL
, invisibly.
Examples
tr <- Translator$new() tr$set_default_value("<unavailable>")
See Also
find_source()
,
translator_read()
,
translator_write()
Examples
# Set source language.
language_source_set("en")
# Create a Translator object.
# This would normally be done automatically
# by find_source(), or translator_read().
tr <- translator(
id = "test-translator",
en = "English",
es = "Español",
fr = "Français",
text(
location("a", 1L, 2L, 3L, 4L),
en = "Hello, world!",
fr = "Bonjour, monde!"),
text(
location("b", 1L, 2L, 3L, 4L),
en = "Farewell, world!",
fr = "Au revoir, monde!"))
is_translator(tr)
# Translator objects has a specific format.
# print() calls format() internally, as expected.
print(tr)
## ------------------------------------------------
## Method `Translator$new`
## ------------------------------------------------
# Consider using translator() instead.
tr <- Translator$new()
## ------------------------------------------------
## Method `Translator$translate`
## ------------------------------------------------
tr <- Translator$new()
tr$set_text(en = "Hello, world!", fr = "Bonjour, monde!")
tr$translate("Hello, world!", lang = "en") ## Outputs "Hello, world!"
tr$translate("Hello, world!", lang = "fr") ## Outputs "Bonjour, monde!"
## ------------------------------------------------
## Method `Translator$get_translation`
## ------------------------------------------------
tr <- Translator$new()
tr$set_text(en = "Hello, world!")
# Consider using translate() instead.
tr$get_translation("256e0d7", "en") ## Outputs "Hello, world!"
## ------------------------------------------------
## Method `Translator$get_text`
## ------------------------------------------------
tr <- Translator$new()
tr$set_text(en = "Hello, world!")
tr$get_translation("256e0d7", "en") ## Outputs "Hello, world!"
## ------------------------------------------------
## Method `Translator$set_text`
## ------------------------------------------------
tr <- Translator$new()
tr$set_text(en = "Hello, world!", location())
## ------------------------------------------------
## Method `Translator$set_texts`
## ------------------------------------------------
# Set source language.
language_source_set("en")
tr <- Translator$new()
# Create Text objects.
txt1 <- text(
location("a", 1L, 2L, 3L, 4L),
en = "Hello, world!",
fr = "Bonjour, monde!")
txt2 <- text(
location("b", 5L, 6L, 7L, 8L),
en = "Farewell, world!",
fr = "Au revoir, monde!")
tr$set_texts(txt1, txt2)
## ------------------------------------------------
## Method `Translator$rm_text`
## ------------------------------------------------
tr <- Translator$new()
tr$set_text(en = "Hello, world!")
tr$rm_text("256e0d7")
## ------------------------------------------------
## Method `Translator$set_native_languages`
## ------------------------------------------------
tr <- Translator$new()
tr$set_native_languages(en = "English", fr = "Français")
# Remove existing entries.
tr$set_native_languages(fr = NULL)
## ------------------------------------------------
## Method `Translator$set_default_value`
## ------------------------------------------------
tr <- Translator$new()
tr$set_default_value("<unavailable>")
Read and Write Translations
Description
Export Translator
objects to text files and import such
files back into R as Translator
objects.
Usage
translator_read(
path = getOption("transltr.path"),
encoding = "UTF-8",
verbose = getOption("transltr.verbose", TRUE),
translations = TRUE
)
translator_write(
tr = translator(),
path = getOption("transltr.path"),
overwrite = FALSE,
verbose = getOption("transltr.verbose", TRUE),
translations = TRUE
)
translations_read(path = "", encoding = "UTF-8", tr = NULL)
translations_write(tr = translator(), path = "", lang = "")
translations_paths(
tr = translator(),
parent_dir = dirname(getOption("transltr.path"))
)
Arguments
path |
A non-empty and non-NA character string. A path to a file to read from, or write to.
See Details for more information. |
encoding |
A non-empty and non-NA character string. The source character encoding. In almost all cases, this should be UTF-8. Other encodings are internally re-encoded to UTF-8 for portability. |
verbose |
A non-NA logical value. Should progress information be reported? |
translations |
A non-NA logical value. Should translations files also
be read, or written along with |
tr |
A This argument is |
overwrite |
A non-NA logical value. Should existing files be
overwritten? If such files are detected and |
lang |
A non-empty and non-NA character string. The underlying language. A language is usually a code (of two or three letters) for a native language name. While users retain full control over codes, it is best to use language codes stemming from well-known schemes such as IETF BCP 47, or ISO 639-1 to maximize portability and cross-compatibility. |
parent_dir |
A non-empty and non-NA character string. A path to a parent directory. |
Details
The information contained within a Translator
object is
split: translations are reorganized by language and exported independently
from other fields.
translator_write()
creates two types of file: a single Translator file,
and zero, or more translations files. These are plain text files that can
be inspected and modified using a wide variety of tools and systems. They
target different audiences:
the Translator file is useful to developers, and
translations files are meant to be shared with non-technical collaborators such as translators.
translator_read()
first reads a Translator file and creates a
Translator
object from it. It then calls
translations_paths()
to list expected translations files (that should
normally be stored alongside the Translator file), attempts to read them,
and registers successfully imported translations.
There are two requirements.
All files must be stored in the same directory. By default, this is set equal to
inst/transltr/
(seegetOption("transltr.path")
).Filenames of translations files are standardized and must correspond to languages (language codes, see
lang
).
The inner workings of the serialization process are thoroughly described in
serialize()
.
Translator file
A Translator file contains a YAML (1.1)
representation of a Translator
object stripped of all
its translations except those that are registered as source text.
Translations files
A translations file contains a FLAT representation of a set of translations sharing the same target language. This format attempts to be as simple as possible for non-technical collaborators.
Value
translator_read()
returns an R6
object of class
Translator
.
translator_write()
returns NULL
, invisibly. It is used for its
side-effects of
creating a Translator file to the location given by
path
, andcreating further translations file(s) in the same directory if
translations
isTRUE
.
translations_read()
returns an S3 object of class
ExportedTranslations
.
translations_write()
returns NULL
, invisibly.
translations_paths()
returns a named character vector.
See Also
Examples
# Set source language.
language_source_set("en")
# Create a path to a temporary Translator file.
temp_path <- tempfile(pattern = "translator_", fileext = ".yml")
temp_dir <- dirname(temp_path) ## tempdir() could also be used
# Create a Translator object.
# This would normally be done by find_source(), or translator_read().
tr <- translator(
id = "test-translator",
en = "English",
es = "Español",
fr = "Français",
text(
en = "Hello, world!",
fr = "Bonjour, monde!"),
text(
en = "Farewell, world!",
fr = "Au revoir, monde!"))
# Export it. This creates 3 files: 1 Translator file, and 2 translations
# files because two non-source languages are registered. The file for
# language "es" contains placeholders and must be completed.
translator_write(tr, temp_path)
translator_read(temp_path)
# Translations can be read individually.
translations_files <- translations_paths(tr, temp_dir)
translations_read(translations_files[["es"]])
translations_read(translations_files[["fr"]])
# This is rarely useful, but translations can also be exported individually.
# You may use this to add a new language, as long as it has an entry in the
# underlying Translator object (or file).
tr$set_native_languages(el = "Greek")
translations_files <- translations_paths(tr, temp_dir)
translations_write(tr, translations_files[["el"]], "el")
translations_read(file.path(temp_dir, "el.txt"))
Universally Unique Identifiers
Description
Generate a random UUID (Universally Unique Identifier) that complies to what RFC4122 prescribes. Such a value is also known as a version 4 UUID.
Usage
uuid()
uuid_raw()
uuid_is(x)
Arguments
x |
An R object. |
Details
uuid()
calls uuid_raw()
and formats its output accordingly.
Pseudo-random bytes are generated with sample()
whenever uuid_raw()
is called. This is most likely done before runtime when
Translator
objects are created. uuid_raw()
samples values
in the [0, 255]
range with replacement and converts them to raw
values. The user must ensure that the underlying seed is appropriate when
generating UUIDs. See set.seed()
for more information.
Value
uuid()
returns a character of length 1 containing exactly 36
characters: 32 hexadecimal characters and 4 hyphens (used as separators).
uuid_raw()
returns a raw vector of length 16.
uuid_is()
returns a logical vector having the same length as x
. It
checks whether its elements are valid version 4 (variant 1) UUIDs or
not. It returns FALSE
for any other kind of UUID.
Note
UUIDs are designed to be globally unique (collisions are extremely unlikely) and are sometimes called GUIDs (Globally Unique Identifiers). There are several UUID versions with slightly different purposes.
Package transltr
uses random identifiers (version 4/variant 1,
also known as DCE 1.1, ISO/IEC 11578:1996).
See Also
Examples
uuid()
uuid_raw()
uuid_is(uuid()) ## TRUE
uuid_is(uuid_raw()) ## FALSE, uuid_raw() does not return a string.
Apply Wrappers
Description
These functions wrap a function of the apply()
family, and enforce various values for convenience. Arguments are
passed as is to an apply()
function.
Usage
vapply_1l(x, fun, ...)
vapply_1i(x, fun, ...)
vapply_1c(x, fun, ...)
map(fun, ..., more = list())
Arguments
x |
See argument |
fun |
|
... |
Further optional arguments passed to |
more |
See argument |
Value
vapply_1l()
,
vapply_1l()
, and
vapply_1c()
respectively return a logical, an integer, and a character
vector having the same length as x
. Names are always discarded.
map()
returns a list having the same length as the longest element passed
to ...
.
See Also
Other utility functions:
format_vector()
,
stops()
,
str_to()