Title: | Create Regular Expressions Easily |
Version: | 0.1.1 |
Description: | Build regular expressions using grammar and functionality inspired by https://github.com/VerbalExpressions. Usage of the %>% is encouraged to build expressions in a chain-like fashion. |
License: | MIT + file LICENSE |
Encoding: | UTF-8 |
RoxygenNote: | 7.2.3 |
Depends: | R (≥ 2.10) |
Imports: | magrittr |
Suggests: | testthat, knitr, rmarkdown, covr, stringr |
VignetteBuilder: | knitr |
URL: | https://github.com/VerbalExpressions/RVerbalExpressions |
BugReports: | https://github.com/VerbalExpressions/RVerbalExpressions/issues |
NeedsCompilation: | no |
Packaged: | 2024-03-20 16:36:14 UTC; tlittlef |
Author: | Tyler Littlefield |
Maintainer: | Tyler Littlefield <tylurp1@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2024-03-20 22:20:05 UTC |
Pipe operator
Description
See magrittr::%>%
for details.
Usage
lhs %>% rhs
Constructs a Verbal Expression
Description
Add this to the beginning of every verbal expression chain. This
simply returns an empty character vector so that the next step in the chain
can provide a value without explicitly writing value = "blah"
.
Usage
rx()
Examples
rx()
# this
rx() %>%
rx_find("cat") %>%
rx_anything() %>%
rx_find("dog")
# instead of
rx_find(value = "cat") %>%
rx_anything() %>%
rx_find("dog")
Match alphanumeric characters.
Description
Matches both letters (case insensitive) and numbers (a through z and 0 through 9).
Usage
rx_alnum(.data = NULL, inverse = FALSE)
Arguments
.data |
Expression to append, typically pulled from the pipe |
inverse |
Invert match behavior, defaults to |
Examples
rx_alnum()
rx_alnum(inverse = TRUE)
# create an expression
x <- rx_alnum()
# create input
string <- "Apple 1!"
# extract match
regmatches(string, gregexpr(x, string))
Match alphabetic characters.
Description
Matches letters (case insensitive) only.
Usage
rx_alpha(.data = NULL, inverse = FALSE)
Arguments
.data |
Expression to append, typically pulled from the pipe |
inverse |
Invert match behavior, defaults to |
Examples
rx_alpha()
rx_alpha(inverse = TRUE)
# create an expression
x <- rx_alpha()
# create input
string <- "Apple 1!"
# extract match
regmatches(string, gregexpr(x, string))
Match any of these characters exactly once.
Description
Constructs a character class, sometimes called a character set. With this particular expression, you can tell the regex engine to match only one out of several characters. It does this by simply placing the characters you want to match between square brackets.
Usage
rx_any_of(.data = NULL, value)
Arguments
.data |
Expression to append, typically pulled from the pipe |
value |
Expression to optionally match |
References
Character class: https://www.regular-expressions.info/charclass.html
Examples
rx_any_of(value = "abc")
# create an expression
x <- rx_any_of(value = "abc")
grepl(x, "c") # should be true
grepl(x, "d") # should be false
y <- rx() %>%
rx_find("gr") %>%
rx_any_of("ae") %>%
rx_find("y")
regmatches("gray", regexec(y, "gray"))[[1]]
regmatches("grey", regexec(y, "grey"))[[1]]
Match any character(s) any (including zero) number of times.
Description
This expression will match everything except line breaks using
the dot and the star. The Dot .
is a
metacharacter and the Star *
is a quantifier. When
combined the expression is considered greedy because it will match everything
(except line breaks) 0 or more times.
Usage
rx_anything(.data = NULL, mode = "greedy")
Arguments
.data |
Expression to append, typically pulled from the pipe |
mode |
Matching mode ( |
References
Dot: https://www.regular-expressions.info/dot.html
Star Quantifier: https://www.regular-expressions.info/repeat.html
Greedy and Lazy Quantifiers: https://www.regular-expressions.info/repeat.html#greedy
Examples
rx_anything()
rx_anything(mode = "lazy")
x <- rx() %>%
rx_start_of_line() %>%
rx_anything() %>%
rx_end_of_line()
grepl(x, "anything!") # this should be true
grepl(rx_anything(), "") # this should be true
grepl(rx_something(), "") # this should be false
Match any character(s) except these any (including zero) number of times.
Description
This expression will match everything except whatever characters
the user specifies in the value
parameter. It does this by adding a
caret symbol ^
at the beginning of a character set []
. Typing
a caret after the opening square bracket negates the character class. The
result is that the character class matches any character that is not in the
character class. Unlike the dot, negated character classes also match
(invisible) line break characters. If you don't want a negated character
class to match line breaks, you need to include the line break characters in
the class.
Usage
rx_anything_but(.data = NULL, value, mode = "greedy")
Arguments
.data |
Expression to append, typically pulled from the pipe |
value |
Characters to not match |
mode |
Matching mode ( |
References
Character Class: https://www.regular-expressions.info/charclass.html
Examples
rx_anything_but(value = "abc")
Negative lookaround functions
Description
This function facilitates matching by providing negative assurances for surrounding symbols/groups of symbols. It allows for building expressions that are dependent on context of occurrence.
Usage
rx_avoid_prefix(.data = NULL, value)
rx_avoid_suffix(.data = NULL, value)
Arguments
.data |
Expression to append, typically pulled from the pipe |
value |
Exact expression to match |
Examples
# matches any number of digits, but not preceded by "USD"
rx() %>%
rx_avoid_prefix('USD') %>%
rx_digit() %>%
rx_one_or_more()
#matches a digit, but not followed by " dollars"
rx() %>%
rx_digit() %>%
rx_avoid_suffix(' dollars')
Begin a capture group.
Description
Begin a capture group.
Usage
rx_begin_capture(.data = NULL)
Arguments
.data |
Expression to append, typically pulled from the pipe |
Details
Capture groups are used to extract data from within the regular expression match for further processing.
Match a digit (0–9).
Description
The function rx_digit()
looks for tabs with the following
expression: %%d
and matches single digit. Plural version matches
specified number of digits n
(equivalent to rx_digit() %>% rx_count(n)
).
Usage
rx_digit(.data = NULL, inverse = FALSE)
Arguments
.data |
Expression to append, typically pulled from the pipe |
inverse |
Invert match behavior, defaults to |
Examples
rx_digit()
rx_digit(inverse = TRUE)
# create an expression
x <- rx_digit()
# create input
string <- "1 apple"
# extract match
regmatches(string, regexpr(x, string))
Alternatively, match either expression.
Description
Expression to match instead. If both expressions exists, both
will be returned. This just adds the vertical bar |
often called an
alternator which allows the user to find this or that, or both!
Usage
rx_either_of(.data, ...)
Arguments
.data |
Expression to append, typically pulled from the pipe |
... |
A character vector |
Examples
x <- rx() %>%
rx_either_of("cat", "dog") %>%
rx_space() %>%
rx_find("food")
string <- c("dog food", "cat food", "fish food")
grep(x, string, value = TRUE)
End a capture group.
Description
End a capture group.
Usage
rx_end_capture(.data = NULL)
Arguments
.data |
Expression to append, typically pulled from the pipe |
Details
Capture groups are used to extract data from within the regular expression match for further processing.
Match the expression only if it appears till the end of the line.
Description
Control whether to match the expression only if it appears till
the end of the line. Basically, append a $
to the end of the
expression. The dollar sign is considered an anchor and matches the
position of characters. It can be used to "anchor" the regex match at a
certain position, in this case the dollar sign matches right after the last
character in the string.
Usage
rx_end_of_line(.data = NULL, enable = TRUE)
Arguments
.data |
Expression to match, typically pulled from the pipe |
enable |
Whether to enable this behavior, defaults to |
References
Anchors: https://www.regular-expressions.info/anchors.html
Examples
rx_end_of_line(enable = TRUE)
rx_end_of_line(enable = FALSE)
rx_end_of_line("abc", enable = TRUE)
# create expression
x <- rx() %>%
rx_start_of_line(FALSE) %>%
rx_find("apple") %>%
rx_end_of_line()
grepl(x, "apples") # should be false
grepl(x, "apple") # should be true
Match an expression.
Description
Identify a specific pattern exactly.
Usage
rx_find(.data = NULL, value)
Arguments
.data |
Expression to append, typically pulled from the pipe |
value |
Exact expression to match |
References
Capturing group: https://www.regular-expressions.info/brackets.html
Stack Overflow: https://stackoverflow.com/questions/3512471
Examples
rx_find(value = "apple")
# create expression
x <- rx_find(value = "apples")
grepl(x, "apple") # should be false
grepl(x, "apples") # should be true
Match a line break.
Description
This expression looks for line breaks, both Unix and Windows style by using the appropriate non printable characters.
Usage
rx_line_break(.data = NULL)
Arguments
.data |
Expression to append, typically pulled from the pipe |
References
Unix style: https://codepoints.net/U+000A
Windows style: https://codepoints.net/U+000D
Non printable character: https://www.regular-expressions.info/nonprint.html
Examples
rx_line_break()
# create an expression
x <- rx_line_break()
# create input
string <- "foo\nbar"
# extract match
regmatches(string, regexpr(x, string))
Match lower case letters.
Description
Matches lower case letters only.
Usage
rx_lowercase(.data = NULL, inverse = FALSE)
Arguments
.data |
Expression to append, typically pulled from the pipe |
inverse |
Invert match behavior, defaults to |
Examples
rx_lowercase()
rx_lowercase(inverse = TRUE)
# create an expression
x <- rx_lowercase()
y <- rx_lowercase(inverse = TRUE)
# create input
string <- "Apple 1!"
# extract match
regmatches(string, gregexpr(x, string))
regmatches(string, gregexpr(y, string))
Optionally match an expression.
Description
This expression uses a quantifier ?
to optionally
match things. Specifically, the question mark makes the preceding token in
the regular expression optional.
Usage
rx_maybe(.data = NULL, value)
Arguments
.data |
Expression to append, typically pulled from the pipe |
value |
Expression to optionally match |
References
Quantifiers: https://www.regular-expressions.info/optional.html
Examples
rx_maybe(value = "abc")
# create expression
x <- rx() %>%
rx_start_of_line() %>%
rx_maybe("abc") %>%
rx_end_of_line(enable = FALSE)
grepl(x, "xyz") # should be true
Match the previous group any number of times.
Description
Match the previous group any number of times.
Usage
rx_multiple(.data = NULL, value = NULL, min = NULL, max = NULL)
Arguments
.data |
Expression to append, typically pulled from the pipe |
value |
Item to match |
min |
Minimum number of times it should be present |
max |
Maximum number of times it should be present |
Match the previous stuff zero or many times.
Description
This function simply adds a * to the end of the expression.
Usage
rx_none_or_more(.data = NULL, mode = "greedy")
Arguments
.data |
Expression to append, typically pulled from the pipe |
mode |
Matching mode ( |
Examples
rx_none_or_more()
# create an expression
x <- rx() %>%
rx_find("a") %>%
rx_none_or_more()
# create input
input <- "aaa"
# extract match
regmatches(input, regexpr(x, input))
Ensure that the parameter does not follow.
Description
This expression uses a negative lookahead to ensure the
value given does not follow the previous verbal expression,
perl = TRUE
is required. For example, if you were to look for the
letter q but not the letter u you might translate this to,
"find the letter q everytime the letter u does not come after it".
Usage
rx_not(.data = NULL, value)
Arguments
.data |
Expression to append, typically pulled from the pipe |
value |
Value to ensure absence of |
References
Negative lookahead: https://www.regular-expressions.info/lookaround.html
Examples
rx_not(value = "FEB-28")
# construct expression
x <- rx() %>%
rx_start_of_line() %>%
rx_find('FEB-29') %>%
rx_not("FEB-28")
# create a string
string <- c("FEB-29-2017", "FEB-28-2017")
# extract matches, perl = TRUE is required for negative lookahead
regmatches(string, regexpr(x, string, perl = TRUE))
# another example
rx() %>%
rx_find("q") %>%
rx_not("u") %>%
grepl(x = c("qu", "qa", "qq", "q", "q u"), perl = TRUE)
Match the previous stuff one or more times.
Description
This function simply adds a + to the end of the expression.
Usage
rx_one_or_more(.data = NULL, mode = "greedy")
Arguments
.data |
Expression to append, typically pulled from the pipe |
mode |
Matching mode ( |
Examples
rx_one_or_more()
# create an expression
x <- rx() %>%
rx_find("a") %>%
rx_one_or_more()
# create input
input <- "aaa"
# extract match
regmatches(input, regexpr(x, input))
Match punctuation characters.
Description
Matches punctuation characters only:
! \" # $ % & ' ( ) * + , - . / : ; < = > ? @ [ \ ] ^ _ ` { | } ~
.
Usage
rx_punctuation(.data = NULL, inverse = FALSE)
Arguments
.data |
Expression to append, typically pulled from the pipe |
inverse |
Invert match behavior, defaults to |
Examples
rx_punctuation()
rx_punctuation(inverse = TRUE)
# create an expression
x <- rx_punctuation()
# create input
string <- 'Apple 1!'
# extract match
regmatches(string, gregexpr(x, string))
# dont extract punctuation
y <- rx_punctuation(inverse = TRUE)
regmatches(string, gregexpr(y, string))
Match any character within the range defined by the parameters.
Description
Value parameter will be interpreted as pairs. For example,
range(c('a', 'z', '0', '9'))
will be interpreted to mean any
character within the ranges a–z (ascii x–y) or 0–9 (ascii x–y). The method
expects an even number of parameters; unpaired parameters are ignored.
Usage
rx_range(.data = NULL, value)
Arguments
.data |
Expression to append, typically pulled from the pipe |
value |
Range of characters. The method expects an even number of parameters; unpaired parameters are ignored. |
Examples
rx_range(value = c('1', '3'))
# create an expression
x <- rx_range(value = c('1', '3'))
grepl(x, "2") # should be true
grepl(x, "4") # should be false
Positive lookaround functions
Description
This function facilitates matching by providing assurances for surrounding symbols/groups of symbols. It allows for building expressions that are dependent on context of occurrence.
Usage
rx_seek_prefix(.data = NULL, value)
rx_seek_suffix(.data = NULL, value)
Arguments
.data |
Expression to append, typically pulled from the pipe |
value |
Exact expression to match |
Examples
# this will match anything between square brackets
rx() %>%
rx_seek_prefix("[") %>%
rx_anything("lazy") %>%
rx_seek_suffix(']')
Match any character(s) at least once.
Description
This expression is almost identical to rx_anything()
with one major exception, a +
is used instead of a *
. This
means rx_something()
expects something whereas
anything()
expects anything including... nothing!
Usage
rx_something(.data = NULL, mode = "greedy")
Arguments
.data |
Expression to append, typically pulled from the pipe |
mode |
Matching mode ( |
References
Metacharacters: https://www.regular-expressions.info/characters.html#special
Greedy and Lazy Quantifiers: https://www.regular-expressions.info/repeat.html#greedy
Examples
rx_something()
# construct an expression
x <- rx_something()
grepl(x, "something!") # this should be true
grepl(x, "") # this should be false
grepl(rx_anything(), "") # this should be true
Match any character(s) except these at least once.
Description
This expression is almost identical to rx_anything_but()
with one major exception, a +
is used instead of a *
. This
means rx_something_but()
expects something whereas
rx_anything_but()
expects anything including... nothing!
Usage
rx_something_but(.data = NULL, value, mode = "greedy")
Arguments
.data |
Expression to append, typically pulled from the pipe |
value |
Expression to optionally match |
mode |
Matching mode ( |
References
Metacharacters: https://www.regular-expressions.info/characters.html#special
Greedy and Lazy Quantifiers: https://www.regular-expressions.info/repeat.html#greedy
Examples
rx_something_but(value = "abc")
# create an expression
x <- rx_something_but(value = "python")
grepl(x, "R") # should be true
grepl(x, "py") # should be false
Match a space character.
Description
Matches a space character.
Usage
rx_space(.data = NULL, inverse = FALSE)
Arguments
.data |
Expression to append, typically pulled from the pipe |
inverse |
Invert match behavior, defaults to |
Examples
# match space, default
rx_space()
# dont match space
rx_space(inverse = TRUE)
# create an expression
x <- rx_space()
# create input
string <- "1 apple\t"
# extract match
regmatches(string, regexpr(x, string))
# extract no whitespace by inverting behavior
y <- rx_space(inverse = TRUE)
regmatches(string, gregexpr(y, string))
Match the expression only if it appears from beginning of line.
Description
Control whether to match the expression only if it appears from the beginning of the line.
Usage
rx_start_of_line(.data = NULL, enable = TRUE)
Arguments
.data |
Expression to append, typically pulled from the pipe |
enable |
Whether to enable this behavior, defaults to |
Examples
rx_start_of_line(enable = TRUE)
rx_start_of_line(enable = FALSE)
# create expression
x <- rx() %>%
rx_start_of_line() %>%
rx_find("apple")
grepl(x, "pineapple") # should be false
grepl(x, "apple") # should be true
Match a tab character.
Description
Match a tab character.
Usage
rx_tab(.data = NULL, inverse = FALSE)
Arguments
.data |
Expression to append, typically pulled from the pipe |
inverse |
Invert match behavior, defaults to |
Details
This function is looks for tabs with the following
expression: \t
Tab character: https://codepoints.net/U+0009
Examples
rx_tab()
rx_tab(inverse = TRUE)
# create an expression
x <- rx_tab()
# create input
string <- "foo\tbar"
# extract match
regmatches(string, regexpr(x, string))
Match upper case letters.
Description
Matches upper case letters only.
Usage
rx_uppercase(.data = NULL, inverse = FALSE)
Arguments
.data |
Expression to append, typically pulled from the pipe |
inverse |
Invert match behavior, defaults to |
Examples
rx_uppercase()
rx_uppercase(inverse = TRUE)
# create an expression
x <- rx_uppercase()
y <- rx_uppercase(inverse = TRUE)
# create input
string <- "Apple 1!"
# extract match
regmatches(string, gregexpr(x, string))
regmatches(string, gregexpr(y, string))
Match a whitespace character.
Description
Match a whitespace character.
Usage
rx_whitespace(.data = NULL, inverse = FALSE)
Arguments
.data |
Expression to append, typically pulled from the pipe |
inverse |
Invert match behavior, defaults to |
Details
Match a whitespace character (one of space, tab, carriage return, new line, vertical tab and form feed).
carriage return: https://codepoints.net/U+000D
new line: https://codepoints.net/U+000
vertical tab: https://codepoints.net/U+000B
form feed: https://codepoints.net/U+000C
Examples
# match whitespace, default
rx_whitespace()
# dont match whitespace
rx_whitespace(inverse = TRUE)
# create an expression
x <- rx_whitespace()
# create input
string <- "1 apple"
# extract match
regmatches(string, regexpr(x, string))
# extract no whitespace by inverting behavior
y <- rx_whitespace(inverse = TRUE)
regmatches(string, gregexpr(y, string))
Control case-insensitive matching.
Description
Control case-insensitive matching.
Usage
rx_with_any_case(.data = NULL, enable = TRUE)
Arguments
.data |
Expression to append, typically pulled from the pipe |
enable |
Whether to enable this behavior |
Details
Equivalent to adding or removing the i modifier.
Examples
rx_with_any_case()
# case insensitive
x <- rx() %>%
rx_find("abc") %>%
rx_with_any_case()
# case sensitive
y <- rx() %>%
rx_find("abc") %>%
rx_with_any_case(enable = FALSE)
grepl(x, "ABC") # should be true
grepl(y, "ABC") # should be false
Match a word.
Description
Match a word—a string of word characters (a–z, A–Z, 0–9 or _).
This function is looks for tabs with the following expression: \w+
Usage
rx_word(.data = NULL)
Arguments
.data |
Expression to append, typically pulled from the pipe |
Examples
rx_word()
# create an expression
x <- rx_word()
# create inputs
string1 <- "foo_bar"
string2 <- "foo-bar"
# extract matches
regmatches(string1, regexpr(x, string1))
regmatches(string2, regexpr(x, string2)) # doesn't match -
Match a word character.
Description
Match a word character (a–z, A–Z, 0–9 or _).
Usage
rx_word_char(.data = NULL)
Arguments
.data |
Expression to append, typically pulled from the pipe |
Examples
rx_word_char()
# Same as rx_word()
x <- rx_word_char() %>%
rx_one_or_more()
Find beginning or end of a word.
Description
Match beginning or end of a word—a string consisting of of word characters (a–z, A–Z, 0–9 or _).
Usage
rx_word_edge(.data = NULL)
Arguments
.data |
Expression to append, typically pulled from the pipe |
Examples
rx_word_edge()
x <- rx() %>%
rx_word_edge() %>%
rx_alpha() %>%
rx_one_or_more() %>%
rx_word_edge()
# create inputs
string1 <- "foobar"
string2 <- "foo 23a bar"
# matches 'foobar'
regmatches(string1, regexpr(x, string1))
# matches 'foo' and 'bar' separately
regmatches(string2, gregexpr(x, string2))
Escape characters expected special by regex engines
Description
Takes a string and escapes all characters considered special by
the regex engine. This is used internally when you add a string to the
value
parameter in most of the available functions. It is exported
and usable externally for users that want to escape all special characters
in their desired match. The following special characters are escaped
. | * ? + ( ) { } ^ $ \ : = [ ]
Usage
sanitize(x)
Arguments
x |
String to sanitize |
Examples
sanitize("^")
sanitize("^+")
sanitize("^+?")