Help for package LexFindR

Title:

Find Related Items and Lexical Dimensions in a Lexicon

Version:

1.1.0

Date:

2024-6-15

Description:

Implements code to identify lexical competitors in a given list of words. We include many of the standard competitor types used in spoken word recognition research, such as functions to find cohorts, neighbors, and rhymes, amongst many others. The package includes documentation for using a variety of lexicon files, including those with form codes made up of multiple letters (i.e., phoneme codes) and also basic orthographies. Importantly, the code makes use of multiple CPU cores and vectorization when possible, making it extremely fast and able to handle large lexicons. Additionally, the package contains documentation for users to easily write new functions, allowing researchers to examine other relationships within a lexicon. Preprint: https://osf.io/preprints/psyarxiv/8dyru/. Open access: <doi:10.3758/s13428-021-01667-6>. Citation: Li, Z., Crinnion, A.M. & Magnuson, J.S. (2021). <doi:10.3758/s13428-021-01667-6>.

License:

GPL (≥ 3)

Encoding:

UTF-8

LazyData:

true

RoxygenNote:

7.3.1

Suggests:

tidyverse, knitr, rmarkdown, testthat, future.apply, tictoc

VignetteBuilder:

knitr

Depends:

R (≥ 3.5.0)

URL:

https://github.com/maglab-uconn/LexFindR

BugReports:

https://github.com/maglab-uconn/LexFindR/issues

NeedsCompilation:

Packaged:

2024-06-16 13:56:53 UTC; zbli

Author:

ZhaoBin Li [aut, cre], Anne Marie Crinnion [aut], James S. Magnuson [aut, cph]

Maintainer:

ZhaoBin Li <li_zhaobin@icloud.com>

Repository:

CRAN

Date/Publication:

2024-06-16 14:40:01 UTC

LexFindR: Find Related Items and Lexical Dimensions in a Lexicon

Description

Implements code to identify lexical competitors in a given list of words. We include many of the standard competitor types used in spoken word recognition research, such as functions to find cohorts, neighbors, and rhymes, amongst many others. The package includes documentation for using a variety of lexicon files, including those with form codes made up of multiple letters (i.e., phoneme codes) and also basic orthographies. Importantly, the code makes use of multiple CPU cores and vectorization when possible, making it extremely fast and able to handle large lexicons. Additionally, the package contains documentation for users to easily write new functions, allowing researchers to examine other relationships within a lexicon. Preprint: https://osf.io/preprints/psyarxiv/8dyru/. Open access: doi:10.3758/s13428-021-01667-6. Citation: Li, Z., Crinnion, A.M. & Magnuson, J.S. (2021). doi:10.3758/s13428-021-01667-6.

Author(s)

Maintainer: ZhaoBin Li li_zhaobin@icloud.com

Authors:

Anne Marie Crinnion anne.crinnion@uconn.edu
James S. Magnuson james.magnuson@uconn.edu [copyright holder]

Escape sep and target in regex

Description

Escape sep and target in regex

Usage

add_escape(target, sep, split = TRUE)

Arguments

target

Character string containing a target word

sep

Separator in target and lexicon

split

Whether to split the target based on sep and escape every element in the character list

Value

escaped sep and target

Check input and raise warnings as needed

Description

Check input and raise warnings as needed

Usage

check_input(target, lexicon, sep)

Arguments

target

Character string containing a target word

lexicon

Character vector containing the lexical database

sep

Separator in target and lexicon

Value

the indexes of the competitors in the lexical database

Get cohort competitors

Description

Cohorts overlap in onset phoneme(s).

Usage

get_cohorts(
  target,
  lexicon,
  sep = " ",
  form = FALSE,
  count = FALSE,
  overlap = 2
)

Arguments

target

Character string containing a target word

lexicon

Character vector containing the lexical database

sep

Separator in target and lexicon

form

Whether to return words in lexicon

count

Whether to return count of words

overlap

(get_cohorts only) Integer specifying the number of onset phonemes to overlap for matching with the target word

Value

the indexes of the competitors in the lexical database

Examples

get_cohorts("AA R K", c("AA R K", "AA R T", "B AA B"))

Get CohortsPrime

Description

Cohorts that are not neighbors

Usage

get_cohortsP(
  target,
  lexicon,
  neighbors = "das",
  sep = " ",
  form = FALSE,
  count = FALSE
)

Arguments

target

Character string containing a target word

lexicon

Character vector containing the lexical database

neighbors

(get_neighbors only) Character vector specifying the type of neighbor to return. Return the delete, add, substitute neighbors of the target when 'd', 'a', and/or 's' is in neighbors respectively

sep

Separator in target and lexicon

form

Whether to return words in lexicon

count

Whether to return count of words

Value

the indexes of the competitors in the lexical database

Examples

get_cohortsP("AA R K", c("AA R K", "AA R", "B AA B"), neighbors = "das")

Get embedding competitors

Description

Embedding competitors are items embedded in target

Usage

get_embeds_in_target(target, lexicon, sep = " ", form = FALSE, count = FALSE)

Arguments

target

Character string containing a target word

lexicon

Character vector containing the lexical database

sep

Separator in target and lexicon

form

Whether to return words in lexicon

count

Whether to return count of words

Value

the indexes of the competitors in the lexical database

Examples

get_embeds_in_target("AA R K", c("AA R K", "AA R", "B AA B"))

Get embeds-in-target PRIME

Description

Items embedded in the target which are not cohorts or neighbors

Usage

get_embeds_in_targetP(
  target,
  lexicon,
  neighbors = "das",
  sep = " ",
  form = FALSE,
  count = FALSE
)

Arguments

target

Character string containing a target word

lexicon

Character vector containing the lexical database

neighbors

(get_neighbors only) Character vector specifying the type of neighbor to return. Return the delete, add, substitute neighbors of the target when 'd', 'a', and/or 's' is in neighbors respectively

sep

Separator in target and lexicon

form

Whether to return words in lexicon

count

Whether to return count of words

Value

the indexes of the competitors in the lexical database

Examples

get_embeds_in_targetP("B AA R K IY", c("AA R K", "AA R", "AA R K IY", "B AA R"))

Get the log Frequency Weight (FW) of a competitor set

Description

Get the log Frequency Weight (FW) of a competitor set

Usage

get_fw(competitors_freq, pad = 0)

Arguments

competitors_freq

Numeric vector containing the frequencies of competitors (including itself)

pad

Value to add to frequencies before taking log; if your minimum frequency is 0, consider adding a value between 1 and 2; if your minimum frequency is between 0 and 1, consider adding 1

Value

Examples

get_fw(c(10, 50), pad = 1)

Get the log Frequency Weighted Competitor Probability (FWCP)

Description

Get the log Frequency Weighted Competitor Probability (FWCP)

Usage

get_fwcp(target_freq, competitors_freq, pad = 0, add_target = FALSE)

Arguments

target_freq

Frequency of target word

competitors_freq

Numeric vector containing the frequencies of competitors (including itself)

pad

Value to add to frequencies before taking log; if your minimum frequency is 0, consider adding a value between 1 and 2; if your minimum frequency is between 0 and 1, consider adding 1

add_target

Boolean; set to TRUE if you want the target frequency added to the denominator; only do this if the target is not already included in the competitor set (e.g., if the target is in the lexicon, it will be captured as its own neighbor, its own cohort, etc.)

Value

log FWCP

Examples

get_fwcp(100, c(10, 50), pad = 1)

Get competitors in spoken word recognition

Description

Get competitors in spoken word recognition

Usage

get_generic(target, lexicon, neighbors, sep, form, count, overlap, mismatch)

Arguments

target

Character string containing a target word

lexicon

Character vector containing the lexical database

neighbors

(get_neighbors only) Character vector specifying the type of neighbor to return. Return the delete, add, substitute neighbors of the target when 'd', 'a', and/or 's' is in neighbors respectively

sep

Separator in target and lexicon

form

Whether to return words in lexicon

count

Whether to return count of words

overlap

(get_cohorts only) Integer specifying the number of onset phonemes to overlap for matching with the target word

mismatch

(get_rhymes only) Integer specifying the number of onset phonemes to mismatch for matching with the target word

Value

the indexes of the competitors in the lexical database

Get homophones

Description

Homophones are items which sound similar to the target

Usage

get_homoforms(target, lexicon, sep = " ", form = FALSE, count = FALSE)

Arguments

target

Character string containing a target word

lexicon

Character vector containing the lexical database

sep

Separator in target and lexicon

form

Whether to return words in lexicon

count

Whether to return count of words

Value

the indexes of the competitors in the lexical database

Examples

get_homoforms("AA R K", c("AA R K", "AA R", "B AA B"))

Get phonological neighbors

Description

Phonological neighbors are items which can be converted to the target by one add, delete and substitute operation

Usage

get_neighbors(
  target,
  lexicon,
  neighbors = "das",
  sep = " ",
  form = FALSE,
  count = FALSE
)

Arguments

target

Character string containing a target word

lexicon

Character vector containing the lexical database

neighbors

(get_neighbors only) Character vector specifying the type of neighbor to return. Return the delete, add, substitute neighbors of the target when 'd', 'a', and/or 's' is in neighbors respectively

sep

Separator in target and lexicon

form

Whether to return words in lexicon

count

Whether to return count of words

Value

the indexes of the competitors in the lexical database

Examples

get_neighbors("AA R K", c("AA R K", "AA R", "B AA B"), "d")
get_neighbors("AA R K", c("AA R K", "AA R", "B AA B"), "da")
get_neighbors("AA R K", c("AA R K", "AA R", "B AA B"), "das")

Get NeighborssPrime

Description

Neighbors which are not cohorts or rhymes

Usage

get_neighborsP(
  target,
  lexicon,
  neighbors = "das",
  sep = " ",
  form = FALSE,
  count = FALSE
)

Arguments

target

Character string containing a target word

lexicon

Character vector containing the lexical database

neighbors

(get_neighbors only) Character vector specifying the type of neighbor to return. Return the delete, add, substitute neighbors of the target when 'd', 'a', and/or 's' is in neighbors respectively

sep

Separator in target and lexicon

form

Whether to return words in lexicon

count

Whether to return count of words

Value

the indexes of the competitors in the lexical database

Examples

get_neighborsP("AA R K", c("AA R K", "AA R", "B AA B"), neighbors = "das")

Get nohorts

Description

Items which are both cohorts and neighbors

Usage

get_nohorts(
  target,
  lexicon,
  neighbors = "das",
  sep = " ",
  form = FALSE,
  count = FALSE
)

Arguments

target

Character string containing a target word

lexicon

Character vector containing the lexical database

neighbors

(get_neighbors only) Character vector specifying the type of neighbor to return. Return the delete, add, substitute neighbors of the target when 'd', 'a', and/or 's' is in neighbors respectively

sep

Separator in target and lexicon

form

Whether to return words in lexicon

count

Whether to return count of words

Value

the indexes of the competitors in the lexical database

Examples

get_nohorts("AA R K", c("AA R K", "AA R", "B AA B"), neighbors = "das")

Return lexical items the appropriate type

Description

Return lexical items the appropriate type

Usage

get_return(idx, lexicon, form, count)

Arguments

lexicon

Character vector containing the lexical database

form

Whether to return words in lexicon

count

Whether to return count of words

Value

the appropriate type

Get rhyme competitors

Description

Rhymes overlap in all except onset phoneme(s)

Usage

get_rhymes(
  target,
  lexicon,
  sep = " ",
  form = FALSE,
  count = FALSE,
  mismatch = 1
)

Arguments

target

Character string containing a target word

lexicon

Character vector containing the lexical database

sep

Separator in target and lexicon

form

Whether to return words in lexicon

count

Whether to return count of words

mismatch

(get_rhymes only) Integer specifying the number of onset phonemes to mismatch for matching with the target word

Value

the indexes of the competitors in the lexical database

Examples

get_rhymes("AA R K", c("AA R K", "B AA R K", "B AA B"))

Get embedded competitors

Description

Embedded competitors are items which the target embedded in.

Usage

get_target_embeds_in(target, lexicon, sep = " ", form = FALSE, count = FALSE)

Arguments

target

Character string containing a target word

lexicon

Character vector containing the lexical database

sep

Separator in target and lexicon

form

Whether to return words in lexicon

count

Whether to return count of words

Value

the indexes of the competitors in the lexical database

Examples

get_target_embeds_in("AA R K", c("AA R K", "B AA R K", "B AA B"))

Get target-embeds-in PRIME

Description

Items the target embeds into which are not cohorts or neighbors

Usage

get_target_embeds_inP(
  target,
  lexicon,
  neighbors = "das",
  sep = " ",
  form = FALSE,
  count = FALSE
)

Arguments

target

Character string containing a target word

lexicon

Character vector containing the lexical database

neighbors

(get_neighbors only) Character vector specifying the type of neighbor to return. Return the delete, add, substitute neighbors of the target when 'd', 'a', and/or 's' is in neighbors respectively

sep

Separator in target and lexicon

form

Whether to return words in lexicon

count

Whether to return count of words

Value

the indexes of the competitors in the lexical database

Examples

get_target_embeds_inP("B AA R K", c("AA R K", "AA R", "B AA R K IY", "B AA R"))

Get phonological uniqueness point

Description

Phonological uniqueness point is the index at which the target becomes unique in the lexicon

Usage

get_uniqpt(target, lexicon, sep = " ")

Arguments

target

Character string containing a target word

lexicon

Character vector containing the lexical database

sep

Separator in target and lexicon

Value

Target is not unique: length + 1, else index where target becomes unique in lexicon

Examples

get_uniqpt("AA R K", c("AA R", "B AA B", "B AA R K"))

Lemmalex dictionary

Description

Lemmalex is primarily based on the SUBTLEXus subtitle corpus (based on American subtitles with 51 million items in total) reduced to lemma using a copyrighted database (Francis and Kučera, 1982). The pronunciation is given by CMU Pronouncing Dictionary

Usage

lemmalex

Format

An object of class tbl_df (inherits from tbl, data.frame) with 17750 rows and 3 columns.

Details

Reference: Brysbaert, M., & New, B. (2009). Moving beyond Kučera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English. Behavior research methods, 41(4), 977-990.

Kučera, H., & Francis, W. N. (1967). Computational analysis of present-day American English. Brown university press.

CMU Pronouncing Dictionary: http://www.speech.cs.cmu.edu/cgi-bin/cmudict

@format A table with 20,293 rows and 3 variables:

Item: SUBTLEXus dictionary reduced to lemmas
Frequency: Number of times the item appeared in the SUBTLEXus corpus
Pronunciation: ARPAbet transcription according to CMU

...

Source

https://www.ugent.be/pp/experimentele-psychologie/en/research/documents/subtlexus

slex ARPAbet

Description

TRACE slex lexicon translated by Nenadić and Tucker into ARPAbet pronunciation

Usage

slex

Format

An object of class data.table (inherits from data.frame) with 212 rows and 3 columns.

Details

TRACE slex lexicon with Frequencies: McClelland, J. L., & Elman, J. L. (1986). The TRACE model of speech perception. Cognitive psychology, 18(1), 1-86.

APRAbet transcription: Nenadić, F., & Tucker, B. V. (2020). Computational modelling of an auditory lexical decision experiment using jTRACE and TISK. Language, Cognition and Neuroscience, 1-29.

@format A table with 212 rows and 2 variables:

Item: TRACE slex transcription
Pronunciation: ARPAbet transcription

...

Source

https://era.library.ualberta.ca/items/61319cc6-436a-428c-b960-545bdc9bd5d3

LexFindR: Find Related Items and Lexical Dimensions in a Lexicon

Description

Author(s)

See Also

Escape sep and target in regex

Description

Usage

Arguments

Value

Check input and raise warnings as needed

Description

Usage

Arguments

Value

Get cohort competitors

Description

Usage

Arguments

Value

Examples

Get CohortsPrime

Description

Usage

Arguments

Value

Examples

Get embedding competitors

Description

Usage

Arguments

Value

Examples

Get embeds-in-target PRIME

Description

Usage

Arguments

Value

Examples

Get the log Frequency Weight (FW) of a competitor set

Description

Usage

Arguments

Value

Examples

Get the log Frequency Weighted Competitor Probability (FWCP)

Description

Usage

Arguments

Value

Examples

Get competitors in spoken word recognition

Description

Usage

Arguments

Value

Get homophones

Description

Usage

Arguments

Value

Examples

Get phonological neighbors

Description

Usage

Arguments

Value

Examples

Get NeighborssPrime

Description

Usage

Arguments

Value

Examples

Get nohorts

Description

Usage

Arguments

Value

Examples

Return lexical items the appropriate type