Title: | Harmonise and Integrate Heterogeneous Areal Data |
Description: | Many relevant applications in the environmental and socioeconomic sciences use areal data, such as biodiversity checklists, agricultural statistics, or socioeconomic surveys. For applications that surpass the spatial, temporal or thematic scope of any single data source, data must be integrated from several heterogeneous sources. Inconsistent concepts, definitions, or messy data tables make this a tedious and error-prone process. 'arealDB' tackles those problems and helps the user to integrate a harmonised databases of areal data. Read the paper at Ehrmann, Seppelt & Meyer (2020) <doi:10.1016/j.envsoft.2020.104799>. |
Version: | 0.9.4 |
URL: | https://github.com/luckinet/arealDB |
BugReports: | https://github.com/luckinet/arealDB/issues |
Depends: | R (≥ 3.5.0) |
Imports: | archive, beepr, checkmate, dplyr, fuzzyjoin, magrittr, ontologics, progress, purrr, readr, rlang, rmapshaper, stringr, sf, tabshiftr, tibble, tidyr, tidyselect, |
Suggests: | testthat, knitr, rmarkdown, bookdown, covr |
Language: | en-gb |
License: | GPL-3 |
Encoding: | UTF-8 |
LazyData: | true |
RoxygenNote: | 7.3.2 |
VignetteBuilder: | knitr |
NeedsCompilation: | no |
Packaged: | 2025-01-20 13:26:14 UTC; se87kuhe |
Author: | Steffen Ehrmann |
Maintainer: | Steffen Ehrmann <steffen.ehrmann@posteo.de> |
Repository: | CRAN |
Date/Publication: | 2025-01-20 13:40:05 UTC |
Edit matches manually in a csv-table
Description
Allows the user to match concepts with an already existing ontology, without actually writing into the ontology, but instead storing the resulting matching table as csv.
Usage
.editMatches(
new,
topLevel,
source = NULL,
ontology = NULL,
matchDir = NULL,
stringdist = TRUE,
verbose = TRUE,
beep = NULL
)
Arguments
new |
|
topLevel |
|
source |
|
ontology |
|
matchDir |
|
stringdist |
|
verbose |
|
beep |
|
Details
In order to match new concepts into an already existing ontology, it
may become necessary to carry out manual matches of the new concepts with
already harmonised concepts, for example, when the new concepts are
described with terms that are not yet in the ontology. This function puts
together a table, in which the user would edit matches by hand. Whith the
argument verbose = TRUE
, detailed information about the edit process
are shown to the user. After defining matches, and even if not all
necessary matches are finished, the function stores a specific "matching
table" with the name match_SOURCE.csv in the respective directory
(matchDir
), from where work can be picked up and continued at
another time.
Fuzzy matching is carried out and matches with 0, 1 or 2 differing charcters are presented in a respective column.
Value
A table that contains all new matches, or if none of the new concepts weren't already in the ontology, a table of the already sucessful matches.
Get the column types of a tibble
Description
(internal function not for user interaction)
Usage
.getColTypes(input = NULL)
Arguments
input |
data.frame |
Match target terms with an ontology
Description
This function takes a table to replace the values of various columns with harmonised values listed in the project specific gazetteer.
Usage
.matchOntology(
table = NULL,
columns = NULL,
dataseries = NULL,
ontology = NULL,
beep = NULL,
colsAsClass = TRUE,
groupMatches = FALSE,
stringdist = TRUE,
strictMatch = FALSE,
verbose = FALSE
)
Arguments
table |
|
columns |
|
dataseries |
|
ontology |
|
beep |
|
colsAsClass |
|
groupMatches |
|
stringdist |
|
strictMatch |
|
verbose |
|
Value
Returns a table that resembles the input table where the target columns were translated according to the provided ontology.
Update an ontology
Description
This function takes a table (spatial) and updates all territorial concepts in the provided gazetteer.
Usage
.updateOntology(
table = NULL,
threshold = NULL,
dataseries = NULL,
ontology = NULL
)
Arguments
table |
|
threshold |
|
dataseries |
|
ontology |
onto |
Value
called for its side-effect of updating a gazetteer
Archive the data from an areal database
Description
Archive the data from an areal database
Usage
adb_archive(pattern = NULL, variables = NULL, compress = FALSE, outPath = NULL)
Arguments
pattern |
|
variables |
|
compress |
|
outPath |
|
Details
This function prepares and packages the data into an archiveable form. This contains geopacakge files for geometries and csv files for all tables, such as inventory, matching and thematic data tables.
Value
no return value, called for the side-effect of creating a database archive.
Backup the current state of an areal database
Description
Backup the current state of an areal database
Usage
adb_backup()
Details
This function creates a tag that is composed of the version and the date, appends it to all stage3 files (tables and geometries), the inventory and the ontology/gazetteer files and stores them in the backup folder of the current areal database.
Value
No return value, called for the side effect of saving the inventory, the stage3 files and modified ontology/gazetteer into the backup directory.
Diagnose databse contents
Description
work in progress, not yet useable
Usage
adb_diagnose(
territory = NULL,
concept = NULL,
variable = NULL,
level = NULL,
year = NULL
)
Arguments
territory |
description |
concept |
description |
variable |
description |
level |
description |
year |
description |
Build an example areal database
Description
This function helps setting up an example database up until a certain step.
Usage
adb_example(path = NULL, until = NULL, verbose = FALSE)
Arguments
path |
|
until |
|
verbose |
|
Details
Setting up a database with an R-based tool can appear to be cumbersome and too complex and thus intimidating. By creating an example database, this functions allows interested users to learn step by step how to build a database of areal data. Moreover, all functions in this package contain verbose information and ask for information that would be missing or lead to an inconsistent database, before a failure renders hours of work useless.
Value
No return value, called for the side effect of creating an example
database at the specified path
.
Examples
if(dev.interactive()){
# to build the full example database
adb_example(path = paste0(tempdir(), "/newDB"))
# to make the example database until a certain step
adb_example(path = paste0(tempdir(), "/newDB"), until = "regDataseries")
}
Initiate an areal database
Description
Initiate a geospatial database or register a database that exists at the root path.
Usage
adb_init(
root,
version,
author,
licence,
ontology,
gazetteer = NULL,
top = NULL,
staged = TRUE
)
Arguments
root |
|
version |
|
author |
|
licence |
|
ontology |
|
gazetteer |
|
top |
|
staged |
|
Details
This is the first function that is run in a project, as it initiates the areal database by creating the default sub-directories and initial inventory tables. When a database has already been set up, this function is used to register that path in the options of the current R session.
Value
No return value, called for the side effect of creating the directory structure of the new areal database and tables that contain the database metadata.
Examples
adb_init(root = paste0(tempdir(), "/newDB"),
version = "1.0.0", licence = "CC-BY-0.4",
author = list(cre = "Gordon Freeman", aut = "Alyx Vance", ctb = "The G-Man"),
gazetteer = paste0(tempdir(), "/newDB/territories.rds"),
top = "al1",
ontology = list(var = paste0(tempdir(), "/newDB/ontology.rds")))
getOption("adb_path"); getOption("gazetteer_path")
Load the inventory of the currently active areal database
Description
Load the inventory of the currently active areal database
Usage
adb_inventory(type = NULL)
Arguments
type |
|
Value
returns the table selected in type
Load the metadata from an areal database
Description
Load the metadata from an areal database
Usage
adb_metadata()
Load the currently active ontology
Description
Load the currently active ontology
Usage
adb_ontology(..., type = "ontology")
Arguments
... |
combination of column name in the ontology and value to filter
that column by to build a tree of the concepts nested into it; see
|
type |
|
Value
returns a tidy table of an ontology or gazetteer that is used in an areal database.
Extract database contents
Description
Extract database contents
Usage
adb_querry(
territory = NULL,
concept = NULL,
variable = NULL,
level = NULL,
year = NULL
)
Arguments
territory |
'character(.) |
concept |
description |
variable |
description |
level |
description |
year |
description |
Value
returns ...
Examples
if(dev.interactive()){
adb_example(path = paste0(tempdir(), "/newDB"))
adb_querry(territory = list(al1 = "a_nation"),
concept = list(commodity = "barley"),
variable = "harvested")
}
Reset an areal database to its unfilled state
Description
Reset an areal database to its unfilled state
Usage
adb_reset(what = "all")
Arguments
what |
|
Value
no return value, called for its side effect of reorganising an areal database into a state where no reg* or norm* functions have been run
Restore the database from a backup
Description
Restore the database from a backup
Usage
adb_restore(version = NULL, date = NULL)
Arguments
version |
'character(1) |
date |
|
Details
This function searches for files that have the version and date tag,
as it was defined in a previous run of adb_backup
, to restore
them to their original folders. This function overwrites by default, so use
with care.
Value
No return value, called for the side effect of restoring files that were previously stored in a backup.
Load the schemas of the currently active areal database
Description
Load the schemas of the currently active areal database
Usage
adb_schemas(pattern = NULL)
Arguments
pattern |
|
Value
returns a list of schema descriptions
Load the translation tables of the currently active areal database
Description
Load the translation tables of the currently active areal database
Usage
adb_translations(type = NULL, dataseries = NULL)
Arguments
type |
|
dataseries |
|
Value
returns the selected translation table
Normalise geometries
Description
Harmonise and integrate geometries into a standardised format
Usage
normGeometry(
input = NULL,
pattern = NULL,
query = NULL,
thresh = 10,
beep = NULL,
simplify = FALSE,
stringdist = TRUE,
strictMatch = FALSE,
verbose = FALSE
)
Arguments
input |
|
pattern |
|
query |
|
thresh |
|
beep |
|
simplify |
|
stringdist |
|
strictMatch |
|
verbose |
|
Details
To normalise geometries, this function proceeds as follows:
Read in
input
and extract initial metadata from the file name.In case filters are set, the new geometry is filtered by those.
The territorial names are matched with the gazetteer to harmonise new territorial names (at this step, the function might ask the user to edit the file 'matching.csv' to align new names with already harmonised names).
Loop through every nation potentially included in the file that shall be processed and carry out the following steps:
In case the geometries are provided as a list of simple feature POLYGONS, they are dissolved into a single MULTIPOLYGON per main polygon.
In case the nation to which a geometry belongs has not yet been created at stage three, the following steps are carried out:
Store the current geometry as basis of the respective level (the user needs to make sure that all following levels of the same dataseries are perfectly nested into those parent territories, for example by using the GADM dataset)
In case the nation to which the geometry belongs has already been created, the following steps are carried out:
Check whether the new geometries have the same coordinate reference system as the already existing database and re-project the new geometries if this is not the case.
Check whether all new geometries are already exactly matched spatially and stop if that is the case.
-
Check whether the new geometries are all within the already defined parents, and save those that are not as a new geometry.
Calculate spatial overlap and distinguish the geometries into those that overlap with more and those with less than
thresh
.For all units that dName match, copy gazID from the geometries they overlap.
For all units that dName not match, rebuild metadata and a new gazID.
store the processed geometry at stage three.
Move the geometry to the folder '/processed', if it is fully processed.
Value
This function harmonises and integrates so far unprocessed geometries at stage two into stage three of the geospatial database. It produces for each main polygon (e.g. nation) in the registered geometries a spatial file of the specified file-type.
See Also
Other normalise functions:
normTable()
Examples
if(dev.interactive()){
library(sf)
# build the example database
adb_example(until = "regGeometry", path = tempdir())
# normalise all geometries ...
normGeometry(pattern = "estonia")
# ... and check the result
st_layers(paste0(tempdir(), "/geometries/stage3/Estonia.gpkg"))
output <- st_read(paste0(tempdir(), "/geometries/stage3/Estonia.gpkg"))
}
Normalise data tables
Description
Harmonise and integrate data tables into standardised format
Usage
normTable(
input = NULL,
pattern = NULL,
query = NULL,
ontoMatch = NULL,
beep = NULL,
verbose = FALSE
)
Arguments
input |
|
pattern |
|
query |
|
ontoMatch |
|
beep |
|
verbose |
|
Details
To normalise data tables, this function proceeds as follows:
Read in
input
and extract initial metadata from the file name.Employ the function
tabshiftr::reorganise()
to reshapeinput
according to the respective schema description.The territorial names are matched with the gazetteer to harmonise new territorial names (at this step, the function might ask the user to edit the file 'matching.csv' to align new names with already harmonised names).
Harmonise territorial unit names.
store the processed data table at stage three.
Value
This function harmonises and integrates so far unprocessed data tables at stage two into stage three of the areal database. It produces for each main polygon (e.g. nation) in the registered data tables a file that includes all thematic areal data.
See Also
Other normalise functions:
normGeometry()
Examples
if(dev.interactive()){
# build the example database
adb_example(until = "normGeometry", path = tempdir())
# normalise all available data tables ...
normTable()
# ... and check the result
output <- readRDS(paste0(tempdir(), "/tables/stage3/Estonia.rds"))
}
Register a new dataseries
Description
This function registers a new dataseries of both, geometries or areal data into the geospatial database. This contains the name and relevant meta-data of a dataseries to enable provenance tracking and reproducability.
Usage
regDataseries(
name = NULL,
description = NULL,
homepage = NULL,
version = NULL,
licence_link = NULL,
reference = NULL,
notes = NULL,
overwrite = FALSE
)
Arguments
name |
|
description |
|
homepage |
|
version |
|
licence_link |
|
reference |
|
notes |
|
overwrite |
|
Value
Returns a tibble of the new entry that is appended to 'inv_dataseries.csv'.
See Also
Other register functions:
regGeometry()
,
regTable()
Examples
if(dev.interactive()){
# start the example database
adb_exampleDB(until = "match_gazetteer", path = tempdir())
regDataseries(name = "gadm",
description = "Database of Global Administrative Areas",
version = "3.6",
homepage = "https://gadm.org/index.html",
licence_link = "https://gadm.org/license.html")
}
Register a new geometry entry
Description
This function registers a new geometry of territorial units into the geospatial database.
Usage
regGeometry(
...,
subset = NULL,
gSeries = NULL,
label = NULL,
ancillary = NULL,
layer = NULL,
archive = NULL,
archiveLink = NULL,
downloadDate = NULL,
updateFrequency = NULL,
notes = NULL,
overwrite = FALSE
)
Arguments
... |
|
subset |
|
gSeries |
|
label |
|
ancillary |
|
layer |
|
archive |
|
archiveLink |
|
downloadDate |
|
updateFrequency |
|
notes |
|
overwrite |
|
Details
When processing geometries to which areal data shall be linked, carry out the following steps:
Determine the main territory (such as a nation, or any other polygon), a
subset
(if applicable), the dataseries of the geometry and the ontologylabel
, and provide them as arguments to this function.Run the function.
Export the shapefile with the following properties:
-
Format: GeoPackage
File name: What is provided as message by this function
CRS: EPSG:4326 - WGS 84
make sure that 'all fields are exported'
-
Confirm that you have saved the file.
Value
Returns a tibble of the entry that is appended to 'inv_geometries.csv'.
See Also
Other register functions:
regDataseries()
,
regTable()
Examples
if(dev.interactive()){
# build the example database
adb_exampleDB(until = "regDataseries", path = tempdir())
# The GADM dataset comes as *.7z archive
regGeometry(gSeries = "gadm",
label = list(al1 = "NAME_0"),
layer = "example_geom1",
archive = "example_geom.7z|example_geom1.gpkg",
archiveLink = "https://gadm.org/",
nextUpdate = "2019-10-01",
updateFrequency = "quarterly")
# The second administrative level in GADM contains names in the columns
# NAME_0 and NAME_1
regGeometry(gSeries = "gadm",
label = list(al1 = "NAME_0", al2 = "NAME_1"),
ancillary = list(name_lcl = "VARNAME_1", code = "GID_1", type = "TYPE_1"),
layer = "example_geom2",
archive = "example_geom.7z|example_geom2.gpkg",
archiveLink = "https://gadm.org/",
nextUpdate = "2019-10-01",
updateFrequency = "quarterly")
}
Register a new areal data table
Description
This function registers a new areal data table into the geospatial database.
Usage
regTable(
...,
subset = NULL,
dSeries = NULL,
gSeries = NULL,
label = NULL,
begin = NULL,
end = NULL,
schema = NULL,
archive = NULL,
archiveLink = NULL,
downloadDate = NULL,
updateFrequency = NULL,
metadataLink = NULL,
metadataPath = NULL,
notes = NULL,
diagnose = FALSE,
overwrite = FALSE
)
Arguments
... |
|
subset |
|
dSeries |
|
gSeries |
|
label |
|
begin |
|
end |
|
schema |
|
archive |
|
archiveLink |
|
downloadDate |
|
updateFrequency |
|
metadataLink |
|
metadataPath |
|
notes |
|
diagnose |
|
overwrite |
|
Details
When processing areal data tables, carry out the following steps:
Determine the main territory (such as a nation, or any other polygon), a
subset
(if applicable), the ontologylabel
and the dataseries of the areal data and of the geometry, and provide them as arguments to this function.Provide a
begin
andend
date for the areal data.Run the function.
-
(Re)Save the table with the following properties:
Format: csv
Encoding: UTF-8
File name: What is provided as message by this function
make sure that the file is not modified or reshaped. This will happen during data normalisation via the schema description, which expects the original table.
Confirm that you have saved the file.
Every areal data dataseries (dSeries
) may come as a slight
permutation of a particular table arrangement. The function
normTable
expects internally a schema description (a list
that describes the position of the data components) for each data table,
which is saved as paste0("meta_", dSeries, TAB_NUMBER)
. See package
tabshiftr
.
Value
Returns a tibble of the entry that is appended to 'inv_tables.csv' in
case update = TRUE
.
See Also
Other register functions:
regDataseries()
,
regGeometry()
Examples
if(dev.interactive()){
# build the example database
adb_exampleDB(until = "regGeometry", path = tempdir())
# the schema description for this table
library(tabshiftr)
schema_madeUp <-
setIDVar(name = "al1", columns = 1) %>%
setIDVar(name = "year", columns = 2) %>%
setIDVar(name = "commodities", columns = 3) %>%
setObsVar(name = "harvested",
factor = 1, columns = 4) %>%
setObsVar(name = "production",
factor = 1, columns = 5)
regTable(nation = "Estonia",
subset = "barleyMaize",
label = "al1",
dSeries = "madeUp",
gSeries = "gadm",
begin = 1990,
end = 2017,
schema = schema_madeUp,
archive = "example_table.7z|example_table1.csv",
archiveLink = "...",
nextUpdate = "2024-10-01",
updateFrequency = "quarterly",
metadataLink = "...",
metadataPath = "my/local/path")
}
Example gazetteer
Description
An ontology of territory names (gazetteer)
Usage
territories
Format
object of class onto
for the example territories used in
adb_example
.