| Title: | Download Infectious Disease Data from 'SurvStat' (Robert Koch Institute) |
| Version: | 0.1.2 |
| Description: | Provides an interface to the 'SurvStat' web service from the Robert Koch Institute (https://tools.rki.de/SurvStat/SurvStatWebService.svc) allowing downloads of disease time series stratified by pathogen type and subtype, age, and geography from notifiable disease reports in Germany. |
| License: | MIT + file LICENSE |
| Encoding: | UTF-8 |
| RoxygenNote: | 7.3.3.9007 |
| Suggests: | knitr, rmarkdown, ggplot2, testthat |
| VignetteBuilder: | knitr |
| Imports: | dplyr, magrittr, xml2, stringr, tibble, httr, curl, whisker, fs, purrr, tidyr, cli, locfit, rlang, sf |
| Depends: | R (≥ 3.5) |
| LazyData: | true |
| Language: | en-GB |
| LazyDataCompression: | xz |
| URL: | https://bristol-vaccine-centre.github.io/rsurvstat/index.html, https://github.com/bristol-vaccine-centre/rsurvstat, https://bristol-vaccine-centre.github.io/rsurvstat/ |
| BugReports: | https://github.com/bristol-vaccine-centre/rsurvstat/issues |
| Config/Needs/build: | terminological/pkgtools, robchallen/roxygen2 |
| NeedsCompilation: | no |
| Packaged: | 2026-01-12 13:18:52 UTC; vp22681 |
| Author: | Robert Challen |
| Maintainer: | Robert Challen <rob.challen@bristol.ac.uk> |
| Repository: | CRAN |
| Date/Publication: | 2026-01-17 11:50:02 UTC |
Survstat option accessor
Description
Survstat options are values that may have children.
Usage
## S3 method for class 'survstat_option'
x$y
Arguments
x |
the options |
y |
the item |
Value
the value of the list item or an error if it does not exist
Pipe operator
Description
See magrittr::%>% for details.
Usage
lhs %>% rhs
Arguments
lhs |
A value or the magrittr placeholder. |
rhs |
A function call using the magrittr semantics. |
Value
The result of calling rhs(lhs).
Support for auto suggests on survstat_options
Description
Support for auto suggests on survstat_options
Usage
## S3 method for class 'survstat_option'
.DollarNames(x, pattern)
Arguments
x |
a |
pattern |
a matching pattern |
Value
the names of the children
Check for supported curl version
Description
Check for supported curl version
Usage
.check_curl()
Value
boolean (+/- warning)
Unit tests
.check_curl()
Convert a nested dataframe to a multilevel list
Description
Convert a nested dataframe to a multilevel list
Usage
.df_to_list_of_lists(df, ...)
Arguments
df |
a nested dataframe |
... |
Named arguments passed on to
|
Value
a list of lists
Unit tests
iris_list = .df_to_list_of_lists(datasets::iris) # TODO: iris_list has lost Petal.Length as it is interpreting Petal.Width as # nested item and it overwrites Petal.Length rather than merging with it. testthat::expect_equal( iris_list[[1]]$Species, iris$Species[[1]] ) mtcars_nest = datasets::mtcars dplyr::mutate(name = rownames(.)) tidyr::nest(details = -c(cyl,gear)) mtcars_list = mtcars_nest mtcars_unnest = mtcars_list testthat::expect_equal( mtcars_list[[1]]$details[[1]]$name, mtcars_nest$details[[1]]$name[[1]] )
Convert a multilevel list to a nested dataframe
Description
Convert a multilevel list to a nested dataframe
Usage
.list_of_lists_to_df(lst, ...)
Arguments
lst |
a multilevel list |
... |
Named arguments passed on to
|
Value
a dataframe with each sublist nested as a dataframe
Unit tests
iris_list = .df_to_list_of_lists(iris, .fix=FALSE)
iris2 = .list_of_lists_to_df(iris_list, .fix=FALSE)
testthat::expect_equal(datasets::iris, as.data.frame(iris2))
mtcars_nest = datasets::mtcars
dplyr::mutate(name = rownames(.))
tidyr::nest(details = -c(cyl,gear))
mtcars_list = mtcars_nest
mtcars_nest2 = mtcars_list
testthat::expect_equal(
mtcars_nest2$details[[2]],
mtcars_nest$details[[2]]
)
# test unequal length vector column is mapped to list of vectors
# and multiply named nests are treated as rows
testlist = list(
row = list(a=1:5, b="x"),
row = list(a=2:4, b="y"),
row = list(a=3, b="z")
)
testdf = testlist
testthat::expect_equal(testdf$b, c("x", "y", "z"))
testthat::expect_equal(testdf$a[[2]], 2:4)
Transform a nested dataframe to / from a row by row list
Description
Data frames are column lists, which may have nested dataframes. This function
transforms a data frame to row based list with named sub lists with one entry
per dataframe column (a row_list). It alternative converts a row_list back
to a nested data frame
Usage
.transpose(x, ..., .fix = ".")
Arguments
x |
a |
... |
not used |
.fix |
collapse or expand names in redundant multi-level |
Value
either a dataframe or a list of class row_list representing the
dataframe as a list of named lists.
Unit tests
# create a test nested data frame: mtcars_nest = datasets::mtcars dplyr::mutate(name = rownames(.)) tidyr::nest(by_carb = -c(cyl,gear,carb)) tidyr::nest(by_cyl_and_gear = -c(cyl,gear)) mtcars_list = mtcars_nest mtcars_nest2 = mtcars_list testthat::expect_equal(mtcars_nest, mtcars_nest2)
Tree printing method for list objects. This is an interactive function.
Description
Tree printing method for list objects. This is an interactive function.
Usage
.tree(x, max_levels = 6, ..., verbose = TRUE)
Arguments
x |
A list |
max_levels |
The maximum number of levels to show |
... |
Additional arguments:
|
verbose |
print output to the console (the default) |
Value
The hierarchy as a string, called for side effects
A Berlin outline sf map
Description
A Berlin outline sf map
Usage
data(BerlinMap)
Format
A sf dataframe containing the following columns:
Name (character) - the Name column
1 rows
The CountyKey71Map dataset
Description
This matches the CountyKey71 dimension in SurvStat. This is the 400
Stadtkreis and Landkreise administrative regions in Germany, plus 12
Berlin boroughs (Bezirke) which replace the Berlin Kriese (Id: 11000).
The boroughs have sequential Ids from [11001] to [11012]
Usage
data(CountyKey71Map)
Format
A sf dataframe containing the following columns:
-
Id- the fullSurvStatidentifier for this region (includes hierarchical information) -
ComponentId- the id of the most granular geographical unit (which can be used to link out to other data sets) -
HierarchyId- the id of the geographical unit type -
Name- the name of the region
Any grouping allowed.
411 rows
The FedStateKey71Map dataset.
Description
This matches the FedStateKey71 dimension in SurvStat. This is the 16
federal states in Germany.
Usage
data(FedStateKey71Map)
Format
A sf dataframe containing the following columns:
-
Id- the fullSurvStatidentifier for this region (includes hierarchical information) -
ComponentId- the id of the most granular geographical unit (which can be used to link out to other data sets) -
HierarchyId- the id of the geographical unit type -
Name- the name of the region
16 rows
The NutsKey71Map dataset
Description
This matches the NutsKey71 dimension in SurvStat. This is the 38 NUTS2
level administrative regions in Germany.
Usage
data(NutsKey71Map)
Format
A sf dataframe containing the following columns:
-
Id- the fullSurvStatidentifier for this region (includes hierarchical information) -
ComponentId- the id of the most granular geographical unit (which can be used to link out to other data sets) -
HierarchyId- the id of the geographical unit type -
Name- the name of the region
38 rows
SurvStat age group list
Description
single_year
children_coarse: from 0, 15, 20, 25, 30, 40, 50, 60, 70, 80 years
children_medium: from 0, 5, 10, 15, 20, 25, 30, 40, 50, 60, 70, 80 years
children_fine: from 0, 1, 2, 3, 4, 5, 10, 15, 20, 25, 30, 40, 50, 60, 70, 80 years
five_year: from 0, 1, 5, 10, 15, 20, … , 75, 80 years
zero_fifteen: from 0, 15+ years
zero_fifteen_sixty: from 0, 15, 60+ years
zero_one_4_20_40_60_80: from 0, 4, 20, 40, 60, 80+ years
Usage
age_groups
Format
An object of class list of length 8.
References
https://survstat.rki.de/Content/Query/Create.aspx
Delete all cached SurvStat requests
Description
This function is only intended to be used interactively. The cache can be
controlled with set_cache_settings()
Usage
cache_clear(confirm = utils::askYesNo("Are you sure?"))
Arguments
confirm |
can be set to TRUE to make function non interactive. |
Value
nothing. called for side effects
Examples
cache_clear( confirm = interactive() )
Commands supported by the SurvStat service
Description
Not all services support all 3 methods.
The 3 different resolution levels of the geospatial data
Usage
commands
return_measures
geography_resolution
Format
An object of class list of length 8.
An object of class list of length 3.
An object of class list of length 3.
References
https://survstat.rki.de/Content/Query/Create.aspx
https://survstat.rki.de/Content/Query/Create.aspx
https://survstat.rki.de/Content/Query/Create.aspx
Data sources in the SurvStat service
Description
Data sources in the SurvStat service
Usage
cubes
Format
An object of class list of length 3.
References
https://survstat.rki.de/Content/Query/Create.aspx
SurvStat disease list
Description
Supported diseases:
Acinetobacter (key: Acinetobacter-Infektion oder –Kolonisation)
Adenovirus (key: Adenovirus (andere Form, Meldepflichtig gemäß Landesmeldeverordnung))
Amoebiasis (key: Amoebiasis)
Anthrax (key: Milzbrand)
Arbovirus (key: Arbovirus-Erkrankung)
Astrovirus (key: Astrovirus-Infektion)
Bornavirus (key: Bornavirus)
Botulism (key: Botulismus)
Brucellosis (key: Brucellose)
CJD (key: CJK)
CJD, variant (key: vCJK)
COVID-19 (key: COVID-19)
Campylobacter (key: Campylobacter-Enteritis)
Candida auris (invasive) (key: Candida auris, invasive Infektion)
Chickenpox (key: Windpocken)
Chickenpox (state) (key: Windpocken (Meldepflicht gemäß Landesmeldeverordnung))
Chickungunya (key: Chikungunya-Fieber)
Chlamydia Trachomatis (key: Chlamydia-trachomatis-Infektion)
Cholera (key: Cholera)
Clostridium difficile / mild (key: Clostridium difficile, nicht schwerer Verlauf)
Clostridium difficile / moderate (key: Clostridium difficile, schwerer Verlauf)
Cryptosporidiosis (key: Kryptosporidiose)
Cytomegalovirus (key: Cytomegalie)
Dengue (key: Denguefieber)
Diptheria (key: Diphtherie)
E. Coli, enteritis (key: E.-coli-Enteritis)
E. Coli, enterohemorrhagic (key: EHEC-Erkrankung)
Ebola (key: Ebolafieber)
Echinococcosis (key: Echinokokkose)
Enterobacteria colonisation (key: Enterobacteriaceae-Infektion oder –Kolonisation)
Enterovirus (key: Enterovirus)
Gas gangrene (key: Gasbrand)
Gastroenteritis (other) (key: Weitere bedrohliche Krankheit (gastro))
Giardia (key: Giardiasis)
Gonorrhoea (key: Gonorrhoe)
Group B Streptococcus (key: Gruppe-B-Streptokokken)
HIV (key: HIV-Infektion)
Haemolytic-uraemic syndrome (key: HUS (Hämolytisch-urämisches Syndrom), enteropathisch)
Haemophilus influenza, invasive (key: Haemophilus influenzae, invasive Erkrankung)
Hand foot mouth disease (key: Hand-Fuß-Mund-Krankheit)
Hantavirus (key: Hantavirus-Erkrankung)
Head lice (key: Kopflausbefall)
Hepatitis (general) (key: Hepatitis (allgemein))
Hepatitis A (key: Hepatitis A)
Hepatitis B (key: Hepatitis B)
Hepatitis C (key: Hepatitis C)
Hepatitis D (key: Hepatitis D)
Hepatitis E (key: Hepatitis E)
Hepatitis non A-E (key: Hepatitis Non A-E)
Herpes Zoster (key: Herpes Zoster)
Influenza, seasonal (key: Influenza, saisonal)
Influenza, zoonotic (key: Influenza, zoonotisch)
Keratoconjunctivitis (IfSG) (key: Keratokunjunktivitis (Meldepflicht gemäß IfSG))
Keratoconjunctivitis (state) (key: Keratokunjunktivitis (Meldepflicht gemäß Landesmeldeverordnung))
Lassa fever (key: Lassafieber)
Legionalla (key: Legionellose)
Leprousy (key: Lepra)
Leptospirosis (key: Leptospirose)
Listeriosis (key: Listeriose)
Lyme Disease (key: Borreliose)
MERS (key: Middle East Respiratory Syndrome)
MRSA, invasive (key: MRSA, invasive Infektion)
Malaria (IfSG) (key: Malaria (§7(3) IfSG))
Malaria (state) (key: Malaria, Länderverordnung)
Marburg virus (key: Marburgfieber)
Measles (key: Masern)
Meningitis (other) (key: Meningitis, andere)
Meningococcal, invasive (key: Meningokokken, invasive Erkrankung)
Mpox (key: Affenpocken)
Mpox (key: Affenpocken)
Mumps (IfSG) (key: Mumps (Meldepflicht gemäß IfSG))
Mumps (state) (key: Mumps (Meldepflicht gemäß Landesmeldeverordnung))
Mycoplasma (key: Mycoplasma)
Norovirus (key: Norovirus-Gastroenteritis)
Orthinovirus (key: Ornithose)
Orthopox (key: Orthopocken)
Parainfluenze (key: Parainfluenza)
Paratyphus (key: Paratyphus)
Plague (key: Pest)
Pneumococcus (IfSG) (key: Pneumokokken (Meldepflicht gemäß IfSG))
Pneumococcus (state) (key: Pneumokokken (Meldepflicht gemäß Landesverordnung))
Poliomyelitis (key: Poliomyelitis)
Q-fever (key: Q-Fieber)
RSV (IfSG) (key: RSV (Meldepflicht gemäß IfSG))
RSV (state) (key: RSV (Meldepflicht gemäß Landesmeldeverordnung))
Rabies (confirmed) (key: Tollwut)
Rabies (suspected) (key: Tollwutexpositionsverdacht)
Relapsing fever (key: Läuserückfallfieber)
Ringworm (key: Ringelröteln)
Rotavirus gastroenteritis (key: Rotavirus-Gastroenteritis)
Rubella (key: Röteln, postnatal)
Rubella (state) (key: Röteln (Meldepflicht gemäß Landesmeldeverordnung))
Rubella, congenital (key: Röteln, konnatal)
SARS (key: SARS)
Salmonellosis (key: Salmonellose)
Scabies (key: Krätzmilbenbefall)
Scarlet fever (key: Scharlach)
Sepsis (other) (key: Weitere bedrohliche Krankheit)
Shigellosis (key: Shigellose)
Smallpox (key: Pocken)
Subacute Sclerosing Panencephalitis (key: Subakute Sklerosierende Panenzephalitis)
Syphilis (key: Syphilis)
Tetanus (key: Tetanus)
Tick bourne encephalitis (key: FSME (Frühsommer-Meningoenzephalitis))
Toxoplasmosis (key: Toxoplasmose)
Toxoplasmosis, congenital (key: Toxoplasmose, konnatal)
Trichinellosis (key: Trichinellose)
Tuberculosis (key: Tuberkulose)
Tulareamia (key: Tularämie)
Typhoid (key: Fleckfieber)
Typhoid, abdominal (key: Typhus abdominalis)
Typhus/Paratyphus (key: Typhus/Paratyphus)
Varicella, congenital (key: Fetales (kongenitales) Varizellensyndrom)
Vibria (key: Vibrionen)
Viral haemmorhagic fever (key: Virale hämorrhagische Fieber)
West Nile Virus (key: West-Nil-Virus)
Whooping cough (IfSG) (key: Keuchhusten (Meldepflicht gemäß IfSG))
Whooping cough (state) (key: Keuchhusten (Meldepflicht gemäß Landesmeldeverordnung))
Yellow fever (key: Gelbfieber)
Yersinia (key: Yersiniose)
Zika (key: Zikavirus-Erkrankung)
Usage
diseases
Format
An object of class list of length 121.
References
https://survstat.rki.de/Content/Query/Create.aspx
Infer and fit a population model from SurvStat output
Description
SurvStat can be queried for count or incidence. From the combination of
these metrics queried across the whole range of disease notifications for any
given year we can infer a stratified population size, that SurvStat is using
to calculate it's incidence. This is simply modelled with a local polynomial
over time to allow us to fill in weekly population denominators.
Usage
fit_population(count_df, .progress = TRUE)
infer_population(
age_group = NULL,
geography = NULL,
years = NULL,
.progress = TRUE
)
Arguments
count_df |
a dataframe from the output of |
.progress |
by default a progress bar is shown, which may be important
if many downloads are needed to fulfil the request. It can be disabled
by setting this to |
age_group |
(optional) the age group of interest as a |
geography |
(optional) one of |
years |
(optional) a vector of years to limit the response to. This may
be useful to limit the size of returned pages in the event the |
Value
the count_df dataframe with an additional population column
a dataframe with geography, age grouping, year and population columns
Functions
-
infer_population(): QuerySurvStatfor data to impute a population denominator
Examples
# snapshot:
get_snapshot(
disease = diseases$`COVID-19`,
geography = "state",
season=2024
) %>%
fit_population() %>%
dplyr::glimpse()
# timeseries
# A weekly population estimate is inferred from the yearly data:
get_timeseries(
diseases$`COVID-19`,
measure = "Count",
age_group = age_groups$children_coarse
) %>%
fit_population() %>%
dplyr::glimpse()
infer_population(years=2020:2025) %>% dplyr::glimpse()
Retrieve data from the SurvStat web service relating to a single time period.
Description
This function gets a snapshot of disease count or incidence data
from the Robert Koch Institute SurvStat web service, based on either whole
epidemiological season or an individual week within a season. Seasons are
whole years starting either at the beginning of the calendar year, at week 27
or at week 40.
Usage
get_snapshot(
disease = NULL,
measure = c("Count", "Incidence"),
...,
season,
season_week = NULL,
season_start = 1,
age_group = NULL,
age_range = c(0, Inf),
disease_subtype = FALSE,
geography = NULL,
.progress = TRUE
)
Arguments
disease |
the disease of interest as a |
measure |
one of |
... |
not used, must be empty. |
season |
the start year of the season in which the snapshot is taken |
season_week |
the start week within the season of the snapshot. If missing then the whole season is used |
season_start |
the week of the calendar year in which the season starts
this can be one of |
age_group |
(optional) the age group of interest as a |
age_range |
(optional) a length 2 vector with the minimum and maximum ages to consider |
disease_subtype |
if |
geography |
(optional) a geographical breakdown. This can be given as a
character where it must be one of |
.progress |
by default a progress bar is shown, which may be important
if many downloads are needed to fulfil the request. It can be disabled
by setting this to |
Details
The snapshot can be stratified by any combination of age, geography, disease,
disease subtype. Queries to SurvStat are cached and paged, but obviously
multidimensional extracts have the potential to need a lot of downloading.
Value
a data frame with at least year (the start of the epidemiological
season) and start_week (the calendar week in which the epidemiological
season starts), and one of count or incidence columns. Most likely it
will also have disease_name and disease_code columns, and some of
age_name, age_code, age_low, age_high, geo_code, geo_name,
disease_subtype_code, disease_subtype_name depending on options.
Examples
get_snapshot(
diseases$`COVID-19`,
measure = "Count",
season = 2024,
age_group = age_groups$children_coarse
)
get_snapshot(
diseases$`COVID-19`,
measure = "Count",
age_group = age_groups$children_coarse,
season = 2024,
geography = rsurvstat::FedStateKey71Map[1:10,]
)
Retrieve time series data from the SurvStat web service.
Description
This function gets a weekly timeseries of disease count or incidence data
from the Robert Koch Institute SurvStat web service. The timeseries can be
stratified by any combination of age, geography, disease, disease subtype.
Queries to SurvStat are cached and paged, but obviously multidimensional
extracts have the potential to need a lot of downloading.
Usage
get_timeseries(
disease = NULL,
measure = c("Count", "Incidence"),
...,
age_group = NULL,
age_range = c(0, Inf),
disease_subtype = FALSE,
years = NULL,
geography = NULL,
trim_zeros = c("leading", "both", "none"),
.progress = TRUE
)
Arguments
disease |
the disease of interest as a |
measure |
one of |
... |
not used, must be empty. |
age_group |
(optional) the age group of interest as a |
age_range |
(optional) a length 2 vector with the minimum and maximum ages to consider |
disease_subtype |
if |
years |
(optional) a vector of years to limit the response to. This may
be useful to limit the size of returned pages in the event the |
geography |
(optional) a geographical breakdown. This can be given as a
character where it must be one of |
trim_zeros |
get rid of zero counts. Either "both" (from start and end), "leading" (from start only - the default) or "none". |
.progress |
by default a progress bar is shown, which may be important
if many downloads are needed to fulfil the request. It can be disabled
by setting this to |
Value
a data frame with at least date (weekly), and one of count or
incidence columns. Most likely it will also have disease_name and
disease_code columns, and some of age_name, age_code, age_low,
age_high, geo_code, geo_name, disease_subtype_code,
disease_subtype_name depending on options. The dataframe will be grouped
to make sure each group contains a single timeseries.
Examples
# age stratified
get_timeseries(
diseases$`COVID-19`,
measure = "Count",
age_group = age_groups$children_coarse
) %>% dplyr::glimpse()
# geographic
get_timeseries(
diseases$`COVID-19`,
measure = "Count",
geography = "state"
) %>% dplyr::glimpse()
# disease stratified, subset of years:
get_timeseries(
measure = "Count",
years = 2024
) %>% dplyr::glimpse()
Languages supported by the SurvStat service
Description
Languages supported by the SurvStat service
Usage
languages
Format
An object of class list of length 2.
References
https://survstat.rki.de/Content/Query/Create.aspx
Path to user cache directory
Description
This functions uses R_USER_CACHE_DIR if set. Otherwise, they follow
platform conventions. Typical user cache directories are:
Mac OS X:
~/Library/Caches/<AppName>Linux:
~/.cache/<AppName>Win XP:
C:\\Documents and Settings\\<username>\\Local Settings\\Application Data\\<AppAuthor>\\<AppName>\\CacheVista:
C:\\Users\\<username>\\AppData\\Local\\<AppAuthor>\\<AppName>\\Cache
Usage
rappdirs_user_cache_dir(
appname = NULL,
appauthor = appname,
version = NULL,
opinion = TRUE,
expand = TRUE,
os = NULL
)
Arguments
appname |
is the name of application. If NULL, just the system directory is returned. |
appauthor |
(only required and used on Windows) is the name of the app author or distributing body for this application. Typically it is the owning company name. This falls back to app name. |
version |
is an optional version path element to append to the
path. You might want to use this if you want multiple versions
of your app to be able to run independently. If used, this
would typically be |
opinion |
(logical) Use |
expand |
If TRUE (the default) will expand the |
os |
Operating system whose conventions are used to construct the
requested directory. Possible values are "win", "mac", "unix". If |
Opinion
On Windows the only suggestion in the MSDN docs is that local settings go
in the CSIDL_LOCAL_APPDATA directory. This is identical to the
non-roaming app data dir. But apps typically put
cache data somewhere under this directory so rappdirs_user_cache_dir() appends
Cache to the CSIDL_LOCAL_APPDATA value, unless opinion = FALSE.
Unit tests
rappdirs_user_cache_dir("rappdirs")
See Also
tempdir() for a non-persistent temporary directory.
Set options for the rsurvstat cache
Description
By default successful requests to SurvStat are cached for 7 days to prevent
repeated querying of the service. This is stored in the usual R package cache
location by default (e.g. "~/.cache/rsurvstat" on mac / linux). Caching can
be switched off altogether.
Usage
set_cache_settings(..., active = NULL, dir = NULL, stale = NULL)
Arguments
... |
you can also submit the settings as a named list. |
active |
boolean (optional), set to FALSE to disable caching |
dir |
file path (optional), the location of the cache |
stale |
numeric (optional), the number of days before a cached item is considered out of date |
Value
the old cache settings as a list
Examples
old_settings = set_cache_settings(active = FALSE)
set_cache_settings(old_settings)