Type: | Package |
Title: | R Interface for Apache Sedona |
Version: | 1.7.2 |
Maintainer: | Apache Sedona <private@sedona.apache.org> |
Description: | R interface for 'Apache Sedona' based on 'sparklyr' (https://sedona.apache.org). |
License: | Apache License 2.0 |
URL: | https://github.com/apache/sedona/, https://sedona.apache.org/ |
BugReports: | https://github.com/apache/sedona/issues |
Depends: | R (≥ 3.2) |
Imports: | rlang, sparklyr (≥ 1.3), dbplyr (≥ 1.1.0), cli, lifecycle |
Suggests: | dplyr (≥ 0.7.2), knitr, rmarkdown |
Encoding: | UTF-8 |
RoxygenNote: | 7.3.2 |
SystemRequirements: | 'Apache Spark' 3.x |
NeedsCompilation: | no |
Packaged: | 2025-06-08 00:53:06 UTC; jiayu |
Author: | Apache Sedona [aut, cre],
Jia Yu [ctb, cph],
Yitao Li |
Repository: | CRAN |
Date/Publication: | 2025-06-08 11:20:02 UTC |
apache.sedona: R Interface for Apache Sedona
Description
R interface for 'Apache Sedona' based on 'sparklyr' (https://sedona.apache.org).
Author(s)
Maintainer: Apache Sedona private@sedona.apache.org
Authors:
Yitao Li yitao@rstudio.com (ORCID) [copyright holder]
Other contributors:
Jia Yu jiayu@apache.org [contributor, copyright holder]
The Apache Software Foundation [copyright holder]
RStudio [copyright holder]
See Also
Useful links:
Report bugs at https://github.com/apache/sedona/issues
Find the approximate total number of records within a Spatial RDD.
Description
Given a Sedona spatial RDD, find the (possibly approximated) number of total records within it.
Usage
approx_count(x)
Arguments
x |
A Sedona spatial RDD. |
Value
Approximate number of records within the SpatialRDD.
See Also
Other Spatial RDD aggregation routine:
minimum_bounding_box()
Examples
library(sparklyr)
library(apache.sedona)
sc <- spark_connect(master = "spark://HOST:PORT")
if (!inherits(sc, "test_connection")) {
input_location <- "/dev/null" # replace it with the path to your input file
rdd <- sedona_read_shapefile_to_typed_rdd(
sc,
location = input_location, type = "polygon"
)
approx_cnt <- approx_count(rdd)
}
Perform a CRS transformation.
Description
Transform data within a spatial RDD from one coordinate reference system to another. This uses the lon/lat order since v1.5.0. Before, it used lat/lon
Usage
crs_transform(x, src_epsg_crs_code, dst_epsg_crs_code, strict = FALSE)
Arguments
x |
The spatial RDD to be processed. |
src_epsg_crs_code |
Coordinate reference system to transform from (e.g., "epsg:4326", "epsg:3857", etc). |
dst_epsg_crs_code |
Coordinate reference system to transform to. (e.g., "epsg:4326", "epsg:3857", etc). |
strict |
If FALSE (default), then ignore the "Bursa-Wolf Parameters Required" error. |
Value
The transformed SpatialRDD.
Examples
library(sparklyr)
library(apache.sedona)
sc <- spark_connect(master = "spark://HOST:PORT")
if (!inherits(sc, "test_connection")) {
input_location <- "/dev/null" # replace it with the path to your input file
rdd <- sedona_read_geojson_to_typed_rdd(
sc,
location = input_location, type = "polygon"
)
crs_transform(
rdd,
src_epsg_crs_code = "epsg:4326", dst_epsg_crs_code = "epsg:3857"
)
}
Find the minimal bounding box of a geometry.
Description
Given a Sedona spatial RDD, find the axis-aligned minimal bounding box of the geometry represented by the RDD.
Usage
minimum_bounding_box(x)
Arguments
x |
A Sedona spatial RDD. |
Value
A minimum bounding box object.
See Also
Other Spatial RDD aggregation routine:
approx_count()
Examples
library(sparklyr)
library(apache.sedona)
sc <- spark_connect(master = "spark://HOST:PORT")
if (!inherits(sc, "test_connection")) {
input_location <- "/dev/null" # replace it with the path to your input file
rdd <- sedona_read_shapefile_to_typed_rdd(
sc,
location = input_location, type = "polygon"
)
boundary <- minimum_bounding_box(rdd)
}
Construct a bounding box object.
Description
Construct a axis-aligned rectangular bounding box object.
Usage
new_bounding_box(sc, min_x = -Inf, max_x = Inf, min_y = -Inf, max_y = Inf)
Arguments
sc |
The Spark connection. |
min_x |
Minimum x-value of the bounding box, can be +/- Inf. |
max_x |
Maximum x-value of the bounding box, can be +/- Inf. |
min_y |
Minimum y-value of the bounding box, can be +/- Inf. |
max_y |
Maximum y-value of the bounding box, can be +/- Inf. |
Value
A bounding box object.
Examples
library(sparklyr)
library(apache.sedona)
sc <- spark_connect(master = "spark://HOST:PORT")
bb <- new_bounding_box(sc, -1, 1, -1, 1)
Import data from a spatial RDD into a Spark Dataframe.
Description
Import data from a spatial RDD (possibly with non-spatial attributes) into a Spark Dataframe.
-
sdf_register
: method for sparklyr's sdf_register to handle Spatial RDD -
as.spark.dataframe
: lower level function with more fine-grained control on non-spatial columns
Usage
## S3 method for class 'spatial_rdd'
sdf_register(x, name = NULL)
as.spark.dataframe(x, non_spatial_cols = NULL, name = NULL)
Arguments
x |
A spatial RDD. |
name |
Name to assign to the resulting Spark temporary view. If unspecified, then a random name will be assigned. |
non_spatial_cols |
Column names for non-spatial attributes in the resulting Spark Dataframe. By default (NULL) it will import all field names if that property exists, in particular for shapefiles. |
Value
A Spark Dataframe containing the imported spatial data.
Examples
library(sparklyr)
library(apache.sedona)
sc <- spark_connect(master = "spark://HOST:PORT")
if (!inherits(sc, "test_connection")) {
input_location <- "/dev/null" # replace it with the path to your input file
rdd <- sedona_read_geojson_to_typed_rdd(
sc,
location = input_location,
type = "polygon"
)
sdf <- sdf_register(rdd)
input_location <- "/dev/null" # replace it with the path to your input file
rdd <- sedona_read_dsv_to_typed_rdd(
sc,
location = input_location,
delimiter = ",",
type = "point",
first_spatial_col_index = 1L,
repartition = 5
)
sdf <- as.spark.dataframe(rdd, non_spatial_cols = c("attr1", "attr2"))
}
Apply a spatial partitioner to a Sedona spatial RDD.
Description
Given a Sedona spatial RDD, partition its content using a spatial partitioner.
Usage
sedona_apply_spatial_partitioner(
rdd,
partitioner = c("quadtree", "kdbtree"),
max_levels = NULL
)
Arguments
rdd |
The spatial RDD to be partitioned. |
partitioner |
The name of a grid type to use (currently "quadtree" and
"kdbtree" are supported) or an
|
max_levels |
Maximum number of levels in the partitioning tree data
structure. If NULL (default), then use the current number of partitions
within |
Value
A spatially partitioned SpatialRDD.
Examples
library(sparklyr)
library(apache.sedona)
sc <- spark_connect(master = "spark://HOST:PORT")
if (!inherits(sc, "test_connection")) {
input_location <- "/dev/null" # replace it with the path to your input file
rdd <- sedona_read_dsv_to_typed_rdd(
sc,
location = input_location,
delimiter = ",",
type = "point",
first_spatial_col_index = 1L
)
sedona_apply_spatial_partitioner(rdd, partitioner = "kdbtree")
}
Build an index on a Sedona spatial RDD.
Description
Given a Sedona spatial RDD, build the type of index specified on each of its partition(s).
Usage
sedona_build_index(
rdd,
type = c("quadtree", "rtree"),
index_spatial_partitions = TRUE
)
Arguments
rdd |
The spatial RDD to be indexed. |
type |
The type of index to build. Currently "quadtree" and "rtree" are supported. |
index_spatial_partitions |
If the RDD is already partitioned using a spatial partitioner, then index each spatial partition within the RDD instead of partitions within the raw RDD associated with the underlying spatial data source. Default: TRUE. Notice this option is irrelevant if the input RDD has not been partitioned using with a spatial partitioner yet. |
Value
A spatial index object.
Examples
library(sparklyr)
library(apache.sedona)
sc <- spark_connect(master = "spark://HOST:PORT")
if (!inherits(sc, "test_connection")) {
input_location <- "/dev/null" # replace it with the path to your input file
rdd <- sedona_read_shapefile_to_typed_rdd(
sc,
location = input_location,
type = "polygon"
)
sedona_build_index(rdd, type = "rtree")
}
Query the k nearest spatial objects.
Description
Given a spatial RDD, a query object x
, and an integer k, find the k
nearest spatial objects within the RDD from x
(distance between
x
and another geometrical object will be measured by the minimum
possible length of any line segment connecting those 2 objects).
Usage
sedona_knn_query(
rdd,
x,
k,
index_type = c("quadtree", "rtree"),
result_type = c("rdd", "sdf", "raw")
)
Arguments
rdd |
A Sedona spatial RDD. |
x |
The query object. |
k |
Number of nearest spatail objects to return. |
index_type |
Index to use to facilitate the KNN query. If NULL, then
do not build any additional spatial index on top of |
result_type |
Type of result to return.
If "rdd" (default), then the k nearest objects will be returned in a Sedona
spatial RDD.
If "sdf", then a Spark dataframe containing the k nearest objects will be
returned.
If "raw", then a list of k nearest objects will be returned. Each element
within this list will be a JVM object of type
|
Value
The KNN query result.
See Also
Other Sedona spatial query:
sedona_range_query()
Examples
library(sparklyr)
library(apache.sedona)
sc <- spark_connect(master = "spark://HOST:PORT")
if (!inherits(sc, "test_connection")) {
knn_query_pt_x <- -84.01
knn_query_pt_y <- 34.01
knn_query_pt_tbl <- sdf_sql(
sc,
sprintf(
"SELECT ST_GeomFromText(\"POINT(%f %f)\") AS `pt`",
knn_query_pt_x,
knn_query_pt_y
)
) %>%
collect()
knn_query_pt <- knn_query_pt_tbl$pt[[1]]
input_location <- "/dev/null" # replace it with the path to your input file
rdd <- sedona_read_geojson_to_typed_rdd(
sc,
location = input_location,
type = "polygon"
)
knn_result_sdf <- sedona_knn_query(
rdd,
x = knn_query_pt, k = 3, index_type = "rtree", result_type = "sdf"
)
}
Execute a range query.
Description
Given a spatial RDD and a query object x
, find all spatial objects
within the RDD that are covered by x
or intersect x
.
Usage
sedona_range_query(
rdd,
x,
query_type = c("cover", "intersect"),
index_type = c("quadtree", "rtree"),
result_type = c("rdd", "sdf", "raw")
)
Arguments
rdd |
A Sedona spatial RDD. |
x |
The query object. |
query_type |
Type of spatial relationship involved in the query. Currently "cover" and "intersect" are supported. |
index_type |
Index to use to facilitate the KNN query. If NULL, then
do not build any additional spatial index on top of |
result_type |
Type of result to return.
If "rdd" (default), then the k nearest objects will be returned in a Sedona
spatial RDD.
If "sdf", then a Spark dataframe containing the k nearest objects will be
returned.
If "raw", then a list of k nearest objects will be returned. Each element
within this list will be a JVM object of type
|
Value
The range query result.
See Also
Other Sedona spatial query:
sedona_knn_query()
Examples
library(sparklyr)
library(apache.sedona)
sc <- spark_connect(master = "spark://HOST:PORT")
if (!inherits(sc, "test_connection")) {
range_query_min_x <- -87
range_query_max_x <- -50
range_query_min_y <- 34
range_query_max_y <- 54
geom_factory <- invoke_new(
sc,
"org.locationtech.jts.geom.GeometryFactory"
)
range_query_polygon <- invoke_new(
sc,
"org.locationtech.jts.geom.Envelope",
range_query_min_x,
range_query_max_x,
range_query_min_y,
range_query_max_y
) %>%
invoke(geom_factory, "toGeometry", .)
input_location <- "/dev/null" # replace it with the path to your input file
rdd <- sedona_read_geojson_to_typed_rdd(
sc,
location = input_location,
type = "polygon"
)
range_query_result_sdf <- sedona_range_query(
rdd,
x = range_query_polygon,
query_type = "intersect",
index_type = "rtree",
result_type = "sdf"
)
}
Create a typed SpatialRDD from a delimiter-separated values data source.
Description
Create a typed SpatialRDD (namely, a PointRDD, a PolygonRDD, or a LineStringRDD) from a data source containing delimiter-separated values. The data source can contain spatial attributes (e.g., longitude and latidude) and other attributes. Currently only inputs with spatial attributes occupying a contiguous range of columns (i.e., [first_spatial_col_index, last_spatial_col_index]) are supported.
Usage
sedona_read_dsv_to_typed_rdd(
sc,
location,
delimiter = c(",", "\t", "?", "'", "\"", "_", "-", "%", "~", "|", ";"),
type = c("point", "polygon", "linestring"),
first_spatial_col_index = 0L,
last_spatial_col_index = NULL,
has_non_spatial_attrs = TRUE,
storage_level = "MEMORY_ONLY",
repartition = 1L
)
Arguments
sc |
A |
location |
Location of the data source. |
delimiter |
Delimiter within each record. Must be one of ',', '\t', '?', '\”, '"', '_', '-', '%', '~', '|', ';' |
type |
Type of the SpatialRDD (must be one of "point", "polygon", or "linestring". |
first_spatial_col_index |
Zero-based index of the left-most column containing spatial attributes (default: 0). |
last_spatial_col_index |
Zero-based index of the right-most column containing spatial attributes (default: NULL). Note last_spatial_col_index does not need to be specified when creating a PointRDD because it will automatically have the implied value of (first_spatial_col_index + 1). For all other types of RDDs, if last_spatial_col_index is unspecified, then it will assume the value of -1 (i.e., the last of all input columns). |
has_non_spatial_attrs |
Whether the input contains non-spatial attributes. |
storage_level |
Storage level of the RDD (default: MEMORY_ONLY). |
repartition |
The minimum number of partitions to have in the resulting RDD (default: 1). |
Value
A typed SpatialRDD.
See Also
Other Sedona RDD data interface functions:
sedona_read_geojson()
,
sedona_read_shapefile_to_typed_rdd()
,
sedona_save_spatial_rdd()
,
sedona_write_wkb()
Examples
library(sparklyr)
library(apache.sedona)
sc <- spark_connect(master = "spark://HOST:PORT")
if (!inherits(sc, "test_connection")) {
input_location <- "/dev/null" # replace it with the path to your csv file
rdd <- sedona_read_dsv_to_typed_rdd(
sc,
location = input_location,
delimiter = ",",
type = "point",
first_spatial_col_index = 1L
)
}
Read geospatial data into a Spatial RDD
Description
Import spatial object from an external data source into a Sedona SpatialRDD.
-
sedona_read_shapefile
: from a shapefile -
sedona_read_geojson
: from a geojson file -
sedona_read_wkt
: from a geojson file -
sedona_read_wkb
: from a geojson file
Usage
sedona_read_geojson(
sc,
location,
allow_invalid_geometries = TRUE,
skip_syntactically_invalid_geometries = TRUE,
storage_level = "MEMORY_ONLY",
repartition = 1L
)
sedona_read_wkb(
sc,
location,
wkb_col_idx = 0L,
allow_invalid_geometries = TRUE,
skip_syntactically_invalid_geometries = TRUE,
storage_level = "MEMORY_ONLY",
repartition = 1L
)
sedona_read_wkt(
sc,
location,
wkt_col_idx = 0L,
allow_invalid_geometries = TRUE,
skip_syntactically_invalid_geometries = TRUE,
storage_level = "MEMORY_ONLY",
repartition = 1L
)
sedona_read_shapefile(sc, location, storage_level = "MEMORY_ONLY")
Arguments
sc |
A |
location |
Location of the data source. |
allow_invalid_geometries |
Whether to allow topology-invalid geometries to exist in the resulting RDD. |
skip_syntactically_invalid_geometries |
Whether to allows Sedona to automatically skip syntax-invalid geometries, rather than throwing errorings. |
storage_level |
Storage level of the RDD (default: MEMORY_ONLY). |
repartition |
The minimum number of partitions to have in the resulting RDD (default: 1). |
wkb_col_idx |
Zero-based index of column containing hex-encoded WKB data (default: 0). |
wkt_col_idx |
Zero-based index of column containing hex-encoded WKB data (default: 0). |
Value
A SpatialRDD.
See Also
Other Sedona RDD data interface functions:
sedona_read_dsv_to_typed_rdd()
,
sedona_read_shapefile_to_typed_rdd()
,
sedona_save_spatial_rdd()
,
sedona_write_wkb()
Examples
library(sparklyr)
library(apache.sedona)
sc <- spark_connect(master = "spark://HOST:PORT")
if (!inherits(sc, "test_connection")) {
input_location <- "/dev/null" # replace it with the path to your input file
rdd <- sedona_read_geojson(sc, location = input_location)
}
(Deprecated) Create a typed SpatialRDD from a shapefile or geojson data source.
Description
Constructors of typed RDD (PointRDD, PolygonRDD, LineStringRDD) are soft deprecated, use non-types versions
Create a typed SpatialRDD (namely, a PointRDD, a PolygonRDD, or a LineStringRDD)
-
sedona_read_shapefile_to_typed_rdd
: from a shapefile data source -
sedona_read_geojson_to_typed_rdd
: from a GeoJSON data source
Usage
sedona_read_shapefile_to_typed_rdd(
sc,
location,
type = c("point", "polygon", "linestring"),
storage_level = "MEMORY_ONLY"
)
sedona_read_geojson_to_typed_rdd(
sc,
location,
type = c("point", "polygon", "linestring"),
has_non_spatial_attrs = TRUE,
storage_level = "MEMORY_ONLY",
repartition = 1L
)
Arguments
sc |
A |
location |
Location of the data source. |
type |
Type of the SpatialRDD (must be one of "point", "polygon", or "linestring". |
storage_level |
Storage level of the RDD (default: MEMORY_ONLY). |
has_non_spatial_attrs |
Whether the input contains non-spatial attributes. |
repartition |
The minimum number of partitions to have in the resulting RDD (default: 1). |
Value
A typed SpatialRDD.
See Also
Other Sedona RDD data interface functions:
sedona_read_dsv_to_typed_rdd()
,
sedona_read_geojson()
,
sedona_save_spatial_rdd()
,
sedona_write_wkb()
Examples
library(sparklyr)
library(apache.sedona)
sc <- spark_connect(master = "spark://HOST:PORT")
if (!inherits(sc, "test_connection")) {
input_location <- "/dev/null" # replace it with the path to your shapefile
rdd <- sedona_read_shapefile_to_typed_rdd(
sc,
location = input_location, type = "polygon"
)
}
Visualize a Sedona spatial RDD using a choropleth map.
Description
Generate a choropleth map of a pair RDD assigning integral values to polygons.
Usage
sedona_render_choropleth_map(
pair_rdd,
resolution_x,
resolution_y,
output_location,
output_format = c("png", "gif", "svg"),
boundary = NULL,
color_of_variation = c("red", "green", "blue"),
base_color = c(0, 0, 0),
shade = TRUE,
reverse_coords = FALSE,
overlay = NULL,
browse = interactive()
)
Arguments
pair_rdd |
A pair RDD with Sedona Polygon objects being keys and java.lang.Long being values. |
resolution_x |
Resolution on the x-axis. |
resolution_y |
Resolution on the y-axis. |
output_location |
Location of the output image. This should be the desired path of the image file excluding extension in its file name. |
output_format |
File format of the output image. Currently "png", "gif", and "svg" formats are supported (default: "png"). |
boundary |
Only render data within the given rectangular boundary.
The |
color_of_variation |
Which color channel will vary depending on values of data points. Must be one of "red", "green", or "blue". Default: red. |
base_color |
Color of any data point with value 0. Must be a numeric vector of length 3 specifying values for red, green, and blue channels. Default: c(0, 0, 0). |
shade |
Whether data point with larger magnitude will be displayed with darker color. Default: TRUE. |
reverse_coords |
Whether to reverse spatial coordinates in the plot (default: FALSE). |
overlay |
A |
browse |
Whether to open the rendered image in a browser (default: interactive()). |
Value
No return value.
See Also
Other Sedona visualization routines:
sedona_render_heatmap()
,
sedona_render_scatter_plot()
Examples
library(sparklyr)
library(apache.sedona)
sc <- spark_connect(master = "spark://HOST:PORT")
if (!inherits(sc, "test_connection")) {
pt_input_location <- "/dev/null" # replace it with the path to your input file
pt_rdd <- sedona_read_dsv_to_typed_rdd(
sc,
location = pt_input_location,
type = "point",
first_spatial_col_index = 1
)
polygon_input_location <- "/dev/null" # replace it with the path to your input file
polygon_rdd <- sedona_read_geojson_to_typed_rdd(
sc,
location = polygon_input_location,
type = "polygon"
)
join_result_rdd <- sedona_spatial_join_count_by_key(
pt_rdd,
polygon_rdd,
join_type = "intersect",
partitioner = "quadtree"
)
sedona_render_choropleth_map(
join_result_rdd,
400,
200,
output_location = tempfile("choropleth-map-"),
boundary = c(-86.8, -86.6, 33.4, 33.6),
base_color = c(255, 255, 255)
)
}
Visualize a Sedona spatial RDD using a heatmap.
Description
Generate a heatmap of geometrical object(s) within a Sedona spatial RDD.
Usage
sedona_render_heatmap(
rdd,
resolution_x,
resolution_y,
output_location,
output_format = c("png", "gif", "svg"),
boundary = NULL,
blur_radius = 10L,
overlay = NULL,
browse = interactive()
)
Arguments
rdd |
A Sedona spatial RDD. |
resolution_x |
Resolution on the x-axis. |
resolution_y |
Resolution on the y-axis. |
output_location |
Location of the output image. This should be the desired path of the image file excluding extension in its file name. |
output_format |
File format of the output image. Currently "png", "gif", and "svg" formats are supported (default: "png"). |
boundary |
Only render data within the given rectangular boundary.
The |
blur_radius |
Controls the radius of a Gaussian blur in the resulting heatmap. |
overlay |
A |
browse |
Whether to open the rendered image in a browser (default: interactive()). |
Value
No return value.
See Also
Other Sedona visualization routines:
sedona_render_choropleth_map()
,
sedona_render_scatter_plot()
Examples
library(sparklyr)
library(apache.sedona)
sc <- spark_connect(master = "spark://HOST:PORT")
if (!inherits(sc, "test_connection")) {
input_location <- "/dev/null" # replace it with the path to your input file
rdd <- sedona_read_dsv_to_typed_rdd(
sc,
location = input_location,
type = "point"
)
sedona_render_heatmap(
rdd,
resolution_x = 800,
resolution_y = 600,
output_location = tempfile("points-"),
output_format = "png",
boundary = c(-91, -84, 30, 35),
blur_radius = 10
)
}
Visualize a Sedona spatial RDD using a scatter plot.
Description
Generate a scatter plot of geometrical object(s) within a Sedona spatial RDD.
Usage
sedona_render_scatter_plot(
rdd,
resolution_x,
resolution_y,
output_location,
output_format = c("png", "gif", "svg"),
boundary = NULL,
color_of_variation = c("red", "green", "blue"),
base_color = c(0, 0, 0),
shade = TRUE,
reverse_coords = FALSE,
overlay = NULL,
browse = interactive()
)
Arguments
rdd |
A Sedona spatial RDD. |
resolution_x |
Resolution on the x-axis. |
resolution_y |
Resolution on the y-axis. |
output_location |
Location of the output image. This should be the desired path of the image file excluding extension in its file name. |
output_format |
File format of the output image. Currently "png", "gif", and "svg" formats are supported (default: "png"). |
boundary |
Only render data within the given rectangular boundary.
The |
color_of_variation |
Which color channel will vary depending on values of data points. Must be one of "red", "green", or "blue". Default: red. |
base_color |
Color of any data point with value 0. Must be a numeric vector of length 3 specifying values for red, green, and blue channels. Default: c(0, 0, 0). |
shade |
Whether data point with larger magnitude will be displayed with darker color. Default: TRUE. |
reverse_coords |
Whether to reverse spatial coordinates in the plot (default: FALSE). |
overlay |
A |
browse |
Whether to open the rendered image in a browser (default: interactive()). |
Value
No return value.
See Also
Other Sedona visualization routines:
sedona_render_choropleth_map()
,
sedona_render_heatmap()
Examples
library(sparklyr)
library(apache.sedona)
sc <- spark_connect(master = "spark://HOST:PORT")
if (!inherits(sc, "test_connection")) {
input_location <- "/dev/null" # replace it with the path to your input file
rdd <- sedona_read_dsv_to_typed_rdd(
sc,
location = input_location,
type = "point"
)
sedona_render_scatter_plot(
rdd,
resolution_x = 800,
resolution_y = 600,
output_location = tempfile("points-"),
output_format = "png",
boundary = c(-91, -84, 30, 35)
)
}
Save a Spark dataframe containing exactly 1 spatial column into a file.
Description
Export serialized data from a Spark dataframe containing exactly 1 spatial column into a file.
Usage
sedona_save_spatial_rdd(
x,
spatial_col,
output_location,
output_format = c("wkb", "wkt", "geojson")
)
Arguments
x |
A Spark dataframe object in sparklyr or a dplyr expression representing a Spark SQL query. |
spatial_col |
The name of the spatial column. |
output_location |
Location of the output file. |
output_format |
Format of the output. |
Value
No return value.
See Also
Other Sedona RDD data interface functions:
sedona_read_dsv_to_typed_rdd()
,
sedona_read_geojson()
,
sedona_read_shapefile_to_typed_rdd()
,
sedona_write_wkb()
Examples
library(sparklyr)
library(apache.sedona)
sc <- spark_connect(master = "spark://HOST:PORT")
if (!inherits(sc, "test_connection")) {
tbl <- dplyr::tbl(
sc,
dplyr::sql("SELECT ST_GeomFromText('POINT(-71.064544 42.28787)') AS `pt`")
)
sedona_save_spatial_rdd(
tbl %>% dplyr::mutate(id = 1),
spatial_col = "pt",
output_location = "/tmp/pts.wkb",
output_format = "wkb"
)
}
Perform a spatial join operation on two Sedona spatial RDDs.
Description
Given spatial_rdd
and query_window_rdd
, return a pair RDD containing all
pairs of geometrical elements (p, q) such that p is an element of
spatial_rdd
, q is an element of query_window_rdd
, and (p, q) satisfies
the spatial relation specified by join_type
.
Usage
sedona_spatial_join(
spatial_rdd,
query_window_rdd,
join_type = c("contain", "intersect"),
partitioner = c("quadtree", "kdbtree"),
index_type = c("quadtree", "rtree")
)
Arguments
spatial_rdd |
Spatial RDD containing geometries to be queried. |
query_window_rdd |
Spatial RDD containing the query window(s). |
join_type |
Type of the join query (must be either "contain" or
"intersect").
If |
partitioner |
Spatial partitioning to apply to both |
index_type |
Controls how |
Value
A spatial RDD containing the join result.
See Also
Other Sedona spatial join operator:
sedona_spatial_join_count_by_key()
Examples
library(sparklyr)
library(apache.sedona)
sc <- spark_connect(master = "spark://HOST:PORT")
if (!inherits(sc, "test_connection")) {
input_location <- "/dev/null" # replace it with the path to your input file
rdd <- sedona_read_dsv_to_typed_rdd(
sc,
location = input_location,
delimiter = ",",
type = "point",
first_spatial_col_index = 1L
)
query_rdd_input_location <- "/dev/null" # replace it with the path to your input file
query_rdd <- sedona_read_shapefile_to_typed_rdd(
sc,
location = query_rdd_input_location,
type = "polygon"
)
join_result_rdd <- sedona_spatial_join(
rdd,
query_rdd,
join_type = "intersect",
partitioner = "quadtree"
)
}
Perform a spatial count-by-key operation based on two Sedona spatial RDDs.
Description
For each element p from spatial_rdd
, count the number of unique elements q
from query_window_rdd
such that (p, q) satisfies the spatial relation
specified by join_type
.
Usage
sedona_spatial_join_count_by_key(
spatial_rdd,
query_window_rdd,
join_type = c("contain", "intersect"),
partitioner = c("quadtree", "kdbtree"),
index_type = c("quadtree", "rtree")
)
Arguments
spatial_rdd |
Spatial RDD containing geometries to be queried. |
query_window_rdd |
Spatial RDD containing the query window(s). |
join_type |
Type of the join query (must be either "contain" or
"intersect").
If |
partitioner |
Spatial partitioning to apply to both |
index_type |
Controls how |
Value
A spatial RDD containing the join-count-by-key results.
See Also
Other Sedona spatial join operator:
sedona_spatial_join()
Examples
library(sparklyr)
library(apache.sedona)
sc <- spark_connect(master = "spark://HOST:PORT")
if (!inherits(sc, "test_connection")) {
input_location <- "/dev/null" # replace it with the path to your input file
rdd <- sedona_read_dsv_to_typed_rdd(
sc,
location = input_location,
delimiter = ",",
type = "point",
first_spatial_col_index = 1L
)
query_rdd_input_location <- "/dev/null" # replace it with the path to your input file
query_rdd <- sedona_read_shapefile_to_typed_rdd(
sc,
location = query_rdd_input_location,
type = "polygon"
)
join_result_rdd <- sedona_spatial_join_count_by_key(
rdd,
query_rdd,
join_type = "intersect",
partitioner = "quadtree"
)
}
Spatial RDD aggregation routine
Description
Function extracting aggregate statistics from a Sedona spatial RDD.
Arguments
x |
A Sedona spatial RDD. |
Create a SpatialRDD from an external data source.
Description
Import spatial object from an external data source into a Sedona SpatialRDD.
Arguments
sc |
A |
location |
Location of the data source. |
type |
Type of the SpatialRDD (must be one of "point", "polygon", or "linestring". |
has_non_spatial_attrs |
Whether the input contains non-spatial attributes. |
storage_level |
Storage level of the RDD (default: MEMORY_ONLY). |
repartition |
The minimum number of partitions to have in the resulting RDD (default: 1). |
Visualization routine for Sedona spatial RDD.
Description
Generate a visual representation of geometrical object(s) within a Sedona spatial RDD.
Arguments
rdd |
A Sedona spatial RDD. |
resolution_x |
Resolution on the x-axis. |
resolution_y |
Resolution on the y-axis. |
output_location |
Location of the output image. This should be the desired path of the image file excluding extension in its file name. |
output_format |
File format of the output image. Currently "png", "gif", and "svg" formats are supported (default: "png"). |
boundary |
Only render data within the given rectangular boundary.
The |
color_of_variation |
Which color channel will vary depending on values of data points. Must be one of "red", "green", or "blue". Default: red. |
base_color |
Color of any data point with value 0. Must be a numeric vector of length 3 specifying values for red, green, and blue channels. Default: c(0, 0, 0). |
shade |
Whether data point with larger magnitude will be displayed with darker color. Default: TRUE. |
overlay |
A |
browse |
Whether to open the rendered image in a browser (default: interactive()). |
Write SpatialRDD into a file.
Description
Export serialized data from a Sedona SpatialRDD into a file.
-
sedona_write_wkb
: -
sedona_write_wkt
: -
sedona_write_geojson
:
Usage
sedona_write_wkb(x, output_location)
sedona_write_wkt(x, output_location)
sedona_write_geojson(x, output_location)
Arguments
x |
The SpatialRDD object. |
output_location |
Location of the output file. |
Value
No return value.
See Also
Other Sedona RDD data interface functions:
sedona_read_dsv_to_typed_rdd()
,
sedona_read_geojson()
,
sedona_read_shapefile_to_typed_rdd()
,
sedona_save_spatial_rdd()
Examples
library(sparklyr)
library(apache.sedona)
sc <- spark_connect(master = "spark://HOST:PORT")
if (!inherits(sc, "test_connection")) {
input_location <- "/dev/null" # replace it with the path to your input file
rdd <- sedona_read_wkb(
sc,
location = input_location,
wkb_col_idx = 0L
)
sedona_write_wkb(rdd, "/tmp/wkb_output.tsv")
}
Read geospatial data into a Spark DataFrame.
Description
These functions are deprecated and will be removed in a future release. Sedona has
been implementing readers as spark DataFrame sources, so you can use spark_read_source
with the right sources ("shapefile", "geojson", "geoparquet") to read geospatial data.
Functions to read geospatial data from a variety of formats into Spark DataFrames.
-
spark_read_shapefile
: from a shapefile -
spark_read_geojson
: from a geojson file -
spark_read_geoparquet
: from a geoparquet file
Usage
spark_read_shapefile(sc, name = NULL, path = name, options = list(), ...)
spark_read_geojson(
sc,
name = NULL,
path = name,
options = list(),
repartition = 0,
memory = TRUE,
overwrite = TRUE
)
spark_read_geoparquet(
sc,
name = NULL,
path = name,
options = list(),
repartition = 0,
memory = TRUE,
overwrite = TRUE
)
Arguments
sc |
A |
name |
The name to assign to the newly generated table. |
path |
The path to the file. Needs to be accessible from the cluster. Supports the ‘"hdfs://"’, ‘"s3a://"’ and ‘"file://"’ protocols. |
options |
A list of strings with additional options. See https://spark.apache.org/docs/latest/sql-programming-guide.html#configuration. |
... |
Optional arguments; currently unused. |
repartition |
The number of partitions used to distribute the generated table. Use 0 (the default) to avoid partitioning. |
memory |
Boolean; should the data be loaded eagerly into memory? (That is, should the table be cached?) |
overwrite |
Boolean; overwrite the table with the given name if it already exists? |
Value
A tbl
See Also
Other Sedona DF data interface functions:
spark_write_geojson()
Examples
library(sparklyr)
library(apache.sedona)
sc <- spark_connect(master = "spark://HOST:PORT")
if (!inherits(sc, "test_connection")) {
input_location <- "/dev/null" # replace it with the path to your input file
rdd <- spark_read_shapefile(sc, location = input_location)
}
Write geospatial data from a Spark DataFrame.
Description
These functions are deprecated and will be removed in a future release. Sedona has
been implementing writers as spark DataFrame sources, so you can use spark_write_source
with the right sources ("shapefile", "geojson", "geoparquet") to write geospatial data.
Functions to write geospatial data into a variety of formats from Spark DataFrames.
-
spark_write_geojson
: to GeoJSON -
spark_write_geoparquet
: to GeoParquet -
spark_write_raster
: to raster tiles after using RS output functions (RS_AsXXX
)
Usage
spark_write_geojson(
x,
path,
mode = NULL,
options = list(),
partition_by = NULL,
...
)
spark_write_geoparquet(
x,
path,
mode = NULL,
options = list(),
partition_by = NULL,
...
)
spark_write_raster(
x,
path,
mode = NULL,
options = list(),
partition_by = NULL,
...
)
Arguments
x |
A Spark DataFrame or dplyr operation |
path |
The path to the file. Needs to be accessible from the cluster. Supports the ‘"hdfs://"’, ‘"s3a://"’ and ‘"file://"’ protocols. |
mode |
A For more details see also https://spark.apache.org/docs/latest/sql-programming-guide.html#save-modes for your version of Spark. |
options |
A list of strings with additional options. |
partition_by |
A |
... |
Optional arguments; currently unused. |
See Also
Other Sedona DF data interface functions:
spark_read_shapefile()
Examples
library(sparklyr)
library(apache.sedona)
sc <- spark_connect(master = "spark://HOST:PORT")
if (!inherits(sc, "test_connection")) {
tbl <- dplyr::tbl(
sc,
dplyr::sql("SELECT ST_GeomFromText('POINT(-71.064544 42.28787)') AS `pt`")
)
spark_write_geojson(
tbl %>% dplyr::mutate(id = 1),
output_location = "/tmp/pts.geojson"
)
}
Spatial join operator
Description
R interface for a Sedona spatial join operator
Arguments
spatial_rdd |
Spatial RDD containing geometries to be queried. |
query_window_rdd |
Spatial RDD containing the query window(s). |
join_type |
Type of the join query (must be either "contain" or
"intersect").
If |
partitioner |
Spatial partitioning to apply to both |
index_type |
Controls how |
Execute a spatial query
Description
Given a spatial RDD, run a spatial query parameterized by a spatial object
x
.
Arguments
rdd |
A Sedona spatial RDD. |
x |
The query object. |
index_type |
Index to use to facilitate the KNN query. If NULL, then
do not build any additional spatial index on top of |
result_type |
Type of result to return.
If "rdd" (default), then the k nearest objects will be returned in a Sedona
spatial RDD.
If "sdf", then a Spark dataframe containing the k nearest objects will be
returned.
If "raw", then a list of k nearest objects will be returned. Each element
within this list will be a JVM object of type
|
Export a Spark SQL query with a spatial column into a Sedona spatial RDD.
Description
Given a Spark dataframe object or a dplyr expression encapsulating a Spark SQL query, build a Sedona spatial RDD that will encapsulate the same query or data source. The input should contain exactly one spatial column and all other non-spatial columns will be treated as custom user-defined attributes in the resulting spatial RDD.
Usage
to_spatial_rdd(x, spatial_col)
Arguments
x |
A Spark dataframe object in sparklyr or a dplyr expression representing a Spark SQL query. |
spatial_col |
The name of the spatial column. |
Value
A SpatialRDD encapsulating the query.
Examples
library(sparklyr)
library(apache.sedona)
sc <- spark_connect(master = "spark://HOST:PORT")
if (!inherits(sc, "test_connection")) {
tbl <- dplyr::tbl(
sc,
dplyr::sql("SELECT ST_GeomFromText('POINT(-71.064544 42.28787)') AS `pt`")
)
rdd <- to_spatial_rdd(tbl, "pt")
}