Title: | Streamline Population Genomic and Genetic Analyses |
Version: | 1.4.0 |
Description: | Estimate commonly used population genomic statistics and generate publication quality figures. 'PopGenHelpR' uses vcf, 'geno' (012), and csv files to generate output. |
URL: | https://kfarleigh.github.io/PopGenHelpR/ |
BugReports: | https://github.com/kfarleigh/PopGenHelpR/issues |
License: | GPL (≥ 3) |
Encoding: | UTF-8 |
RoxygenNote: | 7.3.2 |
Suggests: | knitr, rmarkdown, testthat (≥ 3.0.0) |
Config/testthat/edition: | 3 |
Imports: | dplyr, ggplot2, magrittr, methods, reshape2, rlang, scatterpie, stats, geodata, terra, ggspatial, spdep, sf, utils, vcfR |
Depends: | R (≥ 2.10) |
LazyData: | true |
VignetteBuilder: | knitr |
NeedsCompilation: | no |
Packaged: | 2025-07-02 18:49:42 UTC; keaka |
Author: | Keaka Farleigh |
Maintainer: | Keaka Farleigh <keakafarleigh@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2025-07-02 19:00:02 UTC |
Plot an ancestry matrix for individuals and(or) populations.
Description
Plot an ancestry matrix for individuals and(or) populations.
Usage
Ancestry_barchart(
anc.mat,
pops,
K,
plot.type = "all",
col,
ind.order = NULL,
pop.order = NULL,
legend_pos = "right"
)
Arguments
anc.mat |
Data frame or character string that supplies the input data. If it is a character string, the file should be a csv. The first column should be the names of each sample/population, followed by the estimated contribution of each cluster to that individual/pop. |
pops |
Data frame or character string that supplies the input data. If it is a character string, the file should be a csv. The first two columns should indicate the sample name (first column) and the population that sample belongs to (second column). Other columns (i.e., latitude, longitude) can be present, but will not be used. |
K |
Numeric.The number of genetic clusters in your data set, please contact the package authors if you need help doing this. |
plot.type |
Character string. Options are all, individual, and population. All is default and recommended, this will plot a barchart for both the individuals and populations. |
col |
Character vector indicating the colors you wish to use for plotting. |
ind.order |
Character vector indicating the order to plot the individuals in the individual ancestry bar chart. |
pop.order |
Character vector indicating the order to plot the populations in the population ancestry bar chart. |
legend_pos |
Character. The desired position of the legend. The default is "none", which removes the legend. Other options include "left", "right", "top" or "bottom". Please see the ggplot2 documentation for all of the legend placement options. |
Value
A list containing your plots and the data frames used to generate the plots.
Author(s)
Keaka Farleigh
Examples
data(Q_dat)
Qmat <- Q_dat[[1]]
rownames(Qmat) <- Qmat[,1]
Loc <- Q_dat[[2]]
Test_all <- Ancestry_barchart(anc.mat = Qmat, pops = Loc, K = 5,
plot.type = 'all',col = c('#d73027', '#fc8d59', '#e0f3f8', '#91bfdb', '#4575b4'))
A function to estimate three measures of genetic differentiation using geno files, vcf files, or vcfR objects. Data is assumed to be bi-allelic.
Description
A function to estimate three measures of genetic differentiation using geno files, vcf files, or vcfR objects. Data is assumed to be bi-allelic.
Usage
Differentiation(
data,
pops,
statistic = "all",
missing_value = NA,
write = FALSE,
prefix = NULL,
population_col = NULL,
individual_col = NULL
)
Arguments
data |
Character. String indicating the name of the vcf file, geno file or vcfR object to be used in the analysis.The genotypes within the vcf should be seperated by a "/" or "|". This normally indicates unphased and phased genotypes, respectively. Please reach out to PopGenHelpR authors if you have questions. |
pops |
Character. String indicating the name of the population assignment file or dataframe containing the population assignment information for each individual in the data. This file must be in the same order as the vcf file and include columns specifying the individual and the population that individual belongs to. The first column should contain individual names and the second column should indicate the population assignment of each individual. Alternatively, you can indicate the column containing the individual and population information using the individual_col and population_col arguments. |
statistic |
Character. String or vector indicating the statistic to calculate. Options are any of: all; all of the statistics; Fst, Weir and Cockerham (1984) Fst; NeisD, Nei's D statistic; JostsD, Jost's D. |
missing_value |
Character. String indicating missing data in the input data. It is assumed to be NA, but that may not be true (is likely not) in the case of geno files. |
write |
Boolean. Whether or not to write the output to files in the current working directory. There will be one or two files for each statistic. Files will be named based on their statistic such as Fst_perpop.csv. |
prefix |
Character. Optional argument. String that will be appended to file output. Please provide a prefix if write is set to TRUE. |
population_col |
Numeric. Optional argument (a number) indicating the column that contains the population assignment information. |
individual_col |
Numeric. Optional argument (a number) indicating the column that contains the individuals (i.e., sample name) in the data. |
Value
A list containing the estimated heterozygosity statistics. The per pop values are calculated by taking the average of the per locus estimates.
Author(s)
Keaka Farleigh
References
Fst:
Pembleton, L. W., Cogan, N. O., & Forster, J. W. (2013). StAMPP: An R package for calculation of genetic differentiation and structure of mixed‐ploidy level populations. Molecular ecology resources, 13(5), 946-952.doi:10.1111/1755-0998.12129
Weir, B. S., & Cockerham, C. C. (1984). Estimating F-statistics for the analysis of population structure. evolution, 1358-1370.
Nei's D:
Nei, M. (1972). Genetic distance between populations. The American Naturalist, 106(949), 283-292.doi:10.1086/282771
doi:10.1111/1755-0998.12129 Pembleton, L. W., Cogan, N. O., & Forster, J. W. (2013). StAMPP: An R package for calculation of genetic differentiation and structure of mixed‐ploidy level populations. Molecular ecology resources, 13(5), 946-952.
Jost's D:
Jost L (2008). GST and its relatives do not measure differentiation. Molecular Ecology, 17, 4015–4026.doi:10.1111/j.1365-294X.2008.03887.x
Examples
data("HornedLizard_Pop")
data("HornedLizard_VCF")
Test <- Differentiation(data = HornedLizard_VCF, pops = HornedLizard_Pop, write = FALSE)
A genetic differentiation matrix and locality information for each population. This data was generated by subsetting data of Farleigh et al., 2021.
Description
A symmetric matrix with estimated genetic differentiation (Fst) between 3 populations.
Usage
data(Fst_dat)
Format
A list with two elements:
- Fst_dat
Data frame with three rows and three columns
- Loc_dat
Data frame containing the locality information for each population
...
Source
Farleigh, K., Vladimirova, S. A., Blair, C., Bracken, J. T., Koochekian, N., Schield, D. R., ... & Jezkova, T. (2021). The effects of climate and demographic history in shaping genomic variation across populations of the Desert Horned Lizard (Phrynosoma platyrhinos). Molecular Ecology, 30(18), 4481-4496.
Examples
data(Fst_dat)
Fst <- Fst_dat[[1]]
Loc <- Fst_dat[[2]]
Test <- Network_map(dat = Fst, pops = Loc,
neighbors = 2,col = c('#4575b4', '#91bfdb', '#e0f3f8','#fd8d3c','#fc4e2a'),
statistic = "Fst", Lat_buffer = 1, Long_buffer = 1)
Fstat_plot <- Pairwise_heatmap(dat = Fst, statistic = 'FST')
A data frame of hypothetical heterozygosity data produced by Heterozygosity.
Description
Data frame containing 5 columns and 3 rows
Usage
data(Het_dat)
Format
A data frame with 5 columns and 3 rows:
- Heterozygosity
Estimated heterozygosity
- Pop
Population assignment
- Standard.Deviation
standard deviation
- Longitude
Longitude
- Latitude
Latitude
...
Source
Coordinates and population names taken from Farleigh, K., Vladimirova, S. A., Blair, C., Bracken, J. T., Koochekian, N., Schield, D. R., ... & Jezkova, T. (2021). The effects of climate and demographic history in shaping genomic variation across populations of the Desert Horned Lizard (Phrynosoma platyrhinos). Molecular Ecology, 30(18), 4481-4496.
Examples
data(Het_dat)
Test <- Point_map(Het_dat, statistic = "Heterozygosity")
A function to estimate seven measures of heterozygosity using geno files, vcf files, or vcfR objects. Data is assumed to be bi-allelic.
Description
A function to estimate seven measures of heterozygosity using geno files, vcf files, or vcfR objects. Data is assumed to be bi-allelic.
Usage
Heterozygosity(
data,
pops,
statistic = "all",
missing_value = NA,
write = FALSE,
prefix = NULL,
population_col = NULL,
individual_col = NULL
)
Arguments
data |
Character. String indicating the name of the vcf file, geno file or vcfR object to be used in the analysis. |
pops |
Character. String indicating the name of the population assignment file or dataframe containing the population assignment information for each individual in the data. This file must be in the same order as the vcf file and include columns specifying the individual and the population that individual belongs to. The first column should contain individual names and the second column should indicate the population assignment of each individual. Alternatively, you can indicate the column containing the individual and population information using the individual_col and population_col arguments. |
statistic |
Character. String or vector indicating the statistic to calculate. Options are any of: all; all of the statistics; Ho, observed heterozygosity; He, expected heterozygosity; PHt, proportion of heterozygous loci; Hs_exp, heterozygosity standardized by the average expected heterozygosity; Hs_obs, heterozygosity standardized by the average observed heterozygosity; IR, internal relatedness; HL, homozygosity by locus. |
missing_value |
Character. String indicating missing data in the input data. It is assumed to be NA, but that may not be true (is likely not) in the case of geno files. |
write |
Boolean. Whether or not to write the output to files in the current working directory. There will be one or two files for each statistic. Files will be named based on their statistic such as Ho_perpop.csv or Ho_perloc.csv. |
prefix |
Character. Optional argument. String that will be appended to file output. Please provide a prefix if write is set to TRUE. |
population_col |
Numeric. Optional argument (a number) indicating the column that contains the population assignment information. |
individual_col |
Numeric. Optional argument (a number) indicating the column that contains the individuals (i.e., sample name) in the data. |
Value
A list containing the estimated heterozygosity statistics. The per pop values are calculated by taking the average of the per locus estimates.
Author(s)
Keaka Farleigh
References
Expected (He) and observed heterozygosity (Ho):
Nei, M. (1987) Molecular Evolutionary Genetics. Columbia University Press
Homozygosity by locus (HL) and internal relatedness (IR):
Alho, J. S., Välimäki, K., & Merilä, J. (2010). Rhh: an R extension for estimating multilocus heterozygosity and heterozygosity–heterozygosity correlation. Molecular ecology resources, 10(4), 720-722.
Amos, W., Worthington Wilmer, J., Fullard, K., Burg, T. M., Croxall, J. P., Bloch, D., & Coulson, T. (2001). The influence of parental relatedness on reproductive success. Proceedings of the Royal Society of London. Series B: Biological Sciences, 268(1480), 2021-2027.doi:10.1098/rspb.2001.1751
Aparicio, J. M., Ortego, J., & Cordero, P. J. (2006). What should we weigh to estimate heterozygosity, alleles or loci?. Molecular Ecology, 15(14), 4659-4665.
Heterozygosity standardized by expected (Hs_exp) and observed heterozygosity (Hs_obs):
Coltman, D. W., Pilkington, J. G., Smith, J. A., & Pemberton, J. M. (1999). Parasite‐mediated selection against Inbred Soay sheep in a free‐living island population. Evolution, 53(4), 1259-1267.doi:10.1111/j.1558-5646.1999.tb04538.x
Examples
data("HornedLizard_Pop")
data("HornedLizard_VCF")
Test <- Heterozygosity(data = HornedLizard_VCF, pops = HornedLizard_Pop, write = FALSE)
A population assignment data frame to be used in Heterozygosity
and Differentiation
.
Description
Data frame containing 4 columns and 72 rows
Usage
data(HornedLizard_Pop)
Format
A data frame with 4 columns and 72 rows:
- Sample
Sample Name
- Population
Population assignment according to sNMF results (see citation)
- Longitude
Longitude
- Latitude
Latitude
...
Source
Coordinates and population names taken from Farleigh, K., Vladimirova, S. A., Blair, C., Bracken, J. T., Koochekian, N., Schield, D. R., ... & Jezkova, T. (2021). The effects of climate and demographic history in shaping genomic variation across populations of the Desert Horned Lizard (Phrynosoma platyrhinos). Molecular Ecology, 30(18), 4481-4496.
Examples
data("HornedLizard_Pop")
data("HornedLizard_VCF")
Test <- Differentiation(data = HornedLizard_VCF, pops = HornedLizard_Pop, write = FALSE)
A vcfR object to be used in Heterozygosity
and Differentiation
.
Description
Data frame containing 4 columns and 72 rows
Usage
data(HornedLizard_Pop)
Format
A vcfR object
- vcfR object
A vcfR object containing genotype and sample informaiton for 72 individuals.
...
Source
Farleigh, K., Vladimirova, S. A., Blair, C., Bracken, J. T., Koochekian, N., Schield, D. R., ... & Jezkova, T. (2021). The effects of climate and demographic history in shaping genomic variation across populations of the Desert Horned Lizard (Phrynosoma platyrhinos). Molecular Ecology, 30(18), 4481-4496.
Examples
data("HornedLizard_Pop")
data("HornedLizard_VCF")
Test <- Heterozygosity(data = HornedLizard_VCF, pops = HornedLizard_Pop, write = FALSE)
A function to map statistics (i.e., genetic differentiation) between points as a network on a map.
Description
A function to map statistics (i.e., genetic differentiation) between points as a network on a map.
Usage
Network_map(
dat,
pops,
neighbors,
col,
statistic = NULL,
breaks = NULL,
Lat_buffer = 1,
Long_buffer = 1,
Latitude_col = NULL,
Longitude_col = NULL,
country_code = NULL,
shapefile = NULL,
raster = NULL,
legend_pos = "none",
scale_bar = FALSE,
north_arrow = FALSE,
north_arrow_style = ggspatial::north_arrow_nautical(),
north_arrow_position = NULL,
shapefile_plot_position = NULL,
raster_plot_position = NULL,
shapefile_col = NULL,
shapefile_outline_col = NULL,
shp_outwidth = 1,
raster_col = c("#2c7bb6", "#abd9e9", "#ffffbf", "#fdae61", "#d7191c"),
interpolate_raster = NULL,
raster_breaks = NULL,
discrete_raster = NULL
)
Arguments
dat |
Data frame or character string that supplies the input data. If it is a character string, the file should be a csv. If it is a csv, the 1st row should contain the individual/population names. The columns should also be named in this fashion. |
pops |
Data frame or character string that supplies the input data. If it is a character string, the file should be a csv. The columns should be named Sample, containing the sample IDs; Population indicating the population assignment of the individual; Long, indicating the longitude of the sample; Lat, indicating the latitude of the sample. Alternatively, see the Longitude_col and Latitude_col arguments. |
neighbors |
Numeric or character. The number of neighbors to plot connections with, or the specific relationship that you want to visualize. Names should match those in the population assignment file and be seperated by an underscore. If I want to visualize the relationship between East and West, for example, I would set neighbors = "East_West". |
col |
Character vector indicating the colors you wish to use for plotting. |
statistic |
Character indicating the statistic being plotted. This will be used to title the legend. The legend title will be blank if left as NULL. |
breaks |
Numeric. The breaks used to generate the color ramp when plotting. The number of breaks should match the number of colors. |
Lat_buffer |
Numeric. A buffer to customize visualization. |
Long_buffer |
Numeric. A buffer to customize visualization. |
Latitude_col |
Numeric. The number of the column indicating the latitude for each sample. If this is not null, PopGenHelpR will use this column instead of looking for the Lat column. |
Longitude_col |
Numeric. The number of the column indicating the longitude for each sample. If this is not null, PopGenHelpR will use this column instead of looking for the Long column. |
country_code |
Character. A country code or vector of country codes from the R package geodata specifying the country that you want to plot administrative borders for (e.g, US states). You can determine the correct codes using geodata's |
shapefile |
Character. A file name, vector of file names of a shapefile(s) to plot on the map, or a spatvector object that is compatible with the R package terra. This should be used in conjunction with the shapefile_plot_position argument. |
raster |
Character.A file name or a spatraster object that is compatible with the terra R package. This should be used in conjunction with the raster_plot_position argument. |
legend_pos |
Character. The desired position of the legend. The default is "none", which removes the legend. Other options include "left", "right", "top" or "bottom". Please see the ggplot2 documentation for all of the legend placement options. |
scale_bar |
Boolean. Whether or not to add a scale bar. Note that maps with large areas or those that use unprojected spatial data (i.e., WGS 84) will generate a warning that the scale bar varies. |
north_arrow |
Boolean. Whether or not to add a north arrow. |
north_arrow_style |
Character. Which style of north arrow to add. See ggspatial documentation for more details. |
north_arrow_position |
Character. The position of the north arrow. See ggspatial documentation for more details. |
shapefile_plot_position |
Numeric. A number indicating which position to plot the shapefile in. The options are 1, which plots the shapefile on top of the base world map (under points and administrative boundaries), 2 which plots the shapefile on top of administrative boundaries (but under points), and 3, which plots the shapefile on top of everything. |
raster_plot_position |
Numeric. A number indicating which position to plot the shapefile in. The options are 1, which plots the raster on top of the base world map (under points and administrative boundaries), 2 which plots the raster on top of administrative boundaries (but under points), and 3, which plots the raster on top of everything. |
shapefile_col |
Character. A color or color vector indicating the color to fill the shapefile(s) with. Shapefiles will be colored alphabetically. |
shapefile_outline_col |
Character. A color indicating the outline color of the shapefile. |
shp_outwidth |
Numeric. The width of the shapefile outline. |
raster_col |
Character. A character vector indicating the colors used to visualize the raster. The function will seperate your raster data into the same number of bins as there are colors. If you provide 5 colors, for example, there will be 5 bins. |
interpolate_raster |
Boolean. Whether or not to interpolate the raster. The default is to interpolate the raster. |
raster_breaks |
Numeric or Character vector. Values to be used as breaks for the raster surface. |
discrete_raster |
Boolean. Indicating whether or not the raster being supplied is discrete. |
Value
A list containing the map and the matrix used to plot the map.
Author(s)
Keaka Farleigh
Examples
data(Fst_dat)
Fst <- Fst_dat[[1]]
Loc <- Fst_dat[[2]]
Test <- Network_map(dat = Fst, pops = Loc,
neighbors = 2,col = c('#4575b4', '#91bfdb', '#e0f3f8','#fd8d3c','#fc4e2a'),
statistic = "Fst", Lat_buffer = 1, Long_buffer = 1)
A function to perform principal component analysis (PCA) on genetic data. Loci with missing data will be removed prior to PCA.
Description
A function to perform principal component analysis (PCA) on genetic data. Loci with missing data will be removed prior to PCA.
Usage
PCA(
data,
center = TRUE,
scale = FALSE,
missing_value = NA,
write = FALSE,
prefix = NULL
)
Arguments
data |
Character. String indicating the name of the vcf file, geno file or vcfR object to be used in the analysis. |
center |
Boolean. Whether or not to center the data before principal component analysis. |
scale |
Boolean. Whether or not to scale the data before principal component analysis. |
missing_value |
Character. String indicating missing data in the input data. It is assumed to be NA, but that may not be true (is likely not) in the case of geno files. |
write |
Boolean. Whether or not to write the output to files in the current working directory. There will be two files, one for the individual loadings and the other for the percent variance explained by each axis. |
prefix |
Character. Optional argument. String that will be appended to file output. Please provide a prefix if write is set to TRUE. |
Value
A list containing two elements: the loadings of individuals on each principal component and the variance explained by each principal component.
Author(s)
Keaka Farleigh
Examples
data("HornedLizard_VCF")
Test <- PCA(data = HornedLizard_VCF)
A function to plot a heatmap from a symmetric matrix.
Description
A function to plot a heatmap from a symmetric matrix.
Usage
Pairwise_heatmap(
dat,
statistic,
col = c("#abd9e9", "#2c7bb6", "#ffffbf", "#fdae61", "#d7191c"),
breaks = NULL
)
Arguments
dat |
Data frame or character string that supplies the input data. If it is a character string, the file should be a csv. If it is a csv, the 1st row should contain the individual/population names. The columns should also be named in this fashion. |
statistic |
Character indicating the statistic represented in the matrix, this will be used to label the plot. |
col |
Character vector indicating the colors to be used in plotting. The vector should contain two colors, the first will be the low value, the second will be the high value. |
breaks |
Numeric. The breaks used to generate the color ramp when plotting. The number of breaks should match the number of colors. |
Value
A heatmap plot
Examples
#' data(Fst_dat)
Fst <- Fst_dat[[1]]
Fstat_plot <- Pairwise_heatmap(dat = Fst, statistic = 'FST')
Plot a map of ancestry pie charts.
Description
Plot a map of ancestry pie charts.
Usage
Piechart_map(
anc.mat,
pops,
K,
plot.type = "all",
col,
piesize = 0.35,
Lat_buffer,
Long_buffer,
Latitude_col = NULL,
Longitude_col = NULL,
country_code = NULL,
shapefile = NULL,
legend_pos = "none",
scale_bar = FALSE,
north_arrow = FALSE,
north_arrow_style = ggspatial::north_arrow_nautical(),
north_arrow_position = NULL,
shapefile_plot_position = NULL,
shapefile_col = NULL,
shapefile_outline_col = NULL,
shp_outwidth = 1
)
Arguments
anc.mat |
Data frame or character string that supplies the input data. If it is a character string, the file should be a csv. The first column should be the names of each sample/population, followed by the estimated contribution of each cluster to that individual/pop. |
pops |
Data frame or character string that supplies the input data. If it is a character string, the file should be a csv. The columns should be named Sample, containing the sample IDs; Population indicating the population assignment of the individual, population and sample names must be the same type (i.e., both numeric or both characters); Long, indicating the longitude of the sample; Lat, indicating the latitude of the sample. Alternatively, see the Longitude_col and Latitude_col arguments. |
K |
Numeric.The number of genetic clusters in your data set, please contact the package authors if you need help doing this. |
plot.type |
Character string. Options are all, individual, and population. All is default and recommended, this will plot a piechart map for both the individuals and populations. |
col |
Character vector indicating the colors you wish to use for plotting. |
piesize |
Numeric. The radius of the pie chart for ancestry mapping. |
Lat_buffer |
Numeric. A buffer to customize visualization. |
Long_buffer |
Numeric. A buffer to customize visualization. |
Latitude_col |
Numeric. The number of the column indicating the latitude for each sample. If this is not null, PopGenHelpR will use this column instead of looking for the Lat column. |
Longitude_col |
Numeric. The number of the column indicating the longitude for each sample. If this is not null, PopGenHelpR will use this column instead of looking for the Long column. |
country_code |
Character. A country code or vector of country codes from the R package geodata specifying the country that you want to plot administrative borders for (e.g, US states). You can determine the correct codes using geodata's |
shapefile |
Character. A file name, vector of file names of a shapefile(s) to plot on the map, or a spatvector object that is compatible with the R package terra. This should be used in conjunction with the shapefile_plot_position argument. |
legend_pos |
Character. The desired position of the legend. The default is "none", which removes the legend. Other options include "left", "right", "top" or "bottom". Please see the ggplot2 documentation for all of the legend placement options. |
scale_bar |
Boolean. Whether or not to add a scale bar. Note that maps with large areas or those that use unprojected spatial data (i.e., WGS 84) will generate a warning that the scale bar varies. |
north_arrow |
Boolean. Whether or not to add a north arrow. |
north_arrow_style |
Character. Which style of north arrow to add. See ggspatial documentation for more details. |
north_arrow_position |
Character. The position of the north arrow. See ggspatial documentation for more details. |
shapefile_plot_position |
Numeric. A number indicating which position to plot the shapefile in. The options are 1, which plots the shapefile on top of the base world map (under points and administrative boundaries), 2 which plots the shapefile on top of administrative boundaries (but under points), and 3, which plots the shapefile on top of everything. |
shapefile_col |
Character. A color or color vector indicating the color to fill the shapefile(s) with. Similar to |
shapefile_outline_col |
Character. A color indicating the outline color of the shapefile. |
shp_outwidth |
Numeric. The width of the shapefile outline. |
Value
A list containing your plots and the data frames used to generate the plots.
Author(s)
Keaka Farleigh
Examples
data(Q_dat)
Qmat <- Q_dat[[1]]
rownames(Qmat) <- Qmat[,1]
Loc <- Q_dat[[2]]
Test_all <- Piechart_map(anc.mat = Qmat, pops = Loc, K = 5,
plot.type = 'all', col = c('#d73027', '#fc8d59', '#e0f3f8', '#91bfdb', '#4575b4'), piesize = 0.35,
Lat_buffer = 1, Long_buffer = 1)
A function to plot coordinates on a map.
Description
A function to plot coordinates on a map.
Usage
Plot_coordinates(
dat,
col = c("#A9A9A9", "#000000"),
size = 3,
Lat_buffer = 1,
Long_buffer = 1,
Latitude_col = NULL,
Longitude_col = NULL,
group = NULL,
group_col = NULL,
country_code = NULL,
shapefile = NULL,
raster = NULL,
legend_pos = "none",
scale_bar = FALSE,
north_arrow = FALSE,
north_arrow_style = ggspatial::north_arrow_nautical(),
north_arrow_position = NULL,
shapefile_plot_position = NULL,
raster_plot_position = NULL,
shapefile_col = NULL,
shapefile_outline_col = NULL,
shp_outwidth = 1,
raster_col = c("#2c7bb6", "#abd9e9", "#ffffbf", "#fdae61", "#d7191c"),
interpolate_raster = NULL,
raster_breaks = NULL,
discrete_raster = NULL
)
Arguments
dat |
Data frame or character string that supplies the input data. If it is a character string, the file should be a csv. The coordinates of each row should be indicated by columns named Longitude and Latitude. Alternatively, see the Latitude_col and Longitude_col arugments. |
col |
Character vector indicating the colors you wish to use for plotting, two colors are allowed. The first color will be the fill color, the second is the outline color. For example, if I want red points with a black outline I would set col to col = c("#FF0000", "#000000"). |
size |
Numeric. The size of the points to plot. |
Lat_buffer |
Numeric. A buffer to customize visualization. This results in extra space in your map, so that your points are not cut off and so that the whole world is not plotted. |
Long_buffer |
Numeric. A buffer to customize visualization. This results in extra space in your map, so that your points are not cut off and so that the whole world is not plotted. |
Latitude_col |
Numeric. The number of the column indicating the latitude for each sample. If this is not null, PopGenHelpR will use this column instead of looking for the Latitude column. |
Longitude_col |
Numeric. The number of the column indicating the longitude for each sample. If this is not null, PopGenHelpR will use this column instead of looking for the Longitude column. |
group |
Character. The group that each point belongs to; this could be a species, population, etc. This is used in conjunction with the group_col parameter to fill each point in the group the same color. |
group_col |
Character. A color or color vector indicating the color to fill each point with on the map. The groups will be colored in alphabetical order. If your group_col = c("red","blue","purple") and groups = c("B","C","A"), for example the points from group A will be red, group B will be blue and group C will be purple. |
country_code |
Character. A country code or vector of country codes from the R package geodata specifying the country that you want to plot administrative borders for (e.g, US states). You can determine the correct codes using geodata's |
shapefile |
Character. A file name, vector of file names of a shapefile(s) to plot on the map, or a spatvector object that is compatible with the R package terra. This should be used in conjunction with the shapefile_plot_position argument. |
raster |
Character.A file name or a spatraster object that is compatible with the terra R package. This should be used in conjunction with the raster_plot_position argument. |
legend_pos |
Character. The desired position of the legend. The default is "none", which removes the legend. Other options include "left", "right", "top" or "bottom". Please see the ggplot2 documentation for all of the legend placement options. |
scale_bar |
Boolean. Whether or not to add a scale bar. Note that maps with large areas or those that use unprojected spatial data (i.e., WGS 84) will generate a warning that the scale bar varies. |
north_arrow |
Boolean. Whether or not to add a north arrow. |
north_arrow_style |
Character. Which style of north arrow to add. See ggspatial documentation for more details. |
north_arrow_position |
Character. The position of the north arrow. See ggspatial documentation for more details. |
shapefile_plot_position |
Numeric. A number indicating which position to plot the shapefile in. The options are 1, which plots the shapefile on top of the base world map (under points and administrative boundaries), 2 which plots the shapefile on top of administrative boundaries (but under points), and 3, which plots the shapefile on top of everything. |
raster_plot_position |
Numeric. A number indicating which position to plot the shapefile in. The options are 1, which plots the raster on top of the base world map (under points and administrative boundaries), 2 which plots the raster on top of administrative boundaries (but under points), and 3, which plots the raster on top of everything. |
shapefile_col |
Character. A color or color vector indicating the color to fill the shapefile(s) with. Similar to |
shapefile_outline_col |
Character. A color indicating the outline color of the shapefile. |
shp_outwidth |
Numeric. The width of the shapefile outline. |
raster_col |
Character. A character vector indicating the colors used to visualize the raster. The function will seperate your raster data into the same number of bins as there are colors. If you provide 5 colors, for example, there will be 5 bins. |
interpolate_raster |
Boolean. Whether or not to interpolate the raster. The default is to interpolate the raster. |
raster_breaks |
Numeric or Character vector. Values to be used as breaks for the raster surface. |
discrete_raster |
Boolean. Indicating whether or not the raster being supplied is discrete. |
Value
A ggplot object.
Author(s)
Keaka Farleigh
Examples
data("HornedLizard_Pop")
Test <- Plot_coordinates(HornedLizard_Pop)
A function to map statistics as colored points on a map.
Description
A function to map statistics as colored points on a map.
Usage
Point_map(
dat,
statistic,
size = 3,
breaks = NULL,
col,
out.col = NULL,
Lat_buffer = 1,
Long_buffer = 1,
Latitude_col = NULL,
Longitude_col = NULL,
country_code = NULL,
shapefile = NULL,
raster = NULL,
legend_pos = "none",
scale_bar = FALSE,
north_arrow = FALSE,
north_arrow_style = ggspatial::north_arrow_nautical(),
north_arrow_position = NULL,
shapefile_plot_position = NULL,
raster_plot_position = NULL,
shapefile_col = NULL,
shapefile_outline_col = NULL,
shp_outwidth = 1,
raster_col = c("#2c7bb6", "#abd9e9", "#ffffbf", "#fdae61", "#d7191c"),
interpolate_raster = NULL,
raster_breaks = NULL,
discrete_raster = NULL
)
Arguments
dat |
Data frame or character string that supplies the input data. If it is a character string, the file should be a csv. The first column should be the statistic to be plotted. The coordinates of each row should be indicated by columns named Longitude and Latitude. Alternatively, see the Longitude_col and Latitude_col arguments. |
statistic |
Character string. The statistic to be plotted. |
size |
Numeric. The size of the points to plot. |
breaks |
Numeric. The breaks used to generate the color ramp when plotting. Users should supply 3 values if custom breaks are desired. |
col |
Character vector indicating the colors you wish to use for plotting, three colors are allowed (low, mid, high). The first color will be the low color, the second the middle, the third the high. |
out.col |
Character. A color for outlining points on the map. There will be no visible outline if left as NULL. |
Lat_buffer |
Numeric. A buffer to customize visualization. |
Long_buffer |
Numeric. A buffer to customize visualization. |
Latitude_col |
Numeric. The number of the column indicating the latitude for each sample. If this is not null, PopGenHelpR will use this column instead of looking for the Latitude column. |
Longitude_col |
Numeric. The number of the column indicating the longitude for each sample. If this is not null, PopGenHelpR will use this column instead of looking for the Longitude column. |
country_code |
Character. A country code or vector of country codes from the R package geodata specifying the country that you want to plot administrative borders for (e.g, US states). You can determine the correct codes using geodata's |
shapefile |
Character. A file name, vector of file names of a shapefile(s) to plot on the map, or a spatvector object that is compatible with the R package terra. This should be used in conjunction with the shapefile_plot_position argument. |
raster |
Character.A file name or a spatraster object that is compatible with the terra R package. This should be used in conjunction with the raster_plot_position argument. |
legend_pos |
Character. The desired position of the legend. The default is "none", which removes the legend. Other options include "left", "right", "top" or "bottom". Please see the ggplot2 documentation for all of the legend placement options. |
scale_bar |
Boolean. Whether or not to add a scale bar. Note that maps with large areas or those that use unprojected spatial data (i.e., WGS 84) will generate a warning that the scale bar varies. |
north_arrow |
Boolean. Whether or not to add a north arrow. |
north_arrow_style |
Character. Which style of north arrow to add. See ggspatial documentation for more details. |
north_arrow_position |
Character. The position of the north arrow. See ggspatial documentation for more details. |
shapefile_plot_position |
Numeric. A number indicating which position to plot the shapefile in. The options are 1, which plots the shapefile on top of the base world map (under points and administrative boundaries), 2 which plots the shapefile on top of administrative boundaries (but under points), and 3, which plots the shapefile on top of everything. |
raster_plot_position |
Numeric. A number indicating which position to plot the shapefile in. The options are 1, which plots the raster on top of the base world map (under points and administrative boundaries), 2 which plots the raster on top of administrative boundaries (but under points), and 3, which plots the raster on top of everything. |
shapefile_col |
Character. A color or color vector indicating the color to fill the shapefile(s) with. Similar to |
shapefile_outline_col |
Character. A color indicating the outline color of the shapefile. |
shp_outwidth |
Numeric. The width of the shapefile outline. |
raster_col |
Character. A character vector indicating the colors used to visualize the raster. The function will seperate your raster data into the same number of bins as there are colors. If you provide 5 colors, for example, there will be 5 bins. |
interpolate_raster |
Boolean. Whether or not to interpolate the raster. The default is to interpolate the raster. |
raster_breaks |
Numeric or Character vector. Values to be used as breaks for the raster surface. |
discrete_raster |
Boolean. Indicating whether or not the raster being supplied is discrete. |
Value
A list containing maps and the data frames used to generate them.
Author(s)
Keaka Farleigh
Examples
data(Het_dat)
Test <- Point_map(Het_dat, statistic = "Heterozygosity")
A function to estimate the number of private alleles in each population.
Description
A function to estimate the number of private alleles in each population.
Usage
Private.alleles(
data,
pops,
write = FALSE,
prefix = NULL,
population_col = NULL,
individual_col = NULL
)
Arguments
data |
Character. String indicating the name of the vcf file or vcfR object to be used in the analysis. |
pops |
Character. String indicating the name of the population assignment file or dataframe containing the population assignment information for each individual in the data. This file must be in the same order as the vcf file and include columns specifying the individual and the population that individual belongs to. The first column should contain individual names and the second column should indicate the population assignment of each individual. Alternatively, you can indicate the column containing the individual and population information using the individual_col and population_col arguments. |
write |
Boolean. Optional argument indicating Whether or not to write the output to a file in the current working directory. This will output to files; 1) the table of private allele counts per population (named prefix_PrivateAlleles_countperpop) and 2) metadata associated with the private alleles (named prefix_PrivateAlleles_metadata). Please supply a prefix it you write files to your working directory as a best practice. |
prefix |
Character. Optional argument indicating a string that will be appended to file output. Please set a prefix if write is TRUE. |
population_col |
Numeric. Optional argument (a number) indicating the column that contains the population assignment information. |
individual_col |
Numeric. Optional argument (a number) indicating the column that contains the individuals (i.e., sample name) in the data. |
Value
A list containing the count of private alleles in each population and the metadata for those alleles. The metadata is a list that contains the private allele and locus name for each population.
Author(s)
Keaka Farleigh
Examples
data("HornedLizard_Pop")
data("HornedLizard_VCF")
Test <- Private.alleles(data = HornedLizard_VCF, pops = HornedLizard_Pop, write = FALSE)
A list representing a q-matrix and the locality information associated with the qmatrix
Description
List with two elements
Usage
data(Q_dat)
Format
A list with two elements:
- Qmat
A q-matrix with 6 columns and 30 rows, the first column lists the sample name and the remaining 5 represent the contribution a genetic cluster to that individuals ancestry
- Loc_dat
The locality information for each individual in the q-matrix
...
Source
Data was generated by package authors.
Examples
data(Q_dat)
Qmat <- Q_dat[[1]]
rownames(Qmat) <- Qmat[,1]
Loc <- Q_dat[[2]]
Test_all <- Ancestry_barchart(anc.mat = Qmat, pops = Loc, K = 5,
plot.type = 'all',col = c('#d73027', '#fc8d59', '#e0f3f8', '#91bfdb', '#4575b4'))