Title: Conditional Copula Model for Crop Yield Forecasting
Version: 1.0.0
Description: Provides functions to model and forecast crop yields using a spatial temporal conditional copula approach. The package incorporates extreme weather covariates and Bayesian Structural Time Series models to analyze crop yield dependencies across multiple regions. Includes tools for fitting, simulating, and visualizing results. This method build upon established R packages, including 'Hofert' 'et' 'al'. (2025) <doi:10.32614/CRAN.package.copula>, 'Scott' (2024) <doi:10.32614/CRAN.package.bsts>, and 'Stephenson' 'et' 'al'. (2024) <doi:10.32614/CRAN.package.evd>.
License: MIT + file LICENSE
Encoding: UTF-8
RoxygenNote: 7.3.2
Imports: bsts, copula, evd, ggplot2, grDevices, rootSolve, stats, utils
Depends: R (≥ 4.0.0)
LazyData: true
LazyDataCompression: xz
Suggests: knitr, rmarkdown, testthat (≥ 3.0.0),
VignetteBuilder: knitr
Config/testthat/edition: 3
NeedsCompilation: no
Packaged: 2025-03-26 05:39:16 UTC; liyongkun
Author: Marie Michaelides [aut], Mélina Mailhot [aut], Yongkun Li [aut, cre]
Maintainer: Yongkun Li <yongkun.li@concordia.ca>
Repository: CRAN
Date/Publication: 2025-03-27 17:30:04 UTC

Compute Gumbel Copula Parameter from Kendall's Tau

Description

Computes the Gumbel-Hougaard copula dependence parameter based on Kendall's tau.

Usage

GH.theta(tau)

Arguments

tau

Numeric, Kendall's tau correlation coefficient.

Value

Numeric, estimated Gumbel copula parameter.


Compute Clayton Copula Parameter from Kendall's Tau

Description

Computes the Clayton copula dependence parameter based on Kendall's tau.

Usage

clayton.theta(tau)

Arguments

tau

Numeric, Kendall's tau correlation coefficient.

Value

Numeric, estimated Clayton copula parameter.


Supported copula types

Description

A list containing supported copula types.

Usage

copula_list

Format

A list of copula types.

copulas

"Gaussian" "Clayton" "Frank" "Gumbel" "Joe"


Data of the article "Probabilistic Crop Yields Forecasts With Spatio-Temporal Conditional Copula Using Extreme Weather Covariates"

Description

Contains crop yields and climate indices data of 24 CD regions in Ontario from 1950 to 2022

Usage

cropyields_covariates

Format

A data frame with 1752 rows and 38 variables:

time

chr: year from 1950-2022

CAR_CODE

num: 1-4

CAR

chr: Southern, Western, Central, Eastern Ontario

CD_CODE

num

CD

chr: 24 subregions

ID

chr

lat

num: latitude

lon

num: longitude

yield

num: wheat crop yield per census division, in bushel/acre

cdd

num: Annual maximum number of consecutive days with daily precipitation below 1mm (unit = days)

cddcold_18

num: Annual cooling degree days above 18C (unit = degree_days)

dlyfrzthw_tx0_tn

num: Annual number of days with a diurnal freeze-thaw cycle : tmax > 0 degc and tmin <= -1 degc

first_fall_frost

num: First day of year with temperature below 0 degc for at least 1 days

frost_days

num: Annual number of days with minimum daily temperature below 0C

ice_days

num: Annual number of days with maximum daily temperature below 0 degC

nr_cdd

num: The annual number of dry periods of 6 days and more, during which the maximal precipitation on a window of 6 days is under 1.0 mm

prcptot

num: Annual total precipitation (unit = mm)

r1mm

num: Annual number of days with daily precipitation over 1.0 mm/day

r10mm

num: Annual number of days with daily precipitation over 10.0 mm/day

r20mm

num: Annual number of days with daily precipitation over 20.0 mm/day

rx1day

num: Annual maximum 1-day total precipitation (unit = mm)

rx5day

num: Annual maximum 5-day total precipitation (unit = mm)

tg_mean

num: Annual mean of daily mean temperatures (unit = C degrees)

tn_mean

num: Annual mean of daily minimum temperatures (unit = C degrees)

tn_min

num: Annual minimum of daily minimum temperatures (unit = C degrees)

tnlt_-15

num: Annual number of days where daily minimum temperature is below -15 degC

tnlt_-25

num: Annual number of days where daily minimum temperature is below -25 degC

tr_18

num: Annual number of tropical nights : defined as days with minimum daily temperature above 18 degc

tr_20

num: Annual number of tropical nights : defined as days with minimum daily temperature above 20 degc

tr_22

num: Annual number of tropical nights : defined as days with minimum daily temperature above 22 degc

tx_max

num: Annual minimum of daily maximum temperature (unit = C degrees)

tx_mean

num: Annual mean of daily maximum temperature (unit = C degrees)

txgt_25

num: Annual number of days where daily maximum temperature exceeds 25 degC

txgt_27

num: Annual number of days where daily maximum temperature exceeds 27 degC

txgt_29

num: Annual number of days where daily maximum temperature exceeds 29 degC

txgt_30

num: Annual number of days where daily maximum temperature exceeds 30 degC

txgt_32

num: Annual number of days where daily maximum temperature exceeds 32 degC

Source

ClimateData.ca


Compute Dynamic Gaussian Copula Correlation Parameter (rho)

Description

Computes the time-varying correlation parameter (rho) for a Gaussian copula.

Usage

dynamic.rho(params, lagged_rho, X_t)

Arguments

params

Numeric vector of parameters: omega, alpha, and gamma coefficients.

lagged_rho

Numeric, the previous rho value.

X_t

Numeric vector or matrix of covariates at time t.

Value

Numeric, estimated dynamic Gaussian copula correlation.


Compute Dynamic Clayton Copula Parameter

Description

Computes the Clayton copula parameter dynamically based on lagged values and covariates.

Usage

dynamic.theta.clayton(params, lagged_theta, X_t)

Arguments

params

Numeric vector of parameters: omega, alpha, and gamma coefficients.

lagged_theta

Numeric, the previous theta value.

X_t

Numeric vector or matrix of covariates at time t.

Value

Numeric, estimated dynamic Clayton copula parameter.


Compute Dynamic Frank Copula Parameter

Description

Computes the Frank copula parameter dynamically based on lagged values and covariates.

Usage

dynamic.theta.frank(params, lagged_theta, X_t)

Arguments

params

Numeric vector of parameters: omega, alpha, and gamma coefficients.

lagged_theta

Numeric, the previous theta value.

X_t

Numeric vector or matrix of covariates at time t.

Value

Numeric, estimated dynamic Frank copula parameter.


Compute Dynamic Gumbel Copula Parameter

Description

Computes the Gumbel copula parameter dynamically based on lagged values and covariates.

Usage

dynamic.theta.gumbel(params, lagged_theta, X_t)

Arguments

params

Numeric vector of parameters: omega, alpha, and gamma coefficients.

lagged_theta

Numeric, the previous theta value.

X_t

Numeric vector or matrix of covariates at time t.

Value

Numeric, estimated dynamic Gumbel copula parameter.


Compute Dynamic Joe Copula Parameter

Description

Computes the Joe copula parameter dynamically based on lagged values and covariates.

Usage

dynamic.theta.joe(params, lagged_theta, X_t)

Arguments

params

Numeric vector of parameters: omega, alpha, and gamma coefficients.

lagged_theta

Numeric, the previous theta value.

X_t

Numeric vector or matrix of covariates at time t.

Value

Numeric, estimated dynamic Joe copula parameter.


Fit a Bayesian Structural Time Series (BSTS) Model

Description

Fits a BSTS model for a time series y, given a vector or matrix of covariates z.

Usage

fit_bsts(y, z, lags = 0, MCMC.iter = 5000)

Arguments

y

A numeric vector (time series response variable).

z

A numeric vector or matrix (covariates).

lags

Integer, number of lags for the autoregressive component.

MCMC.iter

Integer, number of MCMC iterations.

Value

A fitted BSTS model.


Compute Frank Copula Parameter from Kendall's Tau

Description

Computes the Frank copula dependence parameter based on Kendall's tau.

Usage

frank.theta(tau)

Arguments

tau

Numeric, Kendall's tau correlation coefficient.

Value

Numeric, estimated Frank copula parameter.


Initial Parameters for 2D Pseudo-Loglikelihood Estimation

Description

Initial Parameters for 2D Pseudo-Loglikelihood Estimation

Usage

init_params_full

Format

A numeric vector of length (2+M+4*D*M) where:

omega

Baseline autoregressive coefficient.

alpha

Parameter controlling variance.

gamma1, gamma2, gamma3

Coefficients related to external factors.

phi_gev

AR(1) coefficient for GEV.

sigma_mu

Std dev of innovations for AR(1) process for GEV.

sigma_gev

GEV scale parameter for GEV.

xi_gev

GEV shape parameter for GEV.


Initial Parameters for 2D Pseudo-Loglikelihood-Generalized Estimation

Description

Initial Parameters for 2D Pseudo-Loglikelihood-Generalized Estimation

Usage

init_params_full_G

Format

A numeric vector of length (2+M+4*D*M), structured as follows:

omega

Baseline autoregressive coefficient.

alpha

Parameter controlling variance.

gamma1, gamma2, gamma3

Coefficients related to external factors.

Climate variable parameters

For each climate variable in each region, the following parameters are included:


Initial Parameters for 2D Pseudo-Loglikelihood Estimation without GEV models for covariates

Description

Initial Parameters for 2D Pseudo-Loglikelihood Estimation without GEV models for covariates

Usage

init_params_noGEV

Format

A numeric vector of length (2+M) where:

omega

Baseline autoregressive coefficient.

alpha

Parameter controlling variance.

gamma1, gamma2, gamma3

Coefficients related to external factors.


Compute Joe Copula Parameter from Kendall's Tau

Description

Computes the Joe copula dependence parameter based on Kendall's tau.

Usage

joe.theta(tau)

Arguments

tau

Numeric, Kendall's tau correlation coefficient.

Value

Numeric, estimated Joe copula parameter.


Compute Log-Likelihood for a Generalized Dynamic Copula-GEV Model

Description

Computes the log-likelihood for a time-varying copula model combined with Generalized Extreme Value (GEV) margins.

Usage

log_likelihood_Generalized(params, U, Z, X, copula)

Arguments

params

Numeric vector of model parameters, including copula parameters (omega, alpha, gamma) and GEV distribution parameters.

U

Numeric matrix (n_train x D), pseudo-observations for the copula.

Z

Numeric array (n_train x D x M), observed data for each margin and sub-feature.

X

Numeric matrix (n_train x M), risk factors for the dynamic copula parameter.

copula

Character, specifying the copula type: "Clayton", "Frank", "Gumbel", "Joe", or "Gaussian".

Value

Numeric, negative log-likelihood value.

Examples

test_ll <- log_likelihood_Generalized(init_params_full_G,uu,
                      zz_train,xx_train,"Gaussian")


Generalized Log-Likelihood Function for 2D Copula-GEV Model

Description

Computes the negative log-likelihood of a 2-dimensional copula-GEV model, incorporating dynamic Generalized Extreme Value (GEV) parameters and a time-varying copula structure.

Usage

log_likelihood_generalized_2d(params, u1, u2, X_t, z1, z2, copula)

Arguments

params

Numeric vector, model parameters including copula and GEV parameters.

u1

Numeric vector (length n_train), pseudo-observations for margin 1.

u2

Numeric vector (length n_train), pseudo-observations for margin 2.

X_t

Numeric matrix (⁠n_train x M⁠), risk factors affecting copula parameters.

z1

Numeric matrix (⁠n_train x M⁠), observed data for margin 1.

z2

Numeric matrix (⁠n_train x M⁠), observed data for margin 2.

copula

Character, specifying the copula type: "Clayton", "Frank", "Gumbel", "Joe", or "Gaussian".

Value

The negative log-likelihood value for optimization.

Examples

test_ll_2d <-log_likelihood_generalized_2d(init_params_full,
                                  uu[,1],
                                  uu[,2],
                                  xx_train,
                                  zz_train[,1,],
                                  zz_train[,2,],
                                  "Gaussian")


Compute Log-Likelihood for a Generalized Dynamic Copula Model without GEV covariates

Description

Computes the log-likelihood for a time-varying copula model.

Usage

log_likelihood_noGEV(params, U, Z, X, copula)

Arguments

params

Numeric vector of model parameters, including copula parameters (omega, alpha, gamma).

U

Numeric matrix (n_train x D), pseudo-observations for the copula.

Z

Numeric array (n_train x D x M), observed data for each margin and sub-feature.

X

Numeric matrix (n_train x M), risk factors for the dynamic copula parameter.

copula

Character, specifying the copula type: "Clayton", "Frank", "Gumbel", "Joe", or "Gaussian".

Value

Numeric, negative log-likelihood value.

Examples

test_ll_noGEV <- log_likelihood_noGEV(init_params_noGEV,uu,
                      zz_train,x_train,"Gaussian")


list containing Dufferin and Wellington

Description

list containing Dufferin and Wellington

Usage

medoid_names

Format

An object of class list of length 2.


19

Description

19

Usage

n_test

Format

An object of class integer of length 1.


54

Description

54

Usage

n_train

Format

An object of class integer of length 1.


Plot Observed Data and BSTS Forecast

Description

Creates a plot of observed data, forecasted values, and confidence intervals.

Usage

plot_forecast(
  forecast,
  data_train,
  data_test,
  time,
  quant_high,
  quant_low,
  observed_col,
  forecast_col,
  title
)

Arguments

forecast

A matrix of BSTS forecast samples.

data_train

Numeric vector, training data.

data_test

Numeric vector, test data.

time

Numeric vector, representing time indices.

quant_high

Numeric, upper quantile for confidence interval.

quant_low

Numeric, lower quantile for confidence interval.

observed_col

Character, color for observed data.

forecast_col

Character, color for forecasted data.

title

Character, title of the plot.

Value

A ggplot2 object.


Compare Forecasts from Two Models

Description

Generates a time series plot comparing the forecasts from two models along with observed data.

Usage

plot_forecast_compare(
  forecast1,
  forecast2,
  data_train,
  data_test,
  time,
  quant_high,
  quant_low,
  col1,
  title
)

Arguments

forecast1

Numeric matrix, forecasted values from the first model (columns: time points).

forecast2

Numeric matrix, forecasted values from the second model (columns: time points).

data_train

Numeric vector, training data used for modeling.

data_test

Numeric vector, actual test data for evaluation.

time

Numeric vector, representing the time points corresponding to the data.

quant_high

Numeric, upper quantile (e.g., 0.9) for confidence interval.

quant_low

Numeric, lower quantile (e.g., 0.1) for confidence interval.

col1

Character, color for observed data lines.

title

Character, title for the plot.

Value

A ggplot2 object showing the forecast comparison.


Simulate Multivariate Crop Yield Data Using a Generalized Copula-BSTS Model Without GEV Covariates

Description

This function simulates multivariate crop yield data using a time-varying copula combined with Bayesian Structural Time Series (BSTS) models without GEV covariates for comparision.

Usage

simul.fun.noGEV(
  nsim = 100,
  n_train,
  n_test,
  copula,
  init_params,
  fn,
  U_train,
  Z_train,
  Z_test,
  X_train,
  X_test,
  Y_test,
  BSTS_list
)

Arguments

nsim

Integer, number of simulation replications.

n_train

Integer, number of training observations.

n_test

Integer, number of test observations.

copula

Character, specifying the copula type: "Clayton", "Frank", "Gumbel", "Joe", or "Gaussian".

init_params

Numeric vector, initial parameter values for optimization.

fn

Function, log-likelihood function for parameter estimation.

U_train

Numeric matrix (n_train x D), pseudo-observations for the copula.

Z_train

Numeric array (n_train x D x M), observed data for each margin and sub-feature.

Z_test

Numeric array (n_test x D x M), observed data for each margin and sub-feature.

X_train

Numeric matrix (n_train x M), risk factors for the dynamic copula parameter.

X_test

Numeric matrix (n_test x M), risk factors for the dynamic copula parameter.

Y_test

Numeric matrix (n_test x D), true future values for MSE calculation.

BSTS_list

List of length D, each element is a BSTS model for a different margin.

Value

A list containing:

optim_results

Results from the optimization process.

theta_sim

Simulated copula parameters across replications.

Y_sim

Simulated final BSTS-based forecasts.

MSE

Mean squared error for each simulation run.


A Special Case of simulation_generalized in 2 Dimensions

Description

A Special Case of simulation_generalized in 2 Dimensions

Usage

simul_fun_generalized_2d(
  nsim,
  n_train,
  n_test,
  copula,
  init_params,
  fn,
  u1,
  u2,
  z1_train,
  z2_train,
  X_t,
  y1_test,
  y2_test,
  BSTS_1,
  BSTS_2
)

Arguments

nsim

Integer, number of simulation replications.

n_train

Integer, number of training observations.

n_test

Integer, number of test observations.

copula

Character, specifying the copula type: "Clayton", "Frank", "Gumbel", "Joe", or "Gaussian".

init_params

Numeric vector, initial parameter values for optimization.

fn

Function, log-likelihood function for parameter estimation.

u1

Numeric vector (n_train), first pseudo-observation for the copula.

u2

Numeric vector (n_train), second pseudo-observation for the copula.

z1_train

Numeric matrix (n_train x M), observed data for the first margin.

z2_train

Numeric matrix (n_train x M), observed data for the second margin.

X_t

Numeric matrix (n_train x M), risk factors for the dynamic copula parameter.

y1_test

Numeric vector (n_test), true future values for the first response variable.

y2_test

Numeric vector (n_test), true future values for the second response variable.

BSTS_1

Fitted BSTS model for the first response variable.

BSTS_2

Fitted BSTS model for the second response variable.

Value

A list containing:

theta_simulated

Simulated copula parameters across replications.

y1_simulated

Simulated values for the first response variable.

y2_simulated

Simulated values for the second response variable.

MSE

Mean squared error for each simulation run.

optim_results

Results from the optimization process.


Simulate Multivariate Crop Yield Data Using a Generalized Copula-GEV-BSTS Model

Description

This function simulates multivariate crop yield data using a time-varying copula combined with Generalized Extreme Value (GEV) margins and Bayesian Structural Time Series (BSTS) models.

Usage

simulation_generalized(
  nsim = 100,
  n_train,
  n_test,
  copula,
  init_params,
  fn,
  U_train,
  Z_train,
  X,
  Y_test,
  BSTS_list
)

Arguments

nsim

Integer, number of simulation replications.

n_train

Integer, number of training observations.

n_test

Integer, number of test observations.

copula

Character, specifying the copula type: "Clayton", "Frank", "Gumbel", "Joe", or "Gaussian".

init_params

Numeric vector, initial parameter values for optimization.

fn

Function, log-likelihood function for parameter estimation.

U_train

Numeric matrix (n_train x D), pseudo-observations for the copula.

Z_train

Numeric array (n_train x D x M), observed data for each margin and sub-feature.

X

Numeric matrix (n_train x M), risk factors for the dynamic copula parameter.

Y_test

Numeric matrix (n_test x D), true future values for MSE calculation.

BSTS_list

List of length D, each element is a BSTS model for a different margin.

Value

A list containing:

optim_results

Results from the optimization process.

theta_sim

Simulated copula parameters across replications.

Y_sim

Simulated final BSTS-based forecasts.

MSE

Mean squared error for each simulation run.


1950-2022

Description

1950-2022

Usage

time_all

Format

An object of class character of length 73.


2004-2022

Description

2004-2022

Usage

time_test

Format

An object of class character of length 19.


1950-2003

Description

1950-2003

Usage

time_train

Format

An object of class character of length 54.


Pseudo-Observations of BSTS Residuals for Crop Yield Forecasting

Description

Pseudo-Observations of BSTS Residuals for Crop Yield Forecasting

Usage

uu

Format

A matrix with dimensions (n_train, D):

n_train

Number of time points used in the training set.

D

Number of regions analyzed (Dufferin, Wellington).

Source

Derived from residuals of BSTS models fitted to crop yield data.


Maximized Covariates Matrix for Crop Yield Forecasting

Description

Maximized Covariates Matrix for Crop Yield Forecasting

Usage

xx_all

Format

A three-dimensional array with dimensions (n_train+n_test, M):

n_train+n_test

Number of time points used in the training set.

M

Number of selected climate covariates used for modeling (cdd,frost_days,rx1day, tg_mean, txgt_25).

Source

Derived from historical climate data from ClimateData.ca.


Maximized Covariates Matrix for Crop Yield Forecasting

Description

Maximized Covariates Matrix for Crop Yield Forecasting

Usage

xx_test

Format

A three-dimensional array with dimensions (n_test, M):

n_test

Number of time points used in the testing set.

M

Number of selected climate covariates used for modeling (cdd,frost_days,rx1day, tg_mean, txgt_25).

Source

Derived from historical climate data from ClimateData.ca.


Maximized Covariates Matrix for Crop Yield Forecasting

Description

Maximized Covariates Matrix for Crop Yield Forecasting

Usage

xx_train

Format

A three-dimensional array with dimensions (n_train, M):

n_test

Number of time points used in the training set.

M

Number of selected climate covariates used for modeling (cdd,frost_days,rx1day, tg_mean, txgt_25).

Source

Derived from historical climate data from ClimateData.ca.


Crop Yield Data

Description

Crop Yield Data

Usage

yy_all

Format

A matrix with dimensions (n_train+n_test, D):

n_train+n_test

Number of time points used in the test set.

D

Number of regions analyzed (Dufferin, Wellington).

Source

Historical crop yield records from ClimateData.ca.


Crop Yield Data for Testing in BSTS Models

Description

Crop Yield Data for Testing in BSTS Models

Usage

yy_test

Format

A matrix with dimensions (n_train, D):

n_train

Number of time points used in the test set.

D

Number of regions analyzed (Dufferin, Wellington).

Source

Historical crop yield records from ClimateData.ca.


Crop Yield Data for Training in BSTS Models

Description

Crop Yield Data for Training in BSTS Models

Usage

yy_train

Format

A matrix with dimensions (n_test, D):

n_test

Number of time points used in the train set.

D

Number of regions analyzed (Dufferin, Wellington).

Source

Historical crop yield records from ClimateData.ca.


Standardized Covariates Array for Crop Yield Forecasting

Description

Standardized Covariates Array for Crop Yield Forecasting

Usage

zz_all

Format

A three-dimensional array with dimensions (n_train+n_test, D, M):

n_train+n_test

Number of time points used in the training set.

D

Number of regions analyzed (Dufferin, Wellington).

M

Number of selected climate covariates used for modeling (cdd,frost_days,rx1day, tg_mean, txgt_25).

Source

Derived from historical climate data.


Standardized Covariates Array for Crop Yield Forecasting

Description

Standardized Covariates Array for Crop Yield Forecasting

Usage

zz_test

Format

A three-dimensional array with dimensions (n_test, D, M):

n_test

Number of time points used in the testing set.

D

Number of regions analyzed (Dufferin, Wellington).

M

Number of selected climate covariates used for modeling (cdd,frost_days,rx1day, tg_mean, txgt_25).

Source

Derived from historical climate data.


Standardized Covariates Array for Crop Yield Forecasting

Description

Standardized Covariates Array for Crop Yield Forecasting

Usage

zz_train

Format

A three-dimensional array with dimensions (n_train, D, M):

n_test

Number of time points used in the training set.

D

Number of regions analyzed (Dufferin, Wellington).

M

Number of selected climate covariates used for modeling (cdd,frost_days,rx1day, tg_mean, txgt_25).

Source

Derived from historical climate data from ClimateData.ca.