Skip to contents

Calibrates a model using the IMABC algorithm.

Usage

imabc(
  target_fun,
  priors = NULL,
  targets = NULL,
  N_start = 1,
  N_centers = 1,
  Center_n = 50,
  N_cov_points = 0,
  N_post = 100,
  sample_inflate = 1.5,
  max_iter = 1000,
  seed = NULL,
  latinHypercube = TRUE,
  backend_fun = NULL,
  output_directory = NULL,
  output_tag = "timestamp",
  previous_results_dir = NULL,
  previous_results_tag = NULL,
  verbose = TRUE,
  validate_run = TRUE
)

Arguments

target_fun

A function that generate target values given parameters (i.e., `the model'). The use of define_target_function is stronlgy advised to ensure that the function takes in the correct values and correctly returns results.

priors

A priors object created using define_priors. This contains information regarding the parameters that are being calibrated. Is ignored if starting from previous results.

targets

A targets object created using define_targets. This contains information regarding the target values which will be used to evaluate simulated parameters. Is ignored if starting from previous results.

N_start

numeric(1). The number of draws to simulate for the first iteration.

N_centers

numeric(1). The number of centers to use for exploring the parameter space.

Center_n

numeric(1). The number of points to add around each center

N_cov_points

numeric(1). The minimum number of points used to estimate the covariance matrix of valid parameters nearest each center point. The covariance matrix is used when simulating new parameter draws around the center. If 0 (default), uses 25*number of parameters.

N_post

numeric(1). The weighted sample size that must be achieved using valid parameter values in order to stop algorithm.

sample_inflate

numeric(1). When generating new results for a given center, how many additional samples should be simulated to ensure enough valid (within range) parameters draws are simulated for the center.

max_iter

numeric(1). The maximum number of iterations to attempt.

seed

numeric(1). The seed to set for reproducibility.

latinHypercube

logical(1). Should algorithm use a Latin Hypercube to generate first set of parameters.

backend_fun

function. For advanced users only. Lets to user evaluate the target function(s) using their own backend, i.e., simulate targets with an alternative parallel method. Only necessary if the backend method is not compatible with foreach. See details for requirements.

output_directory

character(1). Path to save results to. If NULL (default), no results are saved. If a path is provided results are saved/updated every iteration. See details for more information.

output_tag

character(1). Tag to add to result files names. "timestamp" (default) is a special code that adds the time and date the code was executed.

previous_results_dir

Optional character(1). Path to results stored during a previous run. If the user wishes to restart a run that didn't complete the calibration, they can continue by using the outputs stored during the previous run.

previous_results_tag

Optional character(1). The tag that was added to the previous run output files.

verbose

logical(1). Prints out progress messages and additional information as the model works.

validate_run

logical(1). If this is TRUE and an output_directory is specified, the function will save all parameters generated by the model - even ones that were deemed invalid based on their simulated targets.

Value

A list with:

  • good_parm_draws - a data.table of valid parameters for the current target bounds

  • good_sim_target - a data.table of simulated target results from good_parm_draws parameters

  • good_target_dist - a data.table of distances based on simulated good target results

  • mean_cov - a data.frame of the means and covariances of parameters for iterations that had more good parameters than N_cov_points

  • priors - The prior object with empirical standard deviation from first N_start generated values

  • targets - The target object with updated bounds based on calibration

  • metaddata - Important info regarding the function inputs and current set of results including current_iteration (the last iteration that completed) and last_draw (the total number of draws simulated during execution)

if validate_run = TRUE, includes:

  • all_iter_parm_draws - all parameters generated by the algorithm, even ones that results in target values outside of the current target bounds

  • all_iter_sim_target - all simulated target values from the parameters in all_iter_parm_draws

  • all_iter_target_dist - all distances based on simulated target results

Details

The user specifies the calibrated parameters, their prior distributions, calibration targets with initial and final acceptance intervals, and the function (i.e., the model) used to generate targets given calibrated parameters The algorithm begins by drawing a sample of vectors from the parameter space based on prior distributions. This initial sample can be drawn using a Latin hypercube. The algorithm identifies and retains parameter vectors that result in generated targets that are within the current acceptance intervals. The algorithm iteratively updates this sample and narrows the acceptance intervals until either 1) the algorithm reaches the final acceptance intervals around each target and identifies the requested sample of parameter vectors that generate targets within these acceptance intervals, or the algorithm completes the maximum number of iterations. The algorithm can be restarted to continue iterating.

A technical description of the imabc algorithm is provided in Rutter CM, Ozik J, DeYoreo M, Collier N. Microsimulation model calibration using incremental mixture approximate Bayesian computation. Ann. Appl. Stat. 13 (2019), no. 4, 2189-2212. doi:10.1214/19-AOAS1279. https://projecteuclid.org/euclid.aoas/1574910041.

The imabc package implements a small modification to the approach described in the 2019 AOAS paper. In the imabc package, the user specifies initial and final acceptance intervals directly. This approach is more flexible than the approach described in the paper and more easily incorporates asymmetric acceptance intervals.

N_cov_points relation to the number of parameters:

When the algorithm has enough quality draws, it estimates the covariance between parameters and uses these relations in order to improve future simulations of parameters. However, this can only work if the covariance matrix is not singular. When a covariance matrix is singular, imabc will replace it with an independent covariance matrix (a diagonal matrix of the variances of the parameters) to avoid any calculation errors. Setting N_cov_points to be less than the number of parameters will lead to singularness in a covariance matrix. The algorithm can still run but will be not as efficient or may not be able to calibrate completely.

Custom Backend Function:

The primary run handler takes each row from the simulated draws and provides the appropriate information to the target_fun function as inputs. This includes pulling the parameter values as a named vector, pulling a unique seed generated for each set of parameters, as well as passing the current priors and targets objects. This is done using the foreach function from the foreach package. This allows the user to register their own preferred parallel backend before running the imabc function so long as it is compatible with foreach. If the user does not provide a parallel backend, foreach will run the analysis in sequence by default and provide a warning indicating such the first time the imabc function is run within a session.

However, since not all parallel backends are compatible with this method, we have provided a way for the user to add their own run handling method. To utilize this feature, the user must create a function that meets a couple requirements in order to work properly.

The first requirement is that the backend function have inputs in the following order: the data.table of all parameters to be evaluated, the names of all the parameters being calibrated, the target function to be used for evaluating parameters, a list that includes the priors object and the targets object. The user can name these inputs whatever they prefer but the correct order and number of inputs will be expected (i.e. the user must create a function with four inputs, the first will be the parameter data.table, and so on.). The user can utilize any piece of info passed to these parameters as well. This includes unique seed values passed as a column of the parameter data.table (called "seed"), and the current targets and priors objects passed in the fourth input. The priors and targets objects are named priors and targets respectively in the fourth input list.

The last requirement is that the returned object be a data.table of simulated target values. Each row represents a set of results from the target_fun for a given set of parameters and each column represents a target value based on the targets object. If the final output of the custom backend returns a data.table with column names identical to the target names, the order of the columns will be verified by imabc. If the final output of the backend does not include column names that match the target names, the user must ensure that they are in the same order as the targets object. If they are not in the appropriate order, information may be attached to the wrong target and lead to errors.

Do not use the custom backend unless you are confident you understand what is expected of the run handler. To get a better understanding of what is being done run View(imabc:::run_handler) in the console to see how the backend_fun is being used.

Output Files:

If an output directory is specified files are saved for each of the objects returned by the function. They are named as follows:

  • Good_SimulatedParameters_tag.csv = good_parm_draws

  • Good_SimulatedTargets_tag.csv = good_sim_target

  • Good_SimulatedDistances_tag.csv = good_target_dist

  • MeanCovariance_tag.csv = mean_cov

  • CurrentPriors_tag.csv = priors

  • CurrentTargets_tag.csv = targets

  • RunMetadata_tag.csv = metaddata

if validate_run = TRUE, includes:

  • SimulatedParameters_tag.csv = all_iter_parm_draws

  • SimulatedTargets_tag.csv = all_iter_sim_target

  • SimulatedDistances_tags.csv = all_iter_target_dist