Title: | Generalized Convergence Diagnostics for Difficult MCMC Algorithms |
---|---|
Description: | Trace plots and convergence diagnostics for Markov Chain Monte Carlo (MCMC) algorithms on highly multivariate or unordered spaces. Methods outlined in a forthcoming paper. |
Authors: | Luke Duttweiler [aut, cre, cph]
|
Maintainer: | Luke Duttweiler <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.2.3.9000 |
Built: | 2025-03-10 04:49:37 UTC |
Source: | https://github.com/lukeduttweiler/genmcmcdiag |
Results from a Bayesian Network Metropolis-Hastings algorithm run on simulated data. Included for examples.
bnMCMCResults
bnMCMCResults
bnMCMCResults
A list with 5 elements, each representing a different MCMC chain. Each element is a list of data.frames describing a partition arrangement of a Bayesian Network.
Luke Duttweiler
For an MCMC draw from a DPMM D_x, let Z_x be the vector of Z-scores of the observations based on that observation's current group, and let A_x be the 0,1 adjacency matrix where
if observations i and j are in the same group in draw D_x (so the diagonal is always 1s). Then we define the DPMM distance between D_x and D_y as:
dpmmDistance(x, y)
dpmmDistance(x, y)
x |
List with elements 'Zscore' and 'Adj' |
y |
List with elements 'Zscore' and 'Adj', both of same dimensions as in x. |
Numeric, DPMM distance between x and y.
For speed, no error handling if x and y do not have the same dimensions. The function will break if 'Zscore' or 'Adj' doesn't exist though.
Uses generic formulas and rough time estimate to estimate time it will take to evaluate the TS algorithm on a set of unique draws with the tsTransform function.
estimateTsTime(distance, draw1, draw2, N)
estimateTsTime(distance, draw1, draw2, N)
distance |
Function with two parameters x,y. Used to calculate distance between draw1 and draw2 |
draw1 |
Object that works as an argument for distance() |
draw2 |
Different object that works as an argument for distance() |
N |
Number of unique draws for which the user is interested in evaluating the time to completion for the TS algorithm |
Data.frame with 1 row and 2 columns. Entry one gives the standard completion time, entry two gives the completion time if the fuzzy approximation is used.
Simple function to return the Euclidean distance between two objects. Acts elementwise.
eucDist(x, y)
eucDist(x, y)
x |
Numeric vector or matrix. |
y |
Numeric vector or matrix of same dimensions as x. |
Numeric, elementwise Euclidean distance between x and y.
For speed, no error handling if x and y do not have the same dimensions, take care!
eucDist(c(0,0), c(1,1))
eucDist(c(0,0), c(1,1))
Helpful mini function to fit the nearest neighbor (NN) algorithm given a set and defined distance
fitNN(uniqueDraws, uniqueLabels, distance, minDist)
fitNN(uniqueDraws, uniqueLabels, distance, minDist)
uniqueDraws |
List of unique values that make up the set on which we are using the NN algorithm |
uniqueLabels |
List of unique labels associated 1-1 with the unique values |
distance |
Function with arguments x,y that returns a distance defined on the given values |
minDist |
Minimum possible distance between two points that aren't equivalent. May be ignored, but if possible to specify, may speed up the algorithm. |
List. tsSolution gives the ordered labels, tsValues gives the ordered values, tsDiffs is a vector of distances between consecutive values in tsValues
This function generates generalized diagnostics for Markov Chain Monte Carlo (MCMC) draws, transforming the draws if specified, and evaluating selected diagnostics.
genDiagnostic( mhDraws, proximityMap = c("standard", "ts", "lanfear"), diagnostics = c("traceplot", "ess", "psrf"), distance = NULL, verbose = FALSE, ... )
genDiagnostic( mhDraws, proximityMap = c("standard", "ts", "lanfear"), diagnostics = c("traceplot", "ess", "psrf"), distance = NULL, verbose = FALSE, ... )
mhDraws |
A list of MCMC draws, where each element is an ordered list or numeric vector representing the output of a single MCMC chain. |
proximityMap |
Method (called a proximity-map) for transforming the MCMC draws. Options include 'standard', 'ts', 'lanfear', or a custom function. See details. |
diagnostics |
A character vector or list of diagnostic functions to be evaluated. Options include 'traceplot', 'ess', 'psrf', or custom functions. See details. |
distance |
Function for evaluating distance between MCMC draws if required by 'method'. This should be a pairwise distance function that operates on elements of the chains from mhDraws. Note that the lanfear and ts proximityMaps ALWAYS require a distance function. |
verbose |
If TRUE, informative messages are displayed. |
... |
Arguments passed on to
|
Built-in proximity-maps can be called with the appropriate character string in the 'proximity-map' argument. For details on a particular proximity-map use ?lanfearTransform or ?tsTransform, the standard proximity-map induces no transformation. Custom proximity-map functions may be added as well. A custom function must be written to accept a list of mcmcChain type objects, and output a list of dataframes with columns val (the transformed draw) and t (the MCMC chain order). Each element in the list is the transformed MCMC chain corresponding to the input.
Built-in diagnostics can be called with the appropriate character string in the 'diagnostics' argument. Current diagnostic options are 'traceplot' for traceplots, 'ess' for Effective Sample Size, and 'psrf' for the Gelman-Rubin Potential Scale Reduction Factor. Additional custom diagnostic functions may be written. These functions should act on a list of data.frames output from a transform function and should output as a relatively small data.frame where the name of diagnostic is the first row.name.
An object of class 'mcmcDiag', containing evaluated diagnostics, transformed draws, and function call details.
#Example using standard Traceplot tstS <- genDiagnostic(uniMCMCResults) tstS #Example using 'lanfear' traceplot tstL <- genDiagnostic(uniMCMCResults, proximityMap = 'lanfear', distance = eucDist, reference = 0) tstL #Example using bayesian network sample data, with 'lanfear' proximityMap tstBN1 <- genDiagnostic(bnMCMCResults, proximityMap = 'lanfear', distance = partitionDist) tstBN1
#Example using standard Traceplot tstS <- genDiagnostic(uniMCMCResults) tstS #Example using 'lanfear' traceplot tstL <- genDiagnostic(uniMCMCResults, proximityMap = 'lanfear', distance = eucDist, reference = 0) tstL #Example using bayesian network sample data, with 'lanfear' proximityMap tstBN1 <- genDiagnostic(bnMCMCResults, proximityMap = 'lanfear', distance = partitionDist) tstBN1
Simple function to return the Hamming distance between two objects. Acts elementwise.
hammingDist(x, y)
hammingDist(x, y)
x |
Binary vector or matrix |
y |
Binary vector or matrix of same dimensions as x. |
Numeric, elementwise Hamming distance between x and y.
For speed, no error handling if x and y do not have the same dimensions. Also, does not test to make sure x,y are binary, take care!
x <- matrix(c(1,0, 0,0), nrow = 2, byrow = TRUE) y <- diag(1,2) hammingDist(x, y)
x <- matrix(c(1,0, 0,0), nrow = 2, byrow = TRUE) y <- diag(1,2) hammingDist(x, y)
Transforms a list of MCMC chains into a list of data.frames using the Lanfear transformation
lanfearTransform(mhDraws, distance, reference = NULL, ...)
lanfearTransform(mhDraws, distance, reference = NULL, ...)
mhDraws |
List. Each element is a single chain from an MCMC algorithm. Each element should be a numeric vector (for univariate draws), or a list. |
distance |
Distance function defined on the space of MCMC draws. Should operate pairwise on the elements of the given chains. See details. |
reference |
Argument for method = 'lanfear'. Reference point for lanfearTransform (with exact same structure as each MCMC draw) for draw comparison. If left NULL a random point is selected from the given draws. See lanfearTransform details. |
... |
Catches extra arguments. Not used. |
The Lanfear transformation works by specifying a reference point and then comparing each MCMC draw back to that reference point using a distance function. The function returns this distance value as the Lanfear transformation of each draw.
List of data.frames with columns 'val' which is the Lanfear transformation of each MCMC draw, and 't' which gives the within-chain ordering of the MCMC draws. Each data.frame is a separate chain.
Function to assign character labels to all unique objects in a list
listLabels(lst)
listLabels(lst)
lst |
A list of objects. Each object in the list should have the same general structure |
A character vector of labels. Objects in lst that are identical will be assigned the same label.
Function to return the 'Partition' distance between two objects. Used for Bayesian Networks with the 'partition-MCMC' algorithm.
partitionDist(x, y)
partitionDist(x, y)
x |
Data.frame with columns node and partition |
y |
Data.frame with columns node and partition. Same nrows as x. |
Numeric, Partition distance between x and y.
For speed, no error handling if x and y do not have the same dimensions. Also, does not test to make sure x,y are data.frames of integers, take care!
x <- bnMCMCResults[[1]][[1]] y <- bnMCMCResults[[1]][[100]] partitionDist(x, y)
x <- bnMCMCResults[[1]][[1]] y <- bnMCMCResults[[1]][[100]] partitionDist(x, y)
Print method for mcmcDiag objects
## S3 method for class 'mcmcDiag' print(x, ...)
## S3 method for class 'mcmcDiag' print(x, ...)
x |
Object of class mcmcDiag |
... |
Kept for consistency with print. Does nothing. |
Invisible NULL, prints to console
print(genDiagnostic(uniMCMCResults))
print(genDiagnostic(uniMCMCResults))
Calculate the effective sample size, per chain and in total, of draws from an MCMC algorithm
sess(mhDraws, ...)
sess(mhDraws, ...)
mhDraws |
List of data.frames. Each data.frame represents a single chain. Data.frame columns for which ESS is calculated should be named val.1, ..., val.k |
... |
Catches unnecessary additional arguments |
Data.frame with 1 Row and (# Chains + 1) Columns. Each entry gives the estimated ESS for the chain or sum of chains.
Calculate the Gelman-Rubin diagnostic of draws from an MCMC algorithm
spsrf(mhDraws, ...)
spsrf(mhDraws, ...)
mhDraws |
List of data.frames with two columns. Each data.frame represents a single chain. Column names should be val.1 (for values) and t (for chain iteration). |
... |
Catches unnecessary additional arguments |
Data.frame with 1 row and 2 columns. First entry gives estimated psrf, second gives upper 95% limit for GR statistic.
Transforms a list of MCMC chains into a list of dataframes with no modifications to values
standardTransform(mhDraws, ...)
standardTransform(mhDraws, ...)
mhDraws |
An list of numeric vectors |
... |
Not used. |
A list of data.frames with rows that represent MCMC draws.Each separate data.frame is a different chain. Data.frames have columns 'val' for the numeric draws, and 't' for the draw. Currently, using the standard transformation on anything other than univariate draws is not supported.
Generate a traceplot of draws from a multi-chain MCMC
straceplot(mhDraws, method = NULL, ...)
straceplot(mhDraws, method = NULL, ...)
mhDraws |
List of data.frames with two columns. Each data.frame represents a single chain. Column names should be val.1 (for values) and t (for chain iteration). |
method |
Character string - Name of method used to generate traceplot. Is used to generate the title of the traceplot. |
... |
Catches unused arguments |
ggplot2 plot object showing traceplot
Transforms a list of MCMC chains into a list of data.frames using the TS transformation
tsTransform( mhDraws, distance, minDist = 0, fuzzy = FALSE, fuzzyDist = 1, verbose = FALSE, ... )
tsTransform( mhDraws, distance, minDist = 0, fuzzy = FALSE, fuzzyDist = 1, verbose = FALSE, ... )
mhDraws |
List. Each element is a single chain from an MCMC algorithm. Each element should be a numeric vector (for univariate draws), or a list. |
distance |
Distance function defined on the space of MCMC draws. Should operate pairwise on the elements of the given chains. See details. |
minDist |
Numeric. Value which specifies the minimum possible distance for two draws which are not equal. See tsTransform details. |
fuzzy |
Logical. If TRUE computes an approximate version of the TS algorithm. See tsTransform details. |
fuzzyDist |
Numeric. Parameter for approximate version of ts algorithm. See tsTransform details. |
verbose |
Logical. If TRUE, function prints out information about approximate computation time |
... |
Catches extra arguments. Not used. |
The TS transformation sets up a traveling salesman algorithm by calculating the pair-wise distances between each unique draw from the mhDraws and solving the resulting TS problem with the nearest neighbor (NN) algorithm.
minDist can be used to speed up the algorithm if it is known that when x != y then distance(x, y) >= minDist. Otherwise this should be ignored.
The fuzzy approximation of the algorithm works by splitting the unique draws into smaller sets each containing at most 1% of all unique draws, and fitting the NN algorithm within each set, and then on the resulting 'end points' of each set. The sets are created by randomly selecting a representative draw and then putting the 'closest' draws with distance less than fuzzyDist into that set, until the set contains 1% of all unique draws. The fuzzy approximation can GREATLY reduce computation time, unless the fuzzyDistance specified is too small.
List of data.frames with columns 'val' which is the TS transformation of each MCMC draw, and 't' which gives the within-chain ordering of the MCMC draws. Each data.frame is a separate chain.
Results from a univariate Metropolis-Hastings algorithm run on a tri-modal posterior. Although the standard traceplot and Gelman-Rubin diagnostic show good mixing, the results are actually mixing poorly. Included for examples.
uniMCMCResults
uniMCMCResults
uniMCMCResults
A list with 7 elements, each representing a different MCMC chain. Each element is a numeric vector of length 2000
Luke Duttweiler