Title: | A Bayesian Hierarchical Model that Controls for Non-Adherence in Mobile Menstrual Cycle Tracking |
---|---|
Description: | Implements a Bayesian hierarchical model designed to identify skips in mobile menstrual cycle self-tracking on mobile apps. Future developments will allow for the inclusion of covariates affecting cycle mean and regularity, as well as extra information regarding tracking non-adherence. Main methods to be outlined in a forthcoming paper, with alternative models from Li et al. (2022) <doi:10.1093/jamia/ocab182>. |
Authors: | Luke Duttweiler [aut, cre, cph]
|
Maintainer: | Luke Duttweiler <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.1.9000 |
Built: | 2025-02-26 06:18:11 UTC |
Source: | https://github.com/lukeduttweiler/skiptrack |
Gibbs Step Li - One MCMC step for the Li Model
gibbsStepLi(ijDat, iDat, kappa, gamma, alpha, beta, S, indFirst)
gibbsStepLi(ijDat, iDat, kappa, gamma, alpha, beta, S, indFirst)
ijDat |
A data.frame with parameters at the individual-observation level: Individual, ys, lambdais, piis, ss. |
iDat |
A data.frame with parameters at the individual level: Individual, lambdas, pis. |
kappa |
Fixed value of hyperparameter kappa. |
gamma |
Fixed value of hyperparameter gamma. |
alpha |
Fixed value of hyperparameter alpha. |
beta |
Fixed value of hyperparamter beta. |
S |
Fixed input value S. |
indFirst |
A logical vector indicating the first occurrence of each individual. |
A list containing one MCMC draws for each parameter. Elements are:
A data.frame with updated parameters at the individual-observation level: Individual, ys, lambdais, piis, ss.
A data.frame with updated parameters at the individual level: Individual, lambdas, pis.
Fixed value of hyperparameter kappa.
Fixed value of hyperparameter gamma.
Fixed value of hyperparameter alpha.
Fixed value of hyperparamter beta.
Fixed input value S.
A logical vector indicating the first occurrence of each individual.
Li, Kathy, et al. "A predictive model for next cycle start date that accounts for adherence in menstrual self-tracking." Journal of the American Medical Informatics Association 29.1 (2022): 3-11.
This function performs hyperparameter inference on a given dataset of individuals and their tracked cycles, assuming the model specified in Li et al. (2022). Default starting values for hyperparameters and optimization tuning parameters are those given in Li et al.
liInference( Y, cluster, S = 10, startingParams = c(kappa = 180, gamma = 6, alpha = 2, beta = 20) )
liInference( Y, cluster, S = 10, startingParams = c(kappa = 180, gamma = 6, alpha = 2, beta = 20) )
Y |
A vector of observed cycle lengths. |
cluster |
A vector indicating the individual cluster/group membership for each observation Y. |
S |
Maximum number of possible skipped cycles (see Li et al. for details). |
startingParams |
A vector of starting values for hyperparameters (default values from Li et al.). |
A list containing the results of hyperparameter inference.
Li, Kathy, et al. "A predictive model for next cycle start date that accounts for adherence in menstrual self-tracking." Journal of the American Medical Informatics Association 29.1 (2022): 3-11.
This function calculates a Monte Carlo estimate of the negative marginal log-likelihood of the given hyperparameters for the generative model from Li et al. (2022). It samples M instances of the parameters from the given distributions and averages the the likelihoods, giving a marginal likelihood for the hyperparameters.
likVec( pars = c(kappa = 180, gamma = 6, alpha = 2, beta = 20), S = 10, M = 1000, cycleDat, verbose = FALSE, ... )
likVec( pars = c(kappa = 180, gamma = 6, alpha = 2, beta = 20), S = 10, M = 1000, cycleDat, verbose = FALSE, ... )
pars |
Named numeric vector of hyperparameters containing the elements: kappa, gamma, alpha, beta. NOTE: MUST BE IN CORRECT ORDER.
|
S |
Integer, maximum number of allowed skips in the model. |
M |
Integer specifying the number of Monte Carlo iterations. |
cycleDat |
Data frame containing information about individuals and their tracked cycles. |
verbose |
Logical with default FALSE. If true, prints extra info while running. |
... |
Does nothing. |
Numeric value representing the Monte Carlo estimate of the negative marginal log-likelihood.
Li, Kathy, et al. "A predictive model for next cycle start date that accounts for adherence in menstrual self-tracking." Journal of the American Medical Informatics Association 29.1 (2022): 3-11.
This function performs inference on cycle length data, assuming the model from Li et al. (2022). It is important to note that Li et al. does not actually use this algorithm as they target a particular analytic posterior predictive distribution, and solve directly. However, we are targeting a different posterior and thus use this MCMC to perform inference.
liMCMC( Y, cluster, S, hyperparams = c(kappa = 180, gamma = 6, alpha = 2, beta = 20), initialParams = list(pi = c(1/3, 1/3, 1/3), lambdais = rep(30, length(unique(cycleDat$Individual))), piis = rep(0.2, length(unique(cycleDat$Individual))), ss = sample(0:S, nrow(cycleDat), replace = TRUE)), reps = 1000, ... )
liMCMC( Y, cluster, S, hyperparams = c(kappa = 180, gamma = 6, alpha = 2, beta = 20), initialParams = list(pi = c(1/3, 1/3, 1/3), lambdais = rep(30, length(unique(cycleDat$Individual))), piis = rep(0.2, length(unique(cycleDat$Individual))), ss = sample(0:S, nrow(cycleDat), replace = TRUE)), reps = 1000, ... )
Y |
A vector of observed cycle lengths. |
cluster |
A vector indicating the individual cluster/group membership for each observation Y. |
S |
Integer. The maximum number of skips to consider possible. |
hyperparams |
Named numeric vector of hyperparameters containing the elements: kappa, gamma, alpha, beta. NOTE: MUST BE IN CORRECT ORDER.
|
initialParams |
A list of initial parameter values for the MCMC algorithm. Default values are provided for pi, lambdais, piis, ss. |
reps |
The number of MCMC iterations (steps) to perform. Default is 1000. |
... |
For catching unused arguments (like li = TRUE) |
A list containing the MCMC draws for each parameter at each iteration. Each element in the list is itself a list containing:
A data.frame with updated parameters at the individual-observation level: Individual, ys, lambdais, piis, ss.
A data.frame with updated parameters at the individual level: Individual, lambdas, pis.
Fixed value of hyperparameter kappa.
Fixed value of hyperparameter gamma.
Fixed value of hyperparameter alpha.
Fixed value of hyperparamter beta.
Fixed input value S.
A logical vector indicating the first occurrence of each individual.
Li, Kathy, et al. "A predictive model for next cycle start date that accounts for adherence in menstrual self-tracking." Journal of the American Medical Informatics Association 29.1 (2022): 3-11.
This function generates synthetic data for user tracked menstrual cycles for a single individual using the li model. For this model Beta0 = log(30), and Gamma0 doesn't really make sense.
liSim(i, skipProb, maxCycles, trueBetas, trueGammas = NULL, avgCyclesPer)
liSim(i, skipProb, maxCycles, trueBetas, trueGammas = NULL, avgCyclesPer)
i |
Individual identifier. Character, numeric or integer. |
skipProb |
Vector, ignored for this model. |
maxCycles |
Integer, Maximum possible number of true cycles per tracked cycle. |
trueBetas |
Optional. True values for generated mean regression coefficients. |
trueGammas |
NULL, left for consistency. Will throw error if specified. |
avgCyclesPer |
Average number of cycles contributed by each individual. Actual number is drawn from Poisson for each person. Default is 7. |
Individual identifiers.
Tracked cycles.
Number of true values.
Individual's probability of skipping tracking a cycle
Individual's mean values.
Beta0 true value.
NA
1
Covariate matrix for Mean, where N is the length of trueBetas.
This function generates synthetic data for user tracked menstrual cycles for a single individual using the mixture model. For this model Beta_0 is set to log(30) and Gamma_0 is set to 15, although for the skipTrack model this lacks interpretation.
mixSim(i, skipProb, maxCycles, trueBetas, trueGammas, overlap, avgCyclesPer)
mixSim(i, skipProb, maxCycles, trueBetas, trueGammas, overlap, avgCyclesPer)
i |
Individual identifier. Character, numeric, or integer. |
skipProb |
Vector of probabilities for the number of true cycles per tracked cycle. For example, (.7, .2, .1) means that 70% of observed cycles will contain one true cycle, 20% will contain 2 true cycles, and 10% will contain 3 true cycles. |
maxCycles |
Maximum number of true cycles per tracked cycle. |
trueBetas |
Optional. True values for generated mean regression coefficients. |
trueGammas |
Optional. True values for the generated precision regression coefficients. |
overlap |
Optional. Number of (non-intercept) columns shared between X and Z. Columns are shared from left to right. |
avgCyclesPer |
Average number of cycles contributed by each individual. Actual number is drawn from Poisson for each person. Default is 7. |
Individual identifiers.
Tracked cycles.
Number of true values.
Individual's mean values.
Beta0 true value.
Gamma0 true value.
Covariate matrix for Mean, where N is the length of trueBetas.
Covariate matrix for precision, where M is the length of trueGammas.
Plot skipTrack.model objects
## S3 method for class 'skipTrack.model' plot(x, ...)
## S3 method for class 'skipTrack.model' plot(x, ...)
x |
skipTrack.model object from the function skipTrack.fit |
... |
Needed for S3 consistency |
Invisible NULL. Prints plots from skipTrack.visualize
#Simulated data simDat <- skipTrack.simulate(n = 100, skipProb = c(.7, .2, .1)) #Run model fit (should typically run with much more than 50 reps) modFit <- skipTrack.fit(Y = simDat$Y, cluster = simDat$cluster, chains = 2, reps = 50) plot(modFit)
#Simulated data simDat <- skipTrack.simulate(n = 100, skipProb = c(.7, .2, .1)) #Run model fit (should typically run with much more than 50 reps) modFit <- skipTrack.fit(Y = simDat$Y, cluster = simDat$cluster, chains = 2, reps = 50) plot(modFit)
In our model mui follows a normal distribution with mean Xi^T %*% beta and precision rho. Additionally we assume that beta follows a mvnormal prior with mean 0 and precision (rho_Beta) * I.This function draws from the posterior distribution of beta under these assumptions.
postBeta(rhoBeta = 0.01, rho, Xi, muI)
postBeta(rhoBeta = 0.01, rho, Xi, muI)
rhoBeta |
A scalar representing the prior precision parameter for beta. |
rho |
A scalar representing the precision parameter. |
Xi |
A matrix of covariates, where each row represents an individual and each column represents a covariate. |
muI |
A vector where each element is the mean for individual i. |
This function assumes that Xi
is a (num Individuals) x (dimension of beta) matrix of covariates.
A vector representing a draw from the posterior distribution of beta parameters.
In our model the data are drawn from LogN(mu_i + log(c_ij), tau_i) The prior for c_ij is a categorical prior with category probabilities pi1, ..., pik, and c_ij can take values 1, ..., k where k is the length of pi. This function samples from the full conditional posterior of all c_ijs, given vectors of equal length yijs, muis, tauis
postCij(yijs, pi, muis, tauis)
postCij(yijs, pi, muis, tauis)
yijs |
Numeric Vector, cycle lengths |
pi |
Numeric vector, must sum to 1. Sampled probabilities for c_ijs |
muis |
Numeric vector, log of sampled mean for all individuals yijs |
tauis |
Numeric vector > 0, sampled precision for all individuals yijs |
Integer vector
Our model assumes that tau_i ~ Gamma(alpha_i, phi) where alpha_i/phi = theta_i and g(theta_i) = Z_i^T Gamma, where g is the log-link function. Because of this GLM formulation we cannot simply draw from a posterior here and instead use a Metropolis-Hastings step with proposal distribution propGamma ~ MVNormal(currentGamma, rhoGamma*I)
postGamma(taui, Zi, currentGamma, phi = 1, rhoGamma = 1000)
postGamma(taui, Zi, currentGamma, phi = 1, rhoGamma = 1000)
taui |
A vector of length num_Individuals representing precision parameters for individuals. |
Zi |
A matrix of covariates, where each row represents an individual and each column represents a covariate. |
currentGamma |
A vector (or matrix with 1 row) representing the current Gamma value. |
phi |
A scalar, the prior rate for tau_i. |
rhoGamma |
A scalar representing the proposal distribution precision parameter. |
This draw step assumes a log-link function for the Gamma GLM that we are fitting.
A list containing the new Gamma value and the corresponding thetai values.
This function calculates a random draw from the full conditional posterior distribution for lambda_i in the Li algorithm, given the observed values y_ij, the indicators s_ij, and the prior hyperparameters priorK and priorG.
postLambdai(yij, sij, priorK, priorG)
postLambdai(yij, sij, priorK, priorG)
yij |
Vector of observed values for individual i. |
sij |
Vector of cycle skip indicators for individual i. |
priorK |
Prior hyperparameter kappa. |
priorG |
Prior hyperparameter gamma. |
A random draw from the posterior distribution of lambda_i.
In our model the data are drawn from LogN(mu_i + log(c_ij), tau_i). The prior for mu_i is given as N(x_i^T %*% beta, rho). This function draws from the conditional posterior of mu_i.
postMui(yij, cij, taui, xib, rho)
postMui(yij, cij, taui, xib, rho)
yij |
Numeric vector, cycle lengths for a single individual |
cij |
Positive Integer vector, a sampled vector of length(yij) where the corresponding values in cij indicate a sampled number of TRUE cycles in each cycle length given by yij |
taui |
Numeric > 0, A sampled precision for the yijs |
xib |
Numeric, result of multiplying x_i^T %*% beta (single value, not vector) |
rho |
Numeric > 0, sampled prior precision of mu_i |
Additionally, note that in order to vectorize the remainder of the MCMC algorithm this function returns the sampled value repeated for length(yij)
Numeric vector, repeated sampled value of length(yij)
In our model the data are drawn from LogN(mu_i + log(c_ij), tau_i). The prior for tau_i is given as Gamma(thetaiphi, phi). This function uses a MH step to draw a new sample of phi. Proposal distribution is Gamma(currentPhirhoPhi, rhoPhi). Note that we parameterize with RATE, not SCALE.
postPhi(taui, thetai, currentPhi, rhoPhi = 1000)
postPhi(taui, thetai, currentPhi, rhoPhi = 1000)
taui |
Numeric vector, individuals precisions. |
thetai |
Numeric vector. individuals precisions means (estimate) |
currentPhi |
Previous draw of phi |
rhoPhi |
Proposal rate for gamma distribution that draws proposal for phi, default is 1000. |
Numeric, new draw of phi
In our model the data are drawn from LogN(mu_i + log(c_ij), tau_i) The prior for c_ij is a categorical prior with category probabilities pi1, ..., pik, and c_ij can take values 1, ..., k where k is the length of pi. This function samples from the posterior of pi = pi1, ..., pik, assuming that pi follows Dirichlet(priorAlphas)
postPi(ci, priorAlphas)
postPi(ci, priorAlphas)
ci |
Integer vector, all of the sampled cij values for all individuals |
priorAlphas |
Numeric vector, prior dirichlet parameters for pi |
Numeric vector
This performs a Metropolis-Hastings draw for pi_i, assuming s_ij follows a truncated geometric distribution with parameters pi_i and S. The proposal distribution for pi_i is Beta(alpha, beta).
postPii(sij, currentPii, priorA, priorB, S)
postPii(sij, currentPii, priorA, priorB, S)
sij |
Vector of cycle skip indicators for individual i |
currentPii |
Current value of pi_i |
priorA |
Hyperparameter alpha. |
priorB |
Hyperparameter beta. |
S |
Maximum number of skips allowed in algorithm |
Draw for pi_i, repeated for the number of observations from individual i
In our model the data are drawn from LogN(mu_i + log(c_ij), tau_i). The prior for mu_i is given as N(mu, rho). This function draws from the conditional posterior of rho, given that the prior on rho is a uniform prior on the standard deviation.
postRho(muI, xib)
postRho(muI, xib)
muI |
Numeric vector, log of individuals mean values. |
xib |
Numeric vector, result of X %*% Beta, same length as muI. |
Numeric
This function calculates a random draw from the full conditional posterior distribution for s_ij in the Li algorithm, given the observed values yijs, the parameter pi, the lambda_i value, and the truncation level S.
postSij(yijs, pii, lambdai, S)
postSij(yijs, pii, lambdai, S)
yijs |
Vector of observed values for s_ij. |
pii |
Probability parameter pi. |
lambdai |
Value of lambda_i. |
S |
Truncation level. |
A random draw from the posterior distribution of s_ij.
In our model the data are drawn from LogN(mu_i + log(c_ij), tau_i). The prior for tau_i is given as Gamma(thetai*phi, phi). This function draws from the conditional posterior of tau_i. Note that we parameterize with RATE, not SCALE.
postTaui(yij, cij, mui, thetai, phi = 1)
postTaui(yij, cij, mui, thetai, phi = 1)
yij |
Numeric vector, cycle lengths for a single individual |
cij |
Positive Integer vector, a sampled vector of length(yij) where the corresponding values in cij indicate a sampled number of TRUE cycles in each cycle length given by yij |
mui |
Numeric, log of sampled mean of this individual's yijs |
thetai |
Numeric, mean of prior (gamma) distribution on taui |
phi |
Numeric, rate for Taui prior |
Additionally, note that in order to vectorize the remainder of the MCMC algorithm this function returns the sampled value repeated for length(yij)
Numeric vector, repeated sampled value of length(yij)
Print skipTrack.model to console
## S3 method for class 'skipTrack.model' print(x, ...)
## S3 method for class 'skipTrack.model' print(x, ...)
x |
skipTrack.model object from the function skipTrack.fit |
... |
Needed for S3 consistency |
Invisible NULL. Prints info about skipTrack.model object
#Simulated data simDat <- skipTrack.simulate(n = 100, skipProb = c(.7, .2, .1)) #Run model fit (should typically run with much more than 50 reps) modFit <- skipTrack.fit(Y = simDat$Y, cluster = simDat$cluster, chains = 2, reps = 50) print(modFit)
#Simulated data simDat <- skipTrack.simulate(n = 100, skipProb = c(.7, .2, .1)) #Run model fit (should typically run with much more than 50 reps) modFit <- skipTrack.fit(Y = simDat$Y, cluster = simDat$cluster, chains = 2, reps = 50) print(modFit)
This function performs a single step of the Markov Chain Monte Carlo (MCMC) algorithm to update parameters in a hierarchical model used for identifying skips in menstrual cycle tracking.
sampleStep( ijDat, iDat, rho, pi, Xi, Zi, Beta, Gamma, priorAlphas, indFirst, rhoBeta, rhoGamma, phi, rhoPhi, fixedSkips )
sampleStep( ijDat, iDat, rho, pi, Xi, Zi, Beta, Gamma, priorAlphas, indFirst, rhoBeta, rhoGamma, phi, rhoPhi, fixedSkips )
ijDat |
A data.frame with individual-observation level parameters: Individual, ys, cijs, muis, tauis. |
iDat |
A data.frame with individual level parameters: Individual, mus, taus, thetas. |
rho |
Updated value of the global parameter rho. |
pi |
Updated value of the global parameter pi. |
Xi |
A matrix (numIndividuals x length(Beta)) of covariates for cycle length mean. Default is a vector of 1's. NOTE THE DIFFERENCE (from skipTrack.MCMC) IN EXPECTED DIMENSION OF X |
Zi |
A matrix (numIndividuals x length(Gamma)) of covariates for cycle length precision. Default is a vector of 1's. NOTE THE DIFFERENCE (from skipTrack.MCMC) IN EXPECTED DIMENSION OF Z |
Beta |
Matrix (1 x length(Beta)) of coefficients for cycle length mean. |
Gamma |
Matrix of (1 x length(Gamma)) coefficients for cycle length precision. |
priorAlphas |
Vector of prior alpha values for updating pi. |
indFirst |
A logical vector indicating the first occurrence of each individual. |
rhoBeta |
Updated value of the global parameter rhoBeta. |
rhoGamma |
Value of the proposal parameter rhoGamma. |
phi |
Value of the parameter phi. |
rhoPhi |
Value of the proposal parameter rhoPhi. |
fixedSkips |
Logical. If TRUE cycle skip information (cijs) is not updated in sample steps and the inputs are instead assumed to be true. |
A list containing updated parameters after performing a single MCMC step. The list includes:
A data.frame with updated parameters at the individual-observation level: Individual, ys, cijs, muis, tauis.
A data.frame with updated parameters at the individual level: Individual, mus, taus, thetas.
Updated value of the global parameter rho.
Updated value of the global parameter pi.
Matrix of covariates for cycle length mean.
Matrix of covariates for cycle length precision.
Updated matrix of coefficients for cycle length mean.
Updated matrix of coefficients for cycle length precision.
Vector of prior alpha values for updating pi.
A logical vector indicating the first occurrence of each individual.
Hyperprior parameter rhoBeta, used to update Beta.
Value of the proposal parameter rhoGamma.
Updated value of the parameter phi.
Value of the proposal parameter rhoPhi.
Logical. Indicates if skips were fixed.
Takes model results from skipTrack.fit and uses genMCMCDiag to get generalized mcmc diagnostics
skipTrack.diagnostics( stFit, param = c("rho", "phi", "Betas", "Gammas", "muis", "tauis", "cijs"), proximityMap = NULL, ... )
skipTrack.diagnostics( stFit, param = c("rho", "phi", "Betas", "Gammas", "muis", "tauis", "cijs"), proximityMap = NULL, ... )
stFit |
A list of MCMC results from the skipTrack.fit function. |
param |
A character string specifying the parameter for which diagnostics are to be calculated. Must be one of: 'rho', 'phi', 'Betas', 'Gammas', 'muis', 'tauis', or 'cijs'. |
proximityMap |
An optional parameter specifying the proximity-map for calculating diagnostics. See package genMCMCDiag for details. Default is NULL. |
... |
Arguments passed on to
|
If the parameter is 'rho' or 'phi' (the univariate parameters), the function extracts the specified parameter from the MCMC results and calculates diagnostics using the genDiagnostic function with the standard proximityMap. If the parameter is any of the other available options, the function extracts the corresponding values and calculates diagnostics using the genDiagnostic function with the specified or default proximityMap ('lanfear') and hammingDist as the distance function.
Details on the genDiagnostic function can be found in the genMCMCDiag package.
A mcmcDiag object of MCMC diagnostics for the specified parameter
#Simulated data simDat <- skipTrack.simulate(n = 100, skipProb = c(.7, .2, .1)) #Run model fit (should typically run with much more than 50 reps) modFit <- skipTrack.fit(Y = simDat$Y, cluster = simDat$cluster, chains = 2, reps = 50) #Get diagnostics for cijs skipTrack.diagnostics(modFit, 'cijs')
#Simulated data simDat <- skipTrack.simulate(n = 100, skipProb = c(.7, .2, .1)) #Run model fit (should typically run with much more than 50 reps) modFit <- skipTrack.fit(Y = simDat$Y, cluster = simDat$cluster, chains = 2, reps = 50) #Get diagnostics for cijs skipTrack.diagnostics(modFit, 'cijs')
This function fits the model using multiple instances of skipTrack.MCMC, either in parallel or sequentially.
skipTrack.fit( Y, cluster, X = matrix(1, nrow = length(cluster)), Z = matrix(1, nrow = length(cluster)), numSkips = 10, reps = 1000, chains, useParallel = FALSE, ... )
skipTrack.fit( Y, cluster, X = matrix(1, nrow = length(cluster)), Z = matrix(1, nrow = length(cluster)), numSkips = 10, reps = 1000, chains, useParallel = FALSE, ... )
Y |
A vector of observed cycle lengths. |
cluster |
A vector indicating the individual cluster/group membership for each observation Y. |
X |
A matrix (length(Y) x length(Beta)) of covariates for cycle length mean. Default is a vector of 1's. |
Z |
A matrix (length(Y) x length(Gamma)) of covariates for cycle length precision. Default is a vector of 1's. |
numSkips |
The maximum number of skips to allow. Default is 10. |
reps |
The number of MCMC iterations (steps) to perform. Default is 1000. |
chains |
Number of chains to run. |
useParallel |
Logical, indicating whether to use parallel processing, as supported by doParallel. Default is FALSE. |
... |
Arguments passed on to
|
A list containing the results of skipTrack.MCMC for each chain.
#Simulated data simDat <- skipTrack.simulate(n = 100, skipProb = c(.7, .2, .1)) #Run model fit (should typically run with much more than 50 reps) modFit <- skipTrack.fit(Y = simDat$Y, cluster = simDat$cluster, chains = 2, reps = 50) modFit
#Simulated data simDat <- skipTrack.simulate(n = 100, skipProb = c(.7, .2, .1)) #Run model fit (should typically run with much more than 50 reps) modFit <- skipTrack.fit(Y = simDat$Y, cluster = simDat$cluster, chains = 2, reps = 50) modFit
This function runs a single Markov Chain Monte Carlo (MCMC) chain to update parameters in the skipTrack hierarchical model.
skipTrack.MCMC( Y, cluster, X = matrix(1, nrow = length(cluster)), Z = matrix(1, nrow = length(cluster)), numSkips = 10, reps = 1000, fixedSkips = FALSE, initialParams = list(pi = rep(1/(numSkips + 1), numSkips + 1), muis = rep(log(30), length(unique(cluster))), tauis = rep(5, length(unique(cluster))), rho = 1, cijs = sample(1:3, length(Y), replace = TRUE), alphas = rep(1, numSkips + 1), Beta = matrix(rep(0, ncol(as.matrix(X))), 1), Gamma = matrix(rep(0, ncol(as.matrix(Z))), 1), rhoBeta = 0.01, rhoGamma = 1000, phi = 0.01, rhoPhi = 1000), verbose = FALSE )
skipTrack.MCMC( Y, cluster, X = matrix(1, nrow = length(cluster)), Z = matrix(1, nrow = length(cluster)), numSkips = 10, reps = 1000, fixedSkips = FALSE, initialParams = list(pi = rep(1/(numSkips + 1), numSkips + 1), muis = rep(log(30), length(unique(cluster))), tauis = rep(5, length(unique(cluster))), rho = 1, cijs = sample(1:3, length(Y), replace = TRUE), alphas = rep(1, numSkips + 1), Beta = matrix(rep(0, ncol(as.matrix(X))), 1), Gamma = matrix(rep(0, ncol(as.matrix(Z))), 1), rhoBeta = 0.01, rhoGamma = 1000, phi = 0.01, rhoPhi = 1000), verbose = FALSE )
Y |
A vector of observed cycle lengths. |
cluster |
A vector indicating the individual cluster/group membership for each observation Y. |
X |
A matrix (length(Y) x length(Beta)) of covariates for cycle length mean. Default is a vector of 1's. |
Z |
A matrix (length(Y) x length(Gamma)) of covariates for cycle length precision. Default is a vector of 1's. |
numSkips |
The maximum number of skips to allow. Default is 10. |
reps |
The number of MCMC iterations (steps) to perform. Default is 1000. |
fixedSkips |
If TRUE cycle skip information (cijs) is not updated in sample steps and the inputs are instead assumed to be true. |
initialParams |
A list of initial parameter values for the MCMC algorithm. Default values are provided for pi, muis, tauis, rho, cijs, alphas, Beta, Gamma, phi, rhoBeta, rhoGamma, and rhoPhi. |
verbose |
logical. If true progress bars and additional info are printed to the console. |
A list containing the MCMC draws for each parameter at each iteration. Each element in the list is itself a list containing:
A data.frame with updated parameters at the individual-observation level: Individual, ys, cijs, muis, tauis.
A data.frame with updated parameters at the individual level: Individual, mus, taus, thetas.
Updated value of the global parameter rho.
Updated value of the global parameter pi.
Matrix of covariates for cycle length mean.
Matrix of covariates for cycle length precision.
Updated matrix of coefficients for cycle length mean.
Updated matrix of coefficients for cycle length precision.
Vector of prior alpha values for updating pi.
A logical vector indicating the first occurrence of each individual.
Hyperprior parameter rhoBeta, used to update Beta.
Value of the proposal parameter rhoGamma.
Updated value of the parameter phi.
Value of the proposal parameter rhoPhi.
Logical. Indicates if skips were fixed.
This function calculates inference results on Betas, Gammas, and cijs based on the provided MCMC results. It returns summaries such as credible intervals for Betas, Gammas, wald-type confidence intervals for cijs, and Gelman-Rubin PSRF diagnostics for all 3. Note that true values and converage are included in the output if trueVals is supplied, but otherwise not.
skipTrack.results(stFit, trueVals = NULL, burnIn = 750)
skipTrack.results(stFit, trueVals = NULL, burnIn = 750)
stFit |
Object result of skipTrack.fit function. |
trueVals |
Optional named list containing true values for Betas, Gammas, and cijs. (Also can use output of skipTrack.simulate) |
burnIn |
Number of MCMC iterations to discard as burn-in per chain. |
A list containing the following elements:
Betas |
data.frame with 95% credible intervals and (if trueVals is supplied) true values for Betas and Coverage tag. |
Gammas |
data.frame with 95% credible intervals and (if trueVals is supplied) true values for Gammas and Coverage tag. |
cijs |
data.frame with Wald-type 95% confidence intervals and (if trueVals is supplied) true values for cijs and Coverage tags. |
Diagnostics |
data.frame with ess and gelman-rubin diagnostics from genMCMCDiag package, for parameter sets 'Betas', 'Gammas' and 'cijs'. |
#Simulated data simDat <- skipTrack.simulate(n = 100, skipProb = c(.7, .2, .1)) #Run model fit (should typically run with much more than 50 reps) modFit <- skipTrack.fit(Y = simDat$Y, cluster = simDat$cluster, chains = 2, reps = 50) modFit # If using simulated data (which includes access to ground truth): # skipTrack.results(modFit, trueVals = simDat, burnIn = 25) #Recommended burnIn with real data is at least 750 # # If not using simulated data: # skipTrack.results(modFit, burnIn = 25) #Recommended burnIn with real data is at least 750
#Simulated data simDat <- skipTrack.simulate(n = 100, skipProb = c(.7, .2, .1)) #Run model fit (should typically run with much more than 50 reps) modFit <- skipTrack.fit(Y = simDat$Y, cluster = simDat$cluster, chains = 2, reps = 50) modFit # If using simulated data (which includes access to ground truth): # skipTrack.results(modFit, trueVals = simDat, burnIn = 25) #Recommended burnIn with real data is at least 750 # # If not using simulated data: # skipTrack.results(modFit, burnIn = 25) #Recommended burnIn with real data is at least 750
This function generates synthetic data for user-tracked menstrual cycles given a generative model, skip probabilities, maximum cycles and covariates (depending on the model). It supports built-in models ('skipTrack', 'li', 'mixture') and custom models written as functions.
skipTrack.simulate( n, model = c("skipTrack", "li", "mixture"), skipProb = NULL, maxCycles = length(skipProb), trueBetas = NULL, trueGammas = NULL, overlap = 0, avgCyclesPer = 7 )
skipTrack.simulate( n, model = c("skipTrack", "li", "mixture"), skipProb = NULL, maxCycles = length(skipProb), trueBetas = NULL, trueGammas = NULL, overlap = 0, avgCyclesPer = 7 )
n |
Number of individuals to simulate data for. |
model |
model for data simulation. Can be a character ('skipTrack', 'li', 'mixture') or a custom function. |
skipProb |
Vector of probabilities for number of true cycles per tracked cycle. For example, (.7, .2, .1) means that 70% of observed cycles will contain one true cycle, 20% will contain 2 true cycles and 10% will contain 3 true cycles. Default is NULL. If model == 'li', skipProb values are set and user input will be ignored. |
maxCycles |
Maximum number of cycles for generating skip cycles. Default is the length of skipProb. If model == 'li', this must be specified; if model == 'skipTrack' or 'mixture', leave as default. |
trueBetas |
Optional. True values for the mean regression coefficients (not counting intercept which is automatic based on the model). |
trueGammas |
Optional. True values for the precision regression coefficients (not counting intercept which is automatic based on the model). Precision covariates not available for model == 'li'. |
overlap |
Optional. Number of (non-intercept) columns shared between X and Z. Columns are shared from left to right. |
avgCyclesPer |
Average number of cycles contributed by each individual. Actual number is drawn from Poisson for each person. Default is 7. |
A list containing:
Tracked cycles from the simulated data.
Individual identifiers from the simulated data.
Covariate matrix for Betas (mean cycle length).
Covariate matrix for Gammas (regularity).
True beta coefficients.
True gamma coefficients.
Number of true cycles in each tracked cycle. Order matches Y.
Subset of the simulated data containing individual level information. For 'skipTrack' - individual mean and precision for log(cycle lengths), for 'li' - individual mean for cycle lengths, for 'mixture' - individual mean for cycle lengths
Li, Kathy, et al. "A predictive model for next cycle start date that accounts for adherence in menstrual self-tracking." Journal of the American Medical Informatics Association 29.1 (2022): 3-11.
# Example simulation from the SkipTrack model resultSt <- skipTrack.simulate(1000, model = 'skipTrack', skipProb = c(.7, .2, .1)) hist(resultSt$Y, breaks = 5:200) # Example simulation from the Li model resultLi <- skipTrack.simulate(1000, model = 'li', maxCycles = 3) hist(resultLi$Y, breaks = 5:200) #Example simulation from the mixture model resultMix <- skipTrack.simulate(1000, model = 'mixture', skipProb = c(.7, .2, .1)) hist(resultMix$Y, breaks = 5:200)
# Example simulation from the SkipTrack model resultSt <- skipTrack.simulate(1000, model = 'skipTrack', skipProb = c(.7, .2, .1)) hist(resultSt$Y, breaks = 5:200) # Example simulation from the Li model resultLi <- skipTrack.simulate(1000, model = 'li', maxCycles = 3) hist(resultLi$Y, breaks = 5:200) #Example simulation from the mixture model resultMix <- skipTrack.simulate(1000, model = 'mixture', skipProb = c(.7, .2, .1)) hist(resultMix$Y, breaks = 5:200)
This function takes results from skipTrack.fit and produces several helpful visualizations.
skipTrack.visualize(stFit)
skipTrack.visualize(stFit)
stFit |
A list containing MCMC results obtained from skipTrack.fit. |
A list of three ggplot2 objects:
cijOverLength - Scatter plot of estimated Cij values against reported cycle length.
cijOverTaus - Scatter plot of estimated Cij values against estimated individual precisions, colored by cycle length.
cijDens - Density plot of Y values overlayed with a density plot of Y values separated by estimated cij value.
skipTrack.fit
for generating MCMC results.
#Simulated data simDat <- skipTrack.simulate(n = 100, skipProb = c(.7, .2, .1)) #Run model fit (should typically run with much more than 50 reps) modFit <- skipTrack.fit(Y = simDat$Y, cluster = simDat$cluster, chains = 2, reps = 50) #Visualize results skipTrack.visualize(modFit)
#Simulated data simDat <- skipTrack.simulate(n = 100, skipProb = c(.7, .2, .1)) #Run model fit (should typically run with much more than 50 reps) modFit <- skipTrack.fit(Y = simDat$Y, cluster = simDat$cluster, chains = 2, reps = 50) #Visualize results skipTrack.visualize(modFit)
Report skipTrack.model structure to console
## S3 method for class 'skipTrack.model' str(object, ...)
## S3 method for class 'skipTrack.model' str(object, ...)
object |
skipTrack.model object from the function skipTrack.fit |
... |
To match other str calls |
Invisible NULL. Prints structure of skipTrack.model object
#Simulated data simDat <- skipTrack.simulate(n = 100, skipProb = c(.7, .2, .1)) #Run model fit (should typically run with much more than 50 reps) modFit <- skipTrack.fit(Y = simDat$Y, cluster = simDat$cluster, chains = 2, reps = 50) str(modFit)
#Simulated data simDat <- skipTrack.simulate(n = 100, skipProb = c(.7, .2, .1)) #Run model fit (should typically run with much more than 50 reps) modFit <- skipTrack.fit(Y = simDat$Y, cluster = simDat$cluster, chains = 2, reps = 50) str(modFit)
This function generates synthetic data for user tracked menstrual cycles for a single individual. For this model Beta_0 = log(30), Gamma_0 = 5.5, and phi = .01.
stSim(i, skipProb, maxCycles, trueBetas, trueGammas, overlap, avgCyclesPer)
stSim(i, skipProb, maxCycles, trueBetas, trueGammas, overlap, avgCyclesPer)
i |
Individual identifier. Character, numeric or integer. |
skipProb |
Vector of probabilities for number of true cycles per tracked cycle. For example, (.7, .2, .1) means that 70% of observed cycles will contain one true cycle, 20% will contain 2 true cycles and 10% will contain 3 true cycles. |
maxCycles |
Maximum number of true cycles per tracked cycle. Ignored for this model. |
trueBetas |
Optional. True values for the mean regression coefficients (not counting intercept which is automatic based on the model). |
trueGammas |
Optional. True values for the precision regression coefficients (not counting intercept which is automatic based on the model). |
overlap |
Optional. Number of (non-intercept) columns shared between X and Z. Columns are shared from left to right. |
avgCyclesPer |
Average number of cycles contributed by each individual. Actual number is drawn from Poisson for each person. Default is 7. |
Individual identifiers.
Tracked cycles.
Number of true values.
Individual's mean of log(Y).
Individual's precision of log(Y)
Beta0 true value.
Gamma0 true value.
Covariate matrix for Mean, where N is the length of trueBetas.
Covariate matrix for precision, where M is the length of trueGammas.
Report skipTrack.model results to console
## S3 method for class 'skipTrack.model' summary(object, ...)
## S3 method for class 'skipTrack.model' summary(object, ...)
object |
skipTrack.model object from the function skipTrack.fit |
... |
arguments passed to skipTrack.results |
Invisible skipTrack.results. Prints info about skipTrack.model object
#Simulated data simDat <- skipTrack.simulate(n = 100, skipProb = c(.7, .2, .1)) #Run model fit (should typically run with much more than 50 reps) modFit <- skipTrack.fit(Y = simDat$Y, cluster = simDat$cluster, chains = 2, reps = 50) summary(modFit, burnIn = 25) #Recommended burnIn with real data is at least 750
#Simulated data simDat <- skipTrack.simulate(n = 100, skipProb = c(.7, .2, .1)) #Run model fit (should typically run with much more than 50 reps) modFit <- skipTrack.fit(Y = simDat$Y, cluster = simDat$cluster, chains = 2, reps = 50) summary(modFit, burnIn = 25) #Recommended burnIn with real data is at least 750