hcmm_impute {MixedDataImpute} | R Documentation |
Imputations are generated using nonparametric Bayesian joint models (specifically the hierarchcially coupled mixture model with local dependence described in Murray and Reiter (2015); see citation(MixedDataImpute) or http://arxiv.org/abs/1410.0438).
hcmm_impute(X, Y, kz, kx, ky, hyperpar = NULL, num.impute, num.burnin, num.skip, thin.trace = -1, status = 50)
X |
A data frame of categorical variables (as factors) |
Y |
A matrix or data frame of continuous variables |
kz |
Number of top-level clusters |
kx |
Number of X-model clusters |
ky |
Number of Y-model clusters |
hyperpar |
A list of hyperparameter values (see |
num.impute |
Number of imputations |
num.burnin |
Number of MCMC burn-in iterations |
num.skip |
Number of MCMC iterations between saved imputations |
thin.trace |
If negative, only save the num.impute datasets. If positive,
save summaries of the model state at every |
status |
Interval at which to print status messages |
A list with three elements:
imputations
A list of length num.impute
. Each element is an imputed dataset.
trace
MCMC output (currently the component sizes for the three mixture indices)
model
An interface to the C++ object containing the current state
## Not run: library(MixedDataImpute) library(mice) # For the functions implementing combining rules data(sipp08) set.seed(1) n = 1000 s = sample(1:nrow(sipp08), n) Y = sipp08[s,1:2] Y[,1] = log(Y[,1]+1) X = sipp08[s,-c(1:2,9)] # Also removes occ code, which has ~23 levels # MCAR with probability 0.2, for illustration purposes (not matching the paper) Y[runif(n)<0.2,1] = NA Y[runif(n)<0.2,2] = NA for(j in 1:ncol(X)) X[runif(n)<0.2,j] = NA kz = 15 ky = 60 kx = 90 num.impute = 5 num.burnin = 10000 num.skip = 1000 thin.trace = 10 imp = hcmm_impute(X, Y, kz=kz, kx=kx, ky=ky, num.impute=num.impute, num.burnin=num.burnin, num.skip=num.skip, thin.trace=thin.trace) # Example of getting MI estimates for a regression, using the # pooling functions in mice form = total_earnings~age+I(age^2) + sex*I(own_kid!=0) fits = lapply(imp$imputations, function(dat) lm(form, data=dat)) pooled_ests = pool(as.mira(fits)) summary(pooled_ests) # original, complete data estimates for comparison comdat = sipp08[s,] comdat[,1] = log(comdat[,1]+10) summary(lm(form, data=comdat)) # true population values for comparison pop = sipp08 pop[,1] = log(pop[,1]+10) summary(lm(form, data=pop)) ## End(Not run)