mglasso {cglasso} | R Documentation |
‘mglasso
’ function is used to fit an l1-penalized Gaussian graphical model with missing-at-random data.
mglasso(X, weights, pendiag = FALSE, nrho = 50L, rho.min.ratio, rho, maxR2, maxit_em = 1.0e+3, thr_em = 1.0e-4, maxit_bcd = 1.0e+4, thr_bcd = 1.0e-4, trace = 0L)
X |
the (n x p)-dimensional matrix used to fit the model. |
weights |
an optional symmetric matrix of non-negative weights. This matrix can be used to specify the unpenalized partial correlation coefficients (‘ |
pendiag |
flag used to specify if the diagonal elements of the concentration matrix are penalized (‘ |
nrho |
the integer specifying the number of tuning parameters used to fit the mglasso model. Default is ‘ |
rho.min.ratio |
the smallest value for the tuning parameter rho, as a fraction of the smallest tuning parameter for which all the estimated partial correlation coefficients are zero. The default depends on the sample size ‘n’ relative to the number of variables ‘p’. If ‘p < n’, the default is ‘1.0E-4’ otherwise the value ‘1.0E-2’ is used as default. A very small value of ‘ |
rho |
optional argument. A user supplied |
maxR2 |
a value belonging to the interval [0, 1] specifying the largest value of the pseudo R-squared measure (see Section Details). The regularization path is stopped when R2 exceeds ‘ |
maxit_em |
maximum number of iterations of the EM algorithm. Default is |
thr_em |
threshold for the convergence of the EM algorithm. Default value is |
maxit_bcd |
maximum number of iterations of the glasso algorithm. Default is |
thr_bcd |
threshold for the convergence of the glasso algorithm. Default is |
trace |
integer for printing out information as iterations proceed: |
The missglasso estimator (Stadler and other, 2012) is an extension of the classical graphical lasso (glasso) estimator (Yuan and other, 2007) developed to fit a sparse Gaussian graphical model under the assumption that data are missing-at-random.
mglasso
function fits the model using the following EM algorithm:
Step | Description |
1. | Let {hat{mu}^{rho}_{ini}; hat{Tht}^{rho}_{ini}} be initial estimates; |
2. | E-step |
use the expected values of the conditional normal distribution to impute the missing data | |
let X^{rho} the completed data and S^{rho} the corresponding empirical covariance matrix | |
3. | M-step |
let hat{mu}_h^{rho} = sum_i x^{rho}_{ih} / n; | |
compute hat{Tht}^{rho} using S^{rho} and the glasso algorithm (Friedman and other, 2008); | |
4. | repeat steps 2. and 3. until a convergence criterion is met. |
In order to avoid the overfitting of the model, we use the following pseudo R-squared measure:
R2 = 1 - ||S^rho - Sgm^rho||_F / ||S^{rho_max} - Sgm^{rho_max}||_F,
where || . ||_F denotes the Frobenius norm and rho_max denotes the smallest value of the tuning parameter for which all the estimated partial correlation coefficients are zero. By straightforward algebra, it is easy to show that the proposed pseudo R-squared belongs to the closed interval [0, 1]: R2 = 0 when the tuning parameter is equal to rho_max and R2 = 1 when rho = 0. The regularization path is stopped when R2 exceeds the threshold specify by ‘maxR2
’.
mglasso
returns an object with S3 class “mglasso
”, i.e., a list containing the
following components:
call |
the call that produced this object. |
X |
the original matrix used to fit the missglasso model. |
weights |
the weights used to fit the missglasso model. |
pendiag |
the flag specifying if the diagonal elements of the precisione matrix are penalized. |
nrho |
the number of fitted missglasso model. |
rho.min.ratio |
the scale factor used to compute the smallest value of the tuning parameter. |
rho |
the p-dimensional vector reporting the values of the tuning parameter used to fit the missglasso model. |
maxR2 |
the threshold value used to stop the regularization path. |
maxit_em |
the maximum number of iterations of the EM algorithm. |
thr_em |
the threshold for the convergence of the EM algorithm. |
maxit_bcd |
the maximum number of iterations of the glasso algorithm. |
thr_bcd |
the threshold for the convergence of the glasso algorithm. |
Xipt |
an array of dimension n x p x nrho. |
S |
an array of dimension p x p x nrho. |
mu |
a matrix of dimension p x nrho. The kth column is the estimate of the expected values of the missglasso model fitted using |
Sgm |
an array of dimension p x p x nrho. |
Tht |
an array of dimension p x p x nrho. |
Adj |
an array of dimension p x p x nrho. |
df |
the p-dimensional vector reporting the number of non-zero partial correlation coefficients. |
R2 |
the p-dimensional vector reporting the values of the measure R2 described in section Details. |
ncomp |
the p-dimensional vector reporting the number of connected components (for internal purposes only). |
Ck |
the (p x nrho)-dimensional matrix encoding the connected components (for internal purposes only). |
pk |
the (p x nrho)-dimensional matrix reporting the number of vertices per connected component (for internal purposes only). |
nit |
the (nrho x 2)-dimensional matrix reporting the number of iterations. |
conv |
a description of the error that has occurred. |
subrout |
the name of the Fortran subroutine where the error has occurred (for internal debug only). |
trace |
the integer used for printing out information. |
Luigi Augugliaro (luigi.augugliaro@unipa.it)
Friedman, J.H., Hastie, T., and Tibshirani, R. (2008) <DOI:10.1093/biostatistics/kxm045>. Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9, 432–441.
Stadler N., and Buhlmann P. (2012) <DOI:10.1007/s11222-010-9219-7>. Missing values: sparse inverse covariance estimation and an extension to sparse regression. Statistics and Computing 22, 219–235.
Yuan, M., and Lin, Y. (2007) <DOI:10.1093/biomet/asm018>. Model selection and estimation in the Gaussian graphical model. Biometrika 94, 19–35.
glasso
, to_graph
, mle
and the method functions summary
, coef
, plot
, aic
, bic
, ebic
.
library("cglasso") set.seed(123) p <- 5L n <- 100L mu <- rep(0L, p) Tht <- diag(p) diag(Tht[-1L, -p]) <- diag(Tht[-p, -1L]) <- 0.3 Sgm <- solve(Tht) X <- MASS::mvrnorm(n = n, mu = mu, Sigma = Sgm) X <- as.vector(X) id.na <- sample.int(n = n * p, size = n * p * 0.05) X[id.na] <- NA dim(X) <- c(n, p) out <- mglasso(X = X) out # in this example we use the argument 'weights' to specify # the unpenalized partial correlation coefficients and the # structural zeros in the precision matrix w <- rep(1, p * p) dim(w) <- c(p, p) # specifing the unpenalized partial correlation coefficients diag(w) <- diag(w[-1L, -p]) <- diag(w[-p, -1L]) <- 0 # specifing the structural zero w[1L, 4L:5L] <- w[4L:5L, 1L] <- +Inf w[2L, 5L] <- w[5L, 2L] <- +Inf w out <- mglasso(X = X, weights = w) # checking structural zeros out$Tht[, , out$nrho][w == +Inf] # checking stationarity conditions of the MLE estimators # (the unpenalized partial correlation coefficients) (out$Sgm[, , out$nrho] - out$S[, , out$nrho])[w == 0]