cox_cure_net {intsurv} | R Documentation |
For right-censored data, fit a regularized Cox cure rate model through elastic-net penalty following Masud et al. (2018), and Zou and Hastie (2005). For right-censored data with uncertain event status, fit the regularized Cox cure model proposed by Wang et al. (2019+). Without regularization, the model reduces to the regular Cox cure rate model (Kuk and Chen, 1992; Sy and Taylor, 2000)
cox_cure_net(surv_formula, cure_formula, time, event, data, subset, contrasts = NULL, surv_lambda, surv_alpha = 1, surv_nlambda = 10, surv_lambda_min_ratio = 1e-1, surv_l1_penalty_factor, cure_lambda, cure_alpha = 1, cure_nlambda = 10, cure_lambda_min_ratio = 1e-1, cure_l1_penalty_factor, surv_start, cure_start, surv_standardize = TRUE, cure_standardize = TRUE, em_max_iter = 200, em_rel_tol = 1e-5, surv_max_iter = 10, surv_rel_tol = 1e-5, cure_max_iter = 10, cure_rel_tol = 1e-5, tail_completion = c("zero", "exp", "zero-tau"), tail_tau = NULL, pmin = 1e-5, early_stop = TRUE, verbose = FALSE, ...) cox_cure_net.fit(surv_x, cure_x, time, event, cure_intercept = TRUE, surv_lambda, surv_alpha = 1, surv_nlambda = 10, surv_lambda_min_ratio = 1e-1, surv_l1_penalty_factor, cure_lambda, cure_alpha = 1, cure_nlambda = 10, cure_lambda_min_ratio = 1e-1, cure_l1_penalty_factor, surv_start, cure_start, surv_standardize = TRUE, cure_standardize = TRUE, em_max_iter = 200, em_rel_tol = 1e-5, surv_max_iter = 10, surv_rel_tol = 1e-5, cure_max_iter = 10, cure_rel_tol = 1e-5, tail_completion = c("zero", "exp", "zero-tau"), tail_tau = NULL, pmin = 1e-5, early_stop = TRUE, verbose = FALSE, ...)
surv_formula |
A formula object starting with |
cure_formula |
A formula object starting with |
time |
A numeric vector for the observed survival times. |
event |
A numeric vector for the event indicators. |
data |
An optional data frame, list, or environment that contains the
covariates and response variables ( |
subset |
An optional logical vector specifying a subset of observations to be used in the fitting process. |
contrasts |
An optional list, whose entries are values (numeric
matrices or character strings naming functions) to be used as
replacement values for the contrasts replacement function and whose
names are the names of columns of data containing factors. See
|
surv_lambda |
A numeric vector consists of non-negative values representing the tuning parameter sequence for the survival model part. |
surv_alpha |
A number between 0 and 1 for tuning the elastic net penalty for the survival model part. If it is one, the elastic penalty will reduce to the well-known lasso penalty. If it is zero, the ridge penalty will be used. |
surv_nlambda |
A positive number specifying the number of
|
surv_lambda_min_ratio |
The ratio of the minimum |
surv_l1_penalty_factor |
A numeric vector that consists of positive numbers for penalty factors (or weights) on L1-norm for the coefficient estimate vector in the survival model part. The penalty is applied to the coefficient estimate divided by the specified weights. The specified weights are re-scaled internally so that their summation equals the length of coefficients. If it is left unspecified, the weights are all set to be one. |
cure_lambda |
A numeric vector consists of non-negative values representing the tuning parameter sequence for the cure model part. |
cure_alpha |
A number between 0 and 1 for tuning the elastic net penalty for the cure model part. If it is one, the elastic penalty will reduce to the well-known lasso penalty. If it is zero, the ridge penalty will be used. |
cure_nlambda |
A positive number specifying the number of
|
cure_lambda_min_ratio |
The ratio of the minimum |
cure_l1_penalty_factor |
A numeric vector that consists of positive numbers for penalty factors (or weights) on L1-norm for the coefficient estimate vector in the cure model part. The penalty is applied to the coefficient estimate divided by the specified weights. The specified weights are re-scaled internally so that their summation equals the length of coefficients. If it is left unspecified, the weights are all set to be one. |
surv_start |
An optional numeric vector representing the starting values for the Cox model component. If not specified, the starting values will be obtained from fitting a regular Cox model to events only. |
cure_start |
An optional numeric vector representing the starting values for the logistic model component. If not specified, the starting values will be obtained from fitting a regular logistic model to the non-missing event indicators. |
surv_standardize |
A logical value specifying whether to standardize
the covariates for the survival model part. If |
cure_standardize |
A logical value specifying whether to standardize
the covariates for the cure rate model part. If |
em_max_iter |
A positive integer specifying the maximum iteration
number of the EM algorithm. The default value is |
em_rel_tol |
A positive number specifying the tolerance that determines
the convergence of the EM algorithm in terms of the convergence of the
covariate coefficient estimates. The tolerance is compared with the
relative change between estimates from two consecutive iterations, which
is measured by ratio of the L1-norm of their difference to the sum of
their L1-norm. The default value is |
surv_max_iter |
A positive integer specifying the maximum iteration
number of the M-step routine related to the survival model component.
The default value is |
surv_rel_tol |
A positive number specifying the tolerance that
determines the convergence of the M-step related to the survival model
component in terms of the convergence of the covariate coefficient
estimates. The tolerance is compared with the relative change between
estimates from two consecutive iterations, which is measured by ratio of
the L1-norm of their difference to the sum of their L1-norm. The
default value is |
cure_max_iter |
A positive integer specifying the maximum iteration
number of the M-step routine related to the cure rate model component.
The default value is |
cure_rel_tol |
A positive number specifying the tolerance that
determines the convergence of the M-step related to the cure rate model
component in terms of the convergence of the covariate coefficient
estimates. The tolerance is compared with the relative change between
estimates from two consecutive iterations, which is measured by ratio of
the L1-norm of their difference to the sum of their L1-norm. The
default value is |
tail_completion |
A character string specifying the tail completion
method for conditional survival function. The available methods are
|
tail_tau |
A numeric number specifying the time of zero-tail
completion. It will be used only if |
pmin |
A numeric number specifying the minimum value of probabilities
for sake of numerical stability. The default value is |
early_stop |
A logical value specifying whether to stop the iteration
once the negative log-likelihood unexpectedly increases, which may
suggest convergence on likelihood, or indicate numerical issues or
implementation bugs. The default value is |
verbose |
A logical value. If |
... |
Other arguments for future usage. A warning will be thrown if any invalid argument is specified. |
surv_x |
A numeric matrix for the design matrix of the survival model component. |
cure_x |
A numeric matrix for the design matrix of the cure rate model
component. The design matrix should exclude an intercept term unless we
want to fit a model only including the intercept term. In that case, we
need further set |
cure_intercept |
A logical value specifying whether to add an intercept
term to the cure rate model component. If |
The model estimation procedure follows expectation maximization (EM) algorithm. Variable selection procedure through regularization by elastic net penalty is developed based on cyclic coordinate descent and majorization-minimization (MM) algorithm.
cox_cure_net
object for regular Cox cure rate model or
cox_cure_net_uncer
object for Cox cure rate model with uncertain
events.
Kuk, A. Y. C., & Chen, C. (1992). A mixture model combining logistic regression with proportional hazards regression. Biometrika, 79(3), 531–541.
Masud, A., Tu, W., & Yu, Z. (2018). Variable selection for mixture and promotion time cure rate models. Statistical methods in medical research, 27(7), 2185–2199.
Peng, Y. (2003). Estimating baseline distribution in proportional hazards cure models. Computational Statistics & Data Analysis, 42(1-2), 187–201.
Sy, J. P., & Taylor, J. M. (2000). Estimation in a Cox proportional hazards cure model. Biometrics, 56(1), 227–236.
Wang, W., Chen, K., Luo, C., & Yan, J. (2019+). Cox Cure Model with Uncertain Event Status with application to a Suicide Risk Study. Working in Progress.
Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), 301–320.
cox_cure
for regular Cox cure rate model.
library(intsurv) ### regularized Cox cure rate model ================================== ## simulate a toy right-censored data with a cure fraction set.seed(123) n_obs <- 100 p <- 10 x_mat <- matrix(rnorm(n_obs * p), nrow = n_obs, ncol = p) colnames(x_mat) <- paste0("x", seq_len(p)) surv_beta <- c(rep(0, p - 5), rep(1, 5)) cure_beta <- c(rep(1, 2), rep(0, p - 2)) dat <- simData4cure(nSubject = n_obs, lambda_censor = 0.01, max_censor = 10, survMat = x_mat, survCoef = surv_beta, cureCoef = cure_beta, b0 = 0.5, p1 = 1, p2 = 1, p3 = 1) ## model-fitting from given design matrices fit1 <- cox_cure_net.fit(x_mat, x_mat, dat$obs_time, dat$obs_event, surv_nlambda = 10, cure_nlambda = 10, surv_alpha = 0.8, cure_alpha = 0.8) ## model-fitting from given model formula fm <- paste(paste0("x", seq_len(p)), collapse = " + ") surv_fm <- as.formula(sprintf("~ %s", fm)) cure_fm <- surv_fm fit2 <- cox_cure_net(surv_fm, cure_fm, data = dat, time = obs_time, event = obs_event, surv_alpha = 0.5, cure_alpha = 0.5) ## summary of BIC's BIC(fit1) BIC(fit2) ## list of coefficient estimates based on BIC coef(fit1) coef(fit2) ### regularized Cox cure model with uncertain event status =========== ## simulate a toy data set.seed(123) n_obs <- 100 p <- 10 x_mat <- matrix(rnorm(n_obs * p), nrow = n_obs, ncol = p) colnames(x_mat) <- paste0("x", seq_len(p)) surv_beta <- c(rep(0, p - 5), rep(1, 5)) cure_beta <- c(rep(1, 2), rep(0, p - 2)) dat <- simData4cure(nSubject = n_obs, lambda_censor = 0.01, max_censor = 10, survMat = x_mat, survCoef = surv_beta, cureCoef = cure_beta, b0 = 0.5, p1 = 0.95, p2 = 0.95, p3 = 0.95) ## model-fitting from given design matrices fit1 <- cox_cure_net.fit(x_mat, x_mat, dat$obs_time, dat$obs_event, surv_nlambda = 5, cure_nlambda = 5, surv_alpha = 0.8, cure_alpha = 0.8) ## model-fitting from given model formula fm <- paste(paste0("x", seq_len(p)), collapse = " + ") surv_fm <- as.formula(sprintf("~ %s", fm)) cure_fm <- surv_fm fit2 <- cox_cure_net(surv_fm, cure_fm, data = dat, time = obs_time, event = obs_event, surv_nlambda = 5, cure_nlambda = 5, surv_alpha = 0.5, cure_alpha = 0.5) ## summary of BIC's BIC(fit1) BIC(fit2) ## list of coefficient estimates based on BIC coef(fit1) coef(fit2)