select_predictors {textTinyR} | R Documentation |
Exclude highly correlated predictors
select_predictors(response_vector, predictors_matrix, response_lower_thresh = 0.1, predictors_upper_thresh = 0.75, threads = 1, verbose = FALSE)
response_vector |
a numeric vector (the length should be equal to the rows of the predictors_matrix parameter) |
predictors_matrix |
a numeric matrix (the rows should be equal to the length of the response_vector parameter) |
response_lower_thresh |
a numeric value. This parameter allows the user to keep all the predictors having a correlation with the response greater than the response_lower_thresh value. |
predictors_upper_thresh |
a numeric value. This parameter allows the user to keep all the predictors having a correlation comparing to the other predictors less than the predictors_upper_thresh value. |
threads |
a numeric value specifying the number of cores to run in parallel |
verbose |
either TRUE or FALSE. If TRUE then information will be printed out in the R session. |
The function works in the following way : The correlation of the predictors with the response is first calculated and the resulted correlations are sorted in decreasing order. Then iteratively predictors with correlation higher than the predictors_upper_thresh value are removed by favoring those predictors which are more correlated with the response variable. If the response_lower_thresh value is greater than 0.0 then only predictors having a correlation higher than or equal to the response_lower_thresh value will be kept, otherwise they will be excluded. This function returns the indices of the predictors and is useful in case of multicollinearity.
a vector of column-indices
library(textTinyR) set.seed(1) resp = runif(100) set.seed(2) col = runif(100) matr = matrix(c(col, col^4, col^6, col^8, col^10), nrow = 100, ncol = 5) out = select_predictors(resp, matr, predictors_upper_thresh = 0.75)