Vivian Shih
2011-Oct-19 02:24 UTC
[R] Sparse covariance estimation (via glasso) shrinking to a "nonzero" constant
I've only been using R on and off for 9 months and started using the glasso package for sparse covariance estimation. I know the concept is to shrink some of the elements of the covariance matrix to zero. However, say I have a dataset that I know has some underlying "baseline" covariance/correlation (say, a value of 0.3), how can I change or incorporate that into to the code? Basically instead of "let's restrict some of these elements are 0 and others have different nonzero values", I'd like it to be "let's restrict some of these elements are 0.3 and others are different or higher/lower than this 0.3". Does that make any sense? Here is the glasso code if it helps:> library(glasso) > glassofunction (s, rho, zero = NULL, thr = 1e-04, maxit = 10000, approx = FALSE, penalize.diagonal = TRUE, start = c("cold", "warm"), w.init = NULL, wi.init = NULL, trace = FALSE) { n = nrow(s) BIG = 1e+10 if (!is.matrix(rho) & length(rho) != 1 & length(rho) != nrow(s)) { stop("Wrong number of elements in rho") } if (length(rho) == 1 & rho == 0) { cat("With rho=0, there may be convergence problems if the input matrix s\n is not of full rank", fill = TRUE) } if (is.vector(rho)) { rho = matrix(sqrt(rho)) %*% sqrt(rho) } if (length(rho) == 1) { rho = matrix(rho, ncol = n, nrow = n) } if (!is.null(zero)) { if (!is.matrix(zero)) { zero = matrix(zero, nrow = TRUE) } for (k in 1:nrow(zero)) { i = zero[k, 1] j = zero[k, 2] rho[i, j] = BIG rho[j, i] = BIG } } start.type = match.arg(start) if (start.type == "cold") { is = 0 ww = xx = matrix(0, nrow = n, ncol = n) } if (start.type == "warm") { is = 1 if (is.null(w.init) | is.null(wi.init)) { stop("Warm start specified: w.init and wi.init must be non-null") } ww = w.init xx = wi.init } itrace = 1 * trace ipen = 1 * (penalize.diagonal) ia = 1 * approx mode(rho) = "double" mode(s) = "double" mode(ww) = "double" mode(xx) = "double" mode(n) = "integer" mode(maxit) = "integer" mode(ia) = "integer" mode(itrace) = "integer" mode(ipen) = "integer" mode(is) = "integer" mode(thr) = "double" junk <- .Fortran("glasso", n, s, rho, ia, is, itrace, ipen, thr, maxit = maxit, ww = ww, xx = xx, niter = integer(1), del = double(1), ierr = integer(1), PACKAGE = "glasso") ww = matrix(junk$ww, ncol = n) xx = matrix(junk$xx, ncol = n) if (junk$ierr != 0) { stop("memory allocation error") } critfun = function(Sigmahati, s, rho, penalize.diagonal = TRUE) { d = det(Sigmahati) temp = Sigmahati if (!penalize.diagonal) { diag(temp) = 0 } val = -log(d) + sum(diag(s %*% Sigmahati)) + sum(abs(rho * temp)) return(val) } crit = critfun(xx, s, rho, penalize.diagonal) return(list(w = ww, wi = xx, loglik = -(n/2) * crit, errflag = junk$ierr, approx = approx, del = junk$del, niter = junk$niter)) } If anyone can give any suggestions, I'd greatly appreciate it. Thanks! Vivian
Joshua Wiley
2011-Oct-19 03:31 UTC
[R] Sparse covariance estimation (via glasso) shrinking to a "nonzero" constant
Hi Vivian, This may be naive given the method (I am unfamiliar with glasso), but what about simple subtraction? If it restricts to 0, you believe you have .3, then just: obs - .3 and restrict to 0 again? Here is a little example (assuming .3 correlation, but using glasso with the covariance matrix so subtraction is marginally more complex): dat <- lapply(list((dat <- prcomp(matrix(rnorm(1000), 200))$x), dat %*% chol(matrix(c(1, rep(c(rep(.3, 5), 1), 4)), 5, 5))), cov) i <- sqrt(1/diag(dat[[2]])) v <- dat[[2]] - matrix(c(0, rep(c(rep(.3, 5), 0), 4)), 5, 5)/(i * rep(i, each = 5)) require(glasso) Reduce(`-`, res <- list(true = glasso(dat[[1]], .01)$w, fake glasso(v, .01)$w)) Again this may not make any sense at all given how the methods actually work. Cheers, Josh On Tue, Oct 18, 2011 at 7:24 PM, Vivian Shih <vivs at ucla.edu> wrote:> I've only been using R on and off for 9 months and started using the glasso > package for sparse covariance estimation. I know the concept is to shrink > some of the elements of the covariance matrix to zero. > > However, say I have a dataset that I know has some underlying "baseline" > covariance/correlation (say, a value of 0.3), how can I change or > incorporate that into to the code? Basically instead of "let's restrict some > of these elements are 0 and others have different nonzero values", I'd like > it to be "let's restrict some of these elements are 0.3 and others are > different or higher/lower than this 0.3". Does that make any sense? > > > > Here is the glasso code if it helps: >> >> library(glasso) >> glasso > > function (s, rho, zero = NULL, thr = 1e-04, maxit = 10000, approx = FALSE, > ? ?penalize.diagonal = TRUE, start = c("cold", "warm"), w.init = NULL, > ? ?wi.init = NULL, trace = FALSE) > { > ? ?n = nrow(s) > ? ?BIG = 1e+10 > ? ?if (!is.matrix(rho) & length(rho) != 1 & length(rho) != nrow(s)) { > ? ? ? ?stop("Wrong number of elements in rho") > ? ?} > ? ?if (length(rho) == 1 & rho == 0) { > ? ? ? ?cat("With rho=0, there may be convergence problems if the input > matrix s\n is not of full rank", > ? ? ? ? ? ?fill = TRUE) > ? ?} > ? ?if (is.vector(rho)) { > ? ? ? ?rho = matrix(sqrt(rho)) %*% sqrt(rho) > ? ?} > ? ?if (length(rho) == 1) { > ? ? ? ?rho = matrix(rho, ncol = n, nrow = n) > ? ?} > ? ?if (!is.null(zero)) { > ? ? ? ?if (!is.matrix(zero)) { > ? ? ? ? ? ?zero = matrix(zero, nrow = TRUE) > ? ? ? ?} > ? ? ? ?for (k in 1:nrow(zero)) { > ? ? ? ? ? ?i = zero[k, 1] > ? ? ? ? ? ?j = zero[k, 2] > ? ? ? ? ? ?rho[i, j] = BIG > ? ? ? ? ? ?rho[j, i] = BIG > ? ? ? ?} > ? ?} > ? ?start.type = match.arg(start) > ? ?if (start.type == "cold") { > ? ? ? ?is = 0 > ? ? ? ?ww = xx = matrix(0, nrow = n, ncol = n) > ? ?} > ? ?if (start.type == "warm") { > ? ? ? ?is = 1 > ? ? ? ?if (is.null(w.init) | is.null(wi.init)) { > ? ? ? ? ? ?stop("Warm start specified: w.init and wi.init must be non-null") > ? ? ? ?} > ? ? ? ?ww = w.init > ? ? ? ?xx = wi.init > ? ?} > ? ?itrace = 1 * trace > ? ?ipen = 1 * (penalize.diagonal) > ? ?ia = 1 * approx > ? ?mode(rho) = "double" > ? ?mode(s) = "double" > ? ?mode(ww) = "double" > ? ?mode(xx) = "double" > ? ?mode(n) = "integer" > ? ?mode(maxit) = "integer" > ? ?mode(ia) = "integer" > ? ?mode(itrace) = "integer" > ? ?mode(ipen) = "integer" > ? ?mode(is) = "integer" > ? ?mode(thr) = "double" > ? ?junk <- .Fortran("glasso", n, s, rho, ia, is, itrace, ipen, > ? ? ? ?thr, maxit = maxit, ww = ww, xx = xx, niter = integer(1), > ? ? ? ?del = double(1), ierr = integer(1), PACKAGE = "glasso") > ? ?ww = matrix(junk$ww, ncol = n) > ? ?xx = matrix(junk$xx, ncol = n) > ? ?if (junk$ierr != 0) { > ? ? ? ?stop("memory allocation error") > ? ?} > ? ?critfun = function(Sigmahati, s, rho, penalize.diagonal = TRUE) { > ? ? ? ?d = det(Sigmahati) > ? ? ? ?temp = Sigmahati > ? ? ? ?if (!penalize.diagonal) { > ? ? ? ? ? ?diag(temp) = 0 > ? ? ? ?} > ? ? ? ?val = -log(d) + sum(diag(s %*% Sigmahati)) + sum(abs(rho * > ? ? ? ? ? ?temp)) > ? ? ? ?return(val) > ? ?} > ? ?crit = critfun(xx, s, rho, penalize.diagonal) > ? ?return(list(w = ww, wi = xx, loglik = -(n/2) * crit, errflag = junk$ierr, > ? ? ? ?approx = approx, del = junk$del, niter = junk$niter)) > } > > > > If anyone can give any suggestions, I'd greatly appreciate it. Thanks! > > > > > Vivian > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, ATS Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/