Aaron J. Mackey
2004-Jul-06 12:17 UTC
[R] vectorizing sapply() code (Modified by Aaron J. Mackey)
[ Not sure why, but the first two times I sent this it never seemed to go through; apologies if you're seeing this thrice ... ] I have some fully functional code that I'm guessing can be done better/quicker with some savvy R vector tricks; any help to make this run a bit faster would be greatly appreciated; I'm particularly stuck on how to calculate using "row-wise" vectors without iterating explicitly over the dataframe or table ... library(stats4); d <- data.frame( ix=c(0,1,2,3,4,5,6,7), ct=c(253987, 9596, 18680, 2630, 8224, 3590, 5534, 18937), A=c( 0, 1, 0, 1, 0, 1, 0, 1), B=c( 0, 0, 1, 1, 0, 0, 1, 1), C=c( 0, 0, 0, 0, 1, 1, 1, 1) ); ct <- round(logb(length(d$ix), 2)) ll <- function( th=0.5, a1=log(0.5), a2=log(0.5), a3=log(0.5), b1=log(0.5), b2=log(0.5), b3=log(0.5) ) { a <- exp(sapply(1:ct, function (x) { get(paste("a", x, sep="")) })); b <- exp(sapply(1:ct, function (x) { get(paste("b", x, sep="")) })); -sum( d$ct * log( sapply( d$ix, function (ix, th, a, b) { x <- d[ix+1,3:(ct+2)] (th * prod((b ^ (1-x)) * ((1-b) ^ x ))) + ((1-th) * prod((a ^ x ) * ((1-a) ^ (1-x)))) }, th, a, b ) ) ); } ml <- mle(ll, lower=c(0+1e-5, rep(log(0+1e-8), 2*ct)), upper=c(1-1e-5, rep(log(1-1e-8), 2*ct)), method="L-BFGS-B" ); For those interested in the math, this is the MLE procedure to estimate the false positive/false negative rates (a and b) of three diagnostic (A, B and C) tests that have the observed performance recapitulated in dataframe "d", but no "gold standard" (sometimes called "latent class analysis", or LCA). Thanks for any help, -Aaron
Aaron J. Mackey
2004-Jul-06 12:17 UTC
[R] vectorizing sapply() code (Modified by Aaron J. Mackey)
[ Not sure why, but the first time I sent this it never seemed to go through; apologies if you're seeing this twice ... ] I have some fully functional code that I'm guessing can be done better/quicker with some savvy R vector tricks; any help to make this run a bit faster would be greatly appreciated; I'm particularly stuck on how to calculate using "row-wise" vectors without iterating explicitly over the dataframe or table ... library(stats4); d <- data.frame( ix=c(0,1,2,3,4,5,6,7), ct=c(253987, 9596, 18680, 2630, 8224, 3590, 5534, 18937), A=c( 0, 1, 0, 1, 0, 1, 0, 1), B=c( 0, 0, 1, 1, 0, 0, 1, 1), C=c( 0, 0, 0, 0, 1, 1, 1, 1) ); ct <- round(logb(length(d$ix), 2)) ll <- function( th=0.5, a1=log(0.5), a2=log(0.5), a3=log(0.5), b1=log(0.5), b2=log(0.5), b3=log(0.5) ) { a <- exp(sapply(1:ct, function (x) { get(paste("a", x, sep="")) })); b <- exp(sapply(1:ct, function (x) { get(paste("b", x, sep="")) })); -sum( d$ct * log( sapply( d$ix, function (ix, th, a, b) { x <- d[ix+1,3:(ct+2)] (th * prod((b ^ (1-x)) * ((1-b) ^ x ))) + ((1-th) * prod((a ^ x ) * ((1-a) ^ (1-x)))) }, th, a, b ) ) ); } ml <- mle(ll, lower=c(0+1e-5, rep(log(0+1e-8), 2*ct)), upper=c(1-1e-5, rep(log(1-1e-8), 2*ct)), method="L-BFGS-B" ); For those interested in the math, this is the MLE procedure to estimate the false positive/false negative rates (a and b) of three diagnostic (A, B and C) tests that have the observed performance recapitulated in dataframe "d", but no "gold standard" (sometimes called "latent class analysis", or LCA). Thanks for any help, -Aaron