Benilton Carvalho
2007-Jan-19 23:17 UTC
[R] Suggestion on how to improve efficiency when using MASS:::hubers on high-dimensional arrays
Hi Everyone, Given the scenario I have, I was wondering if anyone would be able to give me a hind on how to get the results from hubers() in a more efficient way. I have an outcome on an array [N x S x D]. I also have a factor (levels 1,2,3) stored on a matrix N x S. My objective is to get "mu" and "sigma" for each of the N rows (outcome) stratified by the factor (levels 1, 2 and 3) for each of the D "levels", but using MASS:hubers(). Ideally the final result would be an array [N x D x 3 x 2]. The following toy example demonstrates what I want to do, and I'd like to improve the performance when working on my case, where S=400 and N > 200000 Thank you very much for any suggestion. benilton ## begin toy example set.seed(1) N <- 100 S <- 5 D <- 2 outcome <- array(rnorm(N*S*D), dim=c(N, S, D)) classes <- matrix(sample(c(1:3, NA), N*S, rep=T), ncol=S) results <- array(NA, dim=c(N, D, 3, 2)) library(MASS) myHubers <- function(x) if (length(x)>1) as.numeric(hubers(x)) else c(NA, NA) for (n in 1:N) for (d in 1:D){ tmp <- outcome[n,,d] grp <- classes[n,] results[n, d,,] <- t(sapply(split(tmp, factor(grp, levels=1:3)), myHubers)) } ## end -- Benilton Carvalho PhD Candidate Department of Biostatistics Johns Hopkins University
Prof Brian Ripley
2007-Jan-20 07:08 UTC
[R] Suggestion on how to improve efficiency when using MASS:::hubers on high-dimensional arrays
The usual advice would seem to apply here: - profile (the real application, not a toy example) to see where the bottlenecks are. - rewrite those in C. It is quite possible that hubers would benefit in your problem by being rewritten in C. However, there is a reason why it was not. It is a univariate location estimator, and using multiple univariate location estimators is not a robust multivariate location estimator. So in so far as I understand your problem sketch, you would be better off with a multivariate estimator for your N outcomes. There may be problems which need the application of very large numbers of univariate robust estimators, but they are not commonplace. On Fri, 19 Jan 2007, Benilton Carvalho wrote:> Hi Everyone, > > Given the scenario I have, I was wondering if anyone would be able to > give me a hind on how to get the results from hubers() in a more > efficient way. > > I have an outcome on an array [N x S x D]. > > I also have a factor (levels 1,2,3) stored on a matrix N x S. > > My objective is to get "mu" and "sigma" for each of the N rows > (outcome) stratified by the factor (levels 1, 2 and 3) for each of > the D "levels", but using MASS:hubers(). > > Ideally the final result would be an array [N x D x 3 x 2]. > > The following toy example demonstrates what I want to do, and I'd > like to improve the performance when working on my case, where S=400 > and N > 200000 > > Thank you very much for any suggestion. > > benilton > > ## begin toy example > set.seed(1) > N <- 100 > S <- 5 > D <- 2 > > outcome <- array(rnorm(N*S*D), dim=c(N, S, D)) > classes <- matrix(sample(c(1:3, NA), N*S, rep=T), ncol=S) > > results <- array(NA, dim=c(N, D, 3, 2)) > > library(MASS) > myHubers <- function(x) > if (length(x)>1) as.numeric(hubers(x)) else c(NA, NA) > > for (n in 1:N) > for (d in 1:D){ > tmp <- outcome[n,,d] > grp <- classes[n,] > results[n, d,,] <- t(sapply(split(tmp, factor(grp, levels=1:3)), > myHubers)) > } > ## end > > -- > Benilton Carvalho > PhD Candidate > Department of Biostatistics > Johns Hopkins University > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595