Markus Schmidberger
2008-Sep-11 13:49 UTC
[R] different results form summarization by loop and sum or rowMeans function
Hi, I found different results calculating the rowMeans by the function rowMeans() and a simple for-loop. The differences are very low. But after this calculation I will start some optimization algorithms (BFGS or CG) and there I get huge differences (from the small changes in the beginning or start values, I changed nothing else in the code). How I can avoid these differences between sum-loops and sum-functions? Attached a small testcode using data form Bioconductor. Best Markus library(affy) data(affybatch.example) mat <- exprs(affybatch.example)[1:100,1:3] mat <- exp(1)*mat mat <- asinh(mat) rowM1<- rowMeans(mat) t=rep(0,100) # Vektor mit 0en for(i in 1:100){ for(j in 1:3) t[i] <- t[i] + mat[i,j] } rowM2 <- t/3 m1 <- mat - rowM1 m2 <- mat -rowM2 print(m1-m2) sessionInfo() R version 2.7.1 (2008-06-23) i386-pc-mingw32 locale: LC_COLLATE=German_Germany.1252;LC_CTYPE=German_Germany.1252;LC_MONETARY=German_Germany.1252;LC_NUMERIC=C;LC_TIME=German_Germany.1252 attached base packages: [1] tools stats graphics grDevices utils datasets methods [8] base other attached packages: [1] affy_1.18.2 preprocessCore_1.2.0 affyio_1.8.0 [4] Biobase_2.0.1 -- Dipl.-Tech. Math. Markus Schmidberger Ludwig-Maximilians-Universit?t M?nchen IBE - Institut f?r medizinische Informationsverarbeitung, Biometrie und Epidemiologie Marchioninistr. 15, D-81377 Muenchen URL: http://ibe.web.med.uni-muenchen.de Mail: Markus.Schmidberger [at] ibe.med.uni-muenchen.de Tel: +49 (089) 7095 - 4599
jim holtman
2008-Sep-11 13:59 UTC
[R] different results form summarization by loop and sum or rowMeans function
How low is "very low"? This is probably answered by FAQ 7.31 On Thu, Sep 11, 2008 at 9:49 AM, Markus Schmidberger <schmidb at ibe.med.uni-muenchen.de> wrote:> Hi, > > I found different results calculating the rowMeans by the function > rowMeans() and a simple for-loop. The differences are very low. But after > this calculation I will start some optimization algorithms (BFGS or CG) and > there I get huge differences (from the small changes in the beginning or > start values, I changed nothing else in the code). > How I can avoid these differences between sum-loops and sum-functions? > > Attached a small testcode using data form Bioconductor. > > Best > Markus > > > library(affy) > data(affybatch.example) > mat <- exprs(affybatch.example)[1:100,1:3] > mat <- exp(1)*mat > mat <- asinh(mat) > > rowM1<- rowMeans(mat) > > t=rep(0,100) # Vektor mit 0en > for(i in 1:100){ > for(j in 1:3) > t[i] <- t[i] + mat[i,j] > } > rowM2 <- t/3 > > m1 <- mat - rowM1 > m2 <- mat -rowM2 > > print(m1-m2) > > sessionInfo() > R version 2.7.1 (2008-06-23) > i386-pc-mingw32 > > locale: > LC_COLLATE=German_Germany.1252;LC_CTYPE=German_Germany.1252;LC_MONETARY=German_Germany.1252;LC_NUMERIC=C;LC_TIME=German_Germany.1252 > > attached base packages: > [1] tools stats graphics grDevices utils datasets methods [8] > base > other attached packages: > [1] affy_1.18.2 preprocessCore_1.2.0 affyio_1.8.0 [4] > Biobase_2.0.1 > -- > Dipl.-Tech. Math. Markus Schmidberger > > Ludwig-Maximilians-Universit?t M?nchen > IBE - Institut f?r medizinische Informationsverarbeitung, > Biometrie und Epidemiologie > Marchioninistr. 15, D-81377 Muenchen > URL: http://ibe.web.med.uni-muenchen.de Mail: Markus.Schmidberger [at] > ibe.med.uni-muenchen.de > Tel: +49 (089) 7095 - 4599 > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
Prof Brian Ripley
2008-Sep-11 14:24 UTC
[R] different results form summarization by loop and sum or rowMeans function
On Thu, 11 Sep 2008, Markus Schmidberger wrote:> Hi, > > I found different results calculating the rowMeans by the function rowMeans() > and a simple for-loop. The differences are very low. But after thisIndeed, but the C code (rowMeans) is likely to be more accurate as it uses an extended-precision accumulator.> calculation I will start some optimization algorithms (BFGS or CG) and there > I get huge differences (from the small changes in the beginning or start > values, I changed nothing else in the code). > How I can avoid these differences between sum-loops and sum-functions?You cannot. What you can do is work on making what you do with these inputs numerically stable: unless you do so your end results will have very little value. (For example, are you finding different local minima, in which case you need to decide how to treat that possibility?) I suggest reading an introductory book on Numerical Analysis, or Monahan, J. F. (2001) Numerical Methods of Statistics. Cambridge: Cambridge. Chapter 2. or Press,W. H., Teukolsky, S. A., Vetterling, W. T. and Flannery, B. P. (2007) Numerical Recipes. The Art of Scientific Programming. Third Edition. Cambridge. Section 1.1 (I think).> Attached a small testcode using data form Bioconductor. > > Best > Markus > > > library(affy) > data(affybatch.example) > mat <- exprs(affybatch.example)[1:100,1:3] > mat <- exp(1)*mat > mat <- asinh(mat) > > rowM1<- rowMeans(mat) > > t=rep(0,100) # Vektor mit 0en > for(i in 1:100){ > for(j in 1:3) > t[i] <- t[i] + mat[i,j] > } > rowM2 <- t/3 > > m1 <- mat - rowM1 > m2 <- mat -rowM2 > > print(m1-m2) > > sessionInfo() > R version 2.7.1 (2008-06-23) > i386-pc-mingw32 > > locale: > LC_COLLATE=German_Germany.1252;LC_CTYPE=German_Germany.1252;LC_MONETARY=German_Germany.1252;LC_NUMERIC=C;LC_TIME=German_Germany.1252 > > attached base packages: > [1] tools stats graphics grDevices utils datasets methods [8] > base > other attached packages: > [1] affy_1.18.2 preprocessCore_1.2.0 affyio_1.8.0 [4] > Biobase_2.0.1 > -- > Dipl.-Tech. Math. Markus Schmidberger > > Ludwig-Maximilians-Universit?t M?nchen > IBE - Institut f?r medizinische Informationsverarbeitung, > Biometrie und Epidemiologie > Marchioninistr. 15, D-81377 Muenchen > URL: http://ibe.web.med.uni-muenchen.de Mail: Markus.Schmidberger [at] > ibe.med.uni-muenchen.de > Tel: +49 (089) 7095 - 4599 > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595