Markus Schmidberger
2008-Sep-11  13:49 UTC
[R] different results form summarization by loop and sum or rowMeans function
Hi,
I found different results calculating the rowMeans by the function 
rowMeans() and a simple for-loop. The differences are very low. But 
after this calculation I will start some optimization algorithms (BFGS 
or CG) and there I get huge differences (from the small changes in the 
beginning or start values, I changed nothing else in the code).
How I can avoid these differences between sum-loops and sum-functions?
Attached a small testcode using data form Bioconductor.
Best
Markus
library(affy)
data(affybatch.example)
mat <- exprs(affybatch.example)[1:100,1:3]
mat <- exp(1)*mat
mat <- asinh(mat)
rowM1<- rowMeans(mat)
t=rep(0,100) # Vektor mit 0en
for(i in 1:100){
   for(j in 1:3)
       t[i] <- t[i] + mat[i,j]
}
rowM2 <- t/3
m1 <- mat - rowM1
m2 <- mat -rowM2
print(m1-m2)
sessionInfo()
R version 2.7.1 (2008-06-23)
i386-pc-mingw32
locale:
LC_COLLATE=German_Germany.1252;LC_CTYPE=German_Germany.1252;LC_MONETARY=German_Germany.1252;LC_NUMERIC=C;LC_TIME=German_Germany.1252
attached base packages:
[1] tools     stats     graphics  grDevices utils     datasets  methods 
[8] base    
other attached packages:
[1] affy_1.18.2          preprocessCore_1.2.0 affyio_1.8.0       
[4] Biobase_2.0.1      
-- 
Dipl.-Tech. Math. Markus Schmidberger
Ludwig-Maximilians-Universit?t M?nchen
IBE - Institut f?r medizinische Informationsverarbeitung,
Biometrie und Epidemiologie
Marchioninistr. 15, D-81377 Muenchen
URL: http://ibe.web.med.uni-muenchen.de 
Mail: Markus.Schmidberger [at] ibe.med.uni-muenchen.de
Tel: +49 (089) 7095 - 4599
jim holtman
2008-Sep-11  13:59 UTC
[R] different results form summarization by loop and sum or rowMeans function
How low is "very low"? This is probably answered by FAQ 7.31 On Thu, Sep 11, 2008 at 9:49 AM, Markus Schmidberger <schmidb at ibe.med.uni-muenchen.de> wrote:> Hi, > > I found different results calculating the rowMeans by the function > rowMeans() and a simple for-loop. The differences are very low. But after > this calculation I will start some optimization algorithms (BFGS or CG) and > there I get huge differences (from the small changes in the beginning or > start values, I changed nothing else in the code). > How I can avoid these differences between sum-loops and sum-functions? > > Attached a small testcode using data form Bioconductor. > > Best > Markus > > > library(affy) > data(affybatch.example) > mat <- exprs(affybatch.example)[1:100,1:3] > mat <- exp(1)*mat > mat <- asinh(mat) > > rowM1<- rowMeans(mat) > > t=rep(0,100) # Vektor mit 0en > for(i in 1:100){ > for(j in 1:3) > t[i] <- t[i] + mat[i,j] > } > rowM2 <- t/3 > > m1 <- mat - rowM1 > m2 <- mat -rowM2 > > print(m1-m2) > > sessionInfo() > R version 2.7.1 (2008-06-23) > i386-pc-mingw32 > > locale: > LC_COLLATE=German_Germany.1252;LC_CTYPE=German_Germany.1252;LC_MONETARY=German_Germany.1252;LC_NUMERIC=C;LC_TIME=German_Germany.1252 > > attached base packages: > [1] tools stats graphics grDevices utils datasets methods [8] > base > other attached packages: > [1] affy_1.18.2 preprocessCore_1.2.0 affyio_1.8.0 [4] > Biobase_2.0.1 > -- > Dipl.-Tech. Math. Markus Schmidberger > > Ludwig-Maximilians-Universit?t M?nchen > IBE - Institut f?r medizinische Informationsverarbeitung, > Biometrie und Epidemiologie > Marchioninistr. 15, D-81377 Muenchen > URL: http://ibe.web.med.uni-muenchen.de Mail: Markus.Schmidberger [at] > ibe.med.uni-muenchen.de > Tel: +49 (089) 7095 - 4599 > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
Prof Brian Ripley
2008-Sep-11  14:24 UTC
[R] different results form summarization by loop and sum or rowMeans function
On Thu, 11 Sep 2008, Markus Schmidberger wrote:> Hi, > > I found different results calculating the rowMeans by the function rowMeans() > and a simple for-loop. The differences are very low. But after thisIndeed, but the C code (rowMeans) is likely to be more accurate as it uses an extended-precision accumulator.> calculation I will start some optimization algorithms (BFGS or CG) and there > I get huge differences (from the small changes in the beginning or start > values, I changed nothing else in the code). > How I can avoid these differences between sum-loops and sum-functions?You cannot. What you can do is work on making what you do with these inputs numerically stable: unless you do so your end results will have very little value. (For example, are you finding different local minima, in which case you need to decide how to treat that possibility?) I suggest reading an introductory book on Numerical Analysis, or Monahan, J. F. (2001) Numerical Methods of Statistics. Cambridge: Cambridge. Chapter 2. or Press,W. H., Teukolsky, S. A., Vetterling, W. T. and Flannery, B. P. (2007) Numerical Recipes. The Art of Scientific Programming. Third Edition. Cambridge. Section 1.1 (I think).> Attached a small testcode using data form Bioconductor. > > Best > Markus > > > library(affy) > data(affybatch.example) > mat <- exprs(affybatch.example)[1:100,1:3] > mat <- exp(1)*mat > mat <- asinh(mat) > > rowM1<- rowMeans(mat) > > t=rep(0,100) # Vektor mit 0en > for(i in 1:100){ > for(j in 1:3) > t[i] <- t[i] + mat[i,j] > } > rowM2 <- t/3 > > m1 <- mat - rowM1 > m2 <- mat -rowM2 > > print(m1-m2) > > sessionInfo() > R version 2.7.1 (2008-06-23) > i386-pc-mingw32 > > locale: > LC_COLLATE=German_Germany.1252;LC_CTYPE=German_Germany.1252;LC_MONETARY=German_Germany.1252;LC_NUMERIC=C;LC_TIME=German_Germany.1252 > > attached base packages: > [1] tools stats graphics grDevices utils datasets methods [8] > base > other attached packages: > [1] affy_1.18.2 preprocessCore_1.2.0 affyio_1.8.0 [4] > Biobase_2.0.1 > -- > Dipl.-Tech. Math. Markus Schmidberger > > Ludwig-Maximilians-Universit?t M?nchen > IBE - Institut f?r medizinische Informationsverarbeitung, > Biometrie und Epidemiologie > Marchioninistr. 15, D-81377 Muenchen > URL: http://ibe.web.med.uni-muenchen.de Mail: Markus.Schmidberger [at] > ibe.med.uni-muenchen.de > Tel: +49 (089) 7095 - 4599 > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595