Hi all, I experienced some unmatched result using mean function in ffbase package and cannot figure out what's wrong. I have a simulated ff vector with 1000000000 numbers inside and want to calculate its mean. But the results are quite different. With mean( ) function in ffbase package, the mean is 152.6858. But with R's mean( ) or adding sum from chunks directly, I got 667.5595 any idea ? Thank you in advance! Bayes Chen # F1 is an ffdf , F1$X1 is an ff vector> length(F1$X1)[1] 1000000000 # Use mean() function in ffbase package> mean(F1$X1)[1] 152.6858> X2 = F1$X1[] # X2 is now an non-ff vector > length(X2)[1] 1000000000> mean(X2) # R's original mean function for ordinary vectors[1] 667.5595 # calculate sum and then mean by chunks> chunks = chunk(F1$X1, by=5000000) > sumx = 0 > for (i in chunks) {+ sumx = sumx + sum(F1$X1[i]) + }> sumx/length(F1$X1)[1] 667.5595 ----------------------------------- below are some other trials> X2 = F1$X1[1:1000000] > mean(X2)[1] 59.43149> mean(as.ff(X2))[1] 59.43149> X2 = F1$X1[1:100000000] > mean(X2)[1] 59.41978> mean(as.ff(X2))[1] 59.42128> X2 = F1$X1[1:500000000] > mean(X2)[1] 60.53615> mean(as.ff(X2))[1] 57.72168> X2 = F1$X1[1:750000000] > mean(X2)[1] 59.37562> mean(as.ff(X2))[1] 57.81179> X2 = F1$X1[1:900000000] > mean(X2)[1] 57.0867> mean(as.ff(X2))[1] 57.44862> X3 = F1$X1[900000000:1000000000] > mean(X3)[1] 6161.814> mean(as.ff(X3))[1] 6161.797 [[alternative HTML version deleted]]
Milan Bouchet-Valat
2013-Aug-02 19:52 UTC
[R] problem about mean function in ffbase package
Le jeudi 01 ao?t 2013 ? 00:10 +0800, Chaos Chen a ?crit :> Hi all, > > I experienced some unmatched result using mean function in ffbase package > and cannot figure out what's wrong. > > I have a simulated ff vector with 1000000000 numbers inside and want to > calculate its mean. But the results are quite different. > > With mean( ) function in ffbase package, the mean is 152.6858. > But with R's mean( ) or adding sum from chunks directly, I got 667.5595 > > any idea ? Thank you in advance!Could you provide a fully reproducible example with a shorter vector (I cannot create such a large vector on my box)? Use set.seed() so that runif() gives exactly the same values.>From quick tests here, the problem does not appear.Regards> Bayes Chen > > # F1 is an ffdf , F1$X1 is an ff vector > > length(F1$X1) > [1] 1000000000 > > # Use mean() function in ffbase package > > mean(F1$X1) > [1] 152.6858 > > > X2 = F1$X1[] #X2 is now an non-ff vector > > length(X2) > [1] 1000000000 > > mean(X2) # R's original mean function for ordinary vectors > [1] 667.5595 > > # calculate sum and then mean by chunks > > chunks = chunk(F1$X1, by=5000000) > > sumx = 0 > > for (i in chunks) { > + sumx = sumx + sum(F1$X1[i]) > + } > > sumx/length(F1$X1) > [1] 667.5595 > > ----------------------------------- below are some other trials > > X2 = F1$X1[1:1000000] > > mean(X2) > [1] 59.43149 > > mean(as.ff(X2)) > [1] 59.43149 > > > X2 = F1$X1[1:100000000] > > mean(X2) > [1] 59.41978 > > mean(as.ff(X2)) > [1] 59.42128 > > > X2 = F1$X1[1:500000000] > > mean(X2) > [1] 60.53615 > > mean(as.ff(X2)) > [1] 57.72168 > > > X2 = F1$X1[1:750000000] > > mean(X2) > [1] 59.37562 > > mean(as.ff(X2)) > [1] 57.81179 > > > X2 = F1$X1[1:900000000] > > mean(X2) > [1] 57.0867 > > mean(as.ff(X2)) > [1] 57.44862 > > > X3 = F1$X1[900000000:1000000000] > > mean(X3) > [1] 6161.814 > > mean(as.ff(X3)) > [1] 6161.797 > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Milan Bouchet-Valat
2013-Aug-02 20:00 UTC
[R] problem about mean function in ffbase package
Le jeudi 01 ao?t 2013 ? 00:10 +0800, Chaos Chen a ?crit :> Hi all, > > I experienced some unmatched result using mean function in ffbase package > and cannot figure out what's wrong. > > I have a simulated ff vector with 1000000000 numbers inside and want to > calculate its mean. But the results are quite different. > > With mean( ) function in ffbase package, the mean is 152.6858. > But with R's mean( ) or adding sum from chunks directly, I got 667.5595 > > any idea ? Thank you in advance!Could you provide a fully reproducible example with a shorter vector (I cannot create such a large vector on my box)? Use set.seed() so that runif() gives exactly the same values.>From quick tests here, the problem does not appear.Regards> Bayes Chen > > # F1 is an ffdf , F1$X1 is an ff vector > > length(F1$X1) > [1] 1000000000 > > # Use mean() function in ffbase package > > mean(F1$X1) > [1] 152.6858 > > > X2 = F1$X1[] #X2 is now an non-ff vector > > length(X2) > [1] 1000000000 > > mean(X2) # R's original mean function for ordinary vectors > [1] 667.5595 > > # calculate sum and then mean by chunks > > chunks = chunk(F1$X1, by=5000000) > > sumx = 0 > > for (i in chunks) { > + sumx = sumx + sum(F1$X1[i]) > + } > > sumx/length(F1$X1) > [1] 667.5595 > > ----------------------------------- below are some other trials > > X2 = F1$X1[1:1000000] > > mean(X2) > [1] 59.43149 > > mean(as.ff(X2)) > [1] 59.43149 > > > X2 = F1$X1[1:100000000] > > mean(X2) > [1] 59.41978 > > mean(as.ff(X2)) > [1] 59.42128 > > > X2 = F1$X1[1:500000000] > > mean(X2) > [1] 60.53615 > > mean(as.ff(X2)) > [1] 57.72168 > > > X2 = F1$X1[1:750000000] > > mean(X2) > [1] 59.37562 > > mean(as.ff(X2)) > [1] 57.81179 > > > X2 = F1$X1[1:900000000] > > mean(X2) > [1] 57.0867 > > mean(as.ff(X2)) > [1] 57.44862 > > > X3 = F1$X1[900000000:1000000000] > > mean(X3) > [1] 6161.814 > > mean(as.ff(X3)) > [1] 6161.797 > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.