Paul Johnson
2012-Dec-12 18:14 UTC
[Rd] R-2.15.2 changes in computation speed. Numerical precision?
Speaking of optimization and speeding up R calculations... I mentioned last week I want to speed up calculation of generalized inverses. On Debian Wheezy with R-2.15.2, I see a huge speedup using a souped up generalized inverse algorithm published by V. N. Katsikis, D. Pappas, Fast computing of theMoore-Penrose inverse matrix, Electronic Journal of Linear Algebra, 17(2008), 637-650. I was so delighted to see the computation time drop on my Debian system that I boasted to the WIndows users and gave them a test case. They answered back "there's no benefits, plus Windows is faster than Linux". That sent me off on a bit of a goose chase, but I think I'm beginning to understand the situation. I believe R-2.15.2 introduced a tighter requirement for precision, thus triggering longer-lasting calculations in many example scripts. Better algorithms can avoid some of that slowdown, as you see in this test case. Here is the test code you can run to see: http://pj.freefaculty.org/scraps/profile/prof-puzzle-1.R It downloads a data file from that same directory and then runs some multiple imputations with the Amelia package. Here's the output from my computer http://pj.freefaculty.org/scraps/profile/prof-puzzle-1.Rout That includes the profile of the calculations that depend on the ordinary generalized inverse algorithm based on svd and the new one. See? The KP algorithm is faster. And just as accurate as Amelia:::mpinv or MASS::ginv (for details on that, please review my notes in http://pj.freefaculty.org/scraps/profile/qrginv.R). So I asked WIndows users for more detailed feedback, including sessionInfo(), and I noticed that my proposed algorithm is not faster on Windows--WITH OLD R! Here's the script output with R-2.15.0, shows no speedup from the KPginv algorithm http://pj.freefaculty.org/scraps/profile/prof-puzzle-1-Windows.Rout On the same machine, I updated to R-2.15.2, and we see the same speedup from the KPginv algorithm http://pj.freefaculty.org/scraps/profile/prof-puzzle-1-CRMDA02-WinR2.15.2.Rout After that, I realized it is an R version change, not an OS difference, I was a bit relieved. What causes the difference in this case? In the Amelia code, they try to avoid doing the generalized inverse by using the ordinary solve(), and if that fails, then they do the generalized inverse. In R 2.15.0, the near singularity of the matrix is ignored, but not in R 2.15.2. The ordinary solve is failing almost all the time, thus triggering the use of the svd based generalized inverse. Which is slower. The Katsikis and Pappas 2008 algorithm is the fastest one I've found after translating from Matlab to R. It is not so universally applicable as svd based methods, it will fail if there are linearly dependent columns. However, it does tolerate columns of all zeros, which seems to be the problem case in the particular application I am testing. I tried very hard to get the newer algorithm described here to go as fast, but it is way way slower, at least in the implementations I tried: ## KPP ## Vasilios N. Katsikis, Dimitrios Pappas, Athanassios Petralias. "An improved method for ## the computation of the Moore Penrose inverse matrix," Applied ## Mathematics and Computation, 2011 The notes on that are in the qrginv.R file linked above. The fact that I can't make that newer KPP algorithm go faster, although the authors show it can go faster in Matlab, leads me to a bunch of other questions and possibly the need to implement all of this in C with LAPACK or EIGEN or something like that, but at this point, I've got to return to my normal job. If somebody is good at R's .Call interface and can make a pure C implementation of KPP. I think the key thing is that with R-2.15.2, there is an svd-related bottleneck in the multiple imputation algorithms in Amelia. The replacement version of the function Amelia:::mpinv does reclaim a 30% time saving, while generating imputations that are identical, so far as i can tell. pj -- Paul E. Johnson Professor, Political Science Assoc. Director 1541 Lilac Lane, Room 504 Center for Research Methods University of Kansas University of Kansas http://pj.freefaculty.org http://quant.ku.edu
Uwe Ligges
2012-Dec-13 09:33 UTC
[Rd] R-2.15.2 changes in computation speed. Numerical precision?
Long message, but as far as I can see, this is not about base R but the contributed package Amelia: Please discuss possible improvements with its maintainer. Best, Uwe Ligges On 12.12.2012 19:14, Paul Johnson wrote:> Speaking of optimization and speeding up R calculations... > > I mentioned last week I want to speed up calculation of generalized > inverses. On Debian Wheezy with R-2.15.2, I see a huge speedup using a > souped up generalized inverse algorithm published by > > V. N. Katsikis, D. Pappas, Fast computing of theMoore-Penrose inverse > matrix, Electronic Journal of Linear Algebra, > 17(2008), 637-650. > > I was so delighted to see the computation time drop on my Debian > system that I boasted to the WIndows users and gave them a test case. > They answered back "there's no benefits, plus Windows is faster than > Linux". > > That sent me off on a bit of a goose chase, but I think I'm beginning > to understand the situation. I believe R-2.15.2 introduced a tighter > requirement for precision, thus triggering longer-lasting calculations > in many example scripts. Better algorithms can avoid some of that > slowdown, as you see in this test case. > > Here is the test code you can run to see: > > http://pj.freefaculty.org/scraps/profile/prof-puzzle-1.R > > It downloads a data file from that same directory and then runs some > multiple imputations with the Amelia package. > > Here's the output from my computer > > http://pj.freefaculty.org/scraps/profile/prof-puzzle-1.Rout > > That includes the profile of the calculations that depend on the > ordinary generalized inverse algorithm based on svd and the new one. > > See? The KP algorithm is faster. And just as accurate as > Amelia:::mpinv or MASS::ginv (for details on that, please review my > notes in http://pj.freefaculty.org/scraps/profile/qrginv.R). > > So I asked WIndows users for more detailed feedback, including > sessionInfo(), and I noticed that my proposed algorithm is not faster > on Windows--WITH OLD R! > > Here's the script output with R-2.15.0, shows no speedup from the > KPginv algorithm > > http://pj.freefaculty.org/scraps/profile/prof-puzzle-1-Windows.Rout > > On the same machine, I updated to R-2.15.2, and we see the same > speedup from the KPginv algorithm > > http://pj.freefaculty.org/scraps/profile/prof-puzzle-1-CRMDA02-WinR2.15.2.Rout > > After that, I realized it is an R version change, not an OS > difference, I was a bit relieved. > > What causes the difference in this case? In the Amelia code, they try > to avoid doing the generalized inverse by using the ordinary solve(), > and if that fails, then they do the generalized inverse. In R 2.15.0, > the near singularity of the matrix is ignored, but not in R 2.15.2. > The ordinary solve is failing almost all the time, thus triggering the > use of the svd based generalized inverse. Which is slower. > > The Katsikis and Pappas 2008 algorithm is the fastest one I've found > after translating from Matlab to R. It is not so universally > applicable as svd based methods, it will fail if there are linearly > dependent columns. However, it does tolerate columns of all zeros, > which seems to be the problem case in the particular application I am > testing. > > I tried very hard to get the newer algorithm described here to go as > fast, but it is way way slower, at least in the implementations I > tried: > ## KPP > ## Vasilios N. Katsikis, Dimitrios Pappas, Athanassios Petralias. "An > improved method for > ## the computation of the Moore Penrose inverse matrix," Applied > ## Mathematics and Computation, 2011 > > The notes on that are in the qrginv.R file linked above. > > The fact that I can't make that newer KPP algorithm go faster, > although the authors show it can go faster in Matlab, leads me to a > bunch of other questions and possibly the need to implement all of > this in C with LAPACK or EIGEN or something like that, but at this > point, I've got to return to my normal job. If somebody is good at > R's .Call interface and can make a pure C implementation of KPP. > > I think the key thing is that with R-2.15.2, there is an svd-related > bottleneck in the multiple imputation algorithms in Amelia. The > replacement version of the function Amelia:::mpinv does reclaim a 30% > time saving, while generating imputations that are identical, so far > as i can tell. > > pj > > >
Uwe Ligges
2012-Dec-14 17:19 UTC
[Rd] R-2.15.2 changes in computation speed. Numerical precision?
On 14.12.2012 18:11, Dirk Eddelbuettel wrote:> > On 14 December 2012 at 18:07, Uwe Ligges wrote: > | without overhead of packages. The CRAN check times of > 4000 packages > | are typically a good indicator, and they are a bit slower for R-2.15.2Please do not quote only parts of my sentences, that one was continued: "but not that it is generally worrying (since we also run more checks)." Thanks, Uwe> And sadly less so when you force us to turn tests off. > Dirk >
Dirk Eddelbuettel
2012-Dec-14 17:32 UTC
[Rd] R-2.15.2 changes in computation speed. Numerical precision?
On 14 December 2012 at 18:19, Uwe Ligges wrote: | | | On 14.12.2012 18:11, Dirk Eddelbuettel wrote: | > | > On 14 December 2012 at 18:07, Uwe Ligges wrote: | > | without overhead of packages. The CRAN check times of > 4000 packages | > | are typically a good indicator, and they are a bit slower for R-2.15.2 | | Please do not quote only parts of my sentences, that one was continued: | | "but not that it is generally worrying (since we also run more checks)." I understand the resource constraint, but the fact remains that on the CRAN side most-visible package I am involved with now runs _way fewer_ tests than it used to, making these very comparisons across time a little tricky. Rock, meet hard place. In an ideal world we had way more tests, and comparisons among them. But time is, alas, finite... Dirk -- Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com
Possibly Parallel Threads
- Understanding svd usage and its necessity in generalized inverse calculation
- Debian packaging and openblas related crash when profiling in R
- device "mismatch", coordinates trouble with X11 and pdf devices
- portableParalleSeeds Package violation, CRAN exception?
- shade overlapping portions of circles (or other shapes)