ripley at stats.ox.ac.uk
2007-May-01 14:02 UTC
[Rd] (PR#9623) qr.coef: permutes dimnames; inserts NA; promises
On Thu, 19 Apr 2007, brech at delphioutpost.com wrote:> Full_Name: Christian Brechbuehler > Version: 2.4.1 Patched (2007-03-25 r40917) > OS: Linux 2.6.15-27-adm64-xeon; Ubuntu 6.06.1 LTS > Submission from: (NULL) (24.61.47.236) > > > Splus and R have different ideas about what qr.coef(qr()) should return, > which is fine... but I believe that R has a bug in that it is not > internally consistent, and another separate bug in the documentation.I agree with the bug in the dimnames, and and have corrected that in 2.5.0 patched. But I think the rest stems from the following misunderstanding:> in math, zero times anything is zero, but in R, NA times > anything (even zero) is NA. This seems somewhat inconvenient.That just is not true. In 'math' it would be true for the real line, but R is working with the computer version of the extended real line. In 'math' 0 * Inf is not zero, and an NA value could be infinite. Stemming from this, what R is reporting is that certain columns are not used in the calculation. There is a difference between computing a coefficient as zero, and excluding that column from the calculation (for which R uses NA, because what happens if it were included is unknown). You are assuming that you can find a linear predictor by multiplying a matrix by the coefficients. That is not true in R (especially for linear models), and it is not said to be so in ?qr.coef.> In particular, on this problem, Splus generates a permuted solution > vector: > > Splus 6.2.1: > > A <- matrix(c(0,0,0, 1,1,1), nrow=3,dimnames=list(NULL, c("zero", > "one"))) > > A > zero one > [1,] 0 1 > [2,] 0 1 > [3,] 0 1 > > y <- matrix(c(6,7,8), nrow=3, dimnames=list(NULL, "y")) > > y > y > [1,] 6 > [2,] 7 > [3,] 8 > > x <- qr.coef(qr(A), y) > > x > y > one 7 > zero 0 > > > > To my taste, the answer from Splus is unexpected; I was looking for > this: > > > x[qr(A)$pivot,,drop=F] > y > zero 0 > one 7 > > But at least the answer is internally consistent: the values and the > row names were scrambled in corresponding ways. > > However, R seems to helpfully un-permute the data values, but forgets > to un-permute the dimnames, which seems broken. > > R version 2.4.1 Patched (2007-03-25 r40917) > > A <- matrix(c(0,0,0, 1,1,1), nrow=3,dimnames=list(NULL, c("zero", > "one"))) > > y <- matrix(c(6,7,8), nrow=3, dimnames=list(NULL, "y")) > > x <- qr.coef(qr(A), y) > > x > y > one NA > zero 7 > > I think the answer from qr.coef() should look more like this: > > > x.wanted > y > zero NA > one 7 > > > > In addition, R uses NA to fill in rather than zero. NA for this > problem above might mean "undefined": any value will do. But given > the application of this function, where the solution might be > multiplied by the matrix, any NA will cause that to turn into an NA > result... in math, zero times anything is zero, but in R, NA times > anything (even zero) is NA. This seems somewhat inconvenient. So I'd really > like > R to return this: > > > x > y > zero 0 > one 7 > > More on the scrambling; I see in dqrsl.f that: > > c if pivoting was requested in dqrdc, the j-th > c component of b will be associated with column jpvt(j) > c of the original matrix x that was input into dqrdc.) > > which I think is referring to the helpful un-permuting, but I think > qr.coef() in R needs to correspondingly un-permute the dimnames as > well? > > Code from qr.coef: > if (!is.null(nam <- colnames(qr$qr))) > rownames(coef) <- nam > Proposed fix: That second line could be changed to > rownames(coef)[qr$pivot] <- nam > > > Another issue (either with the documentation or the code) is as follows. > In R, help(qr) promises: > > 'solve.qr' is the method for 'solve' for 'qr' objects. 'qr.solve' > solves systems of equations via the QR decomposition: if 'a' is a > QR decomposition it is the same as 'solve.qr', but if 'a' is a > rectangular matrix the QR decomposition is computed first. Either > will handle over- and under-determined systems, providing a > minimal-length solution or a least-squares fit if appropriate. > > So unlike Splus, R promises a minimal-length solution, but I don't > think it delivers that. For example, let's get the minimal-length > solution for the following under-determined system: > > x1 + 2*x2 == 5 > > qr.solve would fail due to the singular matrix, so use qr.coef: > > > qr.coef(qr(t(1:2)),c(5)) > > R: > [1] 5 NA > > Splus: > [1] 5 0 > > 1*5 + 2*0 does equal 5, so modulo NA, this is a solution, but it is > NOT minimal-length. OTOH, svd gives the right answer in both Splus and > R: > > > d <- svd(t(1:2)); c(5) %*% d$u %*% diag(1/d$d, length(d$d)) %*% t(d$v) > [,1] [,2] > [1,] 1 2 > > This *is* minimum length, and 1*1 + 2*2 == 5. So either qr.coef(qr()) > should deliver that, or the documentation should clarify when a > minimal-length solution is delivered? > > And revisiting the NA issue... in this problem, for the qr.coef() > result to be a solution, that 2nd value MUST be zero, no other values > will work. So the NA seems wrong here, more clearly than in the other > example. > > Finally, a simpler test case that shows the same issues with a 2x1 > example instead of 2x3; any fix to qr.coef() should handle this as > well: > >> qr.coef(qr(matrix(0:1,1,dimnames=list(NULL, c("zero","one")))),5) > one zero > NA 5 > > I think that should return: > > zero one > 0 5 > > or at least: > > zero one > NA 5 > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
brech at delphioutpost.com
2007-May-03 20:07 UTC
[Rd] (PR#9623) qr.coef: permutes dimnames; inserts NA; promises
> From: Prof Brian Ripley <ripley at stats.ox.ac.uk> > Date: Tue, 1 May 2007 15:01:51 +0100 (BST) > > On Thu, 19 Apr 2007, brech at delphioutpost.com wrote: > > > Full_Name: Christian Brechbuehler > > Version: 2.4.1 Patched (2007-03-25 r40917) > > OS: Linux 2.6.15-27-adm64-xeon; Ubuntu 6.06.1 LTS > > Submission from: (NULL) (24.61.47.236) > > > > > > I believe that R has a bug in that it is not > > internally consistent, and another separate bug in the documentation. > > I agree with the bug in the dimnames, and and have corrected that in 2.5.0 > patched.I see it in svn: if(!is.null(nam <- colnames(qr$qr))) - rownames(coef) <- nam + if(k < p) rownames(coef)[qr$pivot] <- nam + else rownames(coef) <- nam + Thank you!> But I think the rest stems from the following misunderstanding: > > > in math, zero times anything is zero, but in R, NA times > > anything (even zero) is NA. This seems somewhat inconvenient. > > That just is not true.OK, I accept that.> Stemming from this, what R is reporting is that certain columns are not > used in the calculation.OK, I understand qr.coef indicates with NA that dcrdc2 decided to exclude the corresponding columns (because of linear dependency). I still think the documentation is misleading -- trimming to the essence:> > help(qr): > > > > 'solve.qr' is the method for 'solve' for 'qr' objects. 'qr.solve' > > solves systems of equations via the QR decomposition: if 'a' is a > > QR decomposition it is the same as 'solve.qr', but if 'a' is a > > rectangular matrix the QR decomposition is computed first. Either > > will handle over- and under-determined systems, providing a > > minimal-length solution or a least-squares fit if appropriate.(A) 'qr.solve' and 'solve.qr' will NOT handle under-determined systems. They both perform this check: if (a$rank != nc) stop("singular matrix 'a' in solve") But 'qr.coef', which they call when all is well, does. (B) Help promises a minimal-length solution, but QR does not deliver that.> > > qr.coef(qr(t(1:2)), 5) > > > > [1] 5 NA> > 1*5 + 2*0 does equal 5, so this is a solution, but it is > > NOT minimal-length. OTOH > > > > > d <- svd(t(1:2)); 5 %*% d$u %*% (d$d^-1 * t(d$v)) > > [,1] [,2] > > [1,] 1 2 > > > > This *is* minimum length, and 1*1 + 2*2 == 5. > > > > The documentation should clarify when minimal-length solution is > > provided.Maybe the phrase "a minimal-length solution or" should be removed? /Christian Brechbuehler
Possibly Parallel Threads
- qr.coef: permutes dimnames; inserts NA; promises minimum-length (PR#9623)
- Bug on qr.coef when qr is created by a zero matrix with colnames and all y equals zero
- follow-up on qr.coef bug (PR#8478)
- bug in qr.coef() and (therefore) in qr.solve (PR#8476)
- qr.coef and complex numbers - still busted for non-square case? (PR#13305)