Apologies if this is this too obscure for R-help. In package splines, ns(x,,knots,intercept=TRUE) produces an n by K+2 matrix N, the values of K+2 basis functions for the natural splines with K (internal) knots, evaluated at x. It does this by first generating an n by K+4 matrix B of unconstrained splines, then postmultiplying B by H, a K+4 by K+2 representation of the nullspace of C (2 by K+4), which contains the 2nd derivatives of the unconstrained splines evaluated at the boundary knots. E.g. see Hastie and Tibshirani, Generalized Additive Models, exercise 2.5, p36. The QR decomposition is used to get H. This can produce basis functions which, while technically correct (they span the K+2 dim space of natural splines), can be counterintuitive. E.g. equally spaced knots symmetrically placed between the data extremes can produce very asymmetric arrangements, with N(K+2) not the mirror image of N(1), for example, and considerable loss of sparseness. This approach works for any basis B, but for B-splines, the second derivatives are zero for all the unconstrained basis functions, apart from 3 at each end. All that is required is to combine these 3 so that the contributions to the 2nd derivatives cancel. In other words, we only need to find the null space of two 2 by 3 matrices, rather than a 2 by K+4. If the left-most internal knots are t(1) and t(2), and the left-hand boundary knot is t(0), we can replace B(1...3) with B(1)+B(2)+B(3) and [t(2)-t(0)]*B(2) + [t(1)+t(2)-2*t(0)]*B(1), (for example), and similarly at the right-hand end. This seems simpler and more elegant than brute force QR on the full matrix of derivatives. But I may have missed some reason why it can't be used. Perhaps it doesn't work when intercept=FALSE? =====================================I.White ICAPB, University of Edinburgh Ashworth Laboratories, West Mains Road Edinburgh EH9 3JT Fax: 0131 650 6564 Tel: 0131 650 5490 E-mail: iwhite at staffmail.ed.ac.uk
On Thu, 8 May 2003 15:07:56 +0100 (BST) iwhite at staffmail.ed.ac.uk wrote:> Apologies if this is this too obscure for R-help. > > In package splines, ns(x,,knots,intercept=TRUE) produces an n by K+2 > matrix N, the values of K+2 basis functions for the natural splines with K > (internal) knots, evaluated at x. It does this by first generating an > n by K+4 matrix B of unconstrained splines, then postmultiplying B by > H, a K+4 by K+2 representation of the nullspace of C (2 by K+4), which > contains the 2nd derivatives of the unconstrained splines evaluated at > the boundary knots. E.g. see Hastie and Tibshirani, Generalized Additive > Models, exercise 2.5, p36. The QR decomposition is used to get H. > > This can produce basis functions which, while technically correct (they > span the K+2 dim space of natural splines), can be counterintuitive. > E.g. equally spaced knots symmetrically placed between the data extremes > can produce very asymmetric arrangements, with N(K+2) not the mirror image > of N(1), for example, and considerable loss of sparseness. > > This approach works for any basis B, but for B-splines, the second > derivatives are zero for all the unconstrained basis functions, apart > from 3 at each end. All that is required is to combine these 3 so that > the contributions to the 2nd derivatives cancel. In other words, we > only need to find the null space of two 2 by 3 matrices, rather than a > 2 by K+4. If the left-most internal knots are t(1) and t(2), and the > left-hand boundary knot is t(0), we can replace B(1...3) with > > B(1)+B(2)+B(3) and [t(2)-t(0)]*B(2) + [t(1)+t(2)-2*t(0)]*B(1), > > (for example), and similarly at the right-hand end. > > This seems simpler and more elegant than brute force QR on the full > matrix of derivatives. But I may have missed some reason why it can't be > used. Perhaps it doesn't work when intercept=FALSE? > > =====================================> I.White > ICAPB, University of Edinburgh > Ashworth Laboratories, West Mains Road > Edinburgh EH9 3JT > Fax: 0131 650 6564 Tel: 0131 650 5490 > E-mail: iwhite at staffmail.ed.ac.uk > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-helpThis is also a vote for using the truncated power basis which is extremely simple and exceptionally fast for large datasets (see the rcspline.eval function in the Hmisc package). With modern matrix arithmetic (as in S), the collinearity of the bases produced by these simple regression splines is a moot point. --- Frank E Harrell Jr Prof. of Biostatistics & Statistics Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences U. Virginia School of Medicine http://hesweb1.med.virginia.edu/biostat
Indeed truncated power functions (TPF) are very useful to compute B-splines. But why take chances with their bad numerical condition? Simple differences will turn the TPF basis into a B-spline basis. See the functions at the end. Try B = bbase(1:100, 0, 101, 10, 3) matplot(B, type = 'b') to see it work. Here I use equally spaced knots (as I always do, adding a penalty to tune smoothness and avoid singularity problems). But with divided differences this scheme will work for arbitrary knots (if no knots are the same). Paul Eilers Department of Medical Statistics Leiden University Medical Centre ======= R functions ================== tpower <- function(x, t, p) { # Truncated p-th power function (x - t) ^ p * (x > t) } bbase <- function(x, xl, xr, ndx, deg){ # Construct B-spline basis from differences of truncated power functions # Input # x: x for which to compute basis # xl, xr: left and right boundary # nseg: number of B-spline segments between xl and xr # deg: degree (of the polynomial segments) of the B-spline # # Paul Eilers, 2003 dx <- (xr - xl) / ndx knots <- seq(xl - deg * dx, xr + deg * dx, by = dx) P <- outer(x, knots, tpower, deg) n <- dim(P)[2] D <- diff(diag(n), differences = deg + 1) / (gamma(deg + 1) * dx ^ deg) B <- (-1) ^ (deg + 1) * P %*% t(D) B } ========================================= On Thu, 8 May 2003 15:07:56 +0100 (BST) iwhite at staffmail.ed.ac.uk wrote:> Apologies if this is this too obscure for R-help. > > In package splines, ns(x,,knots,intercept=TRUE) produces an n by K+2 > matrix N, the values of K+2 basis functions for the natural splines with K > (internal) knots, evaluated at x. It does this by first generating an > n by K+4 matrix B of unconstrained splines, then postmultiplying B by > H, a K+4 by K+2 representation of the nullspace of C (2 by K+4), which > contains the 2nd derivatives of the unconstrained splines evaluated at > the boundary knots. E.g. see Hastie and Tibshirani, Generalized Additive > Models, exercise 2.5, p36. The QR decomposition is used to get H. > > This can produce basis functions which, while technically correct (they > span the K+2 dim space of natural splines), can be counterintuitive. > E.g. equally spaced knots symmetrically placed between the data extremes > can produce very asymmetric arrangements, with N(K+2) not the mirror image > of N(1), for example, and considerable loss of sparseness. > > This approach works for any basis B, but for B-splines, the second > derivatives are zero for all the unconstrained basis functions, apart > from 3 at each end. All that is required is to combine these 3 so that > the contributions to the 2nd derivatives cancel. In other words, we > only need to find the null space of two 2 by 3 matrices, rather than a > 2 by K+4. If the left-most internal knots are t(1) and t(2), and the > left-hand boundary knot is t(0), we can replace B(1...3) with > > B(1)+B(2)+B(3) and [t(2)-t(0)]*B(2) + [t(1)+t(2)-2*t(0)]*B(1), > > (for example), and similarly at the right-hand end. > > This seems simpler and more elegant than brute force QR on the full > matrix of derivatives. But I may have missed some reason why it can't be > used. Perhaps it doesn't work when intercept=FALSE? > > =====================================> I.White > ICAPB, University of Edinburgh > Ashworth Laboratories, West Mains Road > Edinburgh EH9 3JT > Fax: 0131 650 6564 Tel: 0131 650 5490 > E-mail: iwhite at staffmail.ed.ac.uk > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-helpThis is also a vote for using the truncated power basis which is extremely simple and exceptionally fast for large datasets (see the rcspline.eval function in the Hmisc package). With modern matrix arithmetic (as in S), the collinearity of the bases produced by these simple regression splines is a moot point. --- Frank E Harrell Jr Prof. of Biostatistics & Statistics Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences U. Virginia School of Medicine http://hesweb1.med.virginia.edu/biostat ______________________________________________ R-help at stat.math.ethz.ch mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help