Clive Nicholas
2020-Nov-03 01:14 UTC
[R] Query on constrained regressions using -mgcv- and -pcls-
Hello all, I'll level with you: I'm puzzled! How is it that this constrained regression routine using -pcls- runs satisfactorily (courtesy of Tian Zheng): library(mgcv) options(digits=3) x.1=rnorm(100, 0, 1) x.2=rnorm(100, 0, 1) x.3=rnorm(100, 0, 1) x.4=rnorm(100, 0, 1) y=1+0.5*x.1-0.2*x.2+0.3*x.3+0.1*x.4+rnorm(100, 0, 0.01) x.mat=cbind(rep(1, length(y)), x.1, x.2, x.3, x.4) ls.print(lsfit(x.mat, y, intercept=FALSE)) M=list(y=y, w=rep(1, length(y)), X=x.mat, C=matrix(0,0,0), p=rep(1, ncol(x.mat)), off=array(0,0), S=list(), sp=array(0,0), Ain=diag(ncol(x.mat)), bin=rep(0, ncol(x.mat)) ) pcls(M) Residual Standard Error=0.0095 R-Square=1 F-statistic (df=5, 95)=314735 p-value=0 Estimate Std.Err t-value Pr(>|t|) 1.000 0.0010 1043.9 0 x.1 0.501 0.0010 512.6 0 x.2 -0.202 0.0009 -231.6 0 x.3 0.298 0.0010 297.8 0 x.4 0.103 0.0011 94.8 0 but this one does not for a panel dataset: set.seed(02102020) N=500 M=10 rater=rep(1:M, each = N) lead_n=as.factor(rep(1:N,M)) a=rep(rnorm(N),M) z=rep(round(25+2*rnorm(N)+.2*a)) x=a+rnorm(N*M) y=.5*x+5*a-.5*z+2*rnorm(N*M) x_cl=rep(aggregate(x,list(lead_n) mean)[,2],M) model=lm(y~x+x_cl+z) summary(model) y=1+1.5*x+4.6*x_cl-0.5*z x.mat=cbind(rep(1,length(y)),x,x_cl,z) ls.print(lsfit(x.mat,y,intercept=FALSE)) Residual Standard Error=0 R-Square=1 F-statistic (df=4, 4996)=5.06e+30 p-value=0 Estimate Std.Err t-value Pr(>|t|) 1.0 0 2.89e+13 0 x 0.8 0 2.71e+14 0 x_cl 4.6 0 1.18e+15 0 z -0.5 0 -3.63e+14 0 ? There shouldn't be anything wrong with the second set of data, unless I've missed something obvious (that constraints don't work for panel data? Seems unlikely to me)! Also: (1) I'm ultimately looking just to constrain ONE coefficient whilst allowing the other coefficients to be unconstrained (I tried this with the first dataset by setting y=1+0.5*x.1-x.2+x.3+x.4 in the call, but got similar-looking output to what I got in the second dataset); and (2) it would be really useful to have the call to -pcls(M)- produce more informative output (SEs, t-values, fit stats, etc). Many thanks in anticipation of your expert help and being told what a clueless berk I am, Clive -- Clive Nicholas "My colleagues in the social sciences talk a great deal about methodology. I prefer to call it style." -- Freeman J. Dyson [[alternative HTML version deleted]]
Clive Nicholas
2020-Nov-03 05:28 UTC
[R] Query on constrained regressions using -mgcv- and -pcls-
As an addendum / erratum to my original post, the second block of code should read for completeness: set.seed(02102020) N=500 M=10 rater=rep(1:M, each = N) lead_n=as.factor(rep(1:N,M)) a=rep(rnorm(N),M) z=rep(round(25+2*rnorm(N)+.2*a)) x=a+rnorm(N*M) y=.5*x+5*a-.5*z+2*rnorm(N*M) x_cl=rep(aggregate(x,list(lead_n) mean)[,2],M) model=lm(y~x+x_cl+z) summary(model) y=1+1.5*x+4.6*x_cl-0.5*z x.mat=cbind(rep(1,length(y)),x,x_cl,z) ls.print(lsfit(x.mat,y,intercept=FALSE)) M=list(y=y, w=rep(1, length(y)), X=x.mat, C=matrix(0,0,0), p=rep(1, ncol(x.mat)), off=array(0,0), S=list(), sp=array(0,0), Ain=diag(ncol(x.mat)), bin=rep(0, ncol(x.mat)) ) pcls(M) However, all my questions stand. Ta, Clive On Tue, 3 Nov 2020 at 01:14, Clive Nicholas <clivelists at googlemail.com> wrote:> Hello all, > > I'll level with you: I'm puzzled! > > How is it that this constrained regression routine using -pcls- runs > satisfactorily (courtesy of Tian Zheng): > > library(mgcv) > options(digits=3) > x.1=rnorm(100, 0, 1) > x.2=rnorm(100, 0, 1) > x.3=rnorm(100, 0, 1) > x.4=rnorm(100, 0, 1) > y=1+0.5*x.1-0.2*x.2+0.3*x.3+0.1*x.4+rnorm(100, 0, 0.01) > x.mat=cbind(rep(1, length(y)), x.1, x.2, x.3, x.4) > ls.print(lsfit(x.mat, y, intercept=FALSE)) > M=list(y=y, > w=rep(1, length(y)), > X=x.mat, > C=matrix(0,0,0), > p=rep(1, ncol(x.mat)), > off=array(0,0), > S=list(), > sp=array(0,0), > Ain=diag(ncol(x.mat)), > bin=rep(0, ncol(x.mat)) ) > pcls(M) > Residual Standard Error=0.0095 > R-Square=1 > F-statistic (df=5, 95)=314735 > p-value=0 > > Estimate Std.Err t-value Pr(>|t|) > 1.000 0.0010 1043.9 0 > x.1 0.501 0.0010 512.6 0 > x.2 -0.202 0.0009 -231.6 0 > x.3 0.298 0.0010 297.8 0 > x.4 0.103 0.0011 94.8 0 > > but this one does not for a panel dataset: > > set.seed(02102020) > N=500 > M=10 > rater=rep(1:M, each = N) > lead_n=as.factor(rep(1:N,M)) > a=rep(rnorm(N),M) > z=rep(round(25+2*rnorm(N)+.2*a)) > x=a+rnorm(N*M) > y=.5*x+5*a-.5*z+2*rnorm(N*M) > x_cl=rep(aggregate(x,list(lead_n) mean)[,2],M) > model=lm(y~x+x_cl+z) > summary(model) > y=1+1.5*x+4.6*x_cl-0.5*z > x.mat=cbind(rep(1,length(y)),x,x_cl,z) > ls.print(lsfit(x.mat,y,intercept=FALSE)) > > Residual Standard Error=0 > R-Square=1 > F-statistic (df=4, 4996)=5.06e+30 > p-value=0 > > Estimate Std.Err t-value Pr(>|t|) > 1.0 0 2.89e+13 0 > x 0.8 0 2.71e+14 0 > x_cl 4.6 0 1.18e+15 0 > z -0.5 0 -3.63e+14 0 > > ? > > There shouldn't be anything wrong with the second set of data, unless I've > missed something obvious (that constraints don't work for panel data? Seems > unlikely to me)! > > Also: > > (1) I'm ultimately looking just to constrain ONE coefficient whilst > allowing the other coefficients to be unconstrained (I tried this with the > first dataset by setting > > y=1+0.5*x.1-x.2+x.3+x.4 > > in the call, but got similar-looking output to what I got in the second > dataset); and > > (2) it would be really useful to have the call to -pcls(M)- produce more > informative output (SEs, t-values, fit stats, etc). > > Many thanks in anticipation of your expert help and being told what a > clueless berk I am, > Clive > > -- > Clive Nicholas > > "My colleagues in the social sciences talk a great deal about methodology. > I prefer to call it style." -- Freeman J. Dyson >-- Clive Nicholas "My colleagues in the social sciences talk a great deal about methodology. I prefer to call it style." -- Freeman J. Dyson [[alternative HTML version deleted]]
Bert Gunter
2020-Nov-03 06:18 UTC
[R] Query on constrained regressions using -mgcv- and -pcls-
Warning: I did *not* attempt to follow your query(original or addendum) in detail. But as you have not yet received a reply, it may be because your post seems mostly about statistical issues, which are generally off topic here. This list is primarily about R programming issues. If statistical issues are your primary focus, SO may be a better place to post: https://stats.stackexchange.com/ Otherwise, I guess you'll just have to continue waiting. Incidentally, suggestions for improvements in nonstandard packages should generally be sent to the package maintainer (?maintainer) rather than posted here. Maintainers may not even check this list. Finally, this is a plain text list. HTML posts often get mangled by the server. Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Mon, Nov 2, 2020 at 9:28 PM Clive Nicholas via R-help < r-help at r-project.org> wrote:> As an addendum / erratum to my original post, the second block of code > should read for completeness: > > set.seed(02102020) > N=500 > M=10 > rater=rep(1:M, each = N) > lead_n=as.factor(rep(1:N,M)) > a=rep(rnorm(N),M) > z=rep(round(25+2*rnorm(N)+.2*a)) > x=a+rnorm(N*M) > y=.5*x+5*a-.5*z+2*rnorm(N*M) > x_cl=rep(aggregate(x,list(lead_n) mean)[,2],M) > model=lm(y~x+x_cl+z) > summary(model) > y=1+1.5*x+4.6*x_cl-0.5*z > x.mat=cbind(rep(1,length(y)),x,x_cl,z) > ls.print(lsfit(x.mat,y,intercept=FALSE)) > M=list(y=y, > w=rep(1, length(y)), > X=x.mat, > C=matrix(0,0,0), > p=rep(1, ncol(x.mat)), > off=array(0,0), > S=list(), > sp=array(0,0), > Ain=diag(ncol(x.mat)), > bin=rep(0, ncol(x.mat)) ) > pcls(M) > > However, all my questions stand. > > Ta, Clive > > On Tue, 3 Nov 2020 at 01:14, Clive Nicholas <clivelists at googlemail.com> > wrote: > > > Hello all, > > > > I'll level with you: I'm puzzled! > > > > How is it that this constrained regression routine using -pcls- runs > > satisfactorily (courtesy of Tian Zheng): > > > > library(mgcv) > > options(digits=3) > > x.1=rnorm(100, 0, 1) > > x.2=rnorm(100, 0, 1) > > x.3=rnorm(100, 0, 1) > > x.4=rnorm(100, 0, 1) > > y=1+0.5*x.1-0.2*x.2+0.3*x.3+0.1*x.4+rnorm(100, 0, 0.01) > > x.mat=cbind(rep(1, length(y)), x.1, x.2, x.3, x.4) > > ls.print(lsfit(x.mat, y, intercept=FALSE)) > > M=list(y=y, > > w=rep(1, length(y)), > > X=x.mat, > > C=matrix(0,0,0), > > p=rep(1, ncol(x.mat)), > > off=array(0,0), > > S=list(), > > sp=array(0,0), > > Ain=diag(ncol(x.mat)), > > bin=rep(0, ncol(x.mat)) ) > > pcls(M) > > Residual Standard Error=0.0095 > > R-Square=1 > > F-statistic (df=5, 95)=314735 > > p-value=0 > > > > Estimate Std.Err t-value Pr(>|t|) > > 1.000 0.0010 1043.9 0 > > x.1 0.501 0.0010 512.6 0 > > x.2 -0.202 0.0009 -231.6 0 > > x.3 0.298 0.0010 297.8 0 > > x.4 0.103 0.0011 94.8 0 > > > > but this one does not for a panel dataset: > > > > set.seed(02102020) > > N=500 > > M=10 > > rater=rep(1:M, each = N) > > lead_n=as.factor(rep(1:N,M)) > > a=rep(rnorm(N),M) > > z=rep(round(25+2*rnorm(N)+.2*a)) > > x=a+rnorm(N*M) > > y=.5*x+5*a-.5*z+2*rnorm(N*M) > > x_cl=rep(aggregate(x,list(lead_n) mean)[,2],M) > > model=lm(y~x+x_cl+z) > > summary(model) > > y=1+1.5*x+4.6*x_cl-0.5*z > > x.mat=cbind(rep(1,length(y)),x,x_cl,z) > > ls.print(lsfit(x.mat,y,intercept=FALSE)) > > > > Residual Standard Error=0 > > R-Square=1 > > F-statistic (df=4, 4996)=5.06e+30 > > p-value=0 > > > > Estimate Std.Err t-value Pr(>|t|) > > 1.0 0 2.89e+13 0 > > x 0.8 0 2.71e+14 0 > > x_cl 4.6 0 1.18e+15 0 > > z -0.5 0 -3.63e+14 0 > > > > ? > > > > There shouldn't be anything wrong with the second set of data, unless > I've > > missed something obvious (that constraints don't work for panel data? > Seems > > unlikely to me)! > > > > Also: > > > > (1) I'm ultimately looking just to constrain ONE coefficient whilst > > allowing the other coefficients to be unconstrained (I tried this with > the > > first dataset by setting > > > > y=1+0.5*x.1-x.2+x.3+x.4 > > > > in the call, but got similar-looking output to what I got in the second > > dataset); and > > > > (2) it would be really useful to have the call to -pcls(M)- produce more > > informative output (SEs, t-values, fit stats, etc). > > > > Many thanks in anticipation of your expert help and being told what a > > clueless berk I am, > > Clive > > > > -- > > Clive Nicholas > > > > "My colleagues in the social sciences talk a great deal about > methodology. > > I prefer to call it style." -- Freeman J. Dyson > > > > > -- > Clive Nicholas > > "My colleagues in the social sciences talk a great deal about methodology. > I prefer to call it style." -- Freeman J. Dyson > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Possibly Parallel Threads
- Use pcls in "mgcv" package to achieve constrained cubic spline
- use pcls to solve least square fitting with constraints
- mgcv: Impose monotonicity constraint on single or more smooth terms
- linear model with equality and inequality (redundant) constraints
- non-negative least-squares