Gathurst
2011-Dec-05 20:57 UTC
[R] Summary coefficients give NA values because of singularities
Hello, I have a data set which I am using to find a model with the most significant parameters included and most importantly, the p-values. The full model is of the form: sad[,1]~b_1 sad[,2]+b_2 sad[,3]+b_3 sad[,4]+b_4 sad[,5]+b_5 sad[,6]+b_6 sad[,7]+b_7 sad[,8]+b_8 sad[,9]+b_9 sad[,10], where the 9 variables on the right hand side are all indicator variables. The thing I don't understand is the line ' sad[, 10] NA NA NA NA ' as a result of 'Coefficients: (1 not defined because of singularities)'. I think the output is taking sad[,10] as the intercept, based on previous attempts at figuring my issue out, which I find a bit wierd considering sad[,10] is either 0 or 1. How do I produce the correct output showing all p-values? My code and output is as follows: sad<-matrix(1,ncol=11,nrow=486) sad[,c(1:10)]<-d[,2][-357] sad[,1]<-d[,29][-357] sad[,2][sad[,2]!=1]<-0 sad[,3][sad[,3]!=2]<-0 sad[,4][sad[,4]!=3]<-0 sad[,5][sad[,5]!=4]<-0 sad[,6][sad[,6]!=5]<-0 sad[,7][sad[,7]!=6]<-0 sad[,8][sad[,8]!=7]<-0 sad[,9][sad[,9]!=8]<-0 sad[,10][sad[,10]!=9]<-0 sad[,2][sad[,2]==1]<-1 sad[,3][sad[,3]==2]<-1 sad[,4][sad[,4]==3]<-1 sad[,5][sad[,5]==4]<-1 sad[,6][sad[,6]==5]<-1 sad[,7][sad[,7]==6]<-1 sad[,8][sad[,8]==7]<-1 sad[,9][sad[,9]==8]<-1 sad[,10][sad[,10]==9]<-1 sad summary(lm(sad[,1]~sad[,2]+sad[,3] +sad[,4]+sad[,5]+sad[,6] +sad[,7]+sad[,8]+sad[,9]+sad[,10])) Call: lm(formula = sad[, 1] ~ sad[, 2] + sad[, 3] + sad[, 4] + sad[, 5] + sad[, 6] + sad[, 7] + sad[, 8] + sad[, 9] + sad[, 10]) Residuals: Min 1Q Median 3Q Max -3.3191 -0.3893 0.0519 0.7436 1.0519 Coefficients: (1 not defined because of singularities) Estimate Std. Error t value Pr(>|t|) (Intercept) 4.34091 0.14495 29.947 <2e-16 *** sad[, 2] -0.16142 0.18128 -0.890 0.3737 sad[, 3] -0.23221 0.20275 -1.145 0.2527 sad[, 4] 0.17832 0.19695 0.905 0.3657 sad[, 5] 0.06450 0.21447 0.301 0.7638 sad[, 6] -0.15909 0.18713 -0.850 0.3957 sad[, 7] -0.39286 0.18171 -2.162 0.0311 * sad[, 8] -0.08450 0.21146 -0.400 0.6896 sad[, 9] -0.02176 0.20170 -0.108 0.9141 sad[, 10] NA NA NA NA --- Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 Residual standard error: 0.9615 on 477 degrees of freedom Multiple R-squared: 0.02984, Adjusted R-squared: 0.01357 F-statistic: 1.834 on 8 and 477 DF, p-value: 0.06869 Thanks in advance. -- View this message in context: http://r.789695.n4.nabble.com/Summary-coefficients-give-NA-values-because-of-singularities-tp4162113p4162113.html Sent from the R help mailing list archive at Nabble.com.
Uwe Ligges
2011-Dec-06 13:15 UTC
[R] Summary coefficients give NA values because of singularities
On 05.12.2011 21:57, Gathurst wrote:> Hello, > > I have a data set which I am using to find a model with the most significant > parameters included and most importantly, the p-values. The full model is > of the form: > sad[,1]~b_1 sad[,2]+b_2 sad[,3]+b_3 sad[,4]+b_4 sad[,5]+b_5 sad[,6]+b_6 > sad[,7]+b_7 sad[,8]+b_8 sad[,9]+b_9 sad[,10], > where the 9 variables on the right hand side are all indicator variables. > The thing I don't understand is the line ' sad[, 10] NA NA > NA NA ' as a result of 'Coefficients: (1 not defined because of > singularities)'. > > I think the output is taking sad[,10] as the intercept, based on previous > attempts at figuring my issue out, which I find a bit wierd considering > sad[,10] is either 0 or 1. How do I produce the correct output showing all > p-values?You cannot: sad[,10] is either collinear to one or more of the other variables or is constant. Uwe Ligges> > My code and output is as follows: > > sad<-matrix(1,ncol=11,nrow=486) > sad[,c(1:10)]<-d[,2][-357] > sad[,1]<-d[,29][-357] > sad[,2][sad[,2]!=1]<-0 > sad[,3][sad[,3]!=2]<-0 > sad[,4][sad[,4]!=3]<-0 > sad[,5][sad[,5]!=4]<-0 > sad[,6][sad[,6]!=5]<-0 > sad[,7][sad[,7]!=6]<-0 > sad[,8][sad[,8]!=7]<-0 > sad[,9][sad[,9]!=8]<-0 > sad[,10][sad[,10]!=9]<-0 > sad[,2][sad[,2]==1]<-1 > sad[,3][sad[,3]==2]<-1 > sad[,4][sad[,4]==3]<-1 > sad[,5][sad[,5]==4]<-1 > sad[,6][sad[,6]==5]<-1 > sad[,7][sad[,7]==6]<-1 > sad[,8][sad[,8]==7]<-1 > sad[,9][sad[,9]==8]<-1 > sad[,10][sad[,10]==9]<-1 > sad > > summary(lm(sad[,1]~sad[,2]+sad[,3] > +sad[,4]+sad[,5]+sad[,6] > +sad[,7]+sad[,8]+sad[,9]+sad[,10])) > > Call: > lm(formula = sad[, 1] ~ sad[, 2] + sad[, 3] + sad[, 4] + sad[, > 5] + sad[, 6] + sad[, 7] + sad[, 8] + sad[, 9] + sad[, 10]) > > Residuals: > Min 1Q Median 3Q Max > -3.3191 -0.3893 0.0519 0.7436 1.0519 > > Coefficients: (1 not defined because of singularities) > Estimate Std. Error t value Pr(>|t|) > (Intercept) 4.34091 0.14495 29.947<2e-16 *** > sad[, 2] -0.16142 0.18128 -0.890 0.3737 > sad[, 3] -0.23221 0.20275 -1.145 0.2527 > sad[, 4] 0.17832 0.19695 0.905 0.3657 > sad[, 5] 0.06450 0.21447 0.301 0.7638 > sad[, 6] -0.15909 0.18713 -0.850 0.3957 > sad[, 7] -0.39286 0.18171 -2.162 0.0311 * > sad[, 8] -0.08450 0.21146 -0.400 0.6896 > sad[, 9] -0.02176 0.20170 -0.108 0.9141 > sad[, 10] NA NA NA NA > --- > Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 > > Residual standard error: 0.9615 on 477 degrees of freedom > Multiple R-squared: 0.02984, Adjusted R-squared: 0.01357 > F-statistic: 1.834 on 8 and 477 DF, p-value: 0.06869 > > Thanks in advance. > > -- > View this message in context: http://r.789695.n4.nabble.com/Summary-coefficients-give-NA-values-because-of-singularities-tp4162113p4162113.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.