asdir
2010-Aug-12 14:35 UTC
[R] Regression Error: Otherwise good variable causes singularity. Why?
This command cdmoutcome<- glm(log(value)~factor(year)> +log(gdppcpppconst)+log(gdppcpppconstAII) > +log(co2eemisspc)+log(co2eemisspcAII) > +log(dist) > +fdiboth > +odapartnertohost > +corrupt > +log(infraindex) > +litrate > +africa > +imr > , data=cdmdata2, subset=zero==1, gaussian(link > "identity"))results in this table Coefficients: (1 not defined because of singularities)> Estimate Std. Error t value Pr(>|t|) > (Intercept) 1.216e+01 5.771e+01 0.211 0.8332 > factor(year)2006 -1.403e+00 5.777e-01 -2.429 0.0157 * > factor(year)2007 -2.799e-01 7.901e-01 -0.354 0.7234 > log(gdppcpppconst) 2.762e-01 5.517e+00 0.050 0.9601 > log(gdppcpppconstAII) -1.344e-01 9.025e-01 -0.149 0.8817 > log(co2eemisspc) 5.655e+00 2.903e+00 1.948 0.0523 . > log(co2eemisspcAII) -1.411e-01 4.245e-01 -0.332 0.7399 > log(dist) -2.938e-01 4.023e-01 -0.730 0.4658 > fdiboth 1.326e-04 1.133e-04 1.171 0.2425 > odapartnertohost 2.319e-03 1.437e-03 1.613 0.1078 > corrupt 1.875e+00 3.313e+00 0.566 0.5718 > log(infraindex) 4.783e+00 1.091e+01 0.438 0.6615 > litrate0.47 -2.485e+01 3.190e+01 -0.779 0.4365 > litrate0.499 -1.657e+01 2.591e+01 -0.639 0.5230 > litrate0.523 -2.440e+01 3.427e+01 -0.712 0.4769 > litrate0.528 -9.184e+00 1.379e+01 -0.666 0.5060 > litrate0.595 -2.309e+01 2.776e+01 -0.832 0.4062 > litrate0.66 -1.451e+01 2.734e+01 -0.531 0.5961 > litrate0.675 -1.707e+01 2.813e+01 -0.607 0.5444 > litrate0.68 -6.346e+00 1.063e+01 -0.597 0.5509 > litrate0.699 2.717e+00 3.541e+00 0.768 0.4434 > litrate0.706 -1.960e+01 2.933e+01 -0.668 0.5046 > litrate0.714 -2.586e+01 4.002e+01 -0.646 0.5186 > litrate0.736 5.641e+00 1.561e+01 0.361 0.7181 > litrate0.743 -2.692e+01 4.253e+01 -0.633 0.5273 > litrate0.762 -2.208e+01 3.100e+01 -0.712 0.4767 > litrate0.802 -2.325e+01 3.766e+01 -0.617 0.5375 > litrate0.847 -2.620e+01 3.948e+01 -0.664 0.5075 > litrate0.86 -3.576e+01 4.950e+01 -0.722 0.4707 > litrate0.864 -4.482e+01 6.274e+01 -0.714 0.4755 > litrate0.872 -1.946e+01 2.715e+01 -0.717 0.4739 > litrate0.877 -2.710e+01 3.702e+01 -0.732 0.4646 > litrate0.879 -3.460e+01 5.147e+01 -0.672 0.5020 > litrate0.886 -3.276e+01 4.860e+01 -0.674 0.5008 > litrate0.889 -4.120e+01 5.755e+01 -0.716 0.4746 > litrate0.904 -2.282e+01 2.985e+01 -0.764 0.4453 > litrate0.91 -3.478e+01 5.037e+01 -0.691 0.4904 > litrate0.923 -1.762e+01 2.551e+01 -0.691 0.4902 > litrate0.925 -2.445e+01 3.611e+01 -0.677 0.4990 > litrate0.926 -2.995e+01 4.565e+01 -0.656 0.5123 > litrate0.928 -2.839e+01 3.933e+01 -0.722 0.4710 > litrate0.937 -2.571e+01 3.795e+01 -0.677 0.4986 > litrate0.94 -2.109e+01 3.051e+01 -0.691 0.4900 > litrate0.959 -2.078e+01 2.895e+01 -0.718 0.4735 > litrate0.96 -3.403e+01 4.798e+01 -0.709 0.4787 > litrate0.962 -4.084e+01 5.755e+01 -0.710 0.4785 > litrate0.971 -3.743e+01 5.247e+01 -0.713 0.4761 > litrate0.98 -3.709e+01 5.170e+01 -0.717 0.4737 > litrate0.986 -2.663e+01 4.437e+01 -0.600 0.5488 > litrate0.991 -3.045e+01 4.166e+01 -0.731 0.4654 > litrate1 -2.732e+01 4.459e+01 -0.613 0.5405 > africa NA NA NA NA > imr 2.160e+00 9.357e-01 2.309 0.0216 *although it should result in something similar to this: Coefficients: (1 not defined because of singularities)> Estimate Std. Error t value Pr(>|t|) > (Intercept) 1.216e+01 5.771e+01 0.211 0.8332 > factor(year)2006 -1.403e+00 5.777e-01 -2.429 0.0157 * > factor(year)2007 -2.799e-01 7.901e-01 -0.354 0.7234 > log(gdppcpppconst) 2.762e-01 5.517e+00 0.050 0.9601 > log(gdppcpppconstAII) -1.344e-01 9.025e-01 -0.149 0.8817 > log(co2eemisspc) 5.655e+00 2.903e+00 1.948 0.0523 . > log(co2eemisspcAII) -1.411e-01 4.245e-01 -0.332 0.7399 > log(dist) -2.938e-01 4.023e-01 -0.730 0.4658 > fdiboth 1.326e-04 1.133e-04 1.171 0.2425 > odapartnertohost 2.319e-03 1.437e-03 1.613 0.1078 > corrupt 1.875e+00 3.313e+00 0.566 0.5718 > log(infraindex) 4.783e+00 1.091e+01 0.438 0.6615 > litrate -2.485e+01 3.190e+01 -0.779 0.4365 > africa -2.732e+01 4.459e+01 -0.613 0.5405 > imr 2.160e+00 9.357e-01 2.309 0.0216 *In fact, if I don't use the litrate variable, the regression runs just fine. If I use the variable in a different regression, it also works fine. I just can't find the point where it turns ugly. I tested the litrate-variable for everything I know to test for: The structure is numerical and it does not contain any missings. It has the same length as every other variable in the set and is a continuous variable with values between 0 and 1. Does anyone have an idea? -- View this message in context: http://r.789695.n4.nabble.com/Regression-Error-Otherwise-good-variable-causes-singularity-Why-tp2322780p2322780.html Sent from the R help mailing list archive at Nabble.com.
JLucke at ria.buffalo.edu
2010-Aug-12 14:59 UTC
[R] Regression Error: Otherwise good variable causes singularity. Why?
There appears to be a problem in both regressions, as a singularity is also reported in the second regression analysis as well. It appears that the litrate variable is considered a factor in the first analysis and continuous in the second. There also appears to be collinearity between the litrate variable and the Africa variable. Look at the package lm.influence for regression diagnostics. asdir <dirkroettgers@gmail.com> Sent by: r-help-bounces@r-project.org 08/12/2010 10:35 AM To r-help@r-project.org cc Subject [R] Regression Error: Otherwise good variable causes singularity. Why? This command cdmoutcome<- glm(log(value)~factor(year)> +log(gdppcpppconst)+log(gdppcpppconstAII) > +log(co2eemisspc)+log(co2eemisspcAII) > +log(dist) > +fdiboth > +odapartnertohost > +corrupt > +log(infraindex) > +litrate > +africa > +imr > , data=cdmdata2, subset=zero==1, gaussian(link > "identity"))results in this table Coefficients: (1 not defined because of singularities)> Estimate Std. Error t value Pr(>|t|) > (Intercept) 1.216e+01 5.771e+01 0.211 0.8332 > factor(year)2006 -1.403e+00 5.777e-01 -2.429 0.0157 * > factor(year)2007 -2.799e-01 7.901e-01 -0.354 0.7234 > log(gdppcpppconst) 2.762e-01 5.517e+00 0.050 0.9601 > log(gdppcpppconstAII) -1.344e-01 9.025e-01 -0.149 0.8817 > log(co2eemisspc) 5.655e+00 2.903e+00 1.948 0.0523 . > log(co2eemisspcAII) -1.411e-01 4.245e-01 -0.332 0.7399 > log(dist) -2.938e-01 4.023e-01 -0.730 0.4658 > fdiboth 1.326e-04 1.133e-04 1.171 0.2425 > odapartnertohost 2.319e-03 1.437e-03 1.613 0.1078 > corrupt 1.875e+00 3.313e+00 0.566 0.5718 > log(infraindex) 4.783e+00 1.091e+01 0.438 0.6615 > litrate0.47 -2.485e+01 3.190e+01 -0.779 0.4365 > litrate0.499 -1.657e+01 2.591e+01 -0.639 0.5230 > litrate0.523 -2.440e+01 3.427e+01 -0.712 0.4769 > litrate0.528 -9.184e+00 1.379e+01 -0.666 0.5060 > litrate0.595 -2.309e+01 2.776e+01 -0.832 0.4062 > litrate0.66 -1.451e+01 2.734e+01 -0.531 0.5961 > litrate0.675 -1.707e+01 2.813e+01 -0.607 0.5444 > litrate0.68 -6.346e+00 1.063e+01 -0.597 0.5509 > litrate0.699 2.717e+00 3.541e+00 0.768 0.4434 > litrate0.706 -1.960e+01 2.933e+01 -0.668 0.5046 > litrate0.714 -2.586e+01 4.002e+01 -0.646 0.5186 > litrate0.736 5.641e+00 1.561e+01 0.361 0.7181 > litrate0.743 -2.692e+01 4.253e+01 -0.633 0.5273 > litrate0.762 -2.208e+01 3.100e+01 -0.712 0.4767 > litrate0.802 -2.325e+01 3.766e+01 -0.617 0.5375 > litrate0.847 -2.620e+01 3.948e+01 -0.664 0.5075 > litrate0.86 -3.576e+01 4.950e+01 -0.722 0.4707 > litrate0.864 -4.482e+01 6.274e+01 -0.714 0.4755 > litrate0.872 -1.946e+01 2.715e+01 -0.717 0.4739 > litrate0.877 -2.710e+01 3.702e+01 -0.732 0.4646 > litrate0.879 -3.460e+01 5.147e+01 -0.672 0.5020 > litrate0.886 -3.276e+01 4.860e+01 -0.674 0.5008 > litrate0.889 -4.120e+01 5.755e+01 -0.716 0.4746 > litrate0.904 -2.282e+01 2.985e+01 -0.764 0.4453 > litrate0.91 -3.478e+01 5.037e+01 -0.691 0.4904 > litrate0.923 -1.762e+01 2.551e+01 -0.691 0.4902 > litrate0.925 -2.445e+01 3.611e+01 -0.677 0.4990 > litrate0.926 -2.995e+01 4.565e+01 -0.656 0.5123 > litrate0.928 -2.839e+01 3.933e+01 -0.722 0.4710 > litrate0.937 -2.571e+01 3.795e+01 -0.677 0.4986 > litrate0.94 -2.109e+01 3.051e+01 -0.691 0.4900 > litrate0.959 -2.078e+01 2.895e+01 -0.718 0.4735 > litrate0.96 -3.403e+01 4.798e+01 -0.709 0.4787 > litrate0.962 -4.084e+01 5.755e+01 -0.710 0.4785 > litrate0.971 -3.743e+01 5.247e+01 -0.713 0.4761 > litrate0.98 -3.709e+01 5.170e+01 -0.717 0.4737 > litrate0.986 -2.663e+01 4.437e+01 -0.600 0.5488 > litrate0.991 -3.045e+01 4.166e+01 -0.731 0.4654 > litrate1 -2.732e+01 4.459e+01 -0.613 0.5405 > africa NA NA NA NA > imr 2.160e+00 9.357e-01 2.309 0.0216 *although it should result in something similar to this: Coefficients: (1 not defined because of singularities)> Estimate Std. Error t value Pr(>|t|) > (Intercept) 1.216e+01 5.771e+01 0.211 0.8332 > factor(year)2006 -1.403e+00 5.777e-01 -2.429 0.0157 * > factor(year)2007 -2.799e-01 7.901e-01 -0.354 0.7234 > log(gdppcpppconst) 2.762e-01 5.517e+00 0.050 0.9601 > log(gdppcpppconstAII) -1.344e-01 9.025e-01 -0.149 0.8817 > log(co2eemisspc) 5.655e+00 2.903e+00 1.948 0.0523 . > log(co2eemisspcAII) -1.411e-01 4.245e-01 -0.332 0.7399 > log(dist) -2.938e-01 4.023e-01 -0.730 0.4658 > fdiboth 1.326e-04 1.133e-04 1.171 0.2425 > odapartnertohost 2.319e-03 1.437e-03 1.613 0.1078 > corrupt 1.875e+00 3.313e+00 0.566 0.5718 > log(infraindex) 4.783e+00 1.091e+01 0.438 0.6615 > litrate -2.485e+01 3.190e+01 -0.779 0.4365 > africa -2.732e+01 4.459e+01 -0.613 0.5405 > imr 2.160e+00 9.357e-01 2.309 0.0216 *In fact, if I don't use the litrate variable, the regression runs just fine. If I use the variable in a different regression, it also works fine. I just can't find the point where it turns ugly. I tested the litrate-variable for everything I know to test for: The structure is numerical and it does not contain any missings. It has the same length as every other variable in the set and is a continuous variable with values between 0 and 1. Does anyone have an idea? -- View this message in context: http://r.789695.n4.nabble.com/Regression-Error-Otherwise-good-variable-causes-singularity-Why-tp2322780p2322780.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]]
David Winsemius
2010-Aug-12 15:28 UTC
[R] Regression Error: Otherwise good variable causes singularity. Why?
On Aug 12, 2010, at 10:35 AM, asdir wrote:> > This command > > > cdmoutcome<- glm(log(value)~factor(year) >> +log(gdppcpppconst)+log(gdppcpppconstAII) >> +log(co2eemisspc)+log(co2eemisspcAII) >> +log(dist) >> +fdiboth >> +odapartnertohost >> +corrupt >> +log(infraindex) >> +litrate >> +africa >> +imr >> , data=cdmdata2, subset=zero==1, gaussian(link >> "identity")) > > results in this table > > > Coefficients: (1 not defined because of singularities) >> Estimate Std. Error t value Pr(>|t|) >> (Intercept) 1.216e+01 5.771e+01 0.211 0.8332 >> factor(year)2006 -1.403e+00 5.777e-01 -2.429 0.0157 * >> factor(year)2007 -2.799e-01 7.901e-01 -0.354 0.7234 >> log(gdppcpppconst) 2.762e-01 5.517e+00 0.050 0.9601 >> log(gdppcpppconstAII) -1.344e-01 9.025e-01 -0.149 0.8817 >> log(co2eemisspc) 5.655e+00 2.903e+00 1.948 0.0523 . >> log(co2eemisspcAII) -1.411e-01 4.245e-01 -0.332 0.7399 >> log(dist) -2.938e-01 4.023e-01 -0.730 0.4658 >> fdiboth 1.326e-04 1.133e-04 1.171 0.2425 >> odapartnertohost 2.319e-03 1.437e-03 1.613 0.1078 >> corrupt 1.875e+00 3.313e+00 0.566 0.5718 >> log(infraindex) 4.783e+00 1.091e+01 0.438 0.6615You have probably created litrate as a factor without realizing it. That can easily happen if you just use read.table and one of the values cannot be gracefully interpreted as a numeric. Either read in with stringsAsFactors=FALSE or asIs=TRUE and then coerce it to numeric. or if you want to fix an existing factor f%^&-up, then the FAQ tells you to use something like: cdmdata2$f_ed_variable <- as.numeric(as.character(cdmdata2$f_ed_variable)>> litrate0.47 -2.485e+01 3.190e+01 -0.779 0.4365 >> litrate0.499 -1.657e+01 2.591e+01 -0.639 0.5230 >> litrate0.523 -2.440e+01 3.427e+01 -0.712 0.4769 >> litrate0.528 -9.184e+00 1.379e+01 -0.666 0.5060 >> litrate0.595 -2.309e+01 2.776e+01 -0.832 0.4062 >> litrate0.66 -1.451e+01 2.734e+01 -0.531 0.5961 >> litrate0.675 -1.707e+01 2.813e+01 -0.607 0.5444 >> litrate0.68 -6.346e+00 1.063e+01 -0.597 0.5509 >> litrate0.699 2.717e+00 3.541e+00 0.768 0.4434 >> litrate0.706 -1.960e+01 2.933e+01 -0.668 0.5046 >> litrate0.714 -2.586e+01 4.002e+01 -0.646 0.5186 >> litrate0.736 5.641e+00 1.561e+01 0.361 0.7181 >> litrate0.743 -2.692e+01 4.253e+01 -0.633 0.5273 >> litrate0.762 -2.208e+01 3.100e+01 -0.712 0.4767 >> litrate0.802 -2.325e+01 3.766e+01 -0.617 0.5375 >> litrate0.847 -2.620e+01 3.948e+01 -0.664 0.5075 >> litrate0.86 -3.576e+01 4.950e+01 -0.722 0.4707 >> litrate0.864 -4.482e+01 6.274e+01 -0.714 0.4755 >> litrate0.872 -1.946e+01 2.715e+01 -0.717 0.4739 >> litrate0.877 -2.710e+01 3.702e+01 -0.732 0.4646 >> litrate0.879 -3.460e+01 5.147e+01 -0.672 0.5020 >> litrate0.886 -3.276e+01 4.860e+01 -0.674 0.5008 >> litrate0.889 -4.120e+01 5.755e+01 -0.716 0.4746 >> litrate0.904 -2.282e+01 2.985e+01 -0.764 0.4453 >> litrate0.91 -3.478e+01 5.037e+01 -0.691 0.4904 >> litrate0.923 -1.762e+01 2.551e+01 -0.691 0.4902 >> litrate0.925 -2.445e+01 3.611e+01 -0.677 0.4990 >> litrate0.926 -2.995e+01 4.565e+01 -0.656 0.5123 >> litrate0.928 -2.839e+01 3.933e+01 -0.722 0.4710 >> litrate0.937 -2.571e+01 3.795e+01 -0.677 0.4986 >> litrate0.94 -2.109e+01 3.051e+01 -0.691 0.4900 >> litrate0.959 -2.078e+01 2.895e+01 -0.718 0.4735 >> litrate0.96 -3.403e+01 4.798e+01 -0.709 0.4787 >> litrate0.962 -4.084e+01 5.755e+01 -0.710 0.4785 >> litrate0.971 -3.743e+01 5.247e+01 -0.713 0.4761 >> litrate0.98 -3.709e+01 5.170e+01 -0.717 0.4737 >> litrate0.986 -2.663e+01 4.437e+01 -0.600 0.5488 >> litrate0.991 -3.045e+01 4.166e+01 -0.731 0.4654 >> litrate1 -2.732e+01 4.459e+01 -0.613 0.5405 >> africa NA NA NA NA >> imr 2.160e+00 9.357e-01 2.309 0.0216 * > > although it should result in something similar to this: > > > Coefficients: (1 not defined because of singularities) >> Estimate Std. Error t value Pr(>|t|) >> (Intercept) 1.216e+01 5.771e+01 0.211 0.8332 >> factor(year)2006 -1.403e+00 5.777e-01 -2.429 0.0157 * >> factor(year)2007 -2.799e-01 7.901e-01 -0.354 0.7234 >> log(gdppcpppconst) 2.762e-01 5.517e+00 0.050 0.9601 >> log(gdppcpppconstAII) -1.344e-01 9.025e-01 -0.149 0.8817 >> log(co2eemisspc) 5.655e+00 2.903e+00 1.948 0.0523 . >> log(co2eemisspcAII) -1.411e-01 4.245e-01 -0.332 0.7399 >> log(dist) -2.938e-01 4.023e-01 -0.730 0.4658 >> fdiboth 1.326e-04 1.133e-04 1.171 0.2425 >> odapartnertohost 2.319e-03 1.437e-03 1.613 0.1078 >> corrupt 1.875e+00 3.313e+00 0.566 0.5718 >> log(infraindex) 4.783e+00 1.091e+01 0.438 0.6615 >> litrate -2.485e+01 3.190e+01 -0.779 0.4365 >> africa -2.732e+01 4.459e+01 -0.613 0.5405 >> imr 2.160e+00 9.357e-01 2.309 0.0216 * > > In fact, if I don't use the litrate variable, the regression runs > just fine. > If I use the variable in a different regression, it also works fine. > I just > can't find the point where it turns ugly. > > I tested the litrate-variable for everything I know to test for: The > structure is numerical and it does not contain any missings. It has > the same > length as every other variable in the set and is a continuous > variable with > values between 0 and 1. > > Does anyone have an idea? > -- > View this message in context: http://r.789695.n4.nabble.com/Regression-Error-Otherwise-good-variable-causes-singularity-Why-tp2322780p2322780.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD West Hartford, CT
asdir
2010-Aug-12 16:03 UTC
[R] Regression Error: Otherwise good variable causes singularity. Why?
@JLucke: As for the africa variable: I took it out of the model, so that we can exclude this variable itself and collinearity between the africa and the litrate variable as causes for the litrate-problem. This also removed the singularity remark at the top. However, the problem with litrate-variable seen as many factors remains. Just to clarify: The second results table is fictional to explain where I was headed with my regression. Anyway, thanks for the quick answer. @David: Thanks for the pointer. It was in fact a bad variable, but I created it myself. I changed the set halfway in between my calculations and thought I had adjusted everything. It turns out, that I forgot to adjust the set-length which is re-set in between the two steps of my Heckman-procedure. In any case: Thanks for the quick and helpful reply. :-) -- View this message in context: http://r.789695.n4.nabble.com/Regression-Error-Otherwise-good-variable-causes-singularity-Why-tp2322780p2322925.html Sent from the R help mailing list archive at Nabble.com.