Michael Artz
2016-Mar-10 16:08 UTC
[R] Prediction from a rank deficient fit may be misleading
HI all, I have the following error - > resultVector <- predict(logitregressmodel, dataset1, type='response') Warning message: In predict.lm(object, newdata, se.fit, scale = 1, type = ifelse(type == : prediction from a rank-deficient fit may be misleading I have seen on internet that there may be some collinearity in the data and this is causing that. How can I be sure? Thanks [[alternative HTML version deleted]]
David Winsemius
2016-Mar-10 22:05 UTC
[R] Prediction from a rank deficient fit may be misleading
> On Mar 10, 2016, at 8:08 AM, Michael Artz <michaeleartz at gmail.com> wrote: > > HI all, > I have the following error - >> resultVector <- predict(logitregressmodel, dataset1, type='response') > Warning message: > In predict.lm(object, newdata, se.fit, scale = 1, type = ifelse(type == : > prediction from a rank-deficient fit may be misleadingIt wasn't an R error. It was an R warning. Was the `summary` output on logitregressmodel informative? Does the resultVector look sensible given its inputs?> I have seen on internet that there may be some collinearity in the data and > this is causing that. How can I be sure?Do some diagnostics. After looking carefully at the output of summary(logitregressmodel) and perhaps summary(dataset1) if it was the original input to the modeling functions, and then you could move on to looking at cross-correlations on things you think are continuous and crosstabs on factor variables and the condition number on the full data matrix. Lots of stuff turns up on search for "detecting collinearity condition number in r"> > Thanks > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius Alameda, CA, USA
Michael Artz
2016-Mar-10 22:21 UTC
[R] Prediction from a rank deficient fit may be misleading
Here is the results of the logistic regression model. Is it because of the NA values? Call: glm(formula = TARGET_A ~ Contract + Dependents + DeviceProtection + gender + InternetService + MonthlyCharges + MultipleLines + OnlineBackup + OnlineSecurity + PaperlessBilling + Partner + PaymentMethod + PhoneService + SeniorCitizen + StreamingMovies + StreamingTV + TechSupport + tenure + TotalCharges, family binomial(link = "logit"), data = churn_training) Deviance Residuals: Min 1Q Median 3Q Max -1.8943 -0.6867 -0.2863 0.7378 3.4259 Coefficients: (7 not defined because of singularities) Estimate Std. Error z value Pr(>|z|) (Intercept) 1.0664928 1.7195494 0.620 0.5351 ContractOne year -0.6874005 0.1314227 -5.230 1.69e-07 *** ContractTwo year -1.2775385 0.2101193 -6.080 1.20e-09 *** DependentsYes -0.1485301 0.1095348 -1.356 0.1751 DeviceProtectionNo internet service -1.5547306 0.9661837 -1.609 0.1076 DeviceProtectionYes 0.0459115 0.2114253 0.217 0.8281 genderMale -0.0350970 0.0776896 -0.452 0.6514 InternetServiceFiber optic 1.4800374 0.9545398 1.551 0.1210 InternetServiceNo NA NA NA NA MonthlyCharges -0.0324614 0.0379646 -0.855 0.3925 MultipleLinesNo phone service 0.0808745 0.7736359 0.105 0.9167 MultipleLinesYes 0.3990450 0.2131343 1.872 0.0612 . OnlineBackupNo internet service NA NA NA NA OnlineBackupYes -0.0328892 0.2081145 -0.158 0.8744 OnlineSecurityNo internet service NA NA NA NA OnlineSecurityYes -0.2760602 0.2132917 -1.294 0.1956 PaperlessBillingYes 0.3509944 0.0890884 3.940 8.15e-05 *** PartnerYes 0.0306815 0.0940650 0.326 0.7443 PaymentMethodCredit card (automatic) -0.0710923 0.1377252 -0.516 0.6057 PaymentMethodElectronic check 0.3074078 0.1137939 2.701 0.0069 ** PaymentMethodMailed check -0.0201076 0.1377539 -0.146 0.8839 PhoneServiceYes NA NA NA NA SeniorCitizen 0.1856454 0.1023527 1.814 0.0697 . StreamingMoviesNo internet service NA NA NA NA StreamingMoviesYes 0.5260087 0.3899615 1.349 0.1774 StreamingTVNo internet service NA NA NA NA StreamingTVYes 0.4781321 0.3905777 1.224 0.2209 TechSupportNo internet service NA NA NA NA TechSupportYes -0.2511197 0.2181612 -1.151 0.2497 tenure -0.0702813 0.0077113 -9.114 < 2e-16 *** TotalCharges 0.0004276 0.0000874 4.892 9.97e-07 *** On Thu, Mar 10, 2016 at 4:05 PM, David Winsemius <dwinsemius at comcast.net> wrote:> > > On Mar 10, 2016, at 8:08 AM, Michael Artz <michaeleartz at gmail.com> > wrote: > > > > HI all, > > I have the following error - > >> resultVector <- predict(logitregressmodel, dataset1, type='response') > > Warning message: > > In predict.lm(object, newdata, se.fit, scale = 1, type = ifelse(type => : > > prediction from a rank-deficient fit may be misleading > > It wasn't an R error. It was an R warning. Was the `summary` output on > logitregressmodel informative? Does the resultVector look sensible given > its inputs? > > > > I have seen on internet that there may be some collinearity in the data > and > > this is causing that. How can I be sure? > > Do some diagnostics. After looking carefully at the output of > summary(logitregressmodel) and perhaps summary(dataset1) if it was the > original input to the modeling functions, and then you could move on to > looking at cross-correlations on things you think are continuous and > crosstabs on factor variables and the condition number on the full data > matrix. > > Lots of stuff turns up on search for "detecting collinearity condition > number in r" > > > > > Thanks > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > David Winsemius > Alameda, CA, USA > >[[alternative HTML version deleted]]