Michael Artz
2016-Mar-10 16:08 UTC
[R] Prediction from a rank deficient fit may be misleading
HI all, I have the following error - > resultVector <- predict(logitregressmodel, dataset1, type='response') Warning message: In predict.lm(object, newdata, se.fit, scale = 1, type = ifelse(type == : prediction from a rank-deficient fit may be misleading I have seen on internet that there may be some collinearity in the data and this is causing that. How can I be sure? Thanks [[alternative HTML version deleted]]
David Winsemius
2016-Mar-10 22:05 UTC
[R] Prediction from a rank deficient fit may be misleading
> On Mar 10, 2016, at 8:08 AM, Michael Artz <michaeleartz at gmail.com> wrote: > > HI all, > I have the following error - >> resultVector <- predict(logitregressmodel, dataset1, type='response') > Warning message: > In predict.lm(object, newdata, se.fit, scale = 1, type = ifelse(type == : > prediction from a rank-deficient fit may be misleadingIt wasn't an R error. It was an R warning. Was the `summary` output on logitregressmodel informative? Does the resultVector look sensible given its inputs?> I have seen on internet that there may be some collinearity in the data and > this is causing that. How can I be sure?Do some diagnostics. After looking carefully at the output of summary(logitregressmodel) and perhaps summary(dataset1) if it was the original input to the modeling functions, and then you could move on to looking at cross-correlations on things you think are continuous and crosstabs on factor variables and the condition number on the full data matrix. Lots of stuff turns up on search for "detecting collinearity condition number in r"> > Thanks > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius Alameda, CA, USA
Michael Artz
2016-Mar-10 22:21 UTC
[R] Prediction from a rank deficient fit may be misleading
Here is the results of the logistic regression model. Is it because of the
NA values?
Call:
glm(formula = TARGET_A ~ Contract + Dependents + DeviceProtection +
gender + InternetService + MonthlyCharges + MultipleLines +
OnlineBackup + OnlineSecurity + PaperlessBilling + Partner +
PaymentMethod + PhoneService + SeniorCitizen + StreamingMovies +
StreamingTV + TechSupport + tenure + TotalCharges, family binomial(link =
"logit"),
data = churn_training)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.8943 -0.6867 -0.2863 0.7378 3.4259
Coefficients: (7 not defined because of singularities)
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.0664928 1.7195494 0.620 0.5351
ContractOne year -0.6874005 0.1314227 -5.230 1.69e-07
***
ContractTwo year -1.2775385 0.2101193 -6.080 1.20e-09
***
DependentsYes -0.1485301 0.1095348 -1.356 0.1751
DeviceProtectionNo internet service -1.5547306 0.9661837 -1.609 0.1076
DeviceProtectionYes 0.0459115 0.2114253 0.217 0.8281
genderMale -0.0350970 0.0776896 -0.452 0.6514
InternetServiceFiber optic 1.4800374 0.9545398 1.551 0.1210
InternetServiceNo NA NA NA NA
MonthlyCharges -0.0324614 0.0379646 -0.855 0.3925
MultipleLinesNo phone service 0.0808745 0.7736359 0.105 0.9167
MultipleLinesYes 0.3990450 0.2131343 1.872 0.0612
.
OnlineBackupNo internet service NA NA NA NA
OnlineBackupYes -0.0328892 0.2081145 -0.158 0.8744
OnlineSecurityNo internet service NA NA NA NA
OnlineSecurityYes -0.2760602 0.2132917 -1.294 0.1956
PaperlessBillingYes 0.3509944 0.0890884 3.940 8.15e-05
***
PartnerYes 0.0306815 0.0940650 0.326 0.7443
PaymentMethodCredit card (automatic) -0.0710923 0.1377252 -0.516 0.6057
PaymentMethodElectronic check 0.3074078 0.1137939 2.701 0.0069
**
PaymentMethodMailed check -0.0201076 0.1377539 -0.146 0.8839
PhoneServiceYes NA NA NA NA
SeniorCitizen 0.1856454 0.1023527 1.814 0.0697
.
StreamingMoviesNo internet service NA NA NA NA
StreamingMoviesYes 0.5260087 0.3899615 1.349 0.1774
StreamingTVNo internet service NA NA NA NA
StreamingTVYes 0.4781321 0.3905777 1.224 0.2209
TechSupportNo internet service NA NA NA NA
TechSupportYes -0.2511197 0.2181612 -1.151 0.2497
tenure -0.0702813 0.0077113 -9.114 < 2e-16
***
TotalCharges 0.0004276 0.0000874 4.892 9.97e-07
***
On Thu, Mar 10, 2016 at 4:05 PM, David Winsemius <dwinsemius at
comcast.net>
wrote:
>
> > On Mar 10, 2016, at 8:08 AM, Michael Artz <michaeleartz at
gmail.com>
> wrote:
> >
> > HI all,
> > I have the following error -
> >> resultVector <- predict(logitregressmodel, dataset1,
type='response')
> > Warning message:
> > In predict.lm(object, newdata, se.fit, scale = 1, type = ifelse(type
=> :
> > prediction from a rank-deficient fit may be misleading
>
> It wasn't an R error. It was an R warning. Was the `summary` output on
> logitregressmodel informative? Does the resultVector look sensible given
> its inputs?
>
>
> > I have seen on internet that there may be some collinearity in the
data
> and
> > this is causing that. How can I be sure?
>
> Do some diagnostics. After looking carefully at the output of
> summary(logitregressmodel) and perhaps summary(dataset1) if it was the
> original input to the modeling functions, and then you could move on to
> looking at cross-correlations on things you think are continuous and
> crosstabs on factor variables and the condition number on the full data
> matrix.
>
> Lots of stuff turns up on search for "detecting collinearity condition
> number in r"
>
> >
> > Thanks
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius
> Alameda, CA, USA
>
>
[[alternative HTML version deleted]]