Marcus Tullius
2012-Sep-05 07:40 UTC
[R] How to effectively remove Outliers from a binary logistic regression in R
Hallo there, greetings from Germany. I have a simple question for you. I have run a binary logistic model, but there are lots of outliers distorting the real results. I have tried to get rid of the outliers using the following commands: remove = -c(56, 303, 365, 391, 512, 746, 859, 940, 1037, 1042, 1138, 1355) MIGRATION.rebuild <- glm(MIGRATION, subset=remove) influence(MIGRATION.rebuild) influence.measures(MIGRATION.rebuild) BUT it did not work. My question is: *Do you know a simple R-command which erases outliers and rebuilds the model without them?* I am including my model below so that you may have an idea of how I am trying to do it. Thanks in advance for your help. Francisco M. da Rocha [[alternative HTML version deleted]]
Jim Lemon
2012-Sep-05 10:15 UTC
[R] How to effectively remove Outliers from a binary logistic regression in R
On 09/05/2012 05:40 PM, Marcus Tullius wrote:> Hallo there, > > greetings from Germany. > > I have a simple question for you. > > I have run a binary logistic model, but there are lots of outliers distorting the real results. > > I have tried to get rid of the outliers using the following commands: > > remove = -c(56, 303, 365, 391, 512, 746, 859, 940, 1037, 1042, 1138, 1355) > MIGRATION.rebuild<- glm(MIGRATION, subset=remove) > influence(MIGRATION.rebuild) > influence.measures(MIGRATION.rebuild) > > BUT it did not work. > > > My question is: > > *Do you know a simple R-command which erases outliers and rebuilds the model without them?* > > I am including my model below so that you may have an idea of how I am trying to do it. >Hi Francisco, Your model didn't make it to the help list, but I think that the problem is in your attempt to use the "subset" argument in glm. The vector is supposed to include the indices of the values that you _want_ in the analysis, and it looks like you are trying to remove the values that you _don't_ want. Say you have 2000 rows in your data frame in the model. The "subset" argument should look something like this: glm(MIGRATION, subset=!(1:2000 %in% c(56,303,365,391,512,746,859,940,1037,1042,1138, 1355)) Jim