Hello, I'm working on some fairly standard regression models (linear, logistic, and poisson.) Unfortunately, the data is rather messy. A visual inspection, using either a histogram or a density plot indicates some significant outliers. Furthermore, summary statistics of the data indicate the same thing. If I fit a linear regression in R using the "lm" command, I can then plot the model to look at residuals, etc. I'm interesting in re-fitting the model with a N% of the high leverage points removed. (Large data set, want to fit "most" of the data.) Is there a computational way to get the leverage for each data point? That way I can subset the data skipping N% of the highest leverage ones. Thanks! -- Noah Silverman, M.S., C.Phil UCLA Department of Statistics 8117 Math Sciences Building Los Angeles, CA 90095
see the hatvalues function in the car package. also, I highly recommend john's CAR book. there's a new edition that came out a year or so ago. On Fri, Jul 19, 2013 at 6:14 PM, Noah Silverman <noahsilverman@ucla.edu>wrote:> Hello, > > I'm working on some fairly standard regression models (linear, logistic, > and poisson.) Unfortunately, the data is rather messy. > > A visual inspection, using either a histogram or a density plot indicates > some significant outliers. Furthermore, summary statistics of the data > indicate the same thing. > > If I fit a linear regression in R using the "lm" command, I can then plot > the model to look at residuals, etc. > > I'm interesting in re-fitting the model with a N% of the high leverage > points removed. (Large data set, want to fit "most" of the data.) > > Is there a computational way to get the leverage for each data point? > That way I can subset the data skipping N% of the highest leverage ones. > > > Thanks! > > > -- > Noah Silverman, M.S., C.Phil > UCLA Department of Statistics > 8117 Math Sciences Building > Los Angeles, CA 90095 > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
The help page for plot.lm includes mention of ‘lm.influence’, ‘cooks.distance’, and 'hatvalues’ in the "See Also", section. Do any of those accomplish what you want? On Fri, Jul 19, 2013 at 4:14 PM, Noah Silverman <noahsilverman@ucla.edu>wrote:> Hello, > > I'm working on some fairly standard regression models (linear, logistic, > and poisson.) Unfortunately, the data is rather messy. > > A visual inspection, using either a histogram or a density plot indicates > some significant outliers. Furthermore, summary statistics of the data > indicate the same thing. > > If I fit a linear regression in R using the "lm" command, I can then plot > the model to look at residuals, etc. > > I'm interesting in re-fitting the model with a N% of the high leverage > points removed. (Large data set, want to fit "most" of the data.) > > Is there a computational way to get the leverage for each data point? > That way I can subset the data skipping N% of the highest leverage ones. > > > Thanks! > > > -- > Noah Silverman, M.S., C.Phil > UCLA Department of Statistics > 8117 Math Sciences Building > Los Angeles, CA 90095 > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Gregory (Greg) L. Snow Ph.D. 538280@gmail.com [[alternative HTML version deleted]]