I'm new to R and some what new to the world of stats. I got frustrated with excel and found R. Enough of that already. I'm trying to test and correct for Heteroskedasticity I have data in a csv file that I load and store in a dataframe. > ds <- read.csv("book2.csv") > df <- data.frame(ds) I then preform a OLS regression: > lmfit <- lm(df$y~df$x) To test for Heteroskedasticity, I run the BPtest: > bptest(lmfit) studentized Breusch-Pagan test data: lmfit BP = 11.6768, df = 1, p-value = 0.0006329 From the above, if I'm interpreting this correctly, there is Heteroskedasticity present. To correct for this, I need to calculate robust error terms. From my reading on this list, it seems like I need to vcovHC. > vcovHC(lmfit) (Intercept) df$x (Intercept) 1.057460e-03 -4.961118e-05 df$x -4.961118e-05 2.378465e-06 I'm having a little bit of a hard time following the help pages. So is the first column the intercepts and the second column new standard errors? Thanks, mojo
On Jan 20, 2011, at 2:08 PM, Mojo wrote:> I'm new to R and some what new to the world of stats. I got > frustrated with excel and found R. Enough of that already. > > I'm trying to test and correct for Heteroskedasticity > > I have data in a csv file that I load and store in a dataframe. > > > ds <- read.csv("book2.csv") > > df <- data.frame(ds) > > I then preform a OLS regression: > > > lmfit <- lm(df$y~df$x) > > To test for Heteroskedasticity, I run the BPtest: > > > bptest(lmfit) > > studentized Breusch-Pagan test > > data: lmfit > BP = 11.6768, df = 1, p-value = 0.0006329 > > From the above, if I'm interpreting this correctly, there is > Heteroskedasticity present. To correct for this, I need to > calculate robust error terms. From my reading on this list, it > seems like I need to vcovHC. > > > vcovHC(lmfit) > (Intercept) df$x > (Intercept) 1.057460e-03 -4.961118e-05 > df$x -4.961118e-05 2.378465e-06 > > I'm having a little bit of a hard time following the help pages. So > is the first column the intercepts and the second column new > standard errors?No, It's a variance-covariance matrix, so all of the elements are variance estimates. To get what you are expecting ... the SE's of the coefficients (which are the diagonal elements of a var-covar matrix, .... you would wrap sqrt(diag(.)) around that object.> > Thanks, > mojo > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD West Hartford, CT
On Thu, 20 Jan 2011, Mojo wrote:> I'm new to R and some what new to the world of stats. I got frustrated with > excel and found R. Enough of that already. > > I'm trying to test and correct for Heteroskedasticity > > I have data in a csv file that I load and store in a dataframe. > >> ds <- read.csv("book2.csv") >> df <- data.frame(ds) > > I then preform a OLS regression: > >> lmfit <- lm(df$y~df$x)Just btw: lm(y ~ x, data = df) is somewhat easier to read and also easier to write when the formula involves more regressors.> To test for Heteroskedasticity, I run the BPtest: > >> bptest(lmfit) > > studentized Breusch-Pagan test > > data: lmfit > BP = 11.6768, df = 1, p-value = 0.0006329 > > From the above, if I'm interpreting this correctly, there is > Heteroskedasticity present. To correct for this, I need to calculate robust > error terms.That is one option. Another one would be using WLS instead of OLS - or maybe FGLS. As the model just has one regressor, this might be possible and result in a more efficient estimate than OLS.> From my reading on this list, it seems like I need to vcovHC.That's another option, yes.>> vcovHC(lmfit) > (Intercept) df$x > (Intercept) 1.057460e-03 -4.961118e-05 > df$x -4.961118e-05 2.378465e-06 > > I'm having a little bit of a hard time following the help pages.Yes, the manual page is somewhat technical but the first thing the "Details" section does is: It points you to some references that should be easier to read. I recommend starting with Zeileis A (2004), Econometric Computing with HC and HAC Covariance Matrix Estimators. _Journal of Statistical Software_, *11*(10), 1-17. URL <URL: http://www.jstatsoft.org/v11/i10/>. That has also some worked examples.> So is the first column the intercepts and the second column new standard > errors?As David pointed out, it's the full covariance matrix estimate. hth, Z> Thanks, > mojo > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >