Hi there, I am really new to statistics in R and statistics itself as well. My situation: I ran a lot of OLS regressions with different independent variables. (using the lm() function). After having done that, I know there is endogeneity due to omitted variables. (or perhaps due to any other reasons). And here comes the Hausman test. I know this test is used to identify endogeneity. But what I am not sure about is: "Can I use the Hausman test in a simple OLS regression or is it only possible in a 2SLS regression model?" "And if it is possible to use it, how can I do it?" Info about the data: data = lots of data :) x1 <- data$x1 x2 <- data$x2 x3 <- data$x3 x4 <- data$x4 y1 <- data$y1 reg1 <- summary(lm(y1 ~ x1 + x2 + x3 + x4)) Thanks in advance for any support! -- View this message in context: http://r.789695.n4.nabble.com/Hausman-test-in-R-tp4647716.html Sent from the R help mailing list archive at Nabble.com.
1. These are primarily statistics issues, not R issues. You should post on a statistical help list like stats.stackexchange.com, not here. 2. However, given your acknowledged statistical ignorance, you may be asking for trouble. I suggest you seek help from a local statistical expert to get you started. Then, depending on your statistical background, you may understand enough to drive safely on your own. Also try at the R command prompt: install.packages("fortunes") library(fortunes) fortune("brain surgery") Cheers, Bert On Sun, Oct 28, 2012 at 1:33 PM, fxen3k <f.sehardt at gmail.com> wrote:> Hi there, > > I am really new to statistics in R and statistics itself as well. > My situation: I ran a lot of OLS regressions with different independent > variables. (using the lm() function). > After having done that, I know there is endogeneity due to omitted > variables. (or perhaps due to any other reasons). > And here comes the Hausman test. I know this test is used to identify > endogeneity. > But what I am not sure about is: "Can I use the Hausman test in a simple OLS > regression or is it only possible in a 2SLS regression model?" "And if it is > possible to use it, how can I do it?" > > Info about the data: > > data = lots of data :) > > x1 <- data$x1 > x2 <- data$x2 > x3 <- data$x3 > x4 <- data$x4 > y1 <- data$y1 > > reg1 <- summary(lm(y1 ~ x1 + x2 + x3 + x4)) > > Thanks in advance for any support! > > > > -- > View this message in context: http://r.789695.n4.nabble.com/Hausman-test-in-R-tp4647716.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
Hi, I can think of no reason a Hausman test could not be used for OLS---it is a comparison of vectors of coefficients from different models usually assumed to produce similar estimates under certain conditions. Dissimilarity is taken as indicative of a lack of some or all the conditions required for the two models to yield similar parameters. I suggest you look at the plm and systemfit packages. They have many functions for OLS, 2SLS, tests of endogeneity, etc. The plm (and maybe systemfit?) package also has a vignette which is a good thing to read. It has a lot of useful information on the code and examples of comparing different types of models, that you may find instructive. Hope this helps, Josh On Sun, Oct 28, 2012 at 1:33 PM, fxen3k <f.sehardt@gmail.com> wrote:> Hi there, > > I am really new to statistics in R and statistics itself as well. > My situation: I ran a lot of OLS regressions with different independent > variables. (using the lm() function). > After having done that, I know there is endogeneity due to omitted > variables. (or perhaps due to any other reasons). > And here comes the Hausman test. I know this test is used to identify > endogeneity. > But what I am not sure about is: "Can I use the Hausman test in a simple > OLS > regression or is it only possible in a 2SLS regression model?" "And if it > is > possible to use it, how can I do it?" > > Info about the data: > > data = lots of data :) > > x1 <- data$x1 > x2 <- data$x2 > x3 <- data$x3 > x4 <- data$x4 > y1 <- data$y1 > > reg1 <- summary(lm(y1 ~ x1 + x2 + x3 + x4)) > > Thanks in advance for any support! > > > > -- > View this message in context: > http://r.789695.n4.nabble.com/Hausman-test-in-R-tp4647716.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ [[alternative HTML version deleted]]
Hello. Well said Joshua. May I add that in an "OLS" context (which i take as also meaning: no panel structure) what you probably want to do is the so-called Durbin-Wu-Hausman test for endogeneity, as explained e.g. here: http://kurt.schmidheiny.name/teaching/iv2up.pdf see Section 11 for the idea, and 13 for the R implementation. Best wishes, Giovanni ---------- original message -------------------- Date: Sun, 28 Oct 2012 16:03:43 -0700 From: Joshua Wiley <jwiley.psych at gmail.com> To: fxen3k <f.sehardt at gmail.com> Cc: r-help at r-project.org Subject: Re: [R] Hausman test in R Message-ID: <CANz9Z_+2k3QWazrAZqZ09NSFaj_431A2YLRPgswvNbo6PONMAQ at mail.gmail.com> Content-Type: text/plain Hi, I can think of no reason a Hausman test could not be used for OLS---it is a comparison of vectors of coefficients from different models usually assumed to produce similar estimates under certain conditions. Dissimilarity is taken as indicative of a lack of some or all the conditions required for the two models to yield similar parameters. I suggest you look at the plm and systemfit packages. They have many functions for OLS, 2SLS, tests of endogeneity, etc. The plm (and maybe systemfit?) package also has a vignette which is a good thing to read. It has a lot of useful information on the code and examples of comparing different types of models, that you may find instructive. Hope this helps, Josh On Sun, Oct 28, 2012 at 1:33 PM, fxen3k <f.sehardt at gmail.com> wrote:> Hi there, > > I am really new to statistics in R and statistics itself as well. > My situation: I ran a lot of OLS regressions with differentindependent> variables. (using the lm() function). > After having done that, I know there is endogeneity due to omitted > variables. (or perhaps due to any other reasons). > And here comes the Hausman test. I know this test is used to identify > endogeneity. > But what I am not sure about is: "Can I use the Hausman test in asimple> OLS > regression or is it only possible in a 2SLS regression model?" "And ifit> is > possible to use it, how can I do it?" > > Info about the data: > > data = lots of data :) > > x1 <- data$x1 > x2 <- data$x2 > x3 <- data$x3 > x4 <- data$x4 > y1 <- data$y1 > > reg1 <- summary(lm(y1 ~ x1 + x2 + x3 + x4)) > > Thanks in advance for any support! > > > > -- > View this message in context: > http://r.789695.n4.nabble.com/Hausman-test-in-R-tp4647716.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ ----------- end original message ------------- ? Ai sensi del D.Lgs. 196/2003 si precisa che le informazi...{{dropped:12}}
Given my "acknowledged statistical ignorance", I tried to find a *solution *in this forum... And this is not primarily a statistical issue, it is an issue about the Hausman test in the R environment. I cannot imagine, no one in this forum has ever done a Hausman test on OLS regressions. I read in the systemfit package and found only this example referring to 2SLS and 3SLS regressions: data( "Kmenta" ) eqDemand <- consump ~ price + income eqSupply <- consump ~ price + farmPrice + trend inst <- ~ income + farmPrice + trend system <- list( demand = eqDemand, supply = eqSupply ) ## perform the estimations fit2sls <- systemfit( system, "2SLS", inst = inst, data = Kmenta ) fit3sls <- systemfit( system, "3SLS", inst = inst, data = Kmenta ) ## perform the Hausman test h <- hausman.systemfit( fit2sls, fit3sls ) print( h ) -- View this message in context: http://r.789695.n4.nabble.com/Hausman-test-in-R-tp4647716p4647774.html Sent from the R help mailing list archive at Nabble.com.
On 29 October 2012 16:56, fxen3k <f.sehardt at gmail.com> wrote: snip If we are talking about the same test a Hausman test can not be applied to OLS regressions. As you have already been told you must have two estimates of the same set of coefficients to do a Hausman test. Suppose that you do OLS and an IV estimates of a particular regression you will get twu estimates of the coefficients in the model. If the disturbances are not correlated with the explanatory variables (no endogeneity) the two sets of coefficients will no be similar. If there is endogeneity the coefficients will be different. The Hausman test is a test of the null that the coefficients are not different. If the null is accepted you will probably accept the OLS regression. If the null is rejected you may consider the IV estimate. A Hausman test is applicable in many other situations (fixed v random effects etc.) You may have problems with the estimate of the covariance matrix used in the test as on occasion as, due to numerical problems, the estimates of that matrix are not always positive definite. Most intermediate level econometrics textbooks will have a good account of the Hausman test. Green(2012), Econometric Analysis 7th edition, Prentice Hall. contains a comprehensive discussion of these matters which you might read. It is not easy but if you master the basic concepts there, your questions about their implementation in R are likely to be answered on this forum. Best Regards John> I cannot imagine, no one in this forum has ever done a Hausman test on OLS > regressions. > I read in the systemfit package and found only this example referring to > 2SLS and 3SLS regressions: > > data( "Kmenta" ) > eqDemand <- consump ~ price + income > eqSupply <- consump ~ price + farmPrice + trend > inst <- ~ income + farmPrice + trend > system <- list( demand = eqDemand, supply = eqSupply ) > ## perform the estimations > fit2sls <- systemfit( system, "2SLS", inst = inst, data = Kmenta ) > fit3sls <- systemfit( system, "3SLS", inst = inst, data = Kmenta ) > ## perform the Hausman test > h <- hausman.systemfit( fit2sls, fit3sls ) > print( h ) > > > > > -- > View this message in context: http://r.789695.n4.nabble.com/Hausman-test-in-R-tp4647716p4647774.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- John C Frain Economics Department Trinity College Dublin Dublin 2 Ireland www.tcd.ie/Economics/staff/frainj/home.html mailto:frainj at tcd.ie mailto:frainj at gmail.com
Thanks for your answer, John! Having read in Wooldridge, Verbeek and Hausman himself, I tried to figure out how this whole Hausman test works. I tried to figure out, if endogeneity exists in my particular case. So I did this Y ~ X + Z + Rest + error term [# this is the the original regression with Z = instrumental variable for X, X = potentially endogenous variable and Rest = more independent variables] Regression 1: X ~ Z + Rest + error term Regression 2: Y ~ X + Rest + residuals(Reg1) + error [# I took the residuals from Regression 1 by Reg1_resid <- cbind(Red1$resid) Finally, if the coefficient for the residuals is statistically significant, there is endogeneity. Is this approach correct? p.s: My p-value is 0.1138... Thanks for your help -- View this message in context: http://r.789695.n4.nabble.com/Hausman-test-in-R-tp4647716p4647800.html Sent from the R help mailing list archive at Nabble.com.