Achim Zeileis
2006-Jan-05 15:41 UTC
[R] Wald tests and Huberized variances (was: A comment about R:)
On Wed, 4 Jan 2006, Peter Muhlberger wrote: One comment in advance: please use a more meaningful subject. I would have missed this mail if a colleague hadn't pointed me to it.> I'm someone who from time to time comes to R to do applied stats for social > science research.[snip]> I would also prefer not to have to work through a > couple books on R or S+ to learn how to meet common needs in R. If R wereThere are some overviews and pointers available for certain topics, so-called CRAN task views: http://CRAN.R-project.org/src/contrib/Views/ Currently, there is not yet a "SocialSciences" view (but John Fox is working on one). However, it might as be interesting for you to look at the "Econometrics" view which has some remarks about Wald tests.> Ex. 1) Wald tests of linear hypotheses after max. likelihood or even after > a regression. "Wald" does not even appear in my standard R package on a > search.You might want to look at waldtest() and coeftest() in package lmtest. And you seem to have discovered linear.hypothesis() in package car. All three perform Wald tests, providing different means of specifying the hypothesis/alternative of the tests.> There's no comment in the lm help or optim help about what function > to use for hypothesis tests.Well, the lm() man page does say: The functions 'summary' and 'anova' are used to obtain and print a summary and analysis of variance table of the results. As for optim() it is not that straightforward, because optim() does not know whether it maximizes a proper likelihood or not.> I know that statisticians prefer likelihood > ratio tests, but Wald tests are still useful and indeed crucial for > first-pass analysis. After searching with Google for some time, I found > several Wald functions in various contributed R packages I did not have > installed. One confusion was which one would be relevant to my needs. This > took some time to resolve.Yes, this is a problem that is at least partly addressed by the CRAN task views.> I concluded, perhaps on insufficient evidence, > that package car's Wald test would be most helpful. To use it, however, one > has to put together a matrix for the hypotheses, which can be arduous for a > many-term regression or a complex hypothesis. In comparison, in Stata one > simply states the hypothesis in symbolic terms.waldtest() does the latter and is linked in the "See Also" section of linear.hypothesis()> I also don't know for > certain that this function in car will work or work properly w/ various > kinds of output, say from lm or from optim.The man page of linear.hypothesis() does say that there are methods for "lm" and "glm" objects (but not for results from optim).> Ex. 2) Getting neat output of a regression with Huberized variance matrix. > I frequently have to run regressions w/ robust variances. In Stata, one > simply adds the word "robust" to the end of the command or > "cluster(cluster.variable)" for a cluster-robust error. In R, there are two > functions, robcov and hccm. I had to run tests to figure out what the > relationship is between them and between them and Stata (robcov w/o cluster > gives hccm's hc0; hccm's hc1 is equivalent to Stata's 'robust' w/o cluster; > etc.).This is rather clearly document on the respective man pages. hccm() provides HC covariance matrices without clustering, as does vcovHC() in package sandwich. I plan to extend vcovHC() to also deal with clustered data, but I didn't get round to do so, yet.> A single sentence in hccm's help saying something to the effect that > statisticians prefer hc3 for most types of data might save me from having to > scramble through the statistical literature to try to figure out which of > these I should be using. A few sentences on what the differences are > between these methods would be even better.Yes and no. I'll add some more comments about the different HC-type covariance matrices, but on the other hand this is just the software which cannot replace understanding the underlying theory.> Then, there's the problem of > output. Given that hc1 or hc3 are preferred for non-clustered data, I'd > need to be able to get regression output of the form summary(lm) out of > hccm, for any practical use. Getting this, however, would require > programming my own function.Or using coeftest() from package lmtest intended particularly for this.> Huberized t-stats for regressions are > commonplace needs, an R oriented a little toward more everyday needs would > not require programming of such needs. Also, I'm not sure yet how well any > of the existing functions handle missing data.When fitting a linear model via lm() you can specify a suitable na.action. The released version of lmtest and sandwich can deal with Wald tests and sandwich covariance matrix estimators for linear models. I've got development versions ready which make the functions fully object-oriented and thus applicable to "glm" or "survreg" objects (for censored/tobit regression) as well. I plan to release these soon, contact me if you want to have a devel snapshot. Best wishes, Z
Peter Muhlberger
2006-Jan-05 17:43 UTC
[R] Wald tests and Huberized variances (was: A comment about R:)
Hi Achim: Your reply is tremendously helpful in addressing some of the outstanding questions I had about R. The 'econometrics view' materials look exactly like what I needed. Many thanks! But, there is a second point here, which is how difficult it was for me, as someone just becoming more familiar w/ R's more basic capabilities (in the past I've focused on features like optim, sem), to find what seem to me like standard & key features I've taken for granted in other packages. I looked high & low in my existing installed packages for the standard version of R, I googled, I looked in the r-help archives, I looked through several manuals / introductions to R I had downloaded. I've asked questions about all of the points I raised in my email on this email list before. I believe I passed through the parent directory for the econometric view material at the website w/o realizing what it contained because I thought of "computational econometrics" as having to do w/ running Monte Carlo models of economic processes. If R wants to bring in a wider audience, one thing that might help is a denser set of cross-references. For example, perhaps lm's help should mention the econometrics view materials as well as other places to look for tests and procedures people may want to do w/ lm. Another thought is that perhaps the standard R package help should allow people to find non-installed but commonly used contributed packages and perhaps their help page contents. A feature that would be very helpful for me is the capacity to search all the contents of help files, not just keywords that at times seem to miss what I'm trying to find. Cheers, Peter
bogdan romocea
2006-Jan-05 21:42 UTC
[R] Wald tests and Huberized variances (was: A comment about R:)
Peter Muhlberger wrote:> But, there is a second point here, which is how difficult it > was for me [...] to find what seem to me like standard & key > features I've taken for granted in other packages.There is another side to this. Don't consider only how difficult it was to find what you were looking for; also remember to be _glad_ that there are so many packages and features to choose from. IMHO, the benefit of having a lot of packages dwarfs all the efforts needed to locate the right ones.> -----Original Message----- > From: r-help-bounces at stat.math.ethz.ch > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Peter > Muhlberger > Sent: Thursday, January 05, 2006 12:44 PM > To: Achim Zeileis > Cc: r-help at stat.math.ethz.ch > Subject: Re: [R] Wald tests and Huberized variances (was: A > comment about R:) > > > Hi Achim: Your reply is tremendously helpful in addressing > some of the > outstanding questions I had about R. The 'econometrics view' > materials look > exactly like what I needed. Many thanks! > > But, there is a second point here, which is how difficult it > was for me, as > someone just becoming more familiar w/ R's more basic > capabilities (in the > past I've focused on features like optim, sem), to find what > seem to me like > standard & key features I've taken for granted in other > packages. I looked > high & low in my existing installed packages for the standard > version of R, > I googled, I looked in the r-help archives, I looked through > several manuals > / introductions to R I had downloaded. I've asked questions > about all of > the points I raised in my email on this email list before. I > believe I > passed through the parent directory for the econometric view > material at the > website w/o realizing what it contained because I thought of > "computational > econometrics" as having to do w/ running Monte Carlo models > of economic > processes. > > If R wants to bring in a wider audience, one thing that might > help is a > denser set of cross-references. For example, perhaps lm's help should > mention the econometrics view materials as well as other > places to look for > tests and procedures people may want to do w/ lm. Another > thought is that > perhaps the standard R package help should allow people to find > non-installed but commonly used contributed packages and > perhaps their help > page contents. A feature that would be very helpful for me > is the capacity > to search all the contents of help files, not just keywords > that at times > seem to miss what I'm trying to find. > > Cheers, > > Peter > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html >