Ridgeway, Greg
2006-Sep-18 20:04 UTC
[R] Propensity score modeling using machine learning methods. WAS: RE: LARS for generalized linear models
There may be benefits to having a machine learning method that explicitly targets covariate balance. We have experimented with optimizing the weights directly to obtain the best covariate balance, but got some strange solutions for simple cases that made us wary of such methods. Machine learning methods that yield calibrated probability estimates should do well (e.g. those that optimize the logistic log-likelihood). Methods that only seek a decision boundary (SVM comes to mind) can be give great classifiers but offer poor probability estimates and then the propensity score weights are a mess. We've had a lot of success in practice using gbm and selecting the number of iterations to optimize balance. You can try the ps() function in the twang package which wraps up gbm and balance optimization in a single function. It's slow for large datasets but it gets the job done. Including additional variables in a weighted regression is a great protective step. It can reduce both bias and variance and can produce "doubly robust" estimates of the treatment effect (see Bang & Robins 2005 for an example). Greg -----Original Message----- From: Ravi Varadhan [mailto:rvaradhan at jhmi.edu] Sent: Monday, September 18, 2006 12:38 PM To: Ridgeway, Greg; r-help at stat.math.ethz.ch Subject: Propensity score modeling using machine learning methods. WAS: RE: [R] LARS for generalized linear models Thanks very much, Greg. I will certainly look at glmpath. My goal is to develop (nearly) automatic and flexible procedures for estimating causal effects of risk factors in observational epidemiological studies. A major part of this is the development of a propensity score model (when the exposure is binary). I would like to use tools/approaches that can do this semi-automatically so that the resulting model has both low prediction error and good covariate balance. I have read your paper (McCaffrey, Ridgeway and Morral 2004), which uses a gradient boosting machine (gbm) to build a logistic regression model for propensity score. I was wondering whether there are other tools that can also address this problem, for example, glmpath or MARS? An important question is whether these "machine learning" methods, mainly focused on a good prediction rule, can also achieve a good covariate balance between the treatment groups, since "balance" is not explicitly built into the cost function. If there is significant imbalance, incorporating such covariates into the regression model for outcomes, and performing a weighted least squares analysis (with estimated propensity score as weights) should be reasonable. Am I right? I would appreciate comments on these points. Thanks very much. Best, Ravi. ------------------------------------------------------------------------ ---- ------- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: rvaradhan at jhmi.edu Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html ------------------------------------------------------------------------ ---- -------- -----Original Message----- From: Ridgeway, Greg [mailto:gregr at rand.org] Sent: Monday, September 18, 2006 2:17 PM To: r-help at stat.math.ethz.ch Cc: Ravi Varadhan Subject: Re: [R] LARS for generalized linear models Check out Park & Hastie's glmpath package. They have a really clever analysis and implementation of a generalized least angle regression. Greg>On Fri, 2006-09-15 at 18:49 -0400, Ravi Varadhan wrote: > > Is there an R implementation of least angle regression for binaryresponse> > modeling? I know that this question has been asked before, and I amalso> > aware of the "lasso2" package, but that only implements an L1penalty, i.e.> > the Lasso approach. > > > Madigan and Ridgeway in their discussion of Efron et al (2004)describe a> > LARS-type algorithm for generalized linear models. Has anyoneimplemented> > this in R?-------------------- This email message is for the sole use of the intended recip...{{dropped}}
Maybe Matching Threads
- error when using ps() function on categorical variables - re propensity score matching
- LARS for generalized linear models
- twang - Toolkit for Weighting and Analysis of Nonequivalent Groups
- [OT] propensity score implementation
- propensity score matching estimates?