Hi.
I am using the Matching package for propensity score matching. For each
treated unit, I want to find all control units whose propensity scores lie
within a certain distance from the treated unit. The sample code is as
follows:
> library(Matching)
> x <- rnorm(100000)
> y <- rnorm(100000)
> z <- rbinom(100000,1,0.002)
> logit.reg <- glm(z~x+y,family=binomial(link='logit'))
> match <-
>
Match(Y=NULL,Tr=z,X=logit.reg$fitted,version='fast',ties=TRUE,M=1,distance.tolerance=1e-5)
According to the function definition
(http://sekhon.berkeley.edu/matching/Match.html):
"distance.tolerance: This is a scalar which is used to determine if
distances between two observations are different from zero. Values less than
distance.tolerance are deemed to be equal to zero. This option can be used
to perform a type of optimal matching"
Thus, for each treated unit I should get all control units whose difference
in propensity scores from the treated unit is less than 1e-5. However, the
actual difference between the treated unit's and the control units'
propensity is distributed as follows:
>
summary(abs(logit.reg$fitted[match$index.treated]-logit.reg$fitted[match$index.control]))
Min. 1st Qu. Median Mean 3rd Qu. Max.
7.453e-13 2.959e-07 5.849e-07 5.842e-07 8.741e-07 1.167e-06
The maximum difference is only 1.167e-6 instead of the 1e-5 I expected.
Similarly, when I set higher tolerances I get:
> match <-
>
Match(Y=NULL,Tr=z,X=logit.reg$fitted,version='fast',ties=TRUE,M=1,distance.tolerance=2e-5)
>
summary(abs(logit.reg$fitted[match$index.treated]-logit.reg$fitted[match$index.control]))
Min. 1st Qu. Median Mean 3rd Qu. Max.
7.453e-13 4.133e-07 8.208e-07 8.230e-07 1.232e-06 1.652e-06
> match <-
>
Match(Y=NULL,Tr=z,X=logit.reg$fitted,version='fast',ties=TRUE,M=1,distance.tolerance=3e-5)
>
summary(abs(logit.reg$fitted[match$index.treated]-logit.reg$fitted[match$index.control]))
Min. 1st Qu. Median Mean 3rd Qu. Max.
7.453e-13 5.051e-07 1.006e-06 1.008e-06 1.514e-06 2.022e-06
> match <-
>
Match(Y=NULL,Tr=z,X=logit.reg$fitted,version='fast',ties=TRUE,M=1,distance.tolerance=4e-5)
>
summary(abs(logit.reg$fitted[match$index.treated]-logit.reg$fitted[match$index.control]))
Min. 1st Qu. Median Mean 3rd Qu. Max.
7.453e-13 5.818e-07 1.162e-06 1.166e-06 1.750e-06 2.365e-06
So, although there are more control units available with distances greater
than 1.167e-6, for some reason the function doesn't select those and instead
clips it at this value even when the tolerance is set at 1e-5. Similar
issues occur at higher tolerances. I really hope someone can help me resolve
this.
Thanks a lot!
--
View this message in context:
http://r.789695.n4.nabble.com/Matching-package-Match-function-tp3406144p3406144.html
Sent from the R help mailing list archive at Nabble.com.