For each element in w I want to find a good match (subscript number) of an
element in x. x and w can be long. Instead of just finding the closest match I
want to use weighted multinomial sampling (which I've already figured out
once I have the probabilities) where the probabilities come from the tricube
function of absolute differences between donor and target values, but normalized
to sum to one, and using the maximum absolute difference as the scaling factor.
This is similar to the loess weighting function with f=1. Here's code that
works, to get the probability matrix to use for sampling:
z <- abs(outer(w, x, "-"))
s <- apply(z, 1, max)
z <- (1 - sweep(z, 1, s, FUN='/')^3)^3
sums <- apply(z, 1, sum)
z <- sweep(z, 1, sums, FUN='/')
Example:
w <- c(1,2,3,7)
x <- c(0,1.5,3)
z <- abs(outer(w,x,"-"))> z
[,1] [,2] [,3]
[1,] 1 0.5 2
[2,] 2 0.5 1
[3,] 3 1.5 0
[4,] 7 5.5 4
s <- apply(z, 1, max)
z <- (1 - sweep(z, 1, s, FUN='/')^3)^3
z
[1,] 0.6699219 0.9538536 0.0000000
[2,] 0.0000000 0.9538536 0.6699219
[3,] 0.0000000 0.6699219 1.0000000
[4,] 0.0000000 0.1365445 0.5381833
sums <- apply(z, 1, sum)
z <- sweep(z, 1, sums, FUN='/')
z # each row represents multinomial probabilities summing to 1
[1,] 0.4125705 0.5874295 0.0000000
[2,] 0.0000000 0.5874295 0.4125705
[3,] 0.0000000 0.4011696 0.5988304
[4,] 0.0000000 0.2023697 0.7976303
The code is moderately fast. Does anyone know of a significantly faster method
or have any comments on the choice of weighting function for such sampling?
This will be used in the context of predictive mean matching for multiple
imputation. Thanks - Frank
--
Frank E Harrell Jr Prof. of Biostatistics & Statistics
Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences
U. Virginia School of Medicine http://hesweb1.med.virginia.edu/biostat