thr3ads.net - R help - [R] Finding pairs with least magnitude difference from mean [Feb 2011]

If this information is useful, please help other people find it:
Share via:

jctoll

2011-Feb-26 01:59 UTC

[R] Finding pairs with least magnitude difference from mean

Hi,

I have what I think is some kind of linear programming question.
Basically, what I want to figure out is if I have a vector of numbers,
> x <- rnorm(10)
> x [1] -0.44305959 -0.26707077  0.07121266  0.44123714 -1.10323616
-0.19712807  0.20679494 -0.98629992  0.97191659 -0.77561593
> mean(x)[1] -0.2081249

Using each number only once, I want to find the set of five pairs
where the magnitude of the differences between the mean(x) and each
pairs sum is least.
> y <- outer(x, x, "+") - (2 * mean(x))
> y             [,1]        [,2]        [,3]        [,4]       [,5]
 [,6]       [,7]       [,8]      [,9]       [,10]
 [1,] -0.46986936 -0.29388054  0.04440289  0.41442737 -1.1300459
-0.22393784  0.1799852 -1.0131097 0.9451068 -0.80242569
 [2,] -0.29388054 -0.11789173  0.22039171  0.59041619 -0.9540571
-0.04794902  0.3559740 -0.8371209 1.1210956 -0.62643688
 [3,]  0.04440289  0.22039171  0.55867514  0.92869962 -0.6157737
0.29033441  0.6942574 -0.4988374 1.4593791 -0.28815345
 [4,]  0.41442737  0.59041619  0.92869962  1.29872410 -0.2457492
0.66035889  1.0642819 -0.1288130 1.8294035  0.08187104
 [5,] -1.13004593 -0.95405711 -0.61577368 -0.24574920 -1.7902225
-0.88411441 -0.4801914 -1.6732863 0.2849302 -1.46260226
 [6,] -0.22393784 -0.04794902  0.29033441  0.66035889 -0.8841144
0.02199368  0.4259167 -0.7671782 1.1910383 -0.55649417
 [7,]  0.17998518  0.35597399  0.69425742  1.06428191 -0.4801914
0.42591670  0.8298397 -0.3632552 1.5949614 -0.15257116
 [8,] -1.01310969 -0.83712087 -0.49883744 -0.12881296 -1.6732863
-0.76717817 -0.3632552 -1.5563500 0.4018665 -1.34566603
 [9,]  0.94510682  1.12109563  1.45937907  1.82940355  0.2849302
1.19103834  1.5949614  0.4018665 2.3600830  0.61255048
[10,] -0.80242569 -0.62643688 -0.28815345  0.08187104 -1.4626023
-0.55649417 -0.1525712 -1.3456660 0.6125505 -1.13498203

With this matrix, if I put together a combination of pairs which uses
each number only once, the sum of the corresponding numbers is 0.

For example, compare the SD between this set of 5 pairs> y[10,1] + y[9,2] + y[8,3] + y[7,4] + y[6,5]
[1] 0> sum(c(y[10,1], y[9,2], y[8,3], y[7,4], y[6,5]))[1] 5.551115e-17	# basically 0, I assume this is round-off
error> mean(c(y[10,1], y[9,2], y[8,3], y[7,4], y[6,5]))[1] 1.111307e-17	# basically 0, I assume this is round-off
error> sd(c(y[10,1], y[9,2], y[8,3], y[7,4], y[6,5]))[1] 1.007960

versus this hand-selected, possibly lowest SD combination of
pairs> sum(c(y[3,1], y[6,2], y[10,4], y[9,5], y[8,7]))[1] -1.665335e-16	# basically 0, I assume this is round-off
error> mean(c(y[3,1], y[6,2], y[10,4], y[9,5], y[8,7]))[1] -3.330669e-17	# basically 0, I assume this is round-off
error> sd(c(y[3,1], y[6,2], y[10,4], y[9,5], y[8,7]))[1] 0.2367030

I believe that if I could test all the various five pair combinations,
the combination with the lowest SD of values from the table would give
me my answer.  I believe I have 3 questions regarding my problem.

1) How can I find all the 5 pair combinations of my 10 numbers so that
I can perform a brute force test of each set of combinations?  I
believe there are 45 different pairs (i.e. choose(10,2)). I found
combinations from the {Combinations} package but I can't figure out
how to get it to provide pairs.

2) Will my brute force strategy of testing the SD of each of these 5
pair combinations actually give me the answer I'm searching for?

3) Is there a better way of doing this?  Probably something to do with
real linear programming, rather than this method I've concocted.

Thanks for any help you can provide regarding my question.

Best regards,

James

Hans W Borchers

2011-Feb-26 11:43 UTC

head link

[R] Finding pairs with least magnitude difference from mean

> I have what I think is some kind of linear programming question.
> Basically, what I want to figure out is if I have a vector of numbers,
> 
> > x <- rnorm(10)
> > x
>  [1] -0.44305959 -0.26707077  0.07121266  0.44123714 -1.10323616
> -0.19712807  0.20679494 -0.98629992  0.97191659 -0.77561593
> 
> > mean(x)
> [1] -0.2081249
> 
> Using each number only once, I want to find the set of five pairs
> where the magnitude of the differences between the mean(x) and each
> pairs sum is least.
> 
> > y <- outer(x, x, "+") - (2 * mean(x))
> 
> With this matrix, if I put together a combination of pairs which uses
> each number only once, the sum of the corresponding numbers is 0.
> 
> For example, compare the SD between this set of 5 pairs
> > sd(c(y[10,1], y[9,2], y[8,3], y[7,4], y[6,5]))
> [1] 1.007960
> 
> versus this hand-selected, possibly lowest SD combination of pairs
> > sd(c(y[3,1], y[6,2], y[10,4], y[9,5], y[8,7]))
> [1] 0.2367030
Your selection is not bad, as only about 0.4% of all possible distinct
combinations have a smaller value -- the minimum is 0.1770076, for example
[10 7 9 5 8 4 6 2 3 1].

(1) combinat() from the 'combinations' package seems slow, try instead
the
    permutations() function from 'e1071'. 

(2) Yes, except your vector is getting much larger in which case brute force
    is no longer feasible.

(3) This is not a linear programming, but a combinatorial optimization task.
    You could try optim() with the SANN method, or some mixed-integer linear
    program (e.g., lpSolve, Rglpk, Rsymphony) by intelligently using binary
    variables to define the sets.

This does not mean that some specialized approach might not be more
appropriate.

--Hans Werner
> I believe that if I could test all the various five pair combinations,
> the combination with the lowest SD of values from the table would give
> me my answer.  I believe I have 3 questions regarding my problem.
> 
> 1) How can I find all the 5 pair combinations of my 10 numbers so that
> I can perform a brute force test of each set of combinations?  I
> believe there are 45 different pairs (i.e. choose(10,2)). I found
> combinations from the {Combinations} package but I can't figure out
> how to get it to provide pairs.
> 
> 2) Will my brute force strategy of testing the SD of each of these 5
> pair combinations actually give me the answer I'm searching for?
> 
> 3) Is there a better way of doing this?  Probably something to do with
> real linear programming, rather than this method I've concocted.
> 
> Thanks for any help you can provide regarding my question.
> 
> Best regards,
> 
> James
>

Seemingly Similar Threads

Search for more maybe matching threads

R help - Feb 2011 - Finding pairs with least magnitude difference from mean

[R] Finding pairs with least magnitude difference from mean

[R] Finding pairs with least magnitude difference from mean

Seemingly Similar Threads