On 10/16/06, Bj?rn Egert <begert at ipb-halle.de>
wrote:> On 10/8/06, Egert, Bjoern <begert at ipb-halle.de> wrote:
> >> Hello,
> >>
> >> Is there a way in R to construct an (error correcting) binary
code
> >> e.g. for an source alphabet containing integers from 1 to say
255
> >> with the property that each pair of distinct codewords of length
m
> >> is at Hamming distance exactly m/2 ?
> >>
> >> I was suggested to use so called simplex codes, which should be
> >> fairly standard, but I haven't found a direct way via R
packages
> >> to do so, that's why I ask whether there might be in
indirect way
> >> to solve this problem.
> >>
> >> Example:
> >> v1 =c(1,2,3,4)
> >> v2 =c(1,2,5,6)
> >> similarity(v1,v2)=0.5, (because 2 out of 4 elements are equal).
> >> Obviously, a binary representation of would yield a different
> >> similarity of:
> >> binary(v1) =001 010 011 100
> >> binary(v1) =001 010 101 110
> >> similarity(binary(v1),binary(v2))= 9/12
> >>
> >> Remark: The focus here is not on error correction, but rather
the
> >> binary encoding retaining similarity of the elements of vectors.
> >>
> >> Many thanks,
> >> Bjoern
> >
> > Bjoern,
> >
> > NB: I'm an R newbie and I only know a bit about error correcting
codes.
> >
> > I haven't seen any responses to your questions and I don't
know if you
> > still
> > have a need, but it is certainly possible to construct forward error
> > correction
> > codes with all the great math capability in R.
> >
> > It seems you want to generate code words that still have the original
> > bits
> > present. These are systematic codes and there are lots of them
available
> > to use. Many codes are specified by the code word length (n), number
> > of original data
> > bits in each code word (k), and the minimum Hamming distance of the
> > code words (d)
> > as a [n,k,d] code. Simplex Codes have these parameters: [2^k - 1, k,
> > 2^(k - 1)]. These
> > codes could be generated as a simple matrix multiply in R, but are you
> > sure that's what
> > you want? The code words will be quite long.
> >
> > Regards,
> > Richard Graham
>
>
> Hello,
>
> thank you.
>
> yes, basically, that's what I want.
> Just a binary encoding of an arbitrary integer value (or vector of
> integers)
> with the property that each pair of distinct integer values have an
> equal Hamming-
> distance (m/2), so as to be able to a similarity search
>
> I got the idea from: Gionis: "Efficient and Tunable Similar Set
> Retrieval" (Chap 3.2)
>
> regards
> Bjoern
>
>
Bjoern,
I read only the section of the paper you mention and I'll trust that
the stated properties of Simplex Codes are true. I haven't researched
or verified it.
[from http://magma.maths.usyd.edu.au/]
"Magma is a large, well-supported software package designed to solve
computationally hard problems in algebra, number theory, geometry and
combinatorics. It provides a mathematically rigorous environment for
computing with algebraic, number-theoretic, combinatoric and geometric
objects."
I don't understand a fraction of its capability but I still find it to
be very useful. In fact, they have an online calculator that will
give you the generator matrix you want. The online Magma calculator
is at:
http://magma.maths.usyd.edu.au/calc/
To calculate the generator matrix I think you are asking for, go to
the above URL and cut/paste the following command:
ExtendCode(SimplexCode(8));
Click "Evaluate" and the output window will contain a "[256, 8,
128]
Linear Code over GF(2)". You'll need to "massage" this a bit
to use
it as a matrix for R. I'd use Ruby to do this, but anything will do.
If you want to encode more/less than 8 bits, you can modify the above
argument to SimplexCode.
I used "ExtendCode" so that the codeword length == Dmin * 2
The Gionis claim I'll research or verify sometime is that _every_ pair
of Simplex Code words of length m have Hamming distance == m/2. If
you have a reference to a proof, I'd like to read it (like I said, I
only know a bit about ECC).
Good Luck with your work!
Richard Graham