thr3ads.net - llvm dev - [LLVMdev] How to deal with potentially unlimited count/length symbol names? [Jun 2013]

If this information is useful, please help other people find it:
Share via:

edA-qa mort-ora-y

2013-Jun-19 08:04 UTC

[LLVMdev] How to deal with potentially unlimited count/length symbol names?

In my language I have anonymous types (essentially tuples), and I have
generated functions (like constructors) which are unique for these
types. If the same type occurs in multiple modules however it should end
up with only one definition being linked. Thus I need a way to give them
the same name.

The problem is that if I derive the name from what the type contains the
length of that name is essential unbound. So how does one generate
names?  I'm thinking of just using a long hash and hoping I don't get
accidental collisions. Surely there must be a better way?


Currently, since I'm only dealing with one module, it is very easy to
just assign unique numbers. But obviously this doesn't work with
multiple independent modules since they'd all need the same name. It
will ultimately have to work across libraries as well, so I can't just
create a registry of the type->id.

-- 
edA-qa mort-ora-y
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
Sign: Please digitally sign your emails.
Encrypt: I'm also happy to receive encrypted mail.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 261 bytes
Desc: OpenPGP digital signature
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130619/251cf3e0/attachment.sig>

Eli Friedman

2013-Jun-19 17:23 UTC

head link

[LLVMdev] How to deal with potentially unlimited count/length symbol names?

On Wed, Jun 19, 2013 at 1:04 AM, edA-qa mort-ora-y <eda-qa at
disemia.com>wrote:
> In my language I have anonymous types (essentially tuples), and I have
> generated functions (like constructors) which are unique for these
> types. If the same type occurs in multiple modules however it should end
> up with only one definition being linked. Thus I need a way to give them
> the same name.
>
> The problem is that if I derive the name from what the type contains the
> length of that name is essential unbound. So how does one generate
> names?  I'm thinking of just using a long hash and hoping I don't
get
> accidental collisions. Surely there must be a better way?
>
>
> Currently, since I'm only dealing with one module, it is very easy to
> just assign unique numbers. But obviously this doesn't work with
> multiple independent modules since they'd all need the same name. It
> will ultimately have to work across libraries as well, so I can't just
> create a registry of the type->id.
>
>I think you've covered all the possible implementations.

In terms of just generating long names, LLVM and common platforms can
handle long names reasonably well because C++ often uses such names.  Also,
the Itanium C++ ABI has a scheme to compress repeated uses of the same type
which might be of interest; see
http://mentorembedded.github.io/cxx-abi/abi.html#mangling-compression .

In terms of a registry, you might want to consider whether these helpers
actually need to be exposed across libraries.

-Eli
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130619/19f1f06a/attachment.html>

edA-qa mort-ora-y

2013-Jun-19 17:35 UTC

head link

[LLVMdev] How to deal with potentially unlimited count/length symbol names?

On 19/06/13 19:23, Eli Friedman wrote:> I think you've covered all the possible implementations.
> 
> In terms of just generating long names, LLVM and common platforms can
> handle long names reasonably well because C++ often uses such names.
>  Also, the Itanium C++ ABI has a scheme to compress repeated uses of the
> same type which might be of interest; see
> http://mentorembedded.github.io/cxx-abi/abi.html#mangling-compression .
> 
> In terms of a registry, you might want to consider whether these helpers
> actually need to be exposed across libraries.
Annoyingly, the larger the type the more important it is to share -- for
small types everything will just be inlined so it doesn't matter.

Any idea on what the limit of a name can be? I'll try a compression like
system as well, but I will likely have to truncate at some point (where
I can add a hash).

-- 
edA-qa mort-ora-y
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
Sign: Please digitally sign your emails.
Encrypt: I'm also happy to receive encrypted mail.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 261 bytes
Desc: OpenPGP digital signature
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130619/b7222e56/attachment.sig>

Sean Silva

2013-Jun-19 18:45 UTC

head link

[LLVMdev] How to deal with potentially unlimited count/length symbol names?

On Wed, Jun 19, 2013 at 1:04 AM, edA-qa mort-ora-y <eda-qa at
disemia.com>wrote:
>
> The problem is that if I derive the name from what the type contains the
> length of that name is essential unbound. So how does one generate
> names?  I'm thinking of just using a long hash and hoping I don't
get
> accidental collisions. Surely there must be a better way?
>
Just a cryptographic hash (e.g. SHA1) to avoid the need to "hope" that
there are no collisions.

-- Sean Silva
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130619/ecee5cf2/attachment.html>

Robinson, Paul

2013-Jun-19 22:39 UTC

head link

[LLVMdev] How to deal with potentially unlimited count/length symbol names?

> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at
cs.uiuc.edu] On Behalf Of Sean Silva
> Sent: Wednesday, June 19, 2013 11:45 AM
> To: edA-qa mort-ora-y
> Cc: <llvmdev at cs.uiuc.edu>
> Subject: Re: [LLVMdev] How to deal with potentially unlimited count/length
symbol names?
>
> On Wed, Jun 19, 2013 at 1:04 AM, edA-qa mort-ora-y <eda-qa at
disemia.com> wrote:
>
> > The problem is that if I derive the name from what the type contains
the
> > length of that name is essential unbound. So how does one generate
> > names?  I'm thinking of just using a long hash and hoping I
don't get
> > accidental collisions. Surely there must be a better way?
>
> Just a cryptographic hash (e.g. SHA1) to avoid the need to "hope"
that there are no collisions.
>
> -- Sean Silva 
Cryptographic hashes don't guarantee you get no accidental collisions;
their goal is to make it super hard to produce a collision _on purpose_.
What you need is an algorithm designed for string inputs, with good
uniformity, and an adequate output size; there are many.

Accidental collisions are essentially the Birthday Problem:
http://en.wikipedia.org/wiki/Birthday_problem
See particularly the end of the "Square approximation" section, which
specifically discusses the application to hashes, relating probability
of collision to bit-width and number of inputs.
For example, I worked out that with a 128-bit hash you need around
2^50 inputs before you get a collision probability of 1-in-a-billion.
(For comparison, a 64-bit hash gets you about 185,000 inputs for the
same collision probability.)

If you want a "name-brand" algorithm, I'd suggest MD5 over SHA-1.
It produces 128-bit output (versus 160-bit) and the fact that it is
"cryptographically broken" is irrelevant to your use-case.

--paulr

Apparently Analagous Threads

Search for more seemingly similar threads

llvm dev - Jun 2013 - [LLVMdev] How to deal with potentially unlimited count/length symbol names?

[LLVMdev] How to deal with potentially unlimited count/length symbol names?

[LLVMdev] How to deal with potentially unlimited count/length symbol names?

[LLVMdev] How to deal with potentially unlimited count/length symbol names?

[LLVMdev] How to deal with potentially unlimited count/length symbol names?

[LLVMdev] How to deal with potentially unlimited count/length symbol names?

Apparently Analagous Threads