edA-qa mort-ora-y
2013-Jun-19 08:04 UTC
[LLVMdev] How to deal with potentially unlimited count/length symbol names?
In my language I have anonymous types (essentially tuples), and I have generated functions (like constructors) which are unique for these types. If the same type occurs in multiple modules however it should end up with only one definition being linked. Thus I need a way to give them the same name. The problem is that if I derive the name from what the type contains the length of that name is essential unbound. So how does one generate names? I'm thinking of just using a long hash and hoping I don't get accidental collisions. Surely there must be a better way? Currently, since I'm only dealing with one module, it is very easy to just assign unique numbers. But obviously this doesn't work with multiple independent modules since they'd all need the same name. It will ultimately have to work across libraries as well, so I can't just create a registry of the type->id. -- edA-qa mort-ora-y -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- Sign: Please digitally sign your emails. Encrypt: I'm also happy to receive encrypted mail. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 261 bytes Desc: OpenPGP digital signature URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130619/251cf3e0/attachment.sig>
Eli Friedman
2013-Jun-19 17:23 UTC
[LLVMdev] How to deal with potentially unlimited count/length symbol names?
On Wed, Jun 19, 2013 at 1:04 AM, edA-qa mort-ora-y <eda-qa at disemia.com>wrote:> In my language I have anonymous types (essentially tuples), and I have > generated functions (like constructors) which are unique for these > types. If the same type occurs in multiple modules however it should end > up with only one definition being linked. Thus I need a way to give them > the same name. > > The problem is that if I derive the name from what the type contains the > length of that name is essential unbound. So how does one generate > names? I'm thinking of just using a long hash and hoping I don't get > accidental collisions. Surely there must be a better way? > > > Currently, since I'm only dealing with one module, it is very easy to > just assign unique numbers. But obviously this doesn't work with > multiple independent modules since they'd all need the same name. It > will ultimately have to work across libraries as well, so I can't just > create a registry of the type->id. > >I think you've covered all the possible implementations. In terms of just generating long names, LLVM and common platforms can handle long names reasonably well because C++ often uses such names. Also, the Itanium C++ ABI has a scheme to compress repeated uses of the same type which might be of interest; see http://mentorembedded.github.io/cxx-abi/abi.html#mangling-compression . In terms of a registry, you might want to consider whether these helpers actually need to be exposed across libraries. -Eli -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130619/19f1f06a/attachment.html>
edA-qa mort-ora-y
2013-Jun-19 17:35 UTC
[LLVMdev] How to deal with potentially unlimited count/length symbol names?
On 19/06/13 19:23, Eli Friedman wrote:> I think you've covered all the possible implementations. > > In terms of just generating long names, LLVM and common platforms can > handle long names reasonably well because C++ often uses such names. > Also, the Itanium C++ ABI has a scheme to compress repeated uses of the > same type which might be of interest; see > http://mentorembedded.github.io/cxx-abi/abi.html#mangling-compression . > > In terms of a registry, you might want to consider whether these helpers > actually need to be exposed across libraries.Annoyingly, the larger the type the more important it is to share -- for small types everything will just be inlined so it doesn't matter. Any idea on what the limit of a name can be? I'll try a compression like system as well, but I will likely have to truncate at some point (where I can add a hash). -- edA-qa mort-ora-y -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- Sign: Please digitally sign your emails. Encrypt: I'm also happy to receive encrypted mail. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 261 bytes Desc: OpenPGP digital signature URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130619/b7222e56/attachment.sig>
Sean Silva
2013-Jun-19 18:45 UTC
[LLVMdev] How to deal with potentially unlimited count/length symbol names?
On Wed, Jun 19, 2013 at 1:04 AM, edA-qa mort-ora-y <eda-qa at disemia.com>wrote:> > The problem is that if I derive the name from what the type contains the > length of that name is essential unbound. So how does one generate > names? I'm thinking of just using a long hash and hoping I don't get > accidental collisions. Surely there must be a better way? >Just a cryptographic hash (e.g. SHA1) to avoid the need to "hope" that there are no collisions. -- Sean Silva -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130619/ecee5cf2/attachment.html>
Robinson, Paul
2013-Jun-19 22:39 UTC
[LLVMdev] How to deal with potentially unlimited count/length symbol names?
> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Sean Silva > Sent: Wednesday, June 19, 2013 11:45 AM > To: edA-qa mort-ora-y > Cc: <llvmdev at cs.uiuc.edu> > Subject: Re: [LLVMdev] How to deal with potentially unlimited count/length symbol names? > > On Wed, Jun 19, 2013 at 1:04 AM, edA-qa mort-ora-y <eda-qa at disemia.com> wrote: > > > The problem is that if I derive the name from what the type contains the > > length of that name is essential unbound. So how does one generate > > names? I'm thinking of just using a long hash and hoping I don't get > > accidental collisions. Surely there must be a better way? > > Just a cryptographic hash (e.g. SHA1) to avoid the need to "hope" that there are no collisions. > > -- Sean SilvaCryptographic hashes don't guarantee you get no accidental collisions; their goal is to make it super hard to produce a collision _on purpose_. What you need is an algorithm designed for string inputs, with good uniformity, and an adequate output size; there are many. Accidental collisions are essentially the Birthday Problem: http://en.wikipedia.org/wiki/Birthday_problem See particularly the end of the "Square approximation" section, which specifically discusses the application to hashes, relating probability of collision to bit-width and number of inputs. For example, I worked out that with a 128-bit hash you need around 2^50 inputs before you get a collision probability of 1-in-a-billion. (For comparison, a 64-bit hash gets you about 185,000 inputs for the same collision probability.) If you want a "name-brand" algorithm, I'd suggest MD5 over SHA-1. It produces 128-bit output (versus 160-bit) and the fact that it is "cryptographically broken" is irrelevant to your use-case. --paulr
Possibly Parallel Threads
- [LLVMdev] How to deal with potentially unlimited count/length symbol names?
- [LLVMdev] How to deal with potentially unlimited count/length symbol names?
- [LLVMdev] How to deal with potentially unlimited count/length symbol names?
- [LLVMdev] How to deal with potentially unlimited count/length symbol names?
- [LLVMdev] How to deal with potentially unlimited count/length symbol names?