Björn Pettersson A via llvm-dev
2017-Jul-13 17:25 UTC
[llvm-dev] RFC: Harvard architectures and default address spaces
> -----Original Message----- > From: Hal Finkel [mailto:hfinkel at anl.gov] > Sent: den 13 juli 2017 16:01 > To: Björn Pettersson A <bjorn.a.pettersson at ericsson.com>; David Chisnall > <David.Chisnall at cl.cam.ac.uk>; Dylan McKay <me at dylanmckay.io> > Cc: llvm-dev at lists.llvm.org; Carl Peto <carl.peto at me.com> > Subject: Re: [llvm-dev] RFC: Harvard architectures and default address > spaces > > On 07/13/2017 05:38 AM, Björn Pettersson A via llvm-dev wrote: > > My experience of having the address space for functions (or function > pointers) in the DataLayout i that when the .ll file is parsed we need to parse > the DataLayout before any function declarations. That is needed because we > want to attribute the functions with correct address space (according to > DataLayout) when inserting them in the symbol table. An alternative would > be to update address space info for functions after having parsed the > DataLayout. > > > > Is the DataLayout normally used when parsing the .ll file (or .bc)? Or would > this be the first case of doing that? > > > > Is it guaranteed that DataLayout is specified/parsed before function > declaration, or that the DataLayout specification is context sensitive and only > is valid for the following declarations? > > The DataLayout is a required part of the .ll/.bc file. In the .ll file > (*), it's the part of the module that looks like this: > > target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128" > > it is global to the entire module and always available.My point was that the DataLayout isn't available inside the LLParser and the BitcodeReader, up until the point when it has been parsed. So I would not say "always available". Both LLParser and BitcodeReader is for example using getAddressSpace(), both directly and maybe also indirectly through different interfaces (perhaps not on function pointers though). My concern is that maybe the incorrect address space will be used while parsing, and then it might be hard to find all places to fixup at a later stage. when having parsed the DataLayout and finding out what the default address space really is. Or if there is some undocumented(?) rule that the DataLayout always comes before function declarations in the ll/bc file, then all functions can get the default address space attribute directly (as indicated by the DataLayout) when being parsed. I think the whole idea from Dylan was to do this as a fixup after LLParser/BitcodeReader. I.e not trying to lookup a functions address space already when parsing the function declaration. So then the rule would be - do not use Pointer<Function>::getAddressSpace(), or PointerType::get() etc. during ll/bc parsing because it might give the wrong result. Maybe it is possible to assert on that? We could of course give functions some kind of undefined address space value when parsing ll/bc and adding functions to the symbol table. That might help us catch situations when someone tries to fetch the address space for a function pointer before the set-default-address-space-as-given-by-datalayout-on-all-functions pass has executed.> > (*) It is true that you can write tests without specifying one of these, > but in such cases, you just get the builtin default. For all real cases > you'll need to have a target-appropriate DataLayout string. > > > > > What if there are several address spaces for functions? Or is that a silly > thing that no one ever will use? Having the address space specified in the > DataLayout would be insufficient, since we would need to attribute the > functions separately, right? > > > > I do not say that having the info in the DataLayout is a totally bad idea (since > our out-of-tree target is using that trick), but I think it might impose some > problems as well. And perhaps it isn't the most general solution. > > If different functions might be in different address spaces, you'll need > some other mechanism to set the address space (as a single default won't > suffice). You might use source-level function attributes, for example. > > -Hal > > > > > /Björn > > > >> -----Original Message----- > >> From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of > David > >> Chisnall via llvm-dev > >> Sent: den 12 juli 2017 17:26 > >> To: Dylan McKay <me at dylanmckay.io> > >> Cc: llvm-dev <llvm-dev at lists.llvm.org>; Carl Peto <carl.peto at me.com> > >> Subject: Re: [llvm-dev] RFC: Harvard architectures and default address > >> spaces > >> > >> On 11 Jul 2017, at 23:18, Dylan McKay via llvm-dev <llvm- > dev at lists.llvm.org> > >> wrote: > >>>> Add this information to DataLayout and to use that information in > >> relevant places. > >>> This sounds like a much better/cleaner idea, thanks! > >> I’d suggest taking a look at the alloca address space changes, which were > >> recently added based on a cleaned-up version of our code. We have a > similar > >> issue (function and data pointers have the same representation for us, > but > >> casting requires different handling[1]) and have considered adding > address > >> spaces to functions. > >> > >> David > >> > >> [1] Probably not relevant for this discussion, but if anyone cares: in our > world > >> we have 128-bit fat pointers contain base, bounds and permissions, and > that > >> 64-bit pointers that are implicitly relative to one of two special fat pointer > >> registers, one for code and one for data. We must therefore handle 64- > bit to > >> 128-bit pointer casts differently depending on whether we’re casting > code or > >> data pointers. We currently do this with some fairly ugly hacks, but being > >> able to put all functions in a different AS would make this much easier for > us. > >> _______________________________________________ > >> LLVM Developers mailing list > >> llvm-dev at lists.llvm.org > >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > _______________________________________________ > > LLVM Developers mailing list > > llvm-dev at lists.llvm.org > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > -- > Hal Finkel > Lead, Compiler Technology and Programming Languages > Leadership Computing Facility > Argonne National Laboratory
Hal Finkel via llvm-dev
2017-Jul-13 17:33 UTC
[llvm-dev] RFC: Harvard architectures and default address spaces
On 07/13/2017 12:25 PM, Björn Pettersson A wrote:> >> -----Original Message----- >> From: Hal Finkel [mailto:hfinkel at anl.gov] >> Sent: den 13 juli 2017 16:01 >> To: Björn Pettersson A <bjorn.a.pettersson at ericsson.com>; David Chisnall >> <David.Chisnall at cl.cam.ac.uk>; Dylan McKay <me at dylanmckay.io> >> Cc: llvm-dev at lists.llvm.org; Carl Peto <carl.peto at me.com> >> Subject: Re: [llvm-dev] RFC: Harvard architectures and default address >> spaces >> >> On 07/13/2017 05:38 AM, Björn Pettersson A via llvm-dev wrote: >>> My experience of having the address space for functions (or function >> pointers) in the DataLayout i that when the .ll file is parsed we need to parse >> the DataLayout before any function declarations. That is needed because we >> want to attribute the functions with correct address space (according to >> DataLayout) when inserting them in the symbol table. An alternative would >> be to update address space info for functions after having parsed the >> DataLayout. >>> Is the DataLayout normally used when parsing the .ll file (or .bc)? Or would >> this be the first case of doing that? >>> Is it guaranteed that DataLayout is specified/parsed before function >> declaration, or that the DataLayout specification is context sensitive and only >> is valid for the following declarations? >> >> The DataLayout is a required part of the .ll/.bc file. In the .ll file >> (*), it's the part of the module that looks like this: >> >> target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128" >> >> it is global to the entire module and always available. > My point was that the DataLayout isn't available inside the LLParser and the BitcodeReader, up until the point when it has been parsed. So I would not say "always available". > > Both LLParser and BitcodeReader is for example using getAddressSpace(), both directly and maybe also indirectly through different interfaces (perhaps not on function pointers though). My concern is that maybe the incorrect address space will be used while parsing, and then it might be hard to find all places to fixup at a later stage. when having parsed the DataLayout and finding out what the default address space really is. > Or if there is some undocumented(?) rule that the DataLayout always comes before function declarations in the ll/bc file, then all functions can get the default address space attribute directly (as indicated by the DataLayout) when being parsed. > > I think the whole idea from Dylan was to do this as a fixup after LLParser/BitcodeReader. I.e not trying to lookup a functions address space already when parsing the function declaration. So then the rule would be - do not use Pointer<Function>::getAddressSpace(), or PointerType::get() etc. during ll/bc parsing because it might give the wrong result. Maybe it is possible to assert on that? > > We could of course give functions some kind of undefined address space value when parsing ll/bc and adding functions to the symbol table. That might help us catch situations when someone tries to fetch the address space for a function pointer before the set-default-address-space-as-given-by-datalayout-on-all-functions pass has executed.I understand your point. We need to extend the syntax to enabling tagging functions with addressspaces directly (i.e. as a function attribute). DataLayout can't, as you noted, and shouldn't, be used to adjust defaults during parsing. The point of the defaults in DataLayout is to provide some way to supply correct defaults should optimizations need them. -Hal> >> (*) It is true that you can write tests without specifying one of these, >> but in such cases, you just get the builtin default. For all real cases >> you'll need to have a target-appropriate DataLayout string. >> >>> What if there are several address spaces for functions? Or is that a silly >> thing that no one ever will use? Having the address space specified in the >> DataLayout would be insufficient, since we would need to attribute the >> functions separately, right? >>> I do not say that having the info in the DataLayout is a totally bad idea (since >> our out-of-tree target is using that trick), but I think it might impose some >> problems as well. And perhaps it isn't the most general solution. >> >> If different functions might be in different address spaces, you'll need >> some other mechanism to set the address space (as a single default won't >> suffice). You might use source-level function attributes, for example. >> >> -Hal >> >>> /Björn >>> >>>> -----Original Message----- >>>> From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of >> David >>>> Chisnall via llvm-dev >>>> Sent: den 12 juli 2017 17:26 >>>> To: Dylan McKay <me at dylanmckay.io> >>>> Cc: llvm-dev <llvm-dev at lists.llvm.org>; Carl Peto <carl.peto at me.com> >>>> Subject: Re: [llvm-dev] RFC: Harvard architectures and default address >>>> spaces >>>> >>>> On 11 Jul 2017, at 23:18, Dylan McKay via llvm-dev <llvm- >> dev at lists.llvm.org> >>>> wrote: >>>>>> Add this information to DataLayout and to use that information in >>>> relevant places. >>>>> This sounds like a much better/cleaner idea, thanks! >>>> I’d suggest taking a look at the alloca address space changes, which were >>>> recently added based on a cleaned-up version of our code. We have a >> similar >>>> issue (function and data pointers have the same representation for us, >> but >>>> casting requires different handling[1]) and have considered adding >> address >>>> spaces to functions. >>>> >>>> David >>>> >>>> [1] Probably not relevant for this discussion, but if anyone cares: in our >> world >>>> we have 128-bit fat pointers contain base, bounds and permissions, and >> that >>>> 64-bit pointers that are implicitly relative to one of two special fat pointer >>>> registers, one for code and one for data. We must therefore handle 64- >> bit to >>>> 128-bit pointer casts differently depending on whether we’re casting >> code or >>>> data pointers. We currently do this with some fairly ugly hacks, but being >>>> able to put all functions in a different AS would make this much easier for >> us. >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> llvm-dev at lists.llvm.org >>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>> _______________________________________________ >>> LLVM Developers mailing list >>> llvm-dev at lists.llvm.org >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> -- >> Hal Finkel >> Lead, Compiler Technology and Programming Languages >> Leadership Computing Facility >> Argonne National Laboratory-- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory
Dylan McKay via llvm-dev
2017-Jul-14 05:43 UTC
[llvm-dev] RFC: Harvard architectures and default address spaces
> My point was that the DataLayout isn't available inside the LLParser andthe BitcodeReader, up until the point when it has been parsed. So I would not say "always available". I have noticed this. Normally, the frontend compilation adds functions, and them emits some form of code. There are no issues with normal .o or .s compilation, but if we emit IR, we lose the address space information because it is not actually a part of the textual IR form. The standard LLParser process has no data layout available until the end of the process (because we create an empty temporary module for parsing). Because of both of these problems, the best way to implement this is modify the textual IR form to support address space attributes on function declarations.> I think the whole idea from Dylan was to do this as a fixup afterLLParser/BitcodeReader Originally, that was the only workable solution I could come up with, but I think the solution Hal suggested is better. As I understand it: * New functions created by the frontend will default to the address space specified in the DataLayout * When these functions get lowered to LL, we emit the address space on the function like 'addrspace 1' if it is not the default address space 0 * When parsing, all functions will either have no addrspace attribute and default to '0', or take the addrspace specified This will let frontends specify explicit address spaces manually (which would support Mikael and others usecases), and also keep track of the address space throughout the pipeline so that we don't make any incorrect assumptions of the space. On Fri, Jul 14, 2017 at 5:33 AM, Hal Finkel <hfinkel at anl.gov> wrote:> > On 07/13/2017 12:25 PM, Björn Pettersson A wrote: > >> >> -----Original Message----- >>> From: Hal Finkel [mailto:hfinkel at anl.gov] >>> Sent: den 13 juli 2017 16:01 >>> To: Björn Pettersson A <bjorn.a.pettersson at ericsson.com>; David Chisnall >>> <David.Chisnall at cl.cam.ac.uk>; Dylan McKay <me at dylanmckay.io> >>> Cc: llvm-dev at lists.llvm.org; Carl Peto <carl.peto at me.com> >>> Subject: Re: [llvm-dev] RFC: Harvard architectures and default address >>> spaces >>> >>> On 07/13/2017 05:38 AM, Björn Pettersson A via llvm-dev wrote: >>> >>>> My experience of having the address space for functions (or function >>>> >>> pointers) in the DataLayout i that when the .ll file is parsed we need >>> to parse >>> the DataLayout before any function declarations. That is needed because >>> we >>> want to attribute the functions with correct address space (according to >>> DataLayout) when inserting them in the symbol table. An alternative would >>> be to update address space info for functions after having parsed the >>> DataLayout. >>> >>>> Is the DataLayout normally used when parsing the .ll file (or .bc)? Or >>>> would >>>> >>> this be the first case of doing that? >>> >>>> Is it guaranteed that DataLayout is specified/parsed before function >>>> >>> declaration, or that the DataLayout specification is context sensitive >>> and only >>> is valid for the following declarations? >>> >>> The DataLayout is a required part of the .ll/.bc file. In the .ll file >>> (*), it's the part of the module that looks like this: >>> >>> target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128" >>> >>> it is global to the entire module and always available. >>> >> My point was that the DataLayout isn't available inside the LLParser and >> the BitcodeReader, up until the point when it has been parsed. So I would >> not say "always available". >> >> Both LLParser and BitcodeReader is for example using getAddressSpace(), >> both directly and maybe also indirectly through different interfaces >> (perhaps not on function pointers though). My concern is that maybe the >> incorrect address space will be used while parsing, and then it might be >> hard to find all places to fixup at a later stage. when having parsed the >> DataLayout and finding out what the default address space really is. >> Or if there is some undocumented(?) rule that the DataLayout always comes >> before function declarations in the ll/bc file, then all functions can get >> the default address space attribute directly (as indicated by the >> DataLayout) when being parsed. >> >> I think the whole idea from Dylan was to do this as a fixup after >> LLParser/BitcodeReader. I.e not trying to lookup a functions address space >> already when parsing the function declaration. So then the rule would be - >> do not use Pointer<Function>::getAddressSpace(), or PointerType::get() >> etc. during ll/bc parsing because it might give the wrong result. Maybe it >> is possible to assert on that? >> >> We could of course give functions some kind of undefined address space >> value when parsing ll/bc and adding functions to the symbol table. That >> might help us catch situations when someone tries to fetch the address >> space for a function pointer before the set-default-address-space-as-g >> iven-by-datalayout-on-all-functions pass has executed. >> > > I understand your point. We need to extend the syntax to enabling tagging > functions with addressspaces directly (i.e. as a function attribute). > DataLayout can't, as you noted, and shouldn't, be used to adjust defaults > during parsing. The point of the defaults in DataLayout is to provide some > way to supply correct defaults should optimizations need them. > > -Hal > > > >> (*) It is true that you can write tests without specifying one of these, >>> but in such cases, you just get the builtin default. For all real cases >>> you'll need to have a target-appropriate DataLayout string. >>> >>> What if there are several address spaces for functions? Or is that a >>>> silly >>>> >>> thing that no one ever will use? Having the address space specified in >>> the >>> DataLayout would be insufficient, since we would need to attribute the >>> functions separately, right? >>> >>>> I do not say that having the info in the DataLayout is a totally bad >>>> idea (since >>>> >>> our out-of-tree target is using that trick), but I think it might impose >>> some >>> problems as well. And perhaps it isn't the most general solution. >>> >>> If different functions might be in different address spaces, you'll need >>> some other mechanism to set the address space (as a single default won't >>> suffice). You might use source-level function attributes, for example. >>> >>> -Hal >>> >>> /Björn >>>> >>>> -----Original Message----- >>>>> From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of >>>>> >>>> David >>> >>>> Chisnall via llvm-dev >>>>> Sent: den 12 juli 2017 17:26 >>>>> To: Dylan McKay <me at dylanmckay.io> >>>>> Cc: llvm-dev <llvm-dev at lists.llvm.org>; Carl Peto <carl.peto at me.com> >>>>> Subject: Re: [llvm-dev] RFC: Harvard architectures and default address >>>>> spaces >>>>> >>>>> On 11 Jul 2017, at 23:18, Dylan McKay via llvm-dev <llvm- >>>>> >>>> dev at lists.llvm.org> >>> >>>> wrote: >>>>> >>>>>> Add this information to DataLayout and to use that information in >>>>>>> >>>>>> relevant places. >>>>> >>>>>> This sounds like a much better/cleaner idea, thanks! >>>>>> >>>>> I’d suggest taking a look at the alloca address space changes, which >>>>> were >>>>> recently added based on a cleaned-up version of our code. We have a >>>>> >>>> similar >>> >>>> issue (function and data pointers have the same representation for us, >>>>> >>>> but >>> >>>> casting requires different handling[1]) and have considered adding >>>>> >>>> address >>> >>>> spaces to functions. >>>>> >>>>> David >>>>> >>>>> [1] Probably not relevant for this discussion, but if anyone cares: in >>>>> our >>>>> >>>> world >>> >>>> we have 128-bit fat pointers contain base, bounds and permissions, and >>>>> >>>> that >>> >>>> 64-bit pointers that are implicitly relative to one of two special fat >>>>> pointer >>>>> registers, one for code and one for data. We must therefore handle 64- >>>>> >>>> bit to >>> >>>> 128-bit pointer casts differently depending on whether we’re casting >>>>> >>>> code or >>> >>>> data pointers. We currently do this with some fairly ugly hacks, but >>>>> being >>>>> able to put all functions in a different AS would make this much >>>>> easier for >>>>> >>>> us. >>> >>>> _______________________________________________ >>>>> LLVM Developers mailing list >>>>> llvm-dev at lists.llvm.org >>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>>> >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> llvm-dev at lists.llvm.org >>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>> >>> -- >>> Hal Finkel >>> Lead, Compiler Technology and Programming Languages >>> Leadership Computing Facility >>> Argonne National Laboratory >>> >> > -- > Hal Finkel > Lead, Compiler Technology and Programming Languages > Leadership Computing Facility > Argonne National Laboratory > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170714/3b407f6e/attachment.html>