Hi Peter Thank you for your helpful comments, especially on the RPI. Since my use case is lot simpler than compiling all of Clang, I hopefully can take your experience as a good sign. The RTOS that TI provides for the AM335x actually has pretty complete posix layer and other standard libraries. However, I am working without any virtual memory subsystem, so no mmap. However, I was under the impression that LLVM (ORC specifically) should be able to relocate code at any memory location so the lack of mmap shouldn't be a problem? Kind regards Brian Orthogonal Devices Tokyo, Japan www.orthogonaldevices.com On 27/06/2019 18:41, Peter Smith wrote:> On Thu, 27 Jun 2019 at 09:50, Brian Clarkson via llvm-dev > <llvm-dev at lists.llvm.org> wrote: >> Hello! >> >> Q1 >> Are there any resources or examples on embedding LLVM into an ARM-based >> bare-metal application? Searching in this area only turns up >> information on how to use LLVM to target bare-metal when I want to >> compile LLVM for linking against a bare-metal application. >> > I'm not aware of any examples unfortunately. I suspect that this could > be quite challenging depending on how rich an environment your RTOS > offers. It is possible that LLVM depends on Posix or Posix like OS > calls for things like mmap and other file abstractions. I've not > looked at this in any detail as it may be possible to strip these out > with the right configuration options, for example thread support can > be disabled. One possible approach would be to build LLVM for a linux > target and look at the dependencies. That might give you an idea of > what your are up against. > >> Q2 >> Are there any memory usage benchmarks for LLVM across the common tasks >> (especially loading bytecode, doing the optimization passes and finally >> emitting machine code)? My target (embedded) system has only 1GB of RAM. >> > I don't have anything specific unfortunately. It is, or at least was > possible a couple of years ago, for Clang to compile Clang on a 1GB > Raspberry Pi. I'm assuming the plugins will be smaller than the IR > generated by the largest Clang C++ file, but my Rasberry PI wasn't > doing anything else but compiling Clang. > >> Background: >> I'm about to embark on an effort to integrate LLVM into my bare-metal >> application (for AM335x, Cortex-A8, also known as beaglebone black). >> The application area is sound synthesis and the reason for embedding >> LLVM is to allow users to develop their own "plugins" on the desktop >> (using a live coding approach) and then load them (as LLVM bytecode) on >> the embedded device. LLVM would be responsible for generating >> (optimized, and especially vectorized for NEON) machine code directly on >> the embedded device and it would take care of the relocation and >> run-time linking duties. This last task is very important because the >> RTOS (Texas Instrument's SYS/BIOS) that I'm using does not have any >> dynamic linking facilities. Sharing code in the form of LLVM bytecode >> also seems to sidestep the complex task of setting up a cross-compiling >> toolchain which is something that I would prefer not to have to force my >> users to do. In fact, my goal is to have a live coding environment >> provided as a desktop application (which might also embed Clang as well >> as LLVM) that allows the user to rapidly and playfully build their sound >> synthesis idea (in simple C/C++ at first, Faust later maybe) and then >> save the algorithm as bytecode to be copied over to the AM335x-based >> device. >> >> Thank you in advance for any help or pointers to resources that you can >> provide! >> >> Kind regards >> Brian >> >> -- >> Orthogonal Devices >> Tokyo, Japan >> www.orthogonaldevices.com >> >> >> --- >> This email has been checked for viruses by Avast antivirus software. >> https://www.avast.com/antivirus >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
On Thu, 27 Jun 2019 at 13:02, Brian Clarkson <clarkson at orthogonaldevices.com> wrote:> > Hi Peter > > Thank you for your helpful comments, especially on the RPI. Since my > use case is lot simpler than compiling all of Clang, I hopefully can > take your experience as a good sign. > > The RTOS that TI provides for the AM335x actually has pretty complete > posix layer and other standard libraries. However, I am working without > any virtual memory subsystem, so no mmap. However, I was under the > impression that LLVM (ORC specifically) should be able to relocate code > at any memory location so the lack of mmap shouldn't be a problem? >Apologies I don't know a lot about ORC, most of my knowledge is on the static linker side. I don't think mmap is a requirement, just that a lot of the code may have been written assuming it was present. Hopefully there are some other people on the list with more experience of JITs that can help. Thinking about the requirements in your earlier mail:> - load relocatable (but highly optimized) machine code > - relocate the machine code > - export symbols from the loaded machine code (available exports are not known at compile-time) > - import symbols into the loaded machine code (required imports are not known at compile-time) > - finally, actually execute functions exported from the loaded machine codeIt sounds like you would need some kind of dynamic loader to handle the symbol resolution and perform relocation. If the communication is just Kernel (for want of a better name for the main program) to Module, and not Module to Module then something like a PIE executable for each module with the symbols exported with export-dynamic. This would result in only a small number of relocation types that you would need to handle, with the majority being R_ARM_RELATIVE which is just the displacement from the static link address (usually 0), and R_ARM_ABS32 for those requiring the address of a symbol. The major restriction of PIE is that there is a fixed offset between code and data. As I understand it the linux kernel uses something like ld -r for a relocatable link, which is essentially combines many relocatable objects into a single one and loads that. That means that a lot of awkward to handle relocations, especially in Thumb could be exposed. Apologies I couldn't easily find many examples in open source projects or guides on how to write a dynamic linker. I have had some experience with ARM's proprietary linker which had several dynamic linking models for more bare-metal systems (http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dai0242a/index.html), however I'm guessing you would prefer to stick to open source components. Peter> Kind regards > Brian > > Orthogonal Devices > Tokyo, Japan > www.orthogonaldevices.com > > On 27/06/2019 18:41, Peter Smith wrote: > > On Thu, 27 Jun 2019 at 09:50, Brian Clarkson via llvm-dev > > <llvm-dev at lists.llvm.org> wrote: > >> Hello! > >> > >> Q1 > >> Are there any resources or examples on embedding LLVM into an ARM-based > >> bare-metal application? Searching in this area only turns up > >> information on how to use LLVM to target bare-metal when I want to > >> compile LLVM for linking against a bare-metal application. > >> > > I'm not aware of any examples unfortunately. I suspect that this could > > be quite challenging depending on how rich an environment your RTOS > > offers. It is possible that LLVM depends on Posix or Posix like OS > > calls for things like mmap and other file abstractions. I've not > > looked at this in any detail as it may be possible to strip these out > > with the right configuration options, for example thread support can > > be disabled. One possible approach would be to build LLVM for a linux > > target and look at the dependencies. That might give you an idea of > > what your are up against. > > > >> Q2 > >> Are there any memory usage benchmarks for LLVM across the common tasks > >> (especially loading bytecode, doing the optimization passes and finally > >> emitting machine code)? My target (embedded) system has only 1GB of RAM. > >> > > I don't have anything specific unfortunately. It is, or at least was > > possible a couple of years ago, for Clang to compile Clang on a 1GB > > Raspberry Pi. I'm assuming the plugins will be smaller than the IR > > generated by the largest Clang C++ file, but my Rasberry PI wasn't > > doing anything else but compiling Clang. > > > >> Background: > >> I'm about to embark on an effort to integrate LLVM into my bare-metal > >> application (for AM335x, Cortex-A8, also known as beaglebone black). > >> The application area is sound synthesis and the reason for embedding > >> LLVM is to allow users to develop their own "plugins" on the desktop > >> (using a live coding approach) and then load them (as LLVM bytecode) on > >> the embedded device. LLVM would be responsible for generating > >> (optimized, and especially vectorized for NEON) machine code directly on > >> the embedded device and it would take care of the relocation and > >> run-time linking duties. This last task is very important because the > >> RTOS (Texas Instrument's SYS/BIOS) that I'm using does not have any > >> dynamic linking facilities. Sharing code in the form of LLVM bytecode > >> also seems to sidestep the complex task of setting up a cross-compiling > >> toolchain which is something that I would prefer not to have to force my > >> users to do. In fact, my goal is to have a live coding environment > >> provided as a desktop application (which might also embed Clang as well > >> as LLVM) that allows the user to rapidly and playfully build their sound > >> synthesis idea (in simple C/C++ at first, Faust later maybe) and then > >> save the algorithm as bytecode to be copied over to the AM335x-based > >> device. > >> > >> Thank you in advance for any help or pointers to resources that you can > >> provide! > >> > >> Kind regards > >> Brian > >> > >> -- > >> Orthogonal Devices > >> Tokyo, Japan > >> www.orthogonaldevices.com > >> > >> > >> --- > >> This email has been checked for viruses by Avast antivirus software. > >> https://www.avast.com/antivirus > >> > >> _______________________________________________ > >> LLVM Developers mailing list > >> llvm-dev at lists.llvm.org > >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >
With the LLVM ORC JIT you actually don't need to embed the JIT Linker in the remote process. ORC supports an RPC mechanism that allows the JIT linker running on a host to query the remote process for the relocated symbol addresses, and perform linking. It works great and allows the JIT target process to be tiny. The LLI ChildTarget tool inside LLVM does exactly what I'm describing and requires a minimal amount of LLVM code. -Chris> On Jun 27, 2019, at 6:00 AM, Peter Smith via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > On Thu, 27 Jun 2019 at 13:02, Brian Clarkson > <clarkson at orthogonaldevices.com> wrote: >> >> Hi Peter >> >> Thank you for your helpful comments, especially on the RPI. Since my >> use case is lot simpler than compiling all of Clang, I hopefully can >> take your experience as a good sign. >> >> The RTOS that TI provides for the AM335x actually has pretty complete >> posix layer and other standard libraries. However, I am working without >> any virtual memory subsystem, so no mmap. However, I was under the >> impression that LLVM (ORC specifically) should be able to relocate code >> at any memory location so the lack of mmap shouldn't be a problem? >> > > Apologies I don't know a lot about ORC, most of my knowledge is on the > static linker side. I don't think mmap is a requirement, just that a > lot of the code may have been written assuming it was present. > Hopefully there are some other people on the list with more experience > of JITs that can help. > > Thinking about the requirements in your earlier mail: > >> - load relocatable (but highly optimized) machine code >> - relocate the machine code >> - export symbols from the loaded machine code (available exports are not known at compile-time) >> - import symbols into the loaded machine code (required imports are not known at compile-time) >> - finally, actually execute functions exported from the loaded machine code > > It sounds like you would need some kind of dynamic loader to handle > the symbol resolution and perform relocation. If the communication is > just Kernel (for want of a better name for the main program) to > Module, and not Module to Module then something like a PIE executable > for each module with the symbols exported with export-dynamic. This > would result in only a small number of relocation types that you would > need to handle, with the majority being R_ARM_RELATIVE which is just > the displacement from the static link address (usually 0), and > R_ARM_ABS32 for those requiring the address of a symbol. The major > restriction of PIE is that there is a fixed offset between code and > data. > > As I understand it the linux kernel uses something like ld -r for a > relocatable link, which is essentially combines many relocatable > objects into a single one and loads that. That means that a lot of > awkward to handle relocations, especially in Thumb could be exposed. > > Apologies I couldn't easily find many examples in open source projects > or guides on how to write a dynamic linker. I have had some experience > with ARM's proprietary linker which had several dynamic linking models > for more bare-metal systems > (http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dai0242a/index.html), > however I'm guessing you would prefer to stick to open source > components. > > Peter > >> Kind regards >> Brian >> >> Orthogonal Devices >> Tokyo, Japan >> www.orthogonaldevices.com >> >> On 27/06/2019 18:41, Peter Smith wrote: >>> On Thu, 27 Jun 2019 at 09:50, Brian Clarkson via llvm-dev >>> <llvm-dev at lists.llvm.org> wrote: >>>> Hello! >>>> >>>> Q1 >>>> Are there any resources or examples on embedding LLVM into an ARM-based >>>> bare-metal application? Searching in this area only turns up >>>> information on how to use LLVM to target bare-metal when I want to >>>> compile LLVM for linking against a bare-metal application. >>>> >>> I'm not aware of any examples unfortunately. I suspect that this could >>> be quite challenging depending on how rich an environment your RTOS >>> offers. It is possible that LLVM depends on Posix or Posix like OS >>> calls for things like mmap and other file abstractions. I've not >>> looked at this in any detail as it may be possible to strip these out >>> with the right configuration options, for example thread support can >>> be disabled. One possible approach would be to build LLVM for a linux >>> target and look at the dependencies. That might give you an idea of >>> what your are up against. >>> >>>> Q2 >>>> Are there any memory usage benchmarks for LLVM across the common tasks >>>> (especially loading bytecode, doing the optimization passes and finally >>>> emitting machine code)? My target (embedded) system has only 1GB of RAM. >>>> >>> I don't have anything specific unfortunately. It is, or at least was >>> possible a couple of years ago, for Clang to compile Clang on a 1GB >>> Raspberry Pi. I'm assuming the plugins will be smaller than the IR >>> generated by the largest Clang C++ file, but my Rasberry PI wasn't >>> doing anything else but compiling Clang. >>> >>>> Background: >>>> I'm about to embark on an effort to integrate LLVM into my bare-metal >>>> application (for AM335x, Cortex-A8, also known as beaglebone black). >>>> The application area is sound synthesis and the reason for embedding >>>> LLVM is to allow users to develop their own "plugins" on the desktop >>>> (using a live coding approach) and then load them (as LLVM bytecode) on >>>> the embedded device. LLVM would be responsible for generating >>>> (optimized, and especially vectorized for NEON) machine code directly on >>>> the embedded device and it would take care of the relocation and >>>> run-time linking duties. This last task is very important because the >>>> RTOS (Texas Instrument's SYS/BIOS) that I'm using does not have any >>>> dynamic linking facilities. Sharing code in the form of LLVM bytecode >>>> also seems to sidestep the complex task of setting up a cross-compiling >>>> toolchain which is something that I would prefer not to have to force my >>>> users to do. In fact, my goal is to have a live coding environment >>>> provided as a desktop application (which might also embed Clang as well >>>> as LLVM) that allows the user to rapidly and playfully build their sound >>>> synthesis idea (in simple C/C++ at first, Faust later maybe) and then >>>> save the algorithm as bytecode to be copied over to the AM335x-based >>>> device. >>>> >>>> Thank you in advance for any help or pointers to resources that you can >>>> provide! >>>> >>>> Kind regards >>>> Brian >>>> >>>> -- >>>> Orthogonal Devices >>>> Tokyo, Japan >>>> www.orthogonaldevices.com >>>> >>>> >>>> --- >>>> This email has been checked for viruses by Avast antivirus software. >>>> https://www.avast.com/antivirus >>>> >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> llvm-dev at lists.llvm.org >>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev