Hello! Q1 Are there any resources or examples on embedding LLVM into an ARM-based bare-metal application? Searching in this area only turns up information on how to use LLVM to target bare-metal when I want to compile LLVM for linking against a bare-metal application. Q2 Are there any memory usage benchmarks for LLVM across the common tasks (especially loading bytecode, doing the optimization passes and finally emitting machine code)? My target (embedded) system has only 1GB of RAM. Background: I'm about to embark on an effort to integrate LLVM into my bare-metal application (for AM335x, Cortex-A8, also known as beaglebone black). The application area is sound synthesis and the reason for embedding LLVM is to allow users to develop their own "plugins" on the desktop (using a live coding approach) and then load them (as LLVM bytecode) on the embedded device. LLVM would be responsible for generating (optimized, and especially vectorized for NEON) machine code directly on the embedded device and it would take care of the relocation and run-time linking duties. This last task is very important because the RTOS (Texas Instrument's SYS/BIOS) that I'm using does not have any dynamic linking facilities. Sharing code in the form of LLVM bytecode also seems to sidestep the complex task of setting up a cross-compiling toolchain which is something that I would prefer not to have to force my users to do. In fact, my goal is to have a live coding environment provided as a desktop application (which might also embed Clang as well as LLVM) that allows the user to rapidly and playfully build their sound synthesis idea (in simple C/C++ at first, Faust later maybe) and then save the algorithm as bytecode to be copied over to the AM335x-based device. Thank you in advance for any help or pointers to resources that you can provide! Kind regards Brian -- Orthogonal Devices Tokyo, Japan www.orthogonaldevices.com --- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus
Hi Brian, I'm afraid I can't answer your actual questions, but do have a couple of comments on the background... On Thu, 27 Jun 2019 at 09:50, Brian Clarkson via llvm-dev <llvm-dev at lists.llvm.org> wrote:> LLVM would be responsible for generating > (optimized, and especially vectorized for NEON) machine code directly on > the embedded device and it would take care of the relocation and > run-time linking duties.That's a much smaller task than what you'd get from embedding all of LLVM. lldb is probably an example of a program with a similar problem to you, and it gets by with just a pretty small stub of a "debugserver" on a device. It does all CodeGen and even prelinking on the host side, and then transfers binary data across. The concept is called "remote JIT" in the LLVM codebase if you want to research it more. I think the main advantage you'd get from embedding LLVM itself over a scheme like that would be a certain resilience to updating the RTOS on the device (it would cope with a function sliding around in memory even if the host is no longer available to recompile), but I bet there are simpler ways to do that. The API surface you need to control is probably pretty small.> Sharing code in the form of LLVM bytecode > also seems to sidestep the complex task of setting up a cross-compiling > toolchain which is something that I would prefer not to have to force my > users to do.If you can produce bitcode on the host, you can produce an ARM binary without forcing the users to install extra stuff. The work involved would be pretty comparable to what you'd have to do on the RTOS side anyway (you're unlikely to be running GNU ld against system libraries on the RTOS), and made slightly easier by the host being more of a "normal" LLVM environment. Cheers. Tim.
Hi Tim. Thank you for taking to time to comment on the background! I will definitely study lldb and remote JIT for ideas. I worry that I will not be able to pre-link on the host side because the host cannot(?) know the final memory layout of code on the client side, especially when there are multiple plugins being loaded in different combinations on the host and client. Is that an unfounded worry? I suppose it is also possible to share re-locatable machine code (ELF?) and only use client-side embedded LLVM for linking duties? Does that simplify things appreciably? I was under the impression that if I can compile and embed the LLVM linker then embedding LLVM's codegen libraries would not be much extra work. Then I can allow users to use Faust (or any other frontend) to generate bytecode in addition to my "live coding" desktop application. So many variables to consider... :-) Kind regards Brian Clarkson Orthogonal Devices Tokyo, Japan www.orthogonaldevices.com On 27/06/2019 18:17, Tim Northover wrote:> Hi Brian, > > I'm afraid I can't answer your actual questions, but do have a couple > of comments on the background... > > On Thu, 27 Jun 2019 at 09:50, Brian Clarkson via llvm-dev > <llvm-dev at lists.llvm.org> wrote: >> LLVM would be responsible for generating >> (optimized, and especially vectorized for NEON) machine code directly on >> the embedded device and it would take care of the relocation and >> run-time linking duties. > That's a much smaller task than what you'd get from embedding all of > LLVM. lldb is probably an example of a program with a similar problem > to you, and it gets by with just a pretty small stub of a > "debugserver" on a device. It does all CodeGen and even prelinking on > the host side, and then transfers binary data across. > > The concept is called "remote JIT" in the LLVM codebase if you want to > research it more. > > I think the main advantage you'd get from embedding LLVM itself over a > scheme like that would be a certain resilience to updating the RTOS on > the device (it would cope with a function sliding around in memory > even if the host is no longer available to recompile), but I bet there > are simpler ways to do that. The API surface you need to control is > probably pretty small. > >> Sharing code in the form of LLVM bytecode >> also seems to sidestep the complex task of setting up a cross-compiling >> toolchain which is something that I would prefer not to have to force my >> users to do. > If you can produce bitcode on the host, you can produce an ARM binary > without forcing the users to install extra stuff. The work involved > would be pretty comparable to what you'd have to do on the RTOS side > anyway (you're unlikely to be running GNU ld against system libraries > on the RTOS), and made slightly easier by the host being more of a > "normal" LLVM environment. > > Cheers. > > Tim.--- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus
On Thu, 27 Jun 2019 at 09:50, Brian Clarkson via llvm-dev <llvm-dev at lists.llvm.org> wrote:> > Hello! > > Q1 > Are there any resources or examples on embedding LLVM into an ARM-based > bare-metal application? Searching in this area only turns up > information on how to use LLVM to target bare-metal when I want to > compile LLVM for linking against a bare-metal application. >I'm not aware of any examples unfortunately. I suspect that this could be quite challenging depending on how rich an environment your RTOS offers. It is possible that LLVM depends on Posix or Posix like OS calls for things like mmap and other file abstractions. I've not looked at this in any detail as it may be possible to strip these out with the right configuration options, for example thread support can be disabled. One possible approach would be to build LLVM for a linux target and look at the dependencies. That might give you an idea of what your are up against.> Q2 > Are there any memory usage benchmarks for LLVM across the common tasks > (especially loading bytecode, doing the optimization passes and finally > emitting machine code)? My target (embedded) system has only 1GB of RAM. >I don't have anything specific unfortunately. It is, or at least was possible a couple of years ago, for Clang to compile Clang on a 1GB Raspberry Pi. I'm assuming the plugins will be smaller than the IR generated by the largest Clang C++ file, but my Rasberry PI wasn't doing anything else but compiling Clang.> Background: > I'm about to embark on an effort to integrate LLVM into my bare-metal > application (for AM335x, Cortex-A8, also known as beaglebone black). > The application area is sound synthesis and the reason for embedding > LLVM is to allow users to develop their own "plugins" on the desktop > (using a live coding approach) and then load them (as LLVM bytecode) on > the embedded device. LLVM would be responsible for generating > (optimized, and especially vectorized for NEON) machine code directly on > the embedded device and it would take care of the relocation and > run-time linking duties. This last task is very important because the > RTOS (Texas Instrument's SYS/BIOS) that I'm using does not have any > dynamic linking facilities. Sharing code in the form of LLVM bytecode > also seems to sidestep the complex task of setting up a cross-compiling > toolchain which is something that I would prefer not to have to force my > users to do. In fact, my goal is to have a live coding environment > provided as a desktop application (which might also embed Clang as well > as LLVM) that allows the user to rapidly and playfully build their sound > synthesis idea (in simple C/C++ at first, Faust later maybe) and then > save the algorithm as bytecode to be copied over to the AM335x-based > device. > > Thank you in advance for any help or pointers to resources that you can > provide! > > Kind regards > Brian > > -- > Orthogonal Devices > Tokyo, Japan > www.orthogonaldevices.com > > > --- > This email has been checked for viruses by Avast antivirus software. > https://www.avast.com/antivirus > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Hi Peter Thank you for your helpful comments, especially on the RPI. Since my use case is lot simpler than compiling all of Clang, I hopefully can take your experience as a good sign. The RTOS that TI provides for the AM335x actually has pretty complete posix layer and other standard libraries. However, I am working without any virtual memory subsystem, so no mmap. However, I was under the impression that LLVM (ORC specifically) should be able to relocate code at any memory location so the lack of mmap shouldn't be a problem? Kind regards Brian Orthogonal Devices Tokyo, Japan www.orthogonaldevices.com On 27/06/2019 18:41, Peter Smith wrote:> On Thu, 27 Jun 2019 at 09:50, Brian Clarkson via llvm-dev > <llvm-dev at lists.llvm.org> wrote: >> Hello! >> >> Q1 >> Are there any resources or examples on embedding LLVM into an ARM-based >> bare-metal application? Searching in this area only turns up >> information on how to use LLVM to target bare-metal when I want to >> compile LLVM for linking against a bare-metal application. >> > I'm not aware of any examples unfortunately. I suspect that this could > be quite challenging depending on how rich an environment your RTOS > offers. It is possible that LLVM depends on Posix or Posix like OS > calls for things like mmap and other file abstractions. I've not > looked at this in any detail as it may be possible to strip these out > with the right configuration options, for example thread support can > be disabled. One possible approach would be to build LLVM for a linux > target and look at the dependencies. That might give you an idea of > what your are up against. > >> Q2 >> Are there any memory usage benchmarks for LLVM across the common tasks >> (especially loading bytecode, doing the optimization passes and finally >> emitting machine code)? My target (embedded) system has only 1GB of RAM. >> > I don't have anything specific unfortunately. It is, or at least was > possible a couple of years ago, for Clang to compile Clang on a 1GB > Raspberry Pi. I'm assuming the plugins will be smaller than the IR > generated by the largest Clang C++ file, but my Rasberry PI wasn't > doing anything else but compiling Clang. > >> Background: >> I'm about to embark on an effort to integrate LLVM into my bare-metal >> application (for AM335x, Cortex-A8, also known as beaglebone black). >> The application area is sound synthesis and the reason for embedding >> LLVM is to allow users to develop their own "plugins" on the desktop >> (using a live coding approach) and then load them (as LLVM bytecode) on >> the embedded device. LLVM would be responsible for generating >> (optimized, and especially vectorized for NEON) machine code directly on >> the embedded device and it would take care of the relocation and >> run-time linking duties. This last task is very important because the >> RTOS (Texas Instrument's SYS/BIOS) that I'm using does not have any >> dynamic linking facilities. Sharing code in the form of LLVM bytecode >> also seems to sidestep the complex task of setting up a cross-compiling >> toolchain which is something that I would prefer not to have to force my >> users to do. In fact, my goal is to have a live coding environment >> provided as a desktop application (which might also embed Clang as well >> as LLVM) that allows the user to rapidly and playfully build their sound >> synthesis idea (in simple C/C++ at first, Faust later maybe) and then >> save the algorithm as bytecode to be copied over to the AM335x-based >> device. >> >> Thank you in advance for any help or pointers to resources that you can >> provide! >> >> Kind regards >> Brian >> >> -- >> Orthogonal Devices >> Tokyo, Japan >> www.orthogonaldevices.com >> >> >> --- >> This email has been checked for viruses by Avast antivirus software. >> https://www.avast.com/antivirus >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev