Hello, First of all, I'd like to point out that I am a newbie in this topic and this is more of a "would it work?" kind of question. I basically just came up with a difficult problem and decided to research on it. I recently tried to run Elder Scrolls: Daggerfall on an ARM netbook Toshiba AC100 and failed even after turning on the latest patches for "dynamic recompilation". I took a look at the code and here's what I found: https://github.com/wjp/dosbox/blob/idados/src/cpu/core_dynrec/decoder.h#L34 The relevant macros and functions are defined there: https://github.com/wjp/dosbox/blob/idados/src/cpu/core_dynrec/risc_armv4le-o3.h So basically, it looks like there's code that translates instructions from x86 to a few other platforms in chunks of 32 opcodes. Since this code was too slow to me, I asked myself the question "how could it speed up?" and assumed that perhaps LLVM could optimize it on the fly. So, here's the question - would it be feasible given the assumptions above? What I am thinking about is a system that would: 1. Generate LLVM IR code instead of native calls and JIT them on the fly, 2. Apply optimizations that I know from Clang. I saw this example on pastebin [1] and generating functions on the fly looks rather straightforward, but I am not sure if it would be as easy if I wanted to translate machine code from one platform to another. Does LLVM have or integrate with any libraries that would make this practical? What would be the main challenges? Keep in mind that I would welcome even a partial answer. Cheers, d33tah [1] http://pastebin.com/f2NSGZGR -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150722/ac38c3a2/attachment.sig>
A sequence of 32 instruction is not very likely to have many optimisation opportunities that LLVM can take advantage of. You may get a speedup from longer traces, though of course the LLVM JITing time is likely to be longer, so you’d want to make sure that it’s done in a separate thread. If you can get longer traces (and DOSBox has the infrastructure already for invalidating on self-modifying code) then you may be able to get some speedup. There was a similar project to use LLVM in QEMU a few years ago that failed to provide a speedup. David> On 22 Jul 2015, at 14:06, Jacek Wielemborek <d33tah at gmail.com> wrote: > > Hello, > > First of all, I'd like to point out that I am a newbie in this topic > and this is more of a "would it work?" kind of question. I basically > just came up with a difficult problem and decided to research on it. > > I recently tried to run Elder Scrolls: Daggerfall on an ARM netbook > Toshiba AC100 and failed even after turning on the latest patches for > "dynamic recompilation". I took a look at the code and here's what I found: > > https://github.com/wjp/dosbox/blob/idados/src/cpu/core_dynrec/decoder.h#L34 > > The relevant macros and functions are defined there: > > https://github.com/wjp/dosbox/blob/idados/src/cpu/core_dynrec/risc_armv4le-o3.h > > So basically, it looks like there's code that translates instructions > from x86 to a few other platforms in chunks of 32 opcodes. Since this > code was too slow to me, I asked myself the question "how could it speed > up?" and assumed that perhaps LLVM could optimize it on the fly. So, > here's the question - would it be feasible given the assumptions above? > What I am thinking about is a system that would: > > 1. Generate LLVM IR code instead of native calls and JIT them on the fly, > 2. Apply optimizations that I know from Clang. > > I saw this example on pastebin [1] and generating functions on the fly > looks rather straightforward, but I am not sure if it would be as easy > if I wanted to translate machine code from one platform to another. Does > LLVM have or integrate with any libraries that would make this > practical? What would be the main challenges? Keep in mind that I would > welcome even a partial answer. > > Cheers, > d33tah > > [1] http://pastebin.com/f2NSGZGR > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
W dniu 22.07.2015 o 15:21, David Chisnall pisze:> A sequence of 32 instruction is not very likely to have many > optimisation opportunities that LLVM can take advantage of.I don't know the codebase, but perhaps it's as easy as increasing the number here and maybe adjusting some relevant buffers: https://github.com/wjp/dosbox/blob/idados/src/cpu/core_dynrec.cpp#L212> You may > get a speedup from longer traces, though of course the LLVM JITing > time is likely to be longer, so you’d want to make sure that it’s > done in a separate thread. If you can get longer traces (and DOSBox > has the infrastructure already for invalidating on self-modifying > code) then you may be able to get some speedup.Introducing another thread sounds like a difficult task, though it definitely makes sense and could be worth it, though...> There was a similar project to use LLVM in QEMU a few years ago that > failed to provide a speedup. > > DavidWhy did it fail? -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150722/177b8c38/attachment.sig>