Hi, Thanks Volodya, Misha and Chris,> > For example, > > Correct way: > > Instruction *NewInst = new LoadInst(...); > > NewBB->getInstList().push_back(NewInst); > > > > what I need just put some junk data in the BB, not instructions. From > > assemble code level, it looks like the following, > > > > a piece of code from correct instructions by disassemble object code. > > > > :00000009 0533709283 add eax, 83927033 > > :0000000E 05A2B78135 add eax, 3581B7A2 > > :00000013 C1C819 ror eax, 19 > > :00000016 05E5167711 add eax, 117716E5 > > :0000001B 0542F7A8DC add eax, DCA8F742 > > > > > > :00000009 0533709283 add eax, 83927033 > > :0000000E 7878787878 ??? <<<<<< here is the illegal instruction. > > :00000013 23232 ??? <<<<<< > > :00000016 05E5167711 add eax, 117716E5 > > :0000001B 0542F7A8DC add eax, DCA8F742 > > > > what I tried is to make *NewInst point to random memory(cast to > > Instuction pointer) and push_back to instList. But I failed to do > > it. > > > > Instruction *NewInst = ; > > NewBB->getInstList().push_back(NewInst); > > > > So I was wondering if it is allowed in LLVM or not, if so, how to do that? > > LLVM code must not have any dangling pointers, and hence, this is not > valid LLVM. > > If you want to generate "invalid native code", the way I would suggest > doing it is to create some LLVM instruction in the dead basic block that > you can easily identify, such as: > > * create a new external function, do not define it > * call it from the dead basic block > * then, modify the native code generator for your chosen platform to > look for the call(s) to the fake external function and create some > "new instruction", i.e. one that's invalid for the real target but one > that gives you the bit pattern you want > * you will want to add a new instruction definition to the .td file, > and then generate it in the instruction selector > > However, the question is what is your bigger goal? What you're doing > here is hacking around the optimizers, trying to trick them to not > delete the dead code. Perhaps there is another way to achieve your end > goal, if you could tell us what the big picture is.Let's say on IR level, regular way the following IR code %tmp.0 = getelementptr [10 x sbyte]* %str1, int 0, int 0 ; <sbyte*> [#uses=1] store sbyte 116, sbyte* %tmp.0 %tmp.1 = getelementptr [10 x sbyte]* %str1, int 0, int 1 ; <sbyte*> [#uses=1] store sbyte 101, sbyte* %tmp.1 %tmp.2 = getelementptr [10 x sbyte]* %str1, int 0, int 2 ; <sbyte*> [#uses=1] store sbyte 115, sbyte* %tmp.2 %tmp.3 = getelementptr [10 x sbyte]* %str1, int 0, int 3 ; <sbyte*> [#uses=1] store sbyte 116, sbyte* %tmp.3 will be assembled to movb $116, 18(%esp) movb $101, 19(%esp) movb $115, 20(%esp) movb $116, 21(%esp) But for me, in dummy BB, we'd like to put some meaningless code or illegal code. From assemble machine level, it looks like push %eax push %ecx pop %edx pusha safh cltd das clc all of them are legal one-byte x86 machine instructions. Since those instructions have no chance to be executed, so it will not affect the original code. I thought the above machine code cannot be inserted by using new Instruction(....) way because it is IR level. So maybe we can control machineinst generator to generate the above code in dummy bb. By the way, those dummy BBs' name include string " dummy ", so we can identify which BB is dummy on IR level. If there is a way to be able to get that, I am supposed that like the following, 1. generate some dummy BB on IR level ( working on *.bc by writing a pass) 2. llc *.bc ( generate machine code) 3. as -o *.s *o ( generate object file, or use gcc ) 4. ld -o *.out *.o ( generate executable file) during step 2, we read *.bc code and find dummy BB and put some meaningless machinecode, here, we cannot put some illegal machince code, otherwise, step 3 goes to fail. So is it possible to do that for inserting any machine code into BB? if so, how could we chang llc? I take a look at MachineInstr.c CodeGenerator.c etc, but I still don't know how to do it. Here is a thing that may be useful to understand what I want to do. Some virus coder, they code a virus by assemble code and insert some meaningless code into virus, but they work on assemble level, so it is easy to get it. For me. I don't know if I could do same thing by another way.> -- > This isn't going to work. The LLVM code always has to be well-defined. > The way to get the machine code to contain garbage like this is to add an > intrinsic, then have the code generator expand it to the garbage you want.So we cannot use LLVM code to this, but I am not clear for the way you mentioned. Thanks
Misha Brukman
2005-May-11 20:40 UTC
[LLVMdev] Re:RE: Question about inserting instructions
On Wed, May 11, 2005 at 01:30:29PM -0700, Qiuyu Zhang wrote: [snip]> push %eax > das > clc > > all of them are legal one-byte x86 machine instructions.[snip]> If there is a way to be able to get that, I am supposed that like > the following, > > 1. generate some dummy BB on IR level ( working on *.bc by writing a pass) > 2. llc *.bc ( generate machine code) > 3. as -o *.s *o ( generate object file, or use gcc ) > 4. ld -o *.out *.o ( generate executable file) > > during step 2, we read *.bc code and find dummy BB and put some > meaningless machinecode, here, we cannot put some illegal machince > code, otherwise, step 3 goes to fail.Yes, you are correct -- if you want to create illegal code you need to not use system as. What you need is the ability for llc to create object files with native code directly, without using the system assembler. I think someone is working on it, but I'm not sure as to the status. Otherwise, you will just have some random one-byte instructions.> So is it possible to do that for inserting any machine code into BB? > if so, how could we chang llc? I take a look at MachineInstr.c > CodeGenerator.c etc, but I still don't know how to do it.The CodeEmitter would have to be enhanced to allow outputting standard format object files that ld can process. If you are interested in doing this, someone can point you in the right direction as to what needs to be done. -- Misha Brukman :: http://misha.brukman.net :: http://llvm.cs.uiuc.edu
Chris Lattner
2005-May-11 20:45 UTC
[LLVMdev] Re:RE: Question about inserting instructions
On Wed, 11 May 2005, Misha Brukman wrote:>> during step 2, we read *.bc code and find dummy BB and put some >> meaningless machinecode, here, we cannot put some illegal machince >> code, otherwise, step 3 goes to fail. > > Yes, you are correct -- if you want to create illegal code you need to > not use system as. What you need is the ability for llc to create > object files with native code directly, without using the system > assembler. I think someone is working on it, but I'm not sure as to the > status. Otherwise, you will just have some random one-byte > instructions.Actually that's not true. You can make instructions with an asmstring of: ".byte 123\n .byte 56\n .byte 86" and those bytes will get emitted to the code stream. -Chris>> if so, how could we chang llc? I take a look at MachineInstr.c >> CodeGenerator.c etc, but I still don't know how to do it. > > The CodeEmitter would have to be enhanced to allow outputting standard > format object files that ld can process. If you are interested in doing > this, someone can point you in the right direction as to what needs to > be done. > >-Chris -- http://nondot.org/sabre/ http://llvm.cs.uiuc.edu/
Reasonably Related Threads
- [LLVMdev] Question About inserting Instruction?
- [LLVMdev] Re:RE: Question about inserting instructions
- [LLVMdev] need help understanding getelementptr assembler instruction
- [LLVMdev] need help understanding getelementptr assembler instruction
- [LLVMdev] need help understanding getelementptr assembler instruction