
15 Most Recent [RSS]
More...
|
Generating Machine Code at Runtime
Okay, so my next attempt at learning how my computer works and how to speak machine language is the following C code fragment:
typedef int (*FuncPtr)();
// Create a function:
char testFunc[] = { 0x90, // NOP (not really necessary...)
0xB8, 0x10, 0x00, 0x00, 0x00, // MOVL $16,%eax
0xC3 }; // RET
// Make a copy on the heap, OS doesn't like executing the stack:
FuncPtr testFuncPtr = (FuncPtr) malloc(7);
memmove( (void*) testFuncPtr, testFunc, 7 );
printf("Before function.\n");
int result = (*testFuncPtr)();
printf("Result %d\n", result);
Basically, this stores the raw opcodes of a function in an array of chars. The first byte of each line is usually the opcode, i.e. 0x90 is No-Op, 0xB8 is a MOVL into the eax register (with the next 4 bytes being the number to store, in this case 16), and 0xC3 is the return instruction (I had to look up the opcodes in Intel's documentation).
One thing to watch out for here (at least on Mac OS X), is that you'll get a bad access error if you try to execute testFunc directly. That's because testFunc is on the stack, and the stack shouldn't contain executable code (it's a small safety measure). So, what we do is we simply malloc some memory on the heap, and stuff our code in there.
You may wonder why I'm using eax of all registers to store my number 16 in. Easy: Because the convention is that an int return value (and most other 4-byte return values) goes in eax when a function returns. So, what this does is it essentially returns 16. Which our printf() proves. Neat!
Intel's documentation describes the opcodes in a very complicated way, so what I essentially do is I write some assembler code and enclose the instruction whose byte sequence I want to find out in instructions whose byte sequence I already know (I like to use six nops, which are short and show up as 0x90 90 90 90 90 90). Then I compile that, and then use a hex editor to search for the known instructions, and whatever is between them must be my new one. Here's a small table of other operations you may find in the typical program and what byte sequences they turn to:
| 0x50 | pushl %eax |
| 0x53 | pushl %ebx |
| 0x55 | pushl %ebp |
| 0x89 E5 | movl %esp, %ebp |
| 0x90 | nop |
| 0xB8 NN NN NN NN | movl $N, %eax |
| 0x68 NN NN NN NN | pushl $N |
| 0xE8 NN NN NN NN | call relativeOffsetNFromEndOfInstruction |
| 0x8B 1C 24 | movl (%esp), %ebx |
| 0x8D 83 NN NN NN NN | leal relativeOffsetToData(%ebx), %eax |
| 0x8D 85 NN NN NN NN | leal relativeOffsetToData(%ebp), %eax |
| 0x5B | popl %ebx |
| 0x83 C4 NN | addl $NN,%esp |
| 0x83 EC NN | subl $NN,%esp |
| 0x8B 00 | movl (%eax), %eax |
| 0x89 45 NN | movl %eax, NN(%ebp) |
| 0xC9 | leave |
| 0xC3 | ret |
The code fragment above is essentially what one would need to create a just-in-time compiler. For a real compiler, instead of executing this directly, we'd have to write it to a complete MachO file and link it with crt1.o.
Update: on top of the instructions for position-independent code (PIC), I've also added some more useful in passing structs as parameters on the stack.
Tyler Vano writes: Fascinating! Please continue this series, as I'm extremely interested in how this all works, as, I'm sure, are many other people. I'm also interested to hear an analysis on the binary format of the opcodes themselves. Knowing that 0x55 means pushl %epb isn't nearly as interesting as knowing *why* 0x55 means pushl %ebp. ;)
|
Uli Kusterer replies: ★ @Tyler: I would love to know how the opcodes are created myself. Intel's description on this is kinda foggy, involving three different bytes that seem to get combined sometimes and sometimes not... I'm still investigating that part.
|
Peter Hosey writes: Even better, the IA-32 ISA reference that comes with OS X lists the opcode for every instruction. You can get there from Shark, and the underlying PDF files are at /Library/Application Support/Shark/Helpers/XYZ Help.app/Contents/Resources/XYZISA.pdf (for XYZ = {PowerPC,IA32,EM64T}).
|
Blake C. writes: Tyler- there are no fun bit fields within the primary opcode byte, unlike PPC. They're almost arbitrary. There are some nice bitfields in the ModR/M and SIB bytes, when they exist. The highest 2 bits in the ModR/M byte specify one of the 4 addressing modes, 3 of which can include an SIB byte(when the source register is ESP). The Intel manual does a good job of explaining everything, but at the end of the day, x86 machine code is even uglier than x86 assembly.
|
Uli Kusterer replies: ★ @Blake: Well, ugly is in the eye of the beholder. It's already machine code, after all. Sure, the instructions won't nicely line up in a hex editor and if you want to patch code at runtime like Wolf Rentzsch's mach_star stuff does, you're in for a shock, but in the end, it's all just bytes that need to be output.
I would call Intel assembly average, and PPC assembly sounds positively gorgeous to me :-)
|
David Chisnall writes: The ISA reference bundled with Shark is a service. It's really great if you're having to wade through a lot of assembly, since you can just highlight the instruction, select the architecture from the services menu, and jump directly to the definition.
Uli, as I recall, returning values in EAX is the MS-DOS calling convention. Most UNIXes pass and return values via the stack. Mach-O uses some hybrid with some very complicated rules about when things go on the stack and when they go in registers. If you're writing a JIT, then for the most part you can make up your own calling convention, but your entry or exit points might need to conform to the platform's calling convention. If you want to do this portably, you can write a little inline assembly shim that will jump into it. For your example, it would be something like this:
static inline int asm_func_shim(FuncPtr asmFunc)
{
int ret;
__asm("CALL %2" : "=a" (ret) : "r" (asmFunc));
return ret;
}
The syntax might be slightly wrong here (I haven't tested it), but doing this would isolate you from any concerns about the target ABI. If you ran this on OS X, then the compiler would optimise the shim away completely. If you ran it somewhere where values were returned on the stack, then the shim would get the return value from the register. The one thing to watch out for is that different ABIs have different rules about which registers can be clobbered by functions. You might want to add the ones you use to the clobbered list in the inline assembly fragment, just to be sure, since that will make it the compiler's problem.
|
David Chisnall writes: I forgot to mention, if you're interested in run-time code generation from a practical standpoint, rather than as an academic exercise, you might want to look at GNU Lightning:
http://www.gnu.org/software/lightning/manual/lightning.html
It's quite easy to use, and lets you generate native (although not very well optimised) code at runtime for a number of architectures with a single code path. If you're on OS X, you could use it to create code for PowerPC and x86 without needing different code for each.
|
Randy Hollines writes: Here's another example... I download the IA-32 programmer guides to obtain the opcode and operand values. There are some useful table in the guide i.e. table b-13 (opcode values) and table 2-2 (operand values). Cheers!
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char** argv) {
typedef int (*fun_ptr)(int*);
// array of 32-bit values
int values[] = {14, 16};
// inline machine code
char fun_bytes[] = {
/* setup stack frame */
0xff, 0xf5, // push %ebp
0x89, 0xe5, // mov %ebp, %esp
/* add/sub values */
0x8b, 0x4d, 0x08, // movl %ecx, 8(%ebp) - p[] -> c
0x8b, 0x41, 0x00, // movl %eax, (%ecx) - p[0] -> a
// 0x03, 0x41, 0x04, // addl %eax, 4(%ecx) - a + p[1] -> a (add)
0x2b, 0x41, 0x04, // addl %eax, 4(%ecx) - a - p[1] -> a (sub)
/* tear down stack frame and return */
0x8f, 0xc5, // pop %ebp
0xc3 // rtn
};
const int size = sizeof(fun_bytes);
// set up line function
fun_ptr rt_fun = (fun_ptr)malloc(size);
memcpy(rt_fun, fun_bytes, size);
// execute function
int result = (*rt_fun)(values);
printf("?: %x\n", values);
printf("stream length: %d, value: %d\n", size, result);
}
|
MegaByte writes: I've tried to compile and run the examples given but whenever I hit the function pointer call, the program segfaults. I'm attempting this on gcc 4.3.2 (Fedora 10 on a Pentium M). Does anybody know what might be wrong? Perhaps more robust data execution prevention? Are there any compiler flags that I should be aware of?
|
MegaByte writes: So it turns out that my problems were indeed due to DEP. I found two solutions: either turn off the NX bit for the entire executable with "-z execstack" linker option, or use mprotect in conjunction with valloc to directly turn off the NX bit for the generated code array in the code itself.
|
Randy Hollines writes: Executing code for amd64/x64 architectures... thanks to Aaron Kaluszka at Cal Berkeley
// execute buffer for amd64/x64
#include <iostream>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>
using namespace std;
typedef long (*jit_fun_ptr)(long v);
int main() {
char buffer[] = {
0x40, 0x81, 0xc7, 0x0a, 0x00, 0x00, 0x00, // param + const
0x40, 0x89, 0xf8, // set return register
0xc3 // return
};
int total_size = sizeof(buffer);
jit_fun_ptr jit_fun = (jit_fun_ptr)valloc(total_size);
memcpy((char*)jit_fun, (char*)buffer, total_size);
mprotect((void*)jit_fun, total_size, PROT_EXEC);
cout << jit_fun(12L) << endl;
return 0;
} |
|  |