Uli's Web Site
[ Zathras.de - Uli's Web Site ]
Other Sites: Stories
Pix
Abi 2000
Stargate: Resurgence
Lost? Site Map!
 
 
     home | articles | moose | programming | articles >> blog

 Blog
 
 Blog Topics
 
 Archive
 

15 Most Recent [RSS]

 Review: Sherlock
2010-07-28 @073
 
 Playing with Objective C on Debian
2010-05-08 @456
 
 Fruit vs. Obst
2010-05-08 @439
 
 Mixed-language ambiguity
2010-04-15 @994
 
 Uli's 12:07 AM Law
2010-04-12 @881
 
 Uli's 1:24 AM Law
2010-04-12 @874
 
 Uli's 6:28 AM Law
2010-04-12 @869
 
 Uli's 3:57 PM Law
2010-04-12 @867
 
 Uli's 4:41 PM Law
2010-04-12 @864
 
 Uli's 7:25 AM Law
2010-04-12 @862
 
 Uli's 9:36 PM Law
2010-04-12 @861
 
 Typesafe typecasts
2010-04-12 @471
 
 Porting to the Macintosh
2010-04-09 @592
 
 Uli's source code is on Github!
2010-03-05 @986
 
 Downtime on Friday
2010-03-04 @025
 

More...

Generating Machine Code at Runtime

Okay, so my next attempt at learning how my computer works and how to speak machine language is the following C code fragment:

typedef int (*FuncPtr)();

// Create a function:
char            testFunc[] = { 0x90,                         // NOP (not really necessary...)
                               0xB8, 0x10, 0x00, 0x00, 0x00, // MOVL $16,%eax
                               0xC3 };                       // RET
	
// Make a copy on the heap, OS doesn't like executing the stack:
FuncPtr         testFuncPtr = (FuncPtr) malloc(7);
memmove( (void*) testFuncPtr, testFunc, 7 );
	
printf("Before function.\n");
int result = (*testFuncPtr)();
printf("Result %d\n", result);

Basically, this stores the raw opcodes of a function in an array of chars. The first byte of each line is usually the opcode, i.e. 0x90 is No-Op, 0xB8 is a MOVL into the eax register (with the next 4 bytes being the number to store, in this case 16), and 0xC3 is the return instruction (I had to look up the opcodes in Intel's documentation).

One thing to watch out for here (at least on Mac OS X), is that you'll get a bad access error if you try to execute testFunc directly. That's because testFunc is on the stack, and the stack shouldn't contain executable code (it's a small safety measure). So, what we do is we simply malloc some memory on the heap, and stuff our code in there.

You may wonder why I'm using eax of all registers to store my number 16 in. Easy: Because the convention is that an int return value (and most other 4-byte return values) goes in eax when a function returns. So, what this does is it essentially returns 16. Which our printf() proves. Neat!

Intel's documentation describes the opcodes in a very complicated way, so what I essentially do is I write some assembler code and enclose the instruction whose byte sequence I want to find out in instructions whose byte sequence I already know (I like to use six nops, which are short and show up as 0x90 90 90 90 90 90). Then I compile that, and then use a hex editor to search for the known instructions, and whatever is between them must be my new one. Here's a small table of other operations you may find in the typical program and what byte sequences they turn to:

0x50pushl %eax
0x53pushl %ebx
0x55pushl %ebp
0x89 E5movl %esp, %ebp
0x90nop
0xB8 NN NN NN NNmovl $N, %eax
0x68 NN NN NN NNpushl $N
0xE8 NN NN NN NNcall relativeOffsetNFromEndOfInstruction
0x8B 1C 24 movl (%esp), %ebx
0x8D 83 NN NN NN NNleal relativeOffsetToData(%ebx), %eax
0x8D 85 NN NN NN NNleal relativeOffsetToData(%ebp), %eax
0x5Bpopl %ebx
0x83 C4 NNaddl $NN,%esp
0x83 EC NNsubl $NN,%esp
0x8B 00movl (%eax), %eax
0x89 45 NNmovl %eax, NN(%ebp)
0xC9leave
0xC3ret

The code fragment above is essentially what one would need to create a just-in-time compiler. For a real compiler, instead of executing this directly, we'd have to write it to a complete MachO file and link it with crt1.o.

Update: on top of the instructions for position-independent code (PIC), I've also added some more useful in passing structs as parameters on the stack.

Reader Comments: (RSS Feed)
Tyler Vano writes:
Fascinating! Please continue this series, as I'm extremely interested in how this all works, as, I'm sure, are many other people. I'm also interested to hear an analysis on the binary format of the opcodes themselves. Knowing that 0x55 means pushl %epb isn't nearly as interesting as knowing *why* 0x55 means pushl %ebp. ;)
Uli Kusterer replies:
@Tyler: I would love to know how the opcodes are created myself. Intel's description on this is kinda foggy, involving three different bytes that seem to get combined sometimes and sometimes not... I'm still investigating that part.
Tyler Vano writes:
Check this out: http://groups.google.com/group/alt.2600/attach/ec1d317a2d3e778b/x86_intro?part=2&hl=en
Peter Hosey writes:
Even better, the IA-32 ISA reference that comes with OS X lists the opcode for every instruction. You can get there from Shark, and the underlying PDF files are at /Library/Application Support/Shark/Helpers/XYZ Help.app/Contents/Resources/XYZISA.pdf (for XYZ = {PowerPC,IA32,EM64T}).
Blake C. writes:
Tyler- there are no fun bit fields within the primary opcode byte, unlike PPC. They're almost arbitrary. There are some nice bitfields in the ModR/M and SIB bytes, when they exist. The highest 2 bits in the ModR/M byte specify one of the 4 addressing modes, 3 of which can include an SIB byte(when the source register is ESP). The Intel manual does a good job of explaining everything, but at the end of the day, x86 machine code is even uglier than x86 assembly.
Uli Kusterer replies:
@Blake: Well, ugly is in the eye of the beholder. It's already machine code, after all. Sure, the instructions won't nicely line up in a hex editor and if you want to patch code at runtime like Wolf Rentzsch's mach_star stuff does, you're in for a shock, but in the end, it's all just bytes that need to be output. I would call Intel assembly average, and PPC assembly sounds positively gorgeous to me :-)
Uli Kusterer replies:
For anyone who wants to do more than a just-in-time compiler, here's a link to the docs for the MachO file format: http://developer.apple.com/documentation/DeveloperTools/Conceptual/MachOTopics/index.html That's what one needs to generate to create a Mac executable.
David Chisnall writes:
The ISA reference bundled with Shark is a service. It's really great if you're having to wade through a lot of assembly, since you can just highlight the instruction, select the architecture from the services menu, and jump directly to the definition. Uli, as I recall, returning values in EAX is the MS-DOS calling convention. Most UNIXes pass and return values via the stack. Mach-O uses some hybrid with some very complicated rules about when things go on the stack and when they go in registers. If you're writing a JIT, then for the most part you can make up your own calling convention, but your entry or exit points might need to conform to the platform's calling convention. If you want to do this portably, you can write a little inline assembly shim that will jump into it. For your example, it would be something like this: static inline int asm_func_shim(FuncPtr asmFunc) { int ret; __asm("CALL %2" : "=a" (ret) : "r" (asmFunc)); return ret; } The syntax might be slightly wrong here (I haven't tested it), but doing this would isolate you from any concerns about the target ABI. If you ran this on OS X, then the compiler would optimise the shim away completely. If you ran it somewhere where values were returned on the stack, then the shim would get the return value from the register. The one thing to watch out for is that different ABIs have different rules about which registers can be clobbered by functions. You might want to add the ones you use to the clobbered list in the inline assembly fragment, just to be sure, since that will make it the compiler's problem.
David Chisnall writes:
I forgot to mention, if you're interested in run-time code generation from a practical standpoint, rather than as an academic exercise, you might want to look at GNU Lightning: http://www.gnu.org/software/lightning/manual/lightning.html It's quite easy to use, and lets you generate native (although not very well optimised) code at runtime for a number of architectures with a single code path. If you're on OS X, you could use it to create code for PowerPC and x86 without needing different code for each.
Randy Hollines writes:
Here's another example... I download the IA-32 programmer guides to obtain the opcode and operand values. There are some useful table in the guide i.e. table b-13 (opcode values) and table 2-2 (operand values). Cheers! #include <stdio.h> #include <stdlib.h> #include <string.h> int main(int argc, char** argv) { typedef int (*fun_ptr)(int*); // array of 32-bit values int values[] = {14, 16}; // inline machine code char fun_bytes[] = { /* setup stack frame */ 0xff, 0xf5, // push %ebp 0x89, 0xe5, // mov %ebp, %esp /* add/sub values */ 0x8b, 0x4d, 0x08, // movl %ecx, 8(%ebp) - p[] -> c 0x8b, 0x41, 0x00, // movl %eax, (%ecx) - p[0] -> a // 0x03, 0x41, 0x04, // addl %eax, 4(%ecx) - a + p[1] -> a (add) 0x2b, 0x41, 0x04, // addl %eax, 4(%ecx) - a - p[1] -> a (sub) /* tear down stack frame and return */ 0x8f, 0xc5, // pop %ebp 0xc3 // rtn }; const int size = sizeof(fun_bytes); // set up line function fun_ptr rt_fun = (fun_ptr)malloc(size); memcpy(rt_fun, fun_bytes, size); // execute function int result = (*rt_fun)(values); printf("?: %x\n", values); printf("stream length: %d, value: %d\n", size, result); }
Objeck writes:
Hey folks, I'm in the process of implementing a JIT compiler for a Java/C# type language. If you're interested in how it's being implemented check out my Wiki page (http://code.google.com/p/objeck/wiki/ObjeckJit). Cheers!
MegaByte writes:
I've tried to compile and run the examples given but whenever I hit the function pointer call, the program segfaults. I'm attempting this on gcc 4.3.2 (Fedora 10 on a Pentium M). Does anybody know what might be wrong? Perhaps more robust data execution prevention? Are there any compiler flags that I should be aware of?
MegaByte writes:
So it turns out that my problems were indeed due to DEP. I found two solutions: either turn off the NX bit for the entire executable with "-z execstack" linker option, or use mprotect in conjunction with valloc to directly turn off the NX bit for the generated code array in the code itself.
Randy Hollines writes:
Executing code for amd64/x64 architectures... thanks to Aaron Kaluszka at Cal Berkeley // execute buffer for amd64/x64 #include <iostream> #include <stdlib.h> #include <string.h> #include <sys/mman.h> using namespace std; typedef long (*jit_fun_ptr)(long v); int main() { char buffer[] = { 0x40, 0x81, 0xc7, 0x0a, 0x00, 0x00, 0x00, // param + const 0x40, 0x89, 0xf8, // set return register 0xc3 // return }; int total_size = sizeof(buffer); jit_fun_ptr jit_fun = (jit_fun_ptr)valloc(total_size); memcpy((char*)jit_fun, (char*)buffer, total_size); mprotect((void*)jit_fun, total_size, PROT_EXEC); cout << jit_fun(12L) << endl; return 0; }
Comment on this article:
Name:
E-Mail: (not shown, hashed for Gravatar)
Web Site URL: (optional)
Comment: (plain text only)
Please Enter the following word:
Or E-Mail Uli privately.

 
Created: 2006-11-19 @611 Last change: 2010-07-29 @719 | Home | Admin | Edit
© Copyright 2003-2010 by M. Uli Kusterer, all rights reserved.