Uli's Web Site
[ Zathras.de - Uli's Web Site ]
Other Sites: Stories
Pix
Abi 2000
Stargate: Resurgence
Lost? Site Map!
 
 
     home | articles | moose | programming | articles >> blog

 Blog
 
 Blog Topics
 
 Archive
 

15 Most Recent [RSS]

 Uli's source code is on Github!
2010-03-05 @986
 
 Downtime on Friday
2010-03-04 @025
 
 Hacking the Press - A point for usability in press kits
2010-02-18 @404
 
 So. Git.
2010-02-15 @498
 
 Helpful Xcode User Scripts
2010-02-14 @485
 
 CocoaHeads München: Xcode Tiefergelegt Folien
2010-02-10 @995
 
 Debugging Assembler on Mac OS X
2010-02-07 @600
 
 The iPad
2010-01-29 @417
 
 Double click is a shortcut
2010-01-16 @621
 
 Removing transparency from NSImage
2010-01-16 @581
 
 Garbage collection, work of the devil?
2009-12-20 @881
 
 Let's talk about Coding Style
2009-12-15 @459
 
 The iPhone Reality Show
2009-12-13 @589
 
 The Sinus Curve of Life
2009-11-26 @430
 
 AppleScripting Cocoa a little
2009-11-26 @003
 

More...

Intel assembler on Mac OS X

I've always wanted to learn another assembler, and with one of my colleagues being a real assembler guru, and the Intel reference books on my bookshelf, and the Intel switch just behind us, I thought this would be a good opportunity to finally get going with x86 assembler.

Now, assembler programming under Mac OS X isn't quite as well documented as one would wish. There's no tutorial that I could find (lots of tutorials for Linux and Windows, but none for Mac OS X yet). This won't be one either, but rather this is a blog posting of me sharing what I found out about assembler on OS X, and is probably only useful to someone who already knows some assembler, but just doesn't know Intel on Mac OS X. My main approach is to compile C source code into assembler source files using GCC. Then I can look at that code and find out what assembler instructions correspond to what C command. If all of this turns out to be correct and I should happen to have loads of time on my hand, I may still go out there and turn this into a decent tutorial.

The basics are pretty simple

	.text						# start of code indicator.
.globl _main					# make the main function visible to the outside.
_main:							# actually label this spot as the start of our main function.
	pushl	%ebp				# save the base pointer to the stack.
	movl	%esp, %ebp			# put the previous stack pointer into the base pointer.
	subl	$8, %esp			# Balance the stack onto a 16-byte boundary.
	movl	$0, %eax			# Stuff 0 into EAX, which is where result values go.
	leave						# leave cleans up base and stack pointers again.
	ret							# returns to whoever called us.

Now, the underscore in front of "main" is a convention in C, so just accept it. When you enter the _main function, the return address (i.e. the instruction where the program will continue after the function has finished, aka "back pointer") has already been pushed on the stack, taking up 4 bytes. We also save the base pointer (the point where our caller can find its parameters on the stack) to the stack, and set it to the current stack pointer (which is where our parameters are). That takes another 4 bytes, so we have 8 bytes now. Since the stack should be aligned on 16 bytes before you can make a call to another function, we subtract another 8 from the stack pointer, which pads out the stack (we could also just do two "pushl $0" for the same effect). If we used any local variables, we would use this opportunity to subtract more for them.

Now comes the actual body of our function. What we do is simply return 0. This is done by stuffing 0 in the eax register.

Finally, we have the tail end of our function, which calls leave (which cleans up by restoring our caller's base pointer and stack pointer) and then call ret, which pops the return address off the stack and continues execution there.

Calling a local function

Calling a function is fairly simple, as long as it's a local one right in the same file as ours. In that case, what you do is you first declare that function:

	.text
.globl _doSomething				# Our doSomething function.
_doSomething:
	pushl	%ebp
	movl	%esp, %ebp
	subl	$8, %esp
	nop							# does nothing.
	leave
	ret
.globl _main
_main:
	pushl	%ebp
	movl	%esp, %ebp
	subl	$24, %esp			# 8 to align, 16 for our 4-byte parameter and padding.
	movl	$3, (%esp)			# write our parameter at the end of the stack (i.e. padding goes first).
	call	_doSomething		# call doSomething.
	movl	$0, %eax
	leave
	ret

"nop" is a do-nothing instruction I just inserted here to show where doSomething's code would go. That's pretty easy. You just write the function, push the parameters on the stack and use call to jump to the function, and that will take care of pushing the return address and all that. The only tricky thing is passing the parameters. You have to pad first, and then push (or mov, in our case) the parameters in reverse order (i.e. #1 is at the bottom of the stack, #2 above it etc.). That's because otherwise the function being called would have to skip the padding. Well, could be worse.

Accessing parameters

To acess any parameters, you address relative to the base pointer. The value immediately at the base pointer is generally your caller's base pointer and the return address, so you need to add 4 + 4 = 8 bytes. Yes, since the stack starts at the end of memory and grows towards the beginning, and you subtract from the stack pointer to make it larger, you need to add to the stack pointer to find something on the stack. The same applies to our base pointer, of course:

	movl	12(%ebp), %eax	# get parameter 2 at offset 4 + 4 + 4
	addl	8(%ebp), %eax	# get parameter 1 at offset 4 + 4

Would store your second parameter in eax and then add the first parameter to it, leaving the result in eax, where it's ready for use as a return value. Note the ##(foo) syntax, which adds the number ## to the pointer foo. This is register-relative addressing.

An added benefit of this is that you can actually pass more parameters to a function than it knows to handle, and it will just ignore the rest.

Fetching data

To access data (e.g. strings), it gets trickier. You declare data like the following:

	.cstring
myHelloWorld:
	.ascii "Hello World!\0"
	.text
.globl _main
_main:
. . .
So, you add a .cstring section at the top of the function, and in that you declare a label and use the .ascii keyword to actually stash your string there. So far, so good, there's only one problem:

All data manipulation is done using absolute addresses. But we don't know at what position in memory our program will be loaded. Labels aren't absolute addresses, they get compiled into relative offsets from the start of our code. So, how do we find out at which absolute address our string myHelloWorld is? Well, the trick MachO uses is that it knows that our program will be loaded as one huge chunk. So, we know that the distance between any of our instructions in the code will always stay at the same distance to our string.

So, if we could only get the address of one instruction in our code that has a label, we could calculate the absolute address of our string from that. Now, look above, at our function call code. Notice anything? Our return address is an absolute pointer to the next instruction after a function call. So, all we need to do to get our address is call a function. When you assemble C source code, they call this helper function ___i686.get_pc_thunk.bx, which is quite a mouthful. Let's just call it _nextInstructionAddress:

. . .
	call	_nextInstructionAddress
myAnchorPoint:
. . .
That's what we call somewhere at the start of our code to find our own address. Note how I cleverly already added a label myAnchorPoint, which labels the instruction whose address we'll get. Then we somewhere (e.g. at the bottom) define that function:
. . .
_nextInstructionAddress:
	movl	(%esp), %ebx
	ret

We don't even bother aligning the stack or changing and restoring the base pointer. This simply peeks at the last item on the stack (the return address) and stashes that in register ebx. Then it returns (and obviously doesn't call leave because we pushed no base pointer that it could restore).

Once we have this address in ebx, we can do the following to get our string's address into a register, and from there onto the stack:

. . .
	leal	myHelloWorld-myAnchorPoint(%ebx), %eax
	movl	%eax, (%esp)
. . .

LEA means "Load Effective Address", i.e. take an address and stash it into a register. myHelloWorld-myAnchorPoint calculates the difference between our two labels, and thus tells us how far myHelloWorld is from myAnchorPoint. Since myHelloWorld is probably at the start of the program, e.g. at address 3 maybe, and myAnchorPoint further down, say at address 20, what we get is a negative value, e.g. -17. And xxx(%ebx) is how you tell the assembler that you want to add an offset to a register to get a memory address. ebx contains the address of myAnchorPoint, so what this does is subtract 17 from myAnchorPoint's absolute address, giving us the absolute address of myHelloWorld! Whooo! And this mess is called "position-independent code".

Now, our call to LEAL loads a "Long" (which is 32 bits, i.e. the size of a pointer on a 32-bit CPU) and stashes it into register eax. And the movl call moves that long from our register into the last item on the stack, ready for use as a parameter to a function.

Calling a system function

Now, it'd be really nice if we could printf() or something, right? Well, trouble is, we don't know the address of printf(). But this time it's actually easy. We add a new section at the bottom of our code:

. . .
	.section __IMPORT,__jump_table,symbol_stubs,self_modifying_code+pure_instructions,5
_printf_stub:
	.indirect_symbol _printf
	hlt ; hlt ; hlt ; hlt ; hlt
_getchar_stub:
	.indirect_symbol _getchar
	hlt ; hlt ; hlt ; hlt ; hlt

This is a new section named __IMPORT,__jump_table. It has the type symbols_stubs and the attributes self_modifying_code and pure_instructions. 5 is the size of the stub, and intentionally is the same as the number of hlt statements below.

This section is special, because when our code is loaded, the linker will look at it. It will see that there is an .indirect_symbol directive for a function named "printf", and will look up that function. Then it will replace the five hlt instructions, each of which is one byte in size, with an instruction to jump to that address (hence the self_modifying_code). We also added a label for each indirect symbol, which we name the same as the symbol, just with "_stub" appended.

So, to call printf, all you have to do now is push the string on the stack and then

	call	_printf_stub

Which will jump to _printf_stub and immediately continue to printf itself. And just to show you that you can have several such imported symbols, I've also included a stub for getchar. Now note that the system usually doesn't name these symbols "_foo_stub", but rather "L_foo$stub" (yes, a label name can contain dollar signs. You can even put the label in quotes and have spaces in it...). Same difference.

Okay, so that's how much I've guessed my way through it so far. Comments? Corrections?

PS - Thanks to John Kohr, Alexandre Colucci, Jonas Maebe, Eric Albert and Jordan Krushen, all of which helped me figure this out one way or the other. Thanks, guys!

Update: Added mention of how to actually access parameters.

Reader Comments: (RSS Feed)
rentzsch writes:
Excellent write-up: very approachable. Well done!
Jan Patera writes:
Hi, Do you know if there is a way to make the XCode disassembly feature to use the MacroAssembler syntax, commonly used on Wintel? E.g. "mov ebp,esp" instead of "movl %esp, %ebp" -- Jan
Uli Kusterer replies:
@Wolf: Thanks, Jon! :-) Coming from you, that means a lot to me. @Jan: Jordan pointed me at http://www.niksula.hut.fi/~mtiihone/intel2gas/, which is a converter to convert Intel syntax (what a lot of tutorials on the web use) to AT&T syntax (what GCC uses and hence I use in this tutorial). Its features also mention it has preliminary support for the reverse, AT&T to Intel. Maybe that can help you? Let me know whether it works for you.
bxd writes:
@Jan: Xcode->Preferences->Debugging->Disassembly Style from "AT&T" to "Intel". If you're in bare gdb for some reason, it's "set disassembly-flavor intel" vs "att".
Blake C. writes:
"We also save the *back* pointer (the point where our caller can find its parameters on the stack) to the stack, and set it to the current stack pointer..." "Yes, since the stack starts at the end of memory and grows towards the beginning, and you subtract from the stack pointer to make it larger, you need to add to the *stack* pointer to find something in it." Should *back* and *stack* above instead say 'base'? Thanks for the writeup :)
Uli Kusterer replies:
Blake, you're right, that should have been "base" pointer in the first case. The second case is meant to be more general, but I've clarified it a bit since, after all, the example code actually messes with the base pointer. Thanks for catching those!
Marc Haisenko writes:
Good writeup ! Very nice ! This is exactly how it works on Linux as well. The "Calling a system function" is new to me and very interesting, have to try that on Linux as well. There's only one thing left which I want to figure out: how to make a system call. On Linux, you put the arguments in %eax, %ebx, ... (%eax contains the number of the system call). Then you call interrupt 0x80. E.g. to print a string on stdout, you do: movl $4, %eax # syscall: write movl $1, %ebx # file descriptor 1 (stdout) movl $string, %ecx # address of the string movl $len, %edx # length of the string int $0x80 On FreeBSD, it seems that instead of passing the arguments via registers they get passed via the user stack and call a different interrupt. I wonder how it's done in Mac OS X.
Uli Kusterer replies:
Haven't played with that yet. It may be the same as BSD on OS X, or it may be different (because BSD doesn't use Mach)... I'd be interested in hearing what you find out.
Pete H writes:
To answer Marc's question, on Mac OS you use the same 0x80 interrupt. For example, to write a 42 byte string labeled 'output' to stdout, the code would look something like: pushl $42 #l ength lea output, %eax pushl %eax # string address pushl $1 # file descriptor number mov $4,%eax # system call number push %eax int $0x80 # make the system call You can see mach/i386/syscall_sw.h for the different software interrupts available, and sys/syscall.h for the different system call numbers.
David Liontooth writes:
A bunch of us are trying to compile transcode (http://www.transcoding.org) on OSX, both PPC and MacIntel. We're finding that inline assembly is not sufficiently supported for the code to compile, so we were hoping you could give us a pointer (this is being discussed on the transcode and macports mailing lists). Transcode 1.0.3 compiles on PPC but not MacIntel; transcode 1.1.0 causes problems with the asm code on both. A couple of guys working on Pre Make Kit (pmk, see http://sourceforge.net/tracker/index.php?func=detail&aid=1569713&group_id=94395&atid=607694) really push this and end up frustrated, concluding "Projects that mix assembly and C code must all fail on MacOS X Intel machines." They also say mplayer avoids the problem by using C embedded assembly. Could you comment on this whole situation? What's the most likely successful way out? Cheers, Dave
Uli Kusterer replies:
@David: Honestly, I don't know. I've only just started to learn this stuff. I've heard in various places that assembler isn't even portable between different assembers (i.e. between two manufacturers' "assembler compilers" for the same CPU), but there has to be a way. Whether there are problems with embedded or in-line assembly? No idea, I didn't even know the distinction existed.
Amade writes:
I modified simple helloworld program like it is told here and it is working pretty good even managed to add some other .cstring and print it. But there is some part of code not mentioned here without which it doesn't compile: .section __TEXT,__textcoal_nt,coalesced,pure_instructions .weak_definition _nextInstructionAddress .private_extern _nextInstructionAddress could you tell something about it?
adil writes:
Excellent work ! keep it going and you will be the first one to write a tutorial on the topic :)
ema writes:
So, basically every access to a global variable involves a function call to find out our current address. And every call to a shared library function results in another indirect call. This is disturbingly inefficient.
Uli Kusterer replies:
@ema: Not at all. You only have to get the anchor's address once. Then you can calculate any subsequent addresses using the value in EBX. Not to mention that a CALL instruction isn't the same as a function call in a higher-level language. It does not do all the stack set-up and tear-down that a real function call in a high-level language would do, nor does it pass any parameters. So, as long as you don't clobber EBX (or save it somewhere and restore it later), this is completely efficient. Would absolute addressing be faster? Sure, but it's only a few instructions, and the benefit of being able to dynamically load code at runtime without having to worry about address space collisions far outweighs the downside of those three additional instructions. Stop thinking like a high-level programmer, already, this is machine language. ;-) A typical function call takes dozens of instructions.
pete writes:
Hey Uli Thanks for the guide - nice and simple! Tschuess!
André writes:
I just wanted to point out that there is an Assembly reference included with the Xcode documentation set, namely a Introduction to Mac OS X Assembler Guide: http://developer.apple.com/DOCUMENTATION/DeveloperTools/Reference/Assembler/ASMIntroduction/chapter_1_section_1.html#//apple_ref/doc/uid/TP30000851-CH211-DontLinkElementID_14 Also the documentation for as, although sometimes cryptic: http://sourceware.org/binutils/docs/as/index.html And finally, there is the ".intel_syntax noprefix" directive which enables you to use Intel ASM syntax within the GNU Assembler. noprefix means that you may leave out the %-sign in front of register names, to make it fully compatible with the Intel syntax (earlier versions of GAS also had Intel syntax support but still required the usage of % to indentify register names). To switch back to AT&T syntax use ".att_syntax prefix"
André writes:
Sorry, forgot to add there is also some info on Mac OS X IA-32 Assembly in the following Apple docs: IA-32 Function Calling Conventions: http://developer.apple.com/documentation/DeveloperTools/Conceptual/LowLevelABI/Articles/IA32.html
Tony writes:
Hi, I am using the gcc translated code as a guidance as well. But i am having trouble with the making calls to Pthread library functions. I kept getting 'illegal instruction' at run time. And the worst is that the debuggers (like gdb) can not reveal at which line actually the program crashed. The file attached below is the program I experiment with the C library function calls. The program works if I remove the call to '_pthread_mutex_init'. The function declaration looks like this: _pthread_mutex_init(pointer to mutex object, NULL) I am very very confused by Mac OSX. Can you help me find the error here? Thanks. ----------------------------------------------------------------------- .cstring .align 2 termination_msg: .ascii "Program has terminated.\0" .text .globl _main _main: pushl %ebp movl %esp, %ebp #pushl %esi pushl %ebx call ___i686.get_pc_thunk.bx "L12$pb": # indirect addressing for Mac OSX leal termination_msg-"L12$pb"(%ebx) , %eax pushl %eax call L_printf$stub addl $4 , %esp pushl $0 # NULL=$0 leal L_mtx$non_lazy_ptr-"L12$pb"(%ebx) , %eax movl (%eax), %eax # addr(mtx) is in eax now pushl %eax call L_pthread_mutex_init$stub addl $8 , %esp #pushl $0 #call exit popl %ebx #popl %esi movl %ebp, %esp popl %ebp ret .comm _mtx , 44 .section __IMPORT,__pointers,non_lazy_symbol_pointers L_mtx$non_lazy_ptr: .indirect_symbol _mtx .long 0 .section __IMPORT,__jump_table,symbol_stubs,self_modifying_code+pure_instructions,5 L_pthread_mutex_init$stub: .indirect_symbol _pthread_mutex_init hlt ; hlt ; hlt ; hlt ; hlt L_printf$stub: .indirect_symbol _printf hlt ; hlt ; hlt ; hlt ; hlt #-------------------------------------------------------------- .subsections_via_symbols .section __TEXT,__textcoal_nt,coalesced,pure_instructions .weak_definition ___i686.get_pc_thunk.bx .private_extern ___i686.get_pc_thunk.bx ___i686.get_pc_thunk.bx: movl (%esp), %ebx ret
Ted Henry writes:
> the underscore in front of "main" is a convention in C, so just accept it Should "C" be "assembly"?
alex writes:
So here's what I keep getting... does anyone have any ideas? $ cat asmtest.s .text .globl _main _main: pushl %ebp movl %esp, %ebp subl $8, %esp movl $0, %eax leave ret $ as -o asmtest asmtest.s $ chmod +x asmtest $ ./asmtest -bash: ./asmtest: Malformed Mach-o file
jan writes:
super seite! essentielle resource!
Cory B writes:
Here's a nice Hello World program in OS X assembly language for the GNU assembler. Much of it is the same as FreeBSD assembly language. Notice I didn't calculate the absolute address of msg. The program works anyway. Assemble and link like this: $ as -o hello.o hello.S $ ld -e _start -o hello hello.o - .data # data section, where we declare our constants msg: .ascii "Hello, world!\n" # our message len: .long . - msg # length of our message... the . means the # current position pointer... .text # text section, where we execute instructions .globl _start # make _start visible to the linker _syscall: # interrupt 80h, "calls" the kernel to perform a syscall int $0x80 ret _start: # our entry point pushl len # push the length onto the stack pushl $msg # push the (relative) address of msg onto the stack pushl $1 # push 1 onto the stack (the file descriptor for stdout) movl $4, %eax # load 4 into eax (the number for the write() syscall) call _syscall # tell the kernel to write msg to stdout addl $12, %esp # clean the excess off the register pushl len # push the length onto the stack, to return it from exit() mov $1, %eax # load 1 into eax (the number for the exit() syscall) call _syscall # tell the kernel to exit our program -- EOF -- Here's a version for nasm for comparison. It's a nearly one-to-one correspondence, so I didn't bother adding comments (assemble with `nasm -f macho hello.asm`, link the same as before): section .data msg db "Hello, world!",0xa len equ $ - msg section .text global _start _syscall: int 0x80 ret _start: push dword len push dword msg push dword 1 mov eax, 4 call _syscall add esp, 12 push dword len mov eax, 1 call _syscall
David writes:
Hi, I just started playing around with assembler under OS X too and wrote my first little proggy. It basicly does nothing but calls the system call 1 which ends the program. Had some difficulties to get it up and running until I realized I couldn't simply use the -arch x86_64 option for as and ld. So now I guess that I have a 32 bit program where I wanted a 64 bit program. Does anyone here know what is needed to create a 64 bit program?
hdx writes:
Hi, I'm trying to run my first assembly program in my new iMac core 2 duo, but I'm getting the following error: ld: could not find entry point "_start" (perhaps missing crt1.o) for inferred architecture i386 The code: # My first Assembly program .data HelloWorldString: .ascii "Hello World\n" .text .globl _start _start: # Load all the arguments for write () movl $4, %eax movl $1, %ebx movl $HelloWorldString, %ecx movl $12, %edx int $0x80 # Need to exit the program movl $1, %eax movl $0, %ebx int $0x80 The command: ld -e _start -o Hello Hello.o Can anybody help?
me writes:
@hdx you have to link it first. use: as file.s -o file.o ld file.o -o file
Mo writes:
Linux will (usually) only use INT 0x80 as a fallback system-call mechanism. If I remember rightly, there's a page present in every process (I can't honestly recall whether it's managed by the libc or by the kernel-I think the kernel, though) which contains stubs for the system calls - on modern systems, this is SYSENTER instructions, whereas on old ones it'll be INT 0x80 (which is a lot slower than SYSENTER). This way, applications and libraries don't have to care what the system call calling convention (phew) is, which is important when it's changed a few times over the lifetime of the OS and backwards-compatibility needs to be preserved. Somebody with more knowledge than I would have to chime in abut whether Mac OS X/Darwin exhibits any kind of comparable behaviour :)
Mo writes:
(In relation to my previous comment) - Apple deliberately doesn't publicise the system call mechanism, save for the Darwin sources, and similarly doesn't support static linking. I'm not sure if they've said as much, but presumably this is so that they can change the mechanism in future OS release (or even a minor update) without having to recompile anything except libSystem. @Ted Henry: It's actually an ABI convention. Mac OS X uses leading underscores, as do (most) DOS compilers and Win32. No idea if Win64 does, but I'd guess so. Quite a few embedded systems do too. I don't know what the rationale was - I've a feeling that the underscore was (at least once upon a time) used to denote global symbols.
al writes:
@hdx if you're having compile problems, the syntax is gcc youasmfile.as -o executablename I made the assumption to use "as" and "ld" but those turned up with bus errors and linking problems and such.
leffe writes:
I tried to asm a hello world example i found which was written for freeBSD on my intel mac with nasm knowing the similarities between osx an bsd syscalls. It worked, but only if i produced a mach-o obj file (obviously!), or else i would get the: "ld: could not find entry point "_start" (perhaps missing crt1.o) for inferred architecture i386" error. when the correct obj file is produced: "ld -e _start -o foo foo.o" should work
André writes:
Just wanted to update the URLs to the official Assembler and Calling Convention docs by Apple. Seems they both got a very recent update too if you look at the update dates: PPC/IA32 Assembler Intro and Reference: http://developer.apple.com/mac/library/documentation/DeveloperTools/Reference/Assembler LowLevel ABI Function Call Guide: http://developer.apple.com/mac/library/documentation/DeveloperTools/Conceptual/LowLevelABI As usual, just readin the docs provided by Apple can get you already quite far :) Cheers! André
Comment on this article:
Name:
E-Mail: (not shown, hashed for Gravatar)
Web Site URL: (optional)
Comment: (plain text only)
Please Enter the following word:
Or E-Mail Uli privately.

 
Created: 2006-11-08 @761 Last change: 2010-03-12 @379 | Home | Admin | Edit
© Copyright 2003-2010 by M. Uli Kusterer, all rights reserved.