The Mac Hacker's Handbook

By Charles Miller

John Wiley & Sons

Copyright © 2009 Charles Miller
All right reserved.

ISBN: 978-0-470-39536-3

Chapter One

Mac OS X Architecture

This chapter begins by addressing many of the basics of a Mac OS X system. This includes the general architecture and the tools necessary to deal with the architecture. It then addresses some of the security improvements that come with version 10.5 "Leopard", the most recent version of Mac OS X. Many of these security topics will be discussed in great detail throughout this book.


Before we dive into the tools, techniques, and security of Mac OS X, we need to start by discussing how it is put together. To understand the details of Leopard, you need first to understand how it is built, from the ground up. As depicted in Figure 1-1, Mac OS X is built as a series of layers, including the XNU kernel and the Darwin operating system at the bottom, and the Aqua interface and graphical applications on the top. The important components will be discussed in the following sections.


The heart of Mac OS X is the XNU kernel. XNU is basically composed of a Mach core (covered in the next section) with supplementary features provided by Berkeley Software Distribution (BSD). Additionally, XNU is responsible for providing an environment for kernel drivers called the I/O Kit. We'll talk about each of these in more detail in upcoming sections. XNU is a Darwin package, so all of the source code is freely available. Therefore, it is completely possible to install the same kernel used by Mac OS X on any machine with supported hardware; however, as Figure 1-1 illustrates, there is much more to the user experience than just the kernel.

From a security researcher's perspective, Mac OS X feels just like a FreeBSD box with a pretty windowing system and a large number of custom applications. For the most part, applications written for BSD will compile and run without modification on Mac OS X. All the tools you are accustomed to using in BSD are available in Mac OS X. Nevertheless, the fact that the XNU kernel contains all the Mach code means that some day, when you have to dig deeper, you'll find many differences that may cause you problems and some you may be able to leverage for your own purposes. We'll discuss some of these important differences briefly; for more detailed coverage of these topics, see Mac OS X Internals: A Systems Approach (Addison-Wesley, 2006).


Mach, developed at Carnegie Mellon University by Rick Rashid and Avie Tevanian, originated as a UNIX-compatible operating system back in 1984. One of its primary design goals was to be a microkernel; that is, to minimize the amount of code running in the kernel and allow many typical kernel functions, such as file system, networking, and I/O, to run as user-level Mach tasks. In earlier Mach-based UNIX systems, the UNIX layer ran as a server in a separate task. However, in Mac OS X, Mach and the BSD code run in the same address space.

In XNU, Mach is responsible for many of the low-level operations you expect from a kernel, such as processor scheduling and multitasking and virtual-memory management.


The kernel also involves a large chunk of code derived from the FreeBSD code base. As mentioned earlier, this code runs as part of the kernel along with Mach and uses the same address space. The FreeBSD code within XNU may differ significantly from the original FreeBSD code, as changes had to be made for it to coexist with Mach. FreeBSD provides many of the remaining operations the kernel needs, including

* Processes * Signals * Basic security, such as users and groups * System call infrastructure * TCP/IP stack and sockets * Firewall and packet filtering

To get an idea of just how complicated the interaction between these two sets of code can be, consider the idea of the fundamental executing unit. In BSD the fundamental unit is the process. In Mach it is a Mach thread. The disparity is settled by each BSD-style process being associated with a Mach task consisting of exactly one Mach thread. When the BSD fork() system call is made, the BSD code in the kernel uses Mach calls to create a task and thread structure. Also, it is important to note that both the Mach and BSD layers have different security models. The Mach security model is based on port rights, and the BSD model is based on process ownership. Disparities between these two models have resulted in a number of local privilege-escalation vulnerabilities. Additionally, besides typical system cells, there are Mach traps that allow user-space programs to communicate with the kernel.

I/O Kit

I/O Kit is the open-source, object-oriented, device-driver framework in the XNU kernel and is responsible for the addition and management of dynamically loaded device drivers. These drivers allow for modular code to be added to the kernel dynamically for use with different hardware, for example. The available drivers are usually stored in the /System/Library/Extensions/ directory or a subdirectory. The command kextstat will list all the currently loaded drivers,

$ kextstat Index Refs Address Size Wired Name (Version) 1 1 0x0 0x0 0x0 (9.3.0) 2 55 0x0 0x0 0x0 (9.3.0) 3 3 0x0 0x0 0x0 (9.3.0) 4 74 0x0 0x0 0x0 (9.3.0) 5 79 0x0 0x0 0x0 (9.3.0) 6 72 0x0 0x0 0x0 (9.3.0) 7 39 0x0 0x0 0x0 (9.3.0) 8 1 0x0 0x0 0x0 (9.3.0) 9 1 0x0 0x0 0x0 (9.3.0) 10 1 0x0 0x0 0x0 (9.3.0) 11 1 0x0 0x0 0x0 (9.3.0) 12 31 0x0 0x0 0x0 (7.9.9) 13 1 0x0 0x0 0x0 (7.9.9) 14 1 0x0 0x0 0x0 (7.9.9) 15 1 0x0 0x0 0x0 (7.9.9) 16 1 0x0 0x0 0x0 (7.9.9) 17 17 0x2e2bc000 0x10000 0xf000 (2.4.1) <7 6 5 4> 18 10 0x2e2d2000 0x4000 0x3000 (1.2.0) <12> 19 3 0x2e321000 0x3d000 0x3c000 (1.2.1) <18 17 12 7 5 4> ...

Many of the entries in this list say they are loaded at address zero. This just means they are part of the kernel proper and aren't really device drivers-i.e., they cannot be unloaded. The first actual driver is number 17.

Besides kextstat, there are other functions you'll need to know for loading and unloading these drivers. Suppose you wanted to find and load the driver associated with the MS-DOS file system. First you can use the kextfind tool to find the correct driver.

$ kextfind -bundle-id -substring `msdos' /System/Library/Extensions/msdosfs.kext

Now that you know the name of the kext bundle to load, you can load it into the running kernel.

$ sudo kextload /System/Library/Extensions/msdosfs.kext kextload: /System/Library/Extensions/msdosfs.kext loaded successfully

It seemed to load properly. You can verify this and see where it was loaded.

$ kextstat | grep msdos 126 0 0x346d5000 0xc000 0xb000 (1.5.2) <7 6 5 2>

It is the 126th driver currently loaded. There are zero references to it (not surprising, since it wasn't loaded before we loaded it). It has been loaded at address 0x346d5000 and has size 0xc000. This driver occupies 0xb000 wired bytes of kernel memory. Next it lists the driver's name and version. It also lists the index of other kernel extensions that this driver refers to-in this case, looking at the full listing of kextstat, we see it refers to the "unsupported" mach, libkern, and bsd drivers. Finally, we can unload the driver.

$ sudo kextunload kextunload: unload kext /System/Library/Extensions/msdosfs.kext succeeded

Darwin and Friends

A kernel without applications isn't very useful. That is where Darwin comes in. Darwin is the non-Aqua, open-source core of Mac OS X. Basically it is all the parts of Mac OS X for which the source code is available. The code is made available in the form of a package that is easy to install. There are hundreds of available Darwin packages, such as X11, GCC, and other GNU tools. Darwin provides many of the applications you may already use in BSD or Linux for Mac OS X. Apple has spent significant time integrating these packages into their operating system so that everything behaves nicely and has a consistent look and feel when possible.

On the other hand, many familiar pieces of Mac OS X are not open source. The main missing piece to someone running just the Darwin code will be Aqua, the Mac OS X windowing and graphical-interface environment. Additionally, most of the common high-level applications, such as Safari, Mail, QuickTime, iChat, etc., are not open source (although some of their components are open source). Interestingly, these closed-source applications often rely on open-source software, for example, Safari relies on the WebKit project for HTML and JavaScript rendering. For perhaps this reason, you also typically have many more symbols in these applications when debugging than you would in a Windows environment.

Tools of the Trade

Many of the standard Linux/BSD tools work on Mac OS X, but not all of them. If you haven't already, it is important to install the Xcode package, which contains the system compiler (gcc) as well as many other tools, like the GNU debugger gdb. One of the most powerful tools that comes on Mac OS X is the object file displaying tool (otool). This tool fills the role of ldd, nm, objdump, and similar tools from Linux. For example, using otool you can use the -L option to get a list of the dynamically linked libraries needed by a binary.

$ otool -L /bin/ls /bin/ls: /usr/lib/libncurses.5.4.dylib (compatibility version 5.4.0, current version 5.4.0) /usr/lib/libgcc_s.1.dylib (compatibility version 1.0.0, current version 1.0.0) /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 111.0.0) To get a disassembly listing, you can use the -tv option. $ otool -tv /bin/ps /bin/ps: (__TEXT,__text) section 00001bd0 pushl $0x00 00001bd2 movl %esp,%ebp 00001bd4 andl $0xf0,%esp 00001bd7 subl $0x10,%esp ...

You'll see many references to other uses for otool throughout this book.


You must be able to trace execution flow for processes. Before Leopard, this was the job of the ktrace command-line application. ktrace allows kernel trace logging for the specified process or command. For example, tracing the system calls of the ls command can be accomplished with

$ ktrace -tc ls

This will create a file called ktrace.out. To read this file, run the kdump command.

$ kdump 918 ktrace RET ktrace 0 918 ktrace CALL execve(0xbffff73c,0xbffffd14,0xbffffd1c) 918 ls RET execve 0 918 ls CALL issetugid 918 ls RET issetugid 0 918 ls CALL __sysctl(0xbffff7cc,0x2,0xbffff7d4,0xbffff7c8,0x8fe45a90,0xa) 918 ls RET __sysctl 0 918 ls CALL __sysctl(0xbffff7d4,0x2,0x8fe599bc,0xbffff878,0,0) 918 ls RET __sysctl 0 918 ls CALL __sysctl(0xbffff7cc,0x2,0xbffff7d4,0xbffff7c8,0x8fe45abc,0xd) 918 ls RET __sysctl 0 918 ls CALL __sysctl(0xbffff7d4,0x2,0x8fe599b8,0xbffff878,0,0) 918 ls RET __sysctl 0 ...

For more information, see the man page for ktrace.

In Leopard, ktrace is replaced by DTrace. DTrace is a kernel-level tracing mechanism. Throughout the kernel (and in some frameworks and applications) are special DTrace probes that can be activated. Instead of being an application with some command-line arguments, DTrace has an entire language, called D, to control its actions. DTrace is covered in detail in Chapter 4, "Tracing and Debugging," but we present a quick example here as an appetizer.

$ sudo dtrace -n `syscall:::entry {@[execname] = count()}' dtrace: description `syscall:::entry ` matched 427 probes ^C fseventsd 3 socketfilterfw 3 mysqld 6 httpd 8 pvsnatd 8 configd 11 DirectoryServic 14 Terminal 17 ntpd 21 WindowServer 27 mds 33 dtrace 38 llipd 60 SystemUIServer 69 launchd 182 nmblookup 288 smbclient 386 Finder 5232 Mail 5352

Here, this one line of D within the DTrace command keeps track of the number of system calls made by processes until the user hits Ctrl+C. The entire functionality of ktrace can be replicated with DTrace in just a few lines of D. Being able to peer inside processes can be very useful when bug hunting or reverse-engineering, but there will be more on those topics later in the book.


Objective-C is the programming language and runtime for the Cocoa API used extensively by most applications within Mac OS X. It is a superset of the C programming language, meaning that any C program will compile with an Objective-C compiler. The use of Objective-C has implications when applications are being reverse-engineered and exploited. More time will be spent on these topics in the corresponding chapters.

One of the most distinctive features of Objective-C is the way object-oriented programming is handled. Unlike in standard C++, in Objective-C, class methods are not called directly. Rather, they are sent a message. This architecture allows for dynamic binding; i.e., the selection of method implementation occurs at runtime, not at compile time. When a message is sent, a runtime function looks at the receiver and the method name in the message. It identifies the receiver's implementation of the method by the name and executes that method.

The following small example shows the syntactic differences between C++ and Objective-C from a source-code perspective.

#include @interface Integer : Object { int integer; } - (int) integer; - (id) integer: (int) _integer; @end

Here an interface is defined for the class Integer. An interface serves the role of a declaration. The hyphen character indicates the class's methods.

#import "Integer.h" @implementation Integer - (int) integer { return integer; } - (id) integer: (int) _integer { integer = _integer; } @end


Excerpted from The Mac Hacker's Handbook by Charles Miller Copyright © 2009 by Charles Miller. Excerpted by permission.
All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.