0x01 launchd

launchd是第一个被内核启动的用户态进程，负责直接或间接的启动系统中的其他进程。它是用户模式里所有进程的父进程，同时也将负责两种后台作业：守护程序和代理程序。

守护程序：后台服务，通常和用户没有交互。比如push通知、外接设备插入的处理和XPC等。

代理程序：可以和用户交互，比如Mac的Finder或iOS的SpringBoard就是其中之一，即广义上我们理解的桌面。

launchd是如何被创建的，得先看下下面这张XNU启动流程图

xnu启动示意图

start(iOS)：初始化MSR、物理页映射、安装中断处理函数
arm_init(iOS)：初始化平台，为启动内核做准备
machine_startup：解析命令行参数和调试参数
kernel_bootstrap：安装和初始化mach内核的子系统，包括：进程间通信、时钟、访问策略、进程和线程调度。
kernel_bootstrap_thread：创建idle线程，初始化iokit设备驱动框架，初始化应用程序和dyld运行所需的共享模块。如果内核开启了mac(强制访问控制)策略，则会进行mac的初始化，以确保系统的安全。
bsd_init：内核部分剩余的事情都由其来做，初始化各个子系统。网络、文件系统、管道、内存cache、线程、进程、同步对象、权限策略等等。一切完成后，会执行/sbin/launchd来创建一个launchd。

我们看下源码的初始化过程，launchd是怎么被启动起来的

void bsd_init(void) {
	......
    bsd_utaskbootstrap();
    ......
}

void bsd_utaskbootstrap(void) {
	thread_t thread;
	struct uthread *ut;

	// 从内核进程克隆引导进程，但不从内核继承任何任务特性或内存
	thread = cloneproc(TASK_NULL, COALITION_NULL, kernproc, FALSE, TRUE);

	/* Hold the reference as it will be dropped during shutdown */
	initproc = proc_find(1);				

	/*
	 * Since we aren't going back out the normal way to our parent,
	 * we have to drop the transition locks explicitly.
	 */
	proc_signalend(initproc, 0);
	proc_transend(initproc, 0);

	ut = (struct uthread *)get_bsdthread_info(thread);
	ut->uu_sigmask = 0;
    // 为了真正地创建出任务，对创建出的线程调用这个函数
    // 执行后产生一个异步系统陷阱(AST)，Mach的AST异步处理程序会特别处理这个情况，即调用bsd_ast()
	act_set_astbsd(thread);
	task_clear_return_wait(get_threadtask(thread));
}

void bsd_ast(thread_t thread) {
    ......
	if (!bsd_init_done) {
		bsd_init_done = 1;
		bsdinit_task();
	}
    ......
}

void bsdinit_task(void)
{
	proc_t p = current_proc();
	struct uthread *ut;
	thread_t thread;
	
    // 将这个从内核态克隆到用户态的第一个线程的名字设置为init
	process_name("init", p);
	// 内部创建了一个Mach内核线程处理ux_handler，而ux_handler设置了一个消息循环用于监听异常，如果接收到异常，将异常转换为UNIX信号，并投递到出错线程。
	ux_handler_init();

	thread = current_thread();
    // ux_handler_init()返回时，ux_handler已经在另一个线程中执行了，并注册好了ux_exception_port。
    // 这个函数将所有的Mach异常消息都重定向到ux_exception_port
    // 由于所有程序都是launchld后代，所以都会继承这个异常端口
	(void) host_set_exception_ports(host_priv_self(),
					EXC_MASK_ALL & ~(EXC_MASK_RPC_ALERT),//pilotfish (shark) needs this port
					(mach_port_t) ux_exception_port,
					EXCEPTION_DEFAULT| MACH_EXCEPTION_CODES,
					0);

	ut = (uthread_t)get_bsdthread_info(thread);

    vm_init_before_launchd();


	bsd_init_kprintf("bsd_do_post - done");
	// 加载launchd
	load_init_program(p);
	lock_trace = 1;
}

void load_init_program(proc_t p)
{
	uint32_t i;
	int error;
	vm_map_t map = current_map();
	mach_vm_offset_t scratch_addr = 0;
	mach_vm_size_t map_page_size = vm_map_page_size(map);

	(void) mach_vm_allocate_kernel(map, &scratch_addr, map_page_size, VM_FLAGS_ANYWHERE, VM_KERN_MEMORY_NONE);
    
    error = ENOENT;
    // 加载“init”程序，这里指的是launchd
    // init_programs保存着要运行程序的路径
	for (i = 0; i < sizeof(init_programs)/sizeof(init_programs[0]); i++) {
		printf("load_init_program: attempting to load %s\n", init_programs[i]);
        // 使用从系统克隆出的那个第一个线程加载这个"init"程序，即加载launchd
		error = load_init_program_at_path(p, (user_addr_t)scratch_addr, init_programs[i]);
		if (!error) {
			return;
		} else {
			printf("load_init_program: failed loading %s: errno %d\n", init_programs[i], error);
		}
	}

	panic("Process 1 exec of %s failed, errno %d", ((i == 0) ? "<null>" : init_programs[i-1]), error);
}

static int load_init_program_at_path(proc_t p, user_addr_t scratch_addr, const char* path)
{
    return execve(p, &init_exec_args, retval);
}

init_programs装的就是launchd程序的路径

static const char * init_programs[] = {
#if DEBUG
	"/usr/local/sbin/launchd.debug",
#endif
#if DEVELOPMENT || DEBUG
	"/usr/local/sbin/launchd.development",
#endif
	"/sbin/launchd",
};

我们知道iOS和Mac执行的都是Mach-O格式的文件，即使是launchd也是一样，所以接下来的步骤，同样适用于其他进程加载app程序。

int execve(proc_t p, struct execve_args *uap, int32_t *retval)
{
	struct __mac_execve_args muap;
	int err;

	memoryshot(VM_EXECVE, DBG_FUNC_NONE);

	muap.fname = uap->fname;
	muap.argp = uap->argp;
	muap.envp = uap->envp;
	muap.mac_p = USER_ADDR_NULL;
	err = __mac_execve(p, &muap, retval);

	return(err);
}

0x02 MACH-O格式

Mach-O是OS X和iOS的可执行文件，类似于安卓的elf和微软的PE，但又不仅限于可执行文件，比如iOS的动态库其实也可以Mach-O格式。其格式如下图：

Mach-O格式示意图

Mach-O在加载过程中，在内核态的处理主要是对进程的一些基本设置，比如分配虚拟内存、创建主线程以及代码签名、加密等任务。而在转由去用户态的时候调用动态加载器dyld会继续对Mach-O做处理，比如库加载和符号解析等。

1. header

头信息的格式如下：

struct mach_header_64 {
	uint32_t	magic;		/* 0xfeedfacf表示64位，而0xfeedface表示32位 */
	cpu_type_t	cputype;	/* CPU平台:arm还是i386 */
	cpu_subtype_t	cpusubtype;	/* armv7、armv8等等 */
	uint32_t	filetype;	/* 文件类型，比如是可执行程序还是动态库等 */
	uint32_t	ncmds;		/* load commands的数量 */
	uint32_t	sizeofcmds;	/* load commands的大小 */
	uint32_t	flags;		/* 标签参数 */
	uint32_t	reserved;	/* reserved，保留字段，暂时没用 */
};

1.1 filetype

常见的Mach-O文件类型有以下几种：

MH_OBJECT

目标文件，比如编译后得到的.o文件

静态库文件，比如.a文件
MH_EXECUTE

可执行文件，广义上我们口中常说的app文件，即ipa拆包后得到的文件
MH_DYLIB

动态库文件，比如.dylib或.framework
MH_DYLINKER

动态链接器，启动dyld
MH_DSYM

存储着二进制文件符号信息的文件，常用于分析闪退信息等

1.2 flags

常见的标签参数有以下几种

MH_DYLDLINK

作为动态链接器的输入文件，不能再次被静态链接编辑
MH_PIE

加载主程序在一个随机地址。仅文件类型是MH_EXECUTE的才有效

2. Load Commands

这个主要描述的是文件在虚拟内存中的逻辑结构和布局，可以在被调用的时候清晰地知道如何设置并加载二进制数据。其结构如下

struct load_command {
	uint32_t cmd;		/* load command类型 */
	uint32_t cmdsize;	/* 大小 */
};

Load Commands紧跟着mach_header，其总的大小保存在mach_header里的sizeofcmds里。所有的load commands都必须有自己的两个成员cmd和cmdsize，其中cmdsize在64架构中必须是8的倍数。而cmd表示的是类型，常见的类型如下

LC_SEGMENT(LC_SEGMENT_64)

将文件中(32位或64位)的段映射到进程地址空间。包括__text代码区、常量区和OC类信息等。
LC_LOAD_DYLINKER

启动动态链接器，dyld
LC_UUID

这个id是匹配一个二进制文件及其对应的符号，是个唯一值
LC_THREAD

开启一个Mach线程，不分配栈
LC_UNIXTHREAD

开启一个Unix线程，现被LC_MAIN替代
LC_CORE_SIGNATURE

代码签名，如果签名与代码本身不匹配，进程会被杀掉
LC_ENCRYPTION_INFO

加密信息

load_commands在Mach-O中的实例结构如下：

load_commands示意图

3. 通用 Mach-O

根据编译配置，我们可以生成只包含一种架构的Mach-O文件，比如armv7。当然也可以编译生成多架构的的Mach-O文件，这种包含多种架构的我们称之为通用Mach-O，也可以称为Fat Mach-O。运行通用Mach-O的时候，加载器会选择合适的架构的代码去执行。

0x03 地址空间随机布局(ASLR)

如果应用启动的时候都是进程空间某个固定地址开始，这也就意味着内存中的地址分布具有非常强的可预测性，这就给黑客很大的利用机会。所以现在大部分操作系统都会采用ASLR这样的技术，这将有效防止被攻击。

进程每一次启动时，地址空间都将被随机化，即偏移。实现方法是通过内核将Mach-O的Segment平移某个随机系数。后面的代码阅读中，我们将会遇到这个技术。

0x04 dyld被加载流程

在UNIX中，进程不能被创建出来，只能通过fork( ) 系统调用复制出来。

int __mac_execve(proc_t p, struct __mac_execve_args *uap, int32_t *retval)
{
	char *bufp = NULL; 
	struct image_params *imgp;
	struct vnode_attr *vap;
	struct vnode_attr *origvap;
	int error;
	int is_64 = IS_64BIT_PROCESS(p);
	struct vfs_context context;
	struct uthread	*uthread;
	task_t new_task = NULL;
	boolean_t should_release_proc_ref = FALSE;
	boolean_t exec_done = FALSE;
	boolean_t in_vfexec = FALSE;
	void *inherit = NULL;
    
    context.vc_thread = current_thread();
	context.vc_ucred = kauth_cred_proc_ref(p);
    
    // 分配一大块内存
    MALLOC(bufp, char *, (sizeof(*imgp) + sizeof(*vap) + sizeof(*origvap)), M_TEMP, M_WAITOK | M_ZERO);
	imgp = (struct image_params *) bufp;
	if (bufp == NULL) {
		error = ENOMEM;
		goto exit_with_error;
	}
	vap = (struct vnode_attr *) (bufp + sizeof(*imgp));
	origvap = (struct vnode_attr *) (bufp + sizeof(*imgp) + sizeof(*vap));
    
    // 初始化
    imgp->ip_user_fname = uap->fname;
	imgp->ip_user_argv = uap->argp;
	imgp->ip_user_envv = uap->envp;
	imgp->ip_vattr = vap;
	imgp->ip_origvattr = origvap;
	imgp->ip_vfs_context = &context;
	imgp->ip_flags = (is_64 ? IMGPF_WAS_64BIT : IMGPF_NONE) | ((p->p_flag & P_DISABLE_ASLR) ? IMGPF_DISABLE_ASLR : IMGPF_NONE);
	imgp->ip_seg = (is_64 ? UIO_USERSPACE64 : UIO_USERSPACE32);
	imgp->ip_mac_return = 0;
	imgp->ip_cs_error = OS_REASON_NULL;
    
    uthread = get_bsdthread_info(current_thread());
	if (uthread->uu_flag & UT_VFORK) {
		imgp->ip_flags |= IMGPF_VFORK_EXEC;
		in_vfexec = TRUE;
    // 程序启动需要fork一条新的进程，会走这个else分支
	} else {
		imgp->ip_flags |= IMGPF_EXEC;
        // fork进程
        imgp->ip_new_thread = fork_create_child(current_task(),
					NULL, p, FALSE, p->p_flag & P_LP64, TRUE);
		/* task and thread ref returned by fork_create_child */
		if (imgp->ip_new_thread == NULL) {
			error = ENOMEM;
			goto exit_with_error;
		}

		new_task = get_threadtask(imgp->ip_new_thread);
		context.vc_thread = imgp->ip_new_thread;
	}
    
    // 解析程序
    error = exec_activate_image(imgp);
    
    if (imgp->ip_new_thread != NULL) {
        new_task = get_threadtask(imgp->ip_new_thread);
	}

	if (!error && !in_vfexec) {
		p = proc_exec_switch_task(p, current_task(), new_task, imgp->ip_new_thread);
	
		should_release_proc_ref = TRUE;
	}

	kauth_cred_unref(&context.vc_ucred);
    
    if (!error) {
		task_bank_init(get_threadtask(imgp->ip_new_thread));
		proc_transend(p, 0);

		/* Sever any extant thread affinity */
		thread_affinity_exec(current_thread());

		/* Inherit task role from old task to new task for exec */
		if (!in_vfexec) {
			proc_inherit_task_role(get_threadtask(imgp->ip_new_thread), current_task());
		}

		thread_t main_thread = imgp->ip_new_thread;
		// 设置进程的主线程
		task_set_main_thread_qos(new_task, main_thread);
    }
    .......
}

static int exec_activate_image(struct image_params *imgp)
{
    ......
        // 调用格式对应的加载函数
        // 比如胖指令集有对应的胖指令集加载函数
        for(i = 0; error == -1 && execsw[i].ex_imgact != NULL; i++) {

			error = (*execsw[i].ex_imgact)(imgp);
            ......
        }
    ......
}

execsw的结构如下

struct execsw {
	int (*ex_imgact)(struct image_params *);
	const char *ex_name;
} execsw[] = {
	{ exec_mach_imgact,		"Mach-o Binary" },
	{ exec_fat_imgact,		"Fat Binary" },
	{ exec_shell_imgact,		"Interpreter Script" },
	{ NULL, NULL}
};

对应的指令加载，load_machfile函数加载mach-o文件，activate_exec_state处理拿到的结果信息

static int exec_mach_imgact(struct image_params *imgp)
{
    .......
    lret = load_machfile(imgp, mach_header, thread, &map, &load_result);
    .......
    lret = activate_exec_state(task, p, thread, &load_result);
}

load_return_t load_machfile(
	struct image_params	*imgp,
	struct mach_header	*header,
	thread_t 		thread,
	vm_map_t 		*mapp,
	load_result_t		*result
)
{

	lret = parse_machfile(vp, map, thread, header, file_offset, macho_size,
	                      0, aslr_page_offset, dyld_aslr_page_offset, result,
			      NULL, imgp);
}

static int activate_exec_state(task_t task, proc_t p, thread_t thread, load_result_t *result)
{
    ......
     // 设置入口点
    thread_setentrypoint(thread, result->entry_point);
    ......
}

我们再解析完mach-o文件后，就会拿到结果信息取做处理，其中就有一个设置入口点，也就是在解析完毕后就会跳转到这个入口点运行程序，所以这个入口点很关键，那这个入口点是什么呢？其赋值肯定是在解析mach-o的过程中，所以还是得先来看看解析mach-o文件的过程

static
load_return_t
parse_machfile(
	struct vnode 		*vp,       
	vm_map_t		map,
	thread_t		thread,
	struct mach_header	*header,
	off_t			file_offset,
	off_t			macho_size,
	int			depth,
	int64_t			aslr_offset,
	int64_t			dyld_aslr_offset,
	load_result_t		*result,
	load_result_t		*binresult,
	struct image_params	*imgp
)
{
	uint32_t		ncmds;
	struct load_command	*lcp;
	struct dylinker_command	*dlp = 0;
	integer_t		dlarchbits = 0;
	void *			control;
	load_return_t		ret = LOAD_SUCCESS;
	void *			addr;
	vm_size_t		alloc_size, cmds_size;
	size_t			offset;
	size_t			oldoffset;	/* for overflow check */
	int			pass;
	proc_t			p = current_proc();		/* XXXX */
	int			error;
	int 			resid = 0;
	size_t			mach_header_sz = sizeof(struct mach_header);
	boolean_t		abi64;
	boolean_t		got_code_signatures = FALSE;
	boolean_t		found_header_segment = FALSE;
	boolean_t		found_xhdr = FALSE;
	int64_t			slide = 0;
	boolean_t		dyld_no_load_addr = FALSE;
	boolean_t		is_dyld = FALSE;
	vm_map_offset_t		effective_page_mask = MAX(PAGE_MASK, vm_map_page_mask(map));
#if __arm64__
	uint32_t		pagezero_end = 0;
	uint32_t		executable_end = 0;
	uint32_t		writable_start = 0;
	vm_map_size_t		effective_page_size;

	effective_page_size = MAX(PAGE_SIZE, vm_map_page_size(map));
#endif /* __arm64__ */

	if (header->magic == MH_MAGIC_64 ||
	    header->magic == MH_CIGAM_64) {
	    	mach_header_sz = sizeof(struct mach_header_64);
	}

	/*
	 *	Break infinite recursion
	 */
	if (depth > 1) {
		return(LOAD_FAILURE);
	}
	// 此函数会被遍历两次，第一次解析主程序的Mach-O，第二次解析dyld
	depth++;

	/*
	 *	校验文件的CPU架构和当前运行环境的CPU架构是否一致
	 */
	if (((cpu_type_t)(header->cputype & ~CPU_ARCH_MASK) != (cpu_type() & ~CPU_ARCH_MASK)) ||
	    !grade_binary(header->cputype, 
	    	header->cpusubtype & ~CPU_SUBTYPE_MASK))
		return(LOAD_BADARCH);
		
	abi64 = ((header->cputype & CPU_ARCH_ABI64) == CPU_ARCH_ABI64);
	
    // 根据文件类型，区别处理
	switch (header->filetype) {
	
	// 如果是应用程序，即app
	case MH_EXECUTE:
		if (depth != 1) {
			return (LOAD_FAILURE);
		}
#if CONFIG_EMBEDDED
		// 如果需要作为动态链接器的输入文件，肯定会进入这里，因为dyld还需要解析一次主程序
		if (header->flags & MH_DYLDLINK) {
			/* Check properties of dynamic executables */
			if (!(header->flags & MH_PIE) && pie_required(header->cputype, header->cpusubtype & ~CPU_SUBTYPE_MASK)) {
				return (LOAD_FAILURE);
			}
			result->needs_dynlinker = TRUE;
		} else {
			/* Check properties of static executables (disallowed except for development) */
#if !(DEVELOPMENT || DEBUG)
			return (LOAD_FAILURE);
#endif
		}
#endif /* CONFIG_EMBEDDED */

		break;
	
	// 如果是动态链接器
	case MH_DYLINKER:
		if (depth != 2) {
			return (LOAD_FAILURE);
		}
		is_dyld = TRUE;
		break;
		
	default:
		return (LOAD_FAILURE);
	}

	addr = kalloc(alloc_size);
	if (addr == NULL) {
		return LOAD_NOSPACE;
	}

	......

	// 如果是dyld动态链接器，并且设置了随机地址加载这个动态链接器，就将随机地址的偏移值赋给slide
	if ((header->flags & MH_PIE) || is_dyld) {
		slide = aslr_offset;
	}

	/*
	 *  遍历四次，每次只做一件事
	 *  0: 检查代码段和数据段是否对齐
	 *  1: 进程状态, uuid, 代码签名
	 *  2: segments
	 *  3: dyld, encryption, check entry point
	 */

	boolean_t slide_realign = FALSE;
#if __arm64__
	if (!abi64) {
		slide_realign = TRUE;
	}
#endif

	for (pass = 0; pass <= 3; pass++) {
		// 如果不需要做对齐校验，直接下一轮
		if (pass == 0 && !slide_realign && !is_dyld) {
			/* if we dont need to realign the slide or determine dyld's load
			 * address, pass 0 can be skipped */
			continue;
		} else if (pass == 1) {
#if __arm64__
			boolean_t	is_pie;
			int64_t		adjust;

			is_pie = ((header->flags & MH_PIE) != 0);
			if (pagezero_end != 0 &&
			    pagezero_end < effective_page_size) {
				/* need at least 1 page for PAGEZERO */
				adjust = effective_page_size;
				MACHO_PRINTF(("pagezero boundary at "
					      "0x%llx; adjust slide from "
					      "0x%llx to 0x%llx%s\n",
					      (uint64_t) pagezero_end,
					      slide,
					      slide + adjust,
					      (is_pie
					       ? ""
					       : " BUT NO PIE ****** :-(")));
				if (is_pie) {
					slide += adjust;
					pagezero_end += adjust;
					executable_end += adjust;
					writable_start += adjust;
				}
			}
			if (pagezero_end != 0) {
				result->has_pagezero = TRUE;
			}
			if (executable_end == writable_start && 
			    (executable_end & effective_page_mask) != 0 &&
			    (executable_end & FOURK_PAGE_MASK) == 0) {

				 // 数据段或代码段校对，让其页对齐
				adjust =
					(effective_page_size -
					 (executable_end & effective_page_mask));
				MACHO_PRINTF(("page-unaligned X-W boundary at "
					      "0x%llx; adjust slide from "
					      "0x%llx to 0x%llx%s\n",
					      (uint64_t) executable_end,
					      slide,
					      slide + adjust,
					      (is_pie
					       ? ""
					       : " BUT NO PIE ****** :-(")));
				if (is_pie)
					slide += adjust;
			}
#endif /* __arm64__ */

			if (dyld_no_load_addr && binresult) {
				// dyld在用户态的地址 = 随机地址 + 文件最大的虚拟地址
				slide = vm_map_round_page(slide + binresult->max_vm_addr, effective_page_mask);
			}
		}

		
		offset = mach_header_sz;
		ncmds = header->ncmds;

		while (ncmds--) {

			/*
			 *	获取要解析的load_command地址
			 */
			lcp = (struct load_command *)(addr + offset);
			oldoffset = offset;

			
			switch(lcp->cmd) {
			// 指导内核如何设置新运行进行的内存空间。这些段直接从Mach-O加载到内存中
			case LC_SEGMENT: {
				struct segment_command *scp = (struct segment_command *) lcp;

				......
				
				// segment映射和解析
				// segment下还有区的概念，比如__objc_classlist，__objc_protolist
				ret = load_segment(lcp,
				                   header->filetype,
				                   control,
				                   file_offset,
				                   macho_size,
				                   vp,
				                   map,
				                   slide,
				                   result);

				break;
			}
			// 映射文件中的特定的字节到虚拟内存
			case LC_SEGMENT_64: {
				struct segment_command_64 *scp64 = (struct segment_command_64 *) lcp;
				
				......

				ret = load_segment(lcp,
				                   header->filetype,
				                   control,
				                   file_offset,
				                   macho_size,
				                   vp,
				                   map,
				                   slide,
				                   result);

				break;
			}
			// UNIX线程，包含堆栈
			case LC_UNIXTHREAD:
				if (pass != 1)
					break;
				ret = load_unixthread(
						 (struct thread_command *) lcp,
						 thread,
						 slide,
						 result);
				break;
			// 替换LC_UNIXTHREAD
			case LC_MAIN:
				......
				ret = load_main(
						 (struct entry_point_command *) lcp,
						 thread,
						 slide,
						 result);
				break;
			// 加载动态链接器
			case LC_LOAD_DYLINKER:
				if (pass != 3)
					break;
				if ((depth == 1) && (dlp == 0)) {
					// 动态解析器地址
					dlp = (struct dylinker_command *)lcp;
					dlarchbits = (header->cputype & CPU_ARCH_MASK);
				} else {
					ret = LOAD_FAILURE;
				}
				break;
			// uuid
			case LC_UUID:
				if (pass == 1 && depth == 1) {
					ret = load_uuid((struct uuid_command *) lcp,
							(char *)addr + cmds_size,
							result);
				}
				break;
			// 代码签名
			case LC_CODE_SIGNATURE:
				/* CODE SIGNING */
				if (pass != 1)
					break;
				/* pager -> uip ->
				   load signatures & store in uip
				   set VM object "signed_pages"
				*/
				ret = load_code_signature(
					(struct linkedit_data_command *) lcp,
					vp,
					file_offset,
					macho_size,
					header->cputype,
					result,
					imgp);

					.......

				break;
#if CONFIG_CODE_DECRYPTION
			// 加密的段信息
			case LC_ENCRYPTION_INFO:
			case LC_ENCRYPTION_INFO_64:
				if (pass != 3)
					break;
				ret = set_code_unprotect(
					(struct encryption_info_command *) lcp,
					addr, map, slide, vp, file_offset,
					header->cputype, header->cpusubtype);
					......
				}
				break;
#endif
			default:
				/* Other commands are ignored by the kernel */
				ret = LOAD_SUCCESS;
				break;
			}
			if (ret != LOAD_SUCCESS)
				break;
		}
		if (ret != LOAD_SUCCESS)
			break;
	}

	if (ret == LOAD_SUCCESS) {

		/* Make sure if we need dyld, we got it */
		if (result->needs_dynlinker && !dlp) {
			ret = LOAD_FAILURE;
		}

		if ((ret == LOAD_SUCCESS) && (dlp != 0)) {
			/*
			 * 加载动态解析器, 会再次调用一次parse_machfile
			 */
			ret = load_dylinker(dlp, dlarchbits, map, thread, depth,
					    dyld_aslr_offset, result, imgp);
		}

		.......
	}

	if (ret == LOAD_BADMACHO && found_xhdr) {
		ret = LOAD_BADMACHO_UPX;
	}

	kfree(addr, alloc_size);

	return ret;
}

上面的过程得到的结果会被赋值进load_result_t这个结果体

typedef struct _load_result {
	user_addr_t		mach_header;
	user_addr_t		entry_point;

	user_addr_t		user_stack;
	mach_vm_size_t		user_stack_size;

	user_addr_t		user_stack_alloc;
	mach_vm_size_t		user_stack_alloc_size;

	mach_vm_address_t	all_image_info_addr;
	mach_vm_size_t		all_image_info_size;
    
	int			thread_count;
	unsigned int
		/* boolean_t */	unixproc	:1,
				needs_dynlinker : 1,
				dynlinker	:1,
				validentry	:1,
				has_pagezero    :1,
				using_lcmain	:1,
				is64bit         :1,
						:0;
	unsigned int		csflags;
	unsigned char		uuid[16];
	mach_vm_address_t	min_vm_addr;
	mach_vm_address_t	max_vm_addr;
	unsigned int		platform_binary;
	off_t			cs_end_offset;
	void			*threadstate;
	size_t			threadstate_sz;
} load_result_t;

那么在哪里设置entry_point，其实entry_point的设置在load_dylinker里

static load_return_t load_dylinker{
    .......
	*myresult = load_result_null;
	myresult->is64bit = result->is64bit;

	ret = parse_machfile(vp, map, thread, header, file_offset,
	                     macho_size, depth, slide, 0, myresult, result, imgp);

	if (ret == LOAD_SUCCESS) {
		if (result->threadstate) {
			/* don't use the app's threadstate if we have a dyld */
			kfree(result->threadstate, result->threadstate_sz);
		}
		result->threadstate = myresult->threadstate;
		result->threadstate_sz = myresult->threadstate_sz;

		result->dynlinker = TRUE;
        // 将load_result_t的entry_point，设置为dyld动态链接库的entrypoint，所以启动的时候首先加载的会是dyld。
		result->entry_point = myresult->entry_point;
		result->validentry = myresult->validentry;
		result->all_image_info_addr = myresult->all_image_info_addr;
		result->all_image_info_size = myresult->all_image_info_size;
		if (myresult->platform_binary) {
			result->csflags |= CS_DYLD_PLATFORM;
		}
	}
    ....
}

最后，梳理下这个app启动流程：

fork一条新的进程出来
激活app

a. 区分文件，Mach-o Binary和Fat Binary都有对应的加载函数

b. 分配内存

c. 解析主程序的Mach-O信息

d. 读取主程序Mach-O头信息

e. 遍历主程序每条load command信息，装载进内存

f. 解析dyld，再把d,e的内容再做一遍，期间会将entry_point入口地址改为dyld的入口地址。
进入entry_point对应的入口，启动dyld
设置进程的主线程

所有的操作做完，这时候也已经从内核态进入用户态了。

0x05 dyld加载程序流程

上面在最后一次加载完dyld后，就进入dyld的入口函数，即__dyld_start，这段其实是一段汇编代码

__dyld_start:
	// 一些准备工作，获取头、参数等等这类的信息
	mov 	x28, sp
	and     sp, x28, #~15		// force 16-byte alignment of stack
	mov	x0, #0
	mov	x1, #0
	stp	x1, x0, [sp, #-16]!	// make aligned terminating frame
	mov	fp, sp			// set up fp to point to terminating frame
	sub	sp, sp, #16             // make room for local variables
	ldr     x0, [x28]		// get app's mh into x0
 	ldr     x1, [x28, #8]           // get argc into x1 (kernel passes 32-bit int argc as 64-bits on stack to keep alignment)
	add     x2, x28, #16		// get argv into x2
	adrp	x4,___dso_handle@page
	add 	x4,x4,___dso_handle@pageoff // get dyld's mh in to x4
	adrp	x3,__dso_static@page
	ldr 	x3,[x3,__dso_static@pageoff] // get unslid start of dyld
	sub 	x3,x4,x3		// x3 now has slide of dyld
	mov	x5,sp                   // x5 has &startGlue
	
	// 启动引导，入口为dyldbootstrap::start函数
	bl	__ZN13dyldbootstrap5startEPK12macho_headeriPPKclS2_Pm
	// 会返回主程序的入口地址，并保存到x16寄存器
	mov	x16,x0                  
	ldr     x1, [sp]
	cmp	x1, #0
	b.ne	Lnew

	// LC_UNIXTHREAD，由LC_MAIN代替，所以直接看下面的LC_MAIN
	add	sp, x28, #8		
	br	x16			

	// LC_MAIN ，设置栈信息，并跳入到主程序的入口
Lnew:	mov	lr, x1		    // simulate return address into _start in libdyld.dylib
	ldr     x0, [x28, #8] 	    // 参数1 = argc
	add     x1, x28, #16	    // 参数2 = argv
	add	x2, x1, x0, lsl #3  
	add	x2, x2, #8	    // 参数3 = &env[0]
	mov	x3, x2
Lapple:	ldr	x4, [x3]
	add	x3, x3, #8
	cmp	x4, #0
	b.ne	Lapple		    // 参数4 = apple
	// 跳转到主程序的main函数
	br	x16

__dyld_start首先会调用dyldbootstrap::start函数对主程序再次进行一些处理，比如加载动态库，处理完成后会返回主程序的入口地址，然后设置好主程序入口的一些参数后就进入到主程序的main函数。我们关注的是主程序启动前还做了些什么事情？

uintptr_t start(const struct macho_header* appsMachHeader, int argc, const char* argv[], 
				intptr_t slide, const struct macho_header* dyldsMachHeader,
				uintptr_t* startGlue)
{
	if ( slide != 0 ) {
		rebaseDyld(dyldsMachHeader, slide);
	}
    
	mach_init();

	// kernel sets up env pointer to be just past end of agv array
	const char** envp = &argv[argc+1];
	
	// kernel sets up apple pointer to be just past end of envp array
	const char** apple = envp;
	while(*apple != NULL) { ++apple; }
	++apple;

	// set up random value for stack canary
	__guard_setup(apple);

#if DYLD_INITIALIZER_SUPPORT
	// run all C++ initializers inside dyld
	runDyldInitializers(dyldsMachHeader, slide, argc, argv, envp, apple);
#endif

	// now that we are done bootstrapping dyld, call dyld's main
	uintptr_t appsSlide = slideOfMainExecutable(appsMachHeader);
	return dyld::_main(appsMachHeader, appsSlide, argc, argv, envp, apple, startGlue);
}

uintptr_t _main(const macho_header* mainExecutableMH, uintptr_t mainExecutableSlide, 
		int argc, const char* argv[], const char* envp[], const char* apple[], 
		uintptr_t* startGlue)
{
	dyld3::kdebug_trace_dyld_signpost(DBG_DYLD_SIGNPOST_START_DYLD, 0, 0);

	// Grab the cdHash of the main executable from the environment
	uint8_t mainExecutableCDHashBuffer[20];
	const uint8_t* mainExecutableCDHash = nullptr;
	if ( hexToBytes(_simple_getenv(apple, "executable_cdhash"), 40, mainExecutableCDHashBuffer) )
		mainExecutableCDHash = mainExecutableCDHashBuffer;

	// Trace dyld's load
	notifyKernelAboutImage((macho_header*)&__dso_handle, _simple_getenv(apple, "dyld_file"));
#if !TARGET_IPHONE_SIMULATOR
	// Trace the main executable's load
	notifyKernelAboutImage(mainExecutableMH, _simple_getenv(apple, "executable_file"));
#endif

	uintptr_t result = 0;
	sMainExecutableMachHeader = mainExecutableMH;
	sMainExecutableSlide = mainExecutableSlide;

	CRSetCrashLogMessage("dyld: launch started");
	// 设置上下文运行环境
	setContext(mainExecutableMH, argc, argv, envp, apple);

	// Pickup the pointer to the exec path.
	sExecPath = _simple_getenv(apple, "executable_path");

	// Remember short name of process for later logging
	sExecShortName = ::strrchr(sExecPath, '/');
	if ( sExecShortName != NULL )
		++sExecShortName;
	else
		sExecShortName = sExecPath;

    // 配置进程限制
    configureProcessRestrictions(mainExecutableMH);

	checkEnvironmentVariables(envp);
    
	defaultUninitializedFallbackPaths(envp);
    
	if ( sEnv.DYLD_PRINT_OPTS )
		printOptions(argv);
	if ( sEnv.DYLD_PRINT_ENV ) 
		printEnvironmentVariables(envp);
	getHostInfo(mainExecutableMH, mainExecutableSlide);

	
	checkSharedRegionDisable((mach_header*)mainExecutableMH);
	// 加载共享缓存库
	if ( gLinkContext.sharedRegionMode != ImageLoader::kDontUseSharedRegion ) {
		mapSharedCache();
	}

	......
    
	// install gdb notifier
	stateToHandlers(dyld_image_state_dependents_mapped, sBatchHandlers)->push_back(notifyGDB);
	stateToHandlers(dyld_image_state_mapped, sSingleHandlers)->push_back(updateAllImages);
	// make initial allocations large enough that it is unlikely to need to be re-alloced
	sImageRoots.reserve(16);
	sAddImageCallbacks.reserve(4);
	sRemoveImageCallbacks.reserve(4);
	sImageFilesNeedingTermination.reserve(16);
	sImageFilesNeedingDOFUnregistration.reserve(8);


	try {
		// add dyld itself to UUID list
		addDyldImageToUUIDList();

#if SUPPORT_ACCELERATE_TABLES
		bool mainExcutableAlreadyRebased = false;
		if ( (sSharedCacheLoadInfo.loadAddress != nullptr) && !dylibsCanOverrideCache() && !sDisableAcceleratorTables && (sSharedCacheLoadInfo.loadAddress->header.accelerateInfoAddr != 0) ) {
			struct stat statBuf;
			if ( ::stat(IPHONE_DYLD_SHARED_CACHE_DIR "no-dyld2-accelerator-tables", &statBuf) != 0 )
				sAllCacheImagesProxy = ImageLoaderMegaDylib::makeImageLoaderMegaDylib(&sSharedCacheLoadInfo.loadAddress->header, sSharedCacheLoadInfo.slide, mainExecutableMH, gLinkContext);
		}

reloadAllImages:
#endif

		CRSetCrashLogMessage(sLoadingCrashMessage);
		// 初始化主程序
		sMainExecutable = instantiateFromLoadedImage(mainExecutableMH, mainExecutableSlide, sExecPath);
		gLinkContext.mainExecutable = sMainExecutable;
		gLinkContext.mainExecutableCodeSigned = hasCodeSignatureLoadCommand(mainExecutableMH);


		gLinkContext.strictMachORequired = true;

	#if SUPPORT_ACCELERATE_TABLES
		sAllImages.reserve((sAllCacheImagesProxy != NULL) ? 16 : INITIAL_IMAGE_COUNT);
	#else
		sAllImages.reserve(INITIAL_IMAGE_COUNT);
	#endif

		// Now that shared cache is loaded, setup an versioned dylib overrides
	#if SUPPORT_VERSIONED_PATHS
		checkVersionedPaths();
	#endif


		// dyld_all_image_infos image list does not contain dyld
		// add it as dyldPath field in dyld_all_image_infos
		// for simulator, dyld_sim is in image list, need host dyld added
#if TARGET_IPHONE_SIMULATOR
		// get path of host dyld from table of syscall vectors in host dyld
		void* addressInDyld = gSyscallHelpers;
#else
		// get path of dyld itself
		void*  addressInDyld = (void*)&__dso_handle;
#endif
		char dyldPathBuffer[MAXPATHLEN+1];
		int len = proc_regionfilename(getpid(), (uint64_t)(long)addressInDyld, dyldPathBuffer, MAXPATHLEN);
		if ( len > 0 ) {
			dyldPathBuffer[len] = '\0'; // proc_regionfilename() does not zero terminate returned string
			if ( strcmp(dyldPathBuffer, gProcessInfo->dyldPath) != 0 )
				gProcessInfo->dyldPath = strdup(dyldPathBuffer);
		}

		// 加载插入的动态库
		if	( sEnv.DYLD_INSERT_LIBRARIES != NULL ) {
			for (const char* const* lib = sEnv.DYLD_INSERT_LIBRARIES; *lib != NULL; ++lib) 
				loadInsertedDylib(*lib);
		}
		// record count of inserted libraries so that a flat search will look at 
		// inserted libraries, then main, then others.
		sInsertedDylibCount = sAllImages.size()-1;

		// 链接主程序
		gLinkContext.linkingMainExecutable = true;
		link(sMainExecutable, sEnv.DYLD_BIND_AT_LAUNCH, true, ImageLoader::RPathChain(NULL, NULL), -1);
		sMainExecutable->setNeverUnloadRecursive();
		if ( sMainExecutable->forceFlat() ) {
			gLinkContext.bindFlat = true;
			gLinkContext.prebindUsage = ImageLoader::kUseNoPrebinding;
		}

		// 链接插入的动态库
		// do this after linking main executable so that any dylibs pulled in by inserted 
		// dylibs (e.g. libSystem) will not be in front of dylibs the program uses
		if ( sInsertedDylibCount > 0 ) {
			for(unsigned int i=0; i < sInsertedDylibCount; ++i) {
				ImageLoader* image = sAllImages[i+1];
				link(image, sEnv.DYLD_BIND_AT_LAUNCH, true, ImageLoader::RPathChain(NULL, NULL), -1);
				image->setNeverUnloadRecursive();
			}
			// only INSERTED libraries can interpose
			// register interposing info after all inserted libraries are bound so chaining works
			for(unsigned int i=0; i < sInsertedDylibCount; ++i) {
				ImageLoader* image = sAllImages[i+1];
				image->registerInterposing();
			}
		}

		// <rdar://problem/19315404> dyld should support interposition even without DYLD_INSERT_LIBRARIES
		for (long i=sInsertedDylibCount+1; i < sAllImages.size(); ++i) {
			ImageLoader* image = sAllImages[i];
			if ( image->inSharedCache() )
				continue;
			image->registerInterposing();
		}
	#if SUPPORT_ACCELERATE_TABLES
		if ( (sAllCacheImagesProxy != NULL) && ImageLoader::haveInterposingTuples() ) {
			// Accelerator tables cannot be used with implicit interposing, so relaunch with accelerator tables disabled
			ImageLoader::clearInterposingTuples();
			// unmap all loaded dylibs (but not main executable)
			for (long i=1; i < sAllImages.size(); ++i) {
				ImageLoader* image = sAllImages[i];
				if ( image == sMainExecutable )
					continue;
				if ( image == sAllCacheImagesProxy )
					continue;
				image->setCanUnload();
				ImageLoader::deleteImage(image);
			}
			// note: we don't need to worry about inserted images because if DYLD_INSERT_LIBRARIES was set we would not be using the accelerator table
			sAllImages.clear();
			sImageRoots.clear();
			sImageFilesNeedingTermination.clear();
			sImageFilesNeedingDOFUnregistration.clear();
			sAddImageCallbacks.clear();
			sRemoveImageCallbacks.clear();
			sDisableAcceleratorTables = true;
			sAllCacheImagesProxy = NULL;
			sMappedRangesStart = NULL;
			mainExcutableAlreadyRebased = true;
			gLinkContext.linkingMainExecutable = false;
			resetAllImages();
			goto reloadAllImages;
		}
	#endif

		// apply interposing to initial set of images
		for(int i=0; i < sImageRoots.size(); ++i) {
			sImageRoots[i]->applyInterposing(gLinkContext);
		}
		gLinkContext.linkingMainExecutable = false;
		
		// <rdar://problem/12186933> do weak binding only after all inserted images linked
        // 弱符号绑定
		sMainExecutable->weakBind(gLinkContext);


		CRSetCrashLogMessage("dyld: launch, running initializers");
        // 初始化
		initializeMainExecutable(); 

		// notify any montoring proccesses that this process is about to enter main()
		dyld3::kdebug_trace_dyld_signpost(DBG_DYLD_SIGNPOST_START_MAIN_DYLD2, 0, 0);
		notifyMonitoringDyldMain();

		// 寻找主程序入口点
		result = (uintptr_t)sMainExecutable->getThreadPC();
		if ( result != 0 ) {
			// main executable uses LC_MAIN, needs to return to glue in libdyld.dylib
			if ( (gLibSystemHelpers != NULL) && (gLibSystemHelpers->version >= 9) )
				*startGlue = (uintptr_t)gLibSystemHelpers->startGlueToCallExit;
			else
				halt("libdyld.dylib support not present for LC_MAIN");
		}
		else {
			// main executable uses LC_UNIXTHREAD, dyld needs to let "start" in program set up for main()
			result = (uintptr_t)sMainExecutable->getMain();
			*startGlue = 0;
		}
	}
	catch(const char* message) {
		syncAllImages();
		halt(message);
	}
	catch(...) {
		dyld::log("dyld: launch failed\n");
	}

	CRSetCrashLogMessage(NULL);

	if (sSkipMain) {
		dyld3::kdebug_trace_dyld_signpost(DBG_DYLD_SIGNPOST_START_MAIN, 0, 0);
		result = (uintptr_t)&fake_main;
		*startGlue = (uintptr_t)gLibSystemHelpers->startGlueToCallExit;
	}
	
	return result;
}

主要的步骤如下：

设置上下文运行环境
加载共享缓存库
初始化主程序
加载插入的动态库
链接主程序
链接插入的动态库
初始化主程序
寻找主程序入口点
进入主程序入口点

加载共享缓存库：mapSharedCache

我们需要知道，像每个app自带的动态库，比如libobj或者libdispatch，都是被映射到在一个共享区，每个app都是从这里读取动态库的内容。这样就可以大大节省了内存空间。

static void mapSharedCache()
{
	dyld3::SharedCacheOptions opts;
	opts.cacheDirOverride	= sSharedCacheOverrideDir;
	opts.forcePrivate		= (gLinkContext.sharedRegionMode == ImageLoader::kUsePrivateSharedRegion);
	opts.useHaswell			= sHaswell;
	opts.verbose			= gLinkContext.verboseMapping;
    // 加载动态库缓存
	loadDyldCache(opts, &sSharedCacheLoadInfo);

	// update global state
	if ( sSharedCacheLoadInfo.loadAddress != nullptr ) {
		dyld::gProcessInfo->processDetachedFromSharedRegion = opts.forcePrivate;
		dyld::gProcessInfo->sharedCacheSlide                = sSharedCacheLoadInfo.slide;
		dyld::gProcessInfo->sharedCacheBaseAddress          = (unsigned long)sSharedCacheLoadInfo.loadAddress;
		sSharedCacheLoadInfo.loadAddress->getUUID(dyld::gProcessInfo->sharedCacheUUID);
		dyld3::kdebug_trace_dyld_image(DBG_DYLD_UUID_SHARED_CACHE_A, (const uuid_t *)&dyld::gProcessInfo->sharedCacheUUID[0], {0,0}, {{ 0, 0 }}, (const mach_header *)sSharedCacheLoadInfo.loadAddress);
	}
}

bool loadDyldCache(const SharedCacheOptions& options, SharedCacheLoadInfo* results)
{
    results->loadAddress        = 0;
    results->slide              = 0;
    results->cachedDylibsGroup  = nullptr;
    results->errorMessage       = nullptr;

    if ( options.forcePrivate ) {
        // mmap cache into this process only
        return mapCachePrivate(options, results);
    }
    else {
        // 已经映射到了共享区域了，直接将它在共享内存中的内存地址映射到进程的内存地址空间
        if ( reuseExistingCache(options, results) )
            return (results->errorMessage != nullptr);

        // 如果是第一个程序刚刚启动，共享区其实没内容的，需要将库映射到共享区
        return mapCacheSystemWide(options, results);
    }
}

初始化主程序：instantiateFromLoadedImage

主要工作就是创建一个装在主程序的映像加载器(ImageLoader)。主要流程就三步：

检查主程序运行的CPU架构与当前设备的CPU架构是否匹配
实例化一个ImageLoader
把ImageLoader添加到一个管理表中

static ImageLoaderMachO* instantiateFromLoadedImage(const macho_header* mh, uintptr_t slide, const char* path)
{
	if ( isCompatibleMachO((const uint8_t*)mh, path) ) {
		ImageLoader* image = ImageLoaderMachO::instantiateMainExecutable(mh, slide, path, gLinkContext);
		addImage(image);
		return (ImageLoaderMachO*)image;
	}
	
	throw "main executable not a known format";
}

主要看下instantiateMainExecutable的实现

ImageLoader* ImageLoaderMachO::instantiateMainExecutable(const macho_header* mh, uintptr_t slide, const char* path, const LinkContext& context)
{
	bool compressed;
	unsigned int segCount;
	unsigned int libCount;
	const linkedit_data_command* codeSigCmd;
	const encryption_info_command* encryptCmd;
    // 判断主程序是否压缩的，现在基本的程序都是压缩的
    // 判断方式通过段类型为LC_DYLD_INFO和LC_DYLD_INFO_ONLY的信息
    // switch (cmd->cmd) {
	// 		case LC_DYLD_INFO:
	// 		case LC_DYLD_INFO_ONLY:
	// 			if ( cmd->cmdsize != sizeof(dyld_info_command) )
	// 				throw "malformed mach-o image: LC_DYLD_INFO size wrong";
	// 			dyldInfoCmd = (struct dyld_info_command*)cmd;
	//			*compressed = true;
	//			break;
    //      .......
    // }
    //  ......
	sniffLoadCommands(mh, path, false, &compressed, &segCount, &libCount, context, &codeSigCmd, &encryptCmd);
	// 根据load commands的内容来实例化具体类，这里返回一个ImageLoaderMachOCompressed对象
	if ( compressed ) 
		return ImageLoaderMachOCompressed::instantiateMainExecutable(mh, slide, path, segCount, libCount, context);
	else
		return ImageLoaderMachOClassic::instantiateMainExecutable(mh, slide, path, 
}

加载插入的动态库：loadInsertedDylib

循环遍历DYLD_INSERT_LIBRARIES环境变量中指定的动态库列表

static void loadInsertedDylib(const char* path)
{
	ImageLoader* image = NULL;
	unsigned cacheIndex;
	try {
		LoadContext context;
		.......
		image = load(path, context, cacheIndex);
	}
    ......
}

// 
ImageLoader* load(const char* path, const LoadContext& context, unsigned& cacheIndex)
{
    ......
    ImageLoader* image = loadPhase0(path, orgPath, context, cacheIndex, NULL);
    ......
}

// 从DYLD_ROOT_PATH路径进行查找
static ImageLoader* loadPhase0(const char* path, const char* orgPath, const LoadContext& context, unsigned& cacheIndex, std::vector<const char*>* exceptions)
{
#if SUPPORT_ROOT_PATH
	// handle DYLD_ROOT_PATH which forces absolute paths to use a new root
	if ( (gLinkContext.rootPaths != NULL) && (path[0] == '/') ) {
		for(const char* const* rootPath = gLinkContext.rootPaths ; *rootPath != NULL; ++rootPath) {
			char newPath[strlen(*rootPath) + strlen(path)+2];
			strcpy(newPath, *rootPath);
			strcat(newPath, path);
			ImageLoader* image = loadPhase1(newPath, orgPath, context, cacheIndex, exceptions);
			if ( image != NULL )
				return image;
		}
	}
#endif

	// try raw path
	return loadPhase1(path, orgPath, context, cacheIndex, exceptions);
}

// loadPhase1从LD_LIBRARY_PATH路径进行查找
// loadPhase2从executable_path路径进行查找
// loadPhase3 ~ loadPhase4类似的路径查找
static ImageLoader* loadPhase5(const char* path, const char* orgPath, const LoadContext& context, unsigned& cacheIndex, std::vector<const char*>* exceptions)
{
	for (std::vector<DylibOverride>::iterator it = sDylibOverrides.begin(); it != sDylibOverrides.end(); ++it) {
		if ( strcmp(it->installName, path) == 0 ) {
			path = it->override;
			break;
		}
	}
	
	if ( exceptions != NULL ) 
        // 尝试打开，如果在共享区，直接返回该动态库对应的加载器(ImageLoader)，如果不是则在硬盘区，则尝试打开，如果能打开调用loadPhase6将动态库映射到一个加载器
		return loadPhase5load(path, orgPath, context, cacheIndex, exceptions);
	else
        // 检查是否已经存在，如果已经存在直接返回
		return loadPhase5check(path, orgPath, context);
}

static ImageLoader* loadPhase6(int fd, const struct stat& stat_buf, const char* path, const LoadContext& context)
{
    ......
    if ( isCompatibleMachO(firstPages, path) ) {

		// 检查MACH-0类型，只有MH_BUNDLE, MH_DYLIB, 和一些MH_EXECUTE类型才可以被动态加载
		const mach_header* mh = (mach_header*)firstPages;
		switch ( mh->filetype ) {
			case MH_EXECUTE:
			case MH_DYLIB:
			case MH_BUNDLE:
				break;
			default:
				throw "mach-o, but wrong filetype";
		}
        ......
        // 初始化一个映像加载器
		ImageLoader* image = ImageLoaderMachO::instantiateFromFile(path, fd, firstPages, headerAndLoadCommandsSize, fileOffset, fileLength, stat_buf, gLinkContext);
		
		// 添加到全局管理
		return checkandAddImage(image, context);
    }
    ......
}

链接主程序：link

这步主要加载所有的动态库，符号绑定等。

void ImageLoader::link(const LinkContext& context, bool forceLazysBound, bool preflightOnly, bool neverUnload, const RPathChain& loaderRPaths, const char* imagePath)
{
	
	// clear error strings
	(*context.setErrorStrings)(0, NULL, NULL, NULL);

	uint64_t t0 = mach_absolute_time();
    // 递归加载所有动态库
	this->recursiveLoadLibraries(context, preflightOnly, loaderRPaths, imagePath);
	context.notifyBatch(dyld_image_state_dependents_mapped, preflightOnly);

	// we only do the loading step for preflights
	if ( preflightOnly )
		return;
		
	uint64_t t1 = mach_absolute_time();
	context.clearAllDepths();
    // 刷新库依赖的层级。层级越深，depth越大
	this->recursiveUpdateDepth(context.imageCount());

	uint64_t t2 = mach_absolute_time();
    // 递归修正自己和加载动态库的基地址
 	this->recursiveRebase(context);
	context.notifyBatch(dyld_image_state_rebased, false);
	
	uint64_t t3 = mach_absolute_time();
    // 对no-lazy符号进行绑定，修正那些指向其他二进制文件所包含的符号的指针
    // lazy在运行时绑定。
 	this->recursiveBind(context, forceLazysBound, neverUnload);

	uint64_t t4 = mach_absolute_time();
    // 弱符号绑定
	if ( !context.linkingMainExecutable )
		this->weakBind(context);
	uint64_t t5 = mach_absolute_time();	

	context.notifyBatch(dyld_image_state_bound, false);
	uint64_t t6 = mach_absolute_time();	

	std::vector<DOFInfo> dofs;
    // 注册程序的DOF节区，供dtrace使用
	this->recursiveGetDOFSections(context, dofs);
	context.registerDOFs(dofs);
	uint64_t t7 = mach_absolute_time();	

	// interpose any dynamically loaded images
	if ( !context.linkingMainExecutable && (fgInterposingTuples.size() != 0) ) {
		this->recursiveApplyInterposing(context);
	}
	
	// clear error strings
	(*context.setErrorStrings)(0, NULL, NULL, NULL);

	fgTotalLoadLibrariesTime += t1 - t0;
	fgTotalRebaseTime += t3 - t2;
	fgTotalBindTime += t4 - t3;
	fgTotalWeakBindTime += t5 - t4;
	fgTotalDOF += t7 - t6;
	
	// done with initial dylib loads
	fgNextPIEDylibAddress = 0;
}

继续看Mach-O格式图，我们可以看到text段下有__stubs和__stb_helper，以及data段下有__nl_symbol_ptr和__la_symbol_ptr。

Mach-O示意图

__nl_symbol_ptr和__la_symbol_ptr 分别表示non lazy binding指针表和lazy binding指针表，这两个指针表分别保存的是字符串标对应的函数地址。

我们通过一个例子来了解__stubs、__stb_helper和__nl_symbol_ptr、__la_symbol_ptr 之间的关系。测试代码如下

int main(int argc, char * argv[]) {
    printf("测试1");
    printf("测试2");
    return 0;
}

在第一个printf打下断点，进入汇编模式进行查看

testData`main:
    0x1000ca618 <+0>:  sub    sp, sp, #0x30             ; =0x30 
    0x1000ca61c <+4>:  stp    x29, x30, [sp, #0x20]
    0x1000ca620 <+8>:  add    x29, sp, #0x20            ; =0x20 
    0x1000ca624 <+12>: stur   wzr, [x29, #-0x4]
    0x1000ca628 <+16>: stur   w0, [x29, #-0x8]
    0x1000ca62c <+20>: str    x1, [sp, #0x10]
->  0x1000ca630 <+24>: adrp   x0, 4
    0x1000ca634 <+28>: add    x0, x0, #0x871            ; =0x871 
    ; 跳入__stubs区
    0x1000ca638 <+32>: bl     0x1000cc67c               ; symbol stub for: printf
   
// 查看0x1000cc67c处内存内容是什么
// 的确是跳入stubs区
(lldb) image lookup --address 0x1000cc67c
      Address: testData[0x000000010000867c] (testData.__TEXT.__stubs + 540)
      Summary: testData`symbol stub for: printf

给0x1000cc67c下断点，继续看进入stub做什么了

testData`printf:
->  0x1000cc67c <+0>: nop
	; 跳入0x00000001000cc934，即进入stub_helper
    0x1000cc680 <+4>: ldr    x16, #0x3b70              ; (void *)0x0000000100080934
    0x1000cc684 <+8>: br     x16
    
 // 查看0x00000001000cc934内容，发现进入了stub_helper
 (lldb) image lookup --address 0x0000000100080934
      Address: testData[0x0000000100008934] (testData.__TEXT.__stub_helper + 588)
      Summary:

0x0000000100080934 - 0x0000000000078000 = 0x0000000100008934 ，而这个0x0000000100008934在Mach-O的位置，就是__la_symbol_ptr 内指向printf位置的地址

Mach-O示意图

继续给0x0000000100080934 下断点，查看后面指令

->  0x1000cc934: ldr    w16, 0x1000cc93c
    0x1000cc938: b      0x1000cc6e8
  
 (lldb) image lookup --address 0x1001046e8
      Address: testData[0x00000001000086e8] (testData.__TEXT.__stub_helper + 0)
      Summary:

0x1000cc6e8处下断点

0x1001046e8: adr    x17, #0x3998              ; (void *)0x00000001200da038: initialPoolContent + 2856
0x1001046ec: nop    
0x1001046f0: stp    x16, x17, [sp, #-0x10]!
0x1001046f4: nop 
;跳入dyld_stub_binder函数
0x1001046f8: ldr    x16, #0x3980              ; (void *)0x00000001944e915c: dyld_stub_binder
0x1001046fc: br     x16
0x100104700: ldr    w16, 0x100104708
0x100104704: b      0x1001046e8

dyld_stub_binder函数是个汇编函数，它函数地址其实是从__nl_symbol_ptr取到的，在ARM中__nl_symbol_ptr就是__got。

Mach-O示意图

总结，第一次访问printf符号的时候先去stub，stub告诉从__la_symbol_ptr查找，__la_symbol_ptr表示还没有printf符号真实函数地址，需要动态绑定，于是去__nl_symbol_ptr查找dyld_stub_binder函数的地址，进行查找真实的printf地址。找到后调用printf函数，并把这个地址保存进__la_symbol_ptr。下次调用printf函数的时候在__la_symbol_ptr就能得到真实地址进行跳转。

初始化主程序：initializeMainExecutable

初始化主程序和以及其相关的模块，比如动态库。这时候就会执行以前文章说的libobjc库

void initializeMainExecutable()
{
	// record that we've reached this step
	gLinkContext.startedInitializingMainExecutable = true;

	// run initialzers for any inserted dylibs
	ImageLoader::InitializerTimingList initializerTimes[allImagesCount()];
	initializerTimes[0].count = 0;
	const size_t rootCount = sImageRoots.size();
	if ( rootCount > 1 ) {
		for(size_t i=1; i < rootCount; ++i) {
			sImageRoots[i]->runInitializers(gLinkContext, initializerTimes[0]);
		}
	}
	
	// run initializers for main executable and everything it brings up 
	sMainExecutable->runInitializers(gLinkContext, initializerTimes[0]);
	
	// register cxa_atexit() handler to run static terminators in all loaded images when this process exits
	if ( gLibSystemHelpers != NULL ) 
		(*gLibSystemHelpers->cxa_atexit)(&runAllStaticTerminators, NULL, NULL);

	// dump info if requested
	if ( sEnv.DYLD_PRINT_STATISTICS )
		ImageLoader::printStatistics((unsigned int)allImagesCount(), initializerTimes[0]);
	if ( sEnv.DYLD_PRINT_STATISTICS_DETAILS )
		ImageLoaderMachO::printStatisticsDetails((unsigned int)allImagesCount(), initializerTimes[0]);
}

void ImageLoader::runInitializers(const LinkContext& context, InitializerTimingList& timingInfo)
{
	uint64_t t1 = mach_absolute_time();
	mach_port_t thisThread = mach_thread_self();
	ImageLoader::UninitedUpwards up;
	up.count = 1;
	up.images[0] = this;
	processInitializers(context, thisThread, timingInfo, up);
	context.notifyBatch(dyld_image_state_initialized, false);
	mach_port_deallocate(mach_task_self(), thisThread);
	uint64_t t2 = mach_absolute_time();
	fgTotalInitTime += (t2 - t1);
}

void ImageLoader::processInitializers(const LinkContext& context, mach_port_t thisThread,
									 InitializerTimingList& timingInfo, ImageLoader::UninitedUpwards& images)
{
	uint32_t maxImageCount = context.imageCount()+2;
	ImageLoader::UninitedUpwards upsBuffer[maxImageCount];
	ImageLoader::UninitedUpwards& ups = upsBuffer[0];
	ups.count = 0;
	// 递归初始化所有的映像加载器内的数据，Load方法也在这里被初始化
	for (uintptr_t i=0; i < images.count; ++i) {
		images.images[i]->recursiveInitialization(context, thisThread, images.images[i]->getPath(), timingInfo, ups);
	}
	// If any upward dependencies remain, init them.
	if ( ups.count > 0 )
		processInitializers(context, thisThread, timingInfo, ups);
}

寻找主程序入口点：getThreadPC

在LC_MAIN段，程序入口点的保存位置是程序的头的起始位置 + 段记录的偏移地址。

void* ImageLoaderMachO::getThreadPC() const
{
	const uint32_t cmd_count = ((macho_header*)fMachOData)->ncmds;
	const struct load_command* const cmds = (struct load_command*)&fMachOData[sizeof(macho_header)];
	const struct load_command* cmd = cmds;
	for (uint32_t i = 0; i < cmd_count; ++i) {
		if ( cmd->cmd == LC_MAIN ) {
			entry_point_command* mainCmd = (entry_point_command*)cmd;
			void* entry = (void*)(mainCmd->entryoff + (char*)fMachOData);
			// <rdar://problem/8543820&9228031> verify entry point is in image
			if ( this->containsAddress(entry) )
				return entry;
			else
				throw "LC_MAIN entryoff is out of range";
		}
		cmd = (const struct load_command*)(((char*)cmd)+cmd->cmdsize);
	}
	return NULL;
}