dyld启动流程

0x01 launchd

   launchd是第一个被内核启动的用户态进程,负责直接或间接的启动系统中的其他进程。它是用户模式里所有进程的父进程,同时也将负责两种后台作业:守护程序和代理程序。

守护程序:后台服务,通常和用户没有交互。比如push通知、外接设备插入的处理和XPC等。

代理程序:可以和用户交互,比如Mac的Finder或iOS的SpringBoard就是其中之一,即广义上我们理解的桌面。

   launchd是如何被创建的,得先看下下面这张XNU启动流程图

xnu启动示意图

  • start(iOS):初始化MSR、物理页映射、安装中断处理函数

  • arm_init(iOS):初始化平台,为启动内核做准备

  • machine_startup:解析命令行参数和调试参数

  • kernel_bootstrap:安装和初始化mach内核的子系统,包括:进程间通信、时钟、访问策略、进程和线程调度。

  • kernel_bootstrap_thread:创建idle线程,初始化iokit设备驱动框架,初始化应用程序和dyld运行所需的共享模块。如果内核开启了mac(强制访问控制)策略,则会进行mac的初始化,以确保系统的安全。

  • bsd_init:内核部分剩余的事情都由其来做,初始化各个子系统。网络、文件系统、管道、内存cache、线程、进程、同步对象、权限策略等等。 一切完成后,会执行/sbin/launchd来创建一个launchd。

我们看下源码的初始化过程,launchd是怎么被启动起来的

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
void bsd_init(void) {
......
bsd_utaskbootstrap();
......
}

void bsd_utaskbootstrap(void) {
thread_t thread;
struct uthread *ut;

// 从内核进程克隆引导进程,但不从内核继承任何任务特性或内存
thread = cloneproc(TASK_NULL, COALITION_NULL, kernproc, FALSE, TRUE);

/* Hold the reference as it will be dropped during shutdown */
initproc = proc_find(1);

/*
* Since we aren't going back out the normal way to our parent,
* we have to drop the transition locks explicitly.
*/
proc_signalend(initproc, 0);
proc_transend(initproc, 0);

ut = (struct uthread *)get_bsdthread_info(thread);
ut->uu_sigmask = 0;
// 为了真正地创建出任务,对创建出的线程调用这个函数
// 执行后产生一个异步系统陷阱(AST),Mach的AST异步处理程序会特别处理这个情况,即调用bsd_ast()
act_set_astbsd(thread);
task_clear_return_wait(get_threadtask(thread));
}

void bsd_ast(thread_t thread) {
......
if (!bsd_init_done) {
bsd_init_done = 1;
bsdinit_task();
}
......
}

void bsdinit_task(void)
{
proc_t p = current_proc();
struct uthread *ut;
thread_t thread;

// 将这个从内核态克隆到用户态的第一个线程的名字设置为init
process_name("init", p);
// 内部创建了一个Mach内核线程处理ux_handler,而ux_handler设置了一个消息循环用于监听异常,如果接收到异常,将异常转换为UNIX信号,并投递到出错线程。
ux_handler_init();

thread = current_thread();
// ux_handler_init()返回时,ux_handler已经在另一个线程中执行了,并注册好了ux_exception_port。
// 这个函数将所有的Mach异常消息都重定向到ux_exception_port
// 由于所有程序都是launchld后代,所以都会继承这个异常端口
(void) host_set_exception_ports(host_priv_self(),
EXC_MASK_ALL & ~(EXC_MASK_RPC_ALERT),//pilotfish (shark) needs this port
(mach_port_t) ux_exception_port,
EXCEPTION_DEFAULT| MACH_EXCEPTION_CODES,
0);

ut = (uthread_t)get_bsdthread_info(thread);

vm_init_before_launchd();


bsd_init_kprintf("bsd_do_post - done");
// 加载launchd
load_init_program(p);
lock_trace = 1;
}

void load_init_program(proc_t p)
{
uint32_t i;
int error;
vm_map_t map = current_map();
mach_vm_offset_t scratch_addr = 0;
mach_vm_size_t map_page_size = vm_map_page_size(map);

(void) mach_vm_allocate_kernel(map, &scratch_addr, map_page_size, VM_FLAGS_ANYWHERE, VM_KERN_MEMORY_NONE);

error = ENOENT;
// 加载“init”程序,这里指的是launchd
// init_programs保存着要运行程序的路径
for (i = 0; i < sizeof(init_programs)/sizeof(init_programs[0]); i++) {
printf("load_init_program: attempting to load %s\n", init_programs[i]);
// 使用从系统克隆出的那个第一个线程加载这个"init"程序,即加载launchd
error = load_init_program_at_path(p, (user_addr_t)scratch_addr, init_programs[i]);
if (!error) {
return;
} else {
printf("load_init_program: failed loading %s: errno %d\n", init_programs[i], error);
}
}

panic("Process 1 exec of %s failed, errno %d", ((i == 0) ? "<null>" : init_programs[i-1]), error);
}

static int load_init_program_at_path(proc_t p, user_addr_t scratch_addr, const char* path)
{
return execve(p, &init_exec_args, retval);
}

   init_programs装的就是launchd程序的路径

1
2
3
4
5
6
7
8
9
static const char * init_programs[] = {
#if DEBUG
"/usr/local/sbin/launchd.debug",
#endif
#if DEVELOPMENT || DEBUG
"/usr/local/sbin/launchd.development",
#endif
"/sbin/launchd",
};

    我们知道iOS和Mac执行的都是Mach-O格式的文件,即使是launchd也是一样,所以接下来的步骤,同样适用于其他进程加载app程序。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
int execve(proc_t p, struct execve_args *uap, int32_t *retval)
{
struct __mac_execve_args muap;
int err;

memoryshot(VM_EXECVE, DBG_FUNC_NONE);

muap.fname = uap->fname;
muap.argp = uap->argp;
muap.envp = uap->envp;
muap.mac_p = USER_ADDR_NULL;
err = __mac_execve(p, &muap, retval);

return(err);
}

0x02 MACH-O格式

  Mach-O是OS X和iOS的可执行文件,类似于安卓的elf和微软的PE,但又不仅限于可执行文件,比如iOS的动态库其实也可以Mach-O格式。其格式如下图:

Mach-O格式示意图

  Mach-O在加载过程中,在内核态的处理主要是对进程的一些基本设置,比如分配虚拟内存、创建主线程以及代码签名、加密等任务。而在转由去用户态的时候调用动态加载器dyld会继续对Mach-O做处理,比如库加载和符号解析等。

1. header

  头信息的格式如下:

1
2
3
4
5
6
7
8
9
10
struct mach_header_64 {
uint32_t magic; /* 0xfeedfacf表示64位,而0xfeedface表示32位 */
cpu_type_t cputype; /* CPU平台:arm还是i386 */
cpu_subtype_t cpusubtype; /* armv7、armv8等等 */
uint32_t filetype; /* 文件类型,比如是可执行程序还是动态库等 */
uint32_t ncmds; /* load commands的数量 */
uint32_t sizeofcmds; /* load commands的大小 */
uint32_t flags; /* 标签参数 */
uint32_t reserved; /* reserved,保留字段,暂时没用 */
};

1.1 filetype

  常见的Mach-O文件类型有以下几种:

  • MH_OBJECT

    目标文件,比如编译后得到的.o文件

    静态库文件,比如.a文件

  • MH_EXECUTE

    可执行文件,广义上我们口中常说的app文件,即ipa拆包后得到的文件

  • MH_DYLIB

    动态库文件,比如.dylib或.framework

  • MH_DYLINKER

    动态链接器,启动dyld

  • MH_DSYM

    存储着二进制文件符号信息的文件,常用于分析闪退信息等

1.2 flags

  常见的标签参数有以下几种

  • MH_DYLDLINK

    作为动态链接器的输入文件,不能再次被静态链接编辑

  • MH_PIE

    加载主程序在一个随机地址。仅文件类型是MH_EXECUTE的才有效

2. Load Commands

  这个主要描述的是文件在虚拟内存中的逻辑结构和布局,可以在被调用的时候清晰地知道如何设置并加载二进制数据。其结构如下

1
2
3
4
struct load_command {
uint32_t cmd; /* load command类型 */
uint32_t cmdsize; /* 大小 */
};

  Load Commands紧跟着mach_header,其总的大小保存在mach_header里的sizeofcmds里。所有的load commands都必须有自己的两个成员cmd和cmdsize,其中cmdsize在64架构中必须是8的倍数。而cmd表示的是类型,常见的类型如下

  • LC_SEGMENT(LC_SEGMENT_64)

    将文件中(32位或64位)的段映射到进程地址空间。包括__text代码区、常量区和OC类信息等。

  • LC_LOAD_DYLINKER

    启动动态链接器,dyld

  • LC_UUID

    这个id是匹配一个二进制文件及其对应的符号,是个唯一值

  • LC_THREAD

    开启一个Mach线程,不分配栈

  • LC_UNIXTHREAD

    开启一个Unix线程,现被LC_MAIN替代

  • LC_CORE_SIGNATURE

    代码签名,如果签名与代码本身不匹配,进程会被杀掉

  • LC_ENCRYPTION_INFO

    加密信息

  load_commands在Mach-O中的实例结构如下:

load_commands示意图

3. 通用 Mach-O

  根据编译配置,我们可以生成只包含一种架构的Mach-O文件,比如armv7。当然也可以编译生成多架构的的Mach-O文件,这种包含多种架构的我们称之为通用Mach-O,也可以称为Fat Mach-O。运行通用Mach-O的时候,加载器会选择合适的架构的代码去执行。

0x03 地址空间随机布局(ASLR)

  如果应用启动的时候都是进程空间某个固定地址开始,这也就意味着内存中的地址分布具有非常强的可预测性,这就给黑客很大的利用机会。所以现在大部分操作系统都会采用ASLR这样的技术,这将有效防止被攻击。

  进程每一次启动时,地址空间都将被随机化,即偏移。实现方法是通过内核将Mach-O的Segment平移某个随机系数。后面的代码阅读中,我们将会遇到这个技术。

0x04 dyld被加载流程

  在UNIX中,进程不能被创建出来,只能通过fork( ) 系统调用复制出来。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
int __mac_execve(proc_t p, struct __mac_execve_args *uap, int32_t *retval)
{
char *bufp = NULL;
struct image_params *imgp;
struct vnode_attr *vap;
struct vnode_attr *origvap;
int error;
int is_64 = IS_64BIT_PROCESS(p);
struct vfs_context context;
struct uthread *uthread;
task_t new_task = NULL;
boolean_t should_release_proc_ref = FALSE;
boolean_t exec_done = FALSE;
boolean_t in_vfexec = FALSE;
void *inherit = NULL;

context.vc_thread = current_thread();
context.vc_ucred = kauth_cred_proc_ref(p);

// 分配一大块内存
MALLOC(bufp, char *, (sizeof(*imgp) + sizeof(*vap) + sizeof(*origvap)), M_TEMP, M_WAITOK | M_ZERO);
imgp = (struct image_params *) bufp;
if (bufp == NULL) {
error = ENOMEM;
goto exit_with_error;
}
vap = (struct vnode_attr *) (bufp + sizeof(*imgp));
origvap = (struct vnode_attr *) (bufp + sizeof(*imgp) + sizeof(*vap));

// 初始化
imgp->ip_user_fname = uap->fname;
imgp->ip_user_argv = uap->argp;
imgp->ip_user_envv = uap->envp;
imgp->ip_vattr = vap;
imgp->ip_origvattr = origvap;
imgp->ip_vfs_context = &context;
imgp->ip_flags = (is_64 ? IMGPF_WAS_64BIT : IMGPF_NONE) | ((p->p_flag & P_DISABLE_ASLR) ? IMGPF_DISABLE_ASLR : IMGPF_NONE);
imgp->ip_seg = (is_64 ? UIO_USERSPACE64 : UIO_USERSPACE32);
imgp->ip_mac_return = 0;
imgp->ip_cs_error = OS_REASON_NULL;

uthread = get_bsdthread_info(current_thread());
if (uthread->uu_flag & UT_VFORK) {
imgp->ip_flags |= IMGPF_VFORK_EXEC;
in_vfexec = TRUE;
// 程序启动需要fork一条新的进程,会走这个else分支
} else {
imgp->ip_flags |= IMGPF_EXEC;
// fork进程
imgp->ip_new_thread = fork_create_child(current_task(),
NULL, p, FALSE, p->p_flag & P_LP64, TRUE);
/* task and thread ref returned by fork_create_child */
if (imgp->ip_new_thread == NULL) {
error = ENOMEM;
goto exit_with_error;
}

new_task = get_threadtask(imgp->ip_new_thread);
context.vc_thread = imgp->ip_new_thread;
}

// 解析程序
error = exec_activate_image(imgp);

if (imgp->ip_new_thread != NULL) {
new_task = get_threadtask(imgp->ip_new_thread);
}

if (!error && !in_vfexec) {
p = proc_exec_switch_task(p, current_task(), new_task, imgp->ip_new_thread);

should_release_proc_ref = TRUE;
}

kauth_cred_unref(&context.vc_ucred);

if (!error) {
task_bank_init(get_threadtask(imgp->ip_new_thread));
proc_transend(p, 0);

/* Sever any extant thread affinity */
thread_affinity_exec(current_thread());

/* Inherit task role from old task to new task for exec */
if (!in_vfexec) {
proc_inherit_task_role(get_threadtask(imgp->ip_new_thread), current_task());
}

thread_t main_thread = imgp->ip_new_thread;
// 设置进程的主线程
task_set_main_thread_qos(new_task, main_thread);
}
.......
}

static int exec_activate_image(struct image_params *imgp)
{
......
// 调用格式对应的加载函数
// 比如胖指令集有对应的胖指令集加载函数
for(i = 0; error == -1 && execsw[i].ex_imgact != NULL; i++) {

error = (*execsw[i].ex_imgact)(imgp);
......
}
......
}

  execsw的结构如下

1
2
3
4
5
6
7
8
9
struct execsw {
int (*ex_imgact)(struct image_params *);
const char *ex_name;
} execsw[] = {
{ exec_mach_imgact, "Mach-o Binary" },
{ exec_fat_imgact, "Fat Binary" },
{ exec_shell_imgact, "Interpreter Script" },
{ NULL, NULL}
};

  对应的指令加载,load_machfile函数加载mach-o文件,activate_exec_state处理拿到的结果信息

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
static int exec_mach_imgact(struct image_params *imgp)
{
.......
lret = load_machfile(imgp, mach_header, thread, &map, &load_result);
.......
lret = activate_exec_state(task, p, thread, &load_result);
}

load_return_t load_machfile(
struct image_params *imgp,
struct mach_header *header,
thread_t thread,
vm_map_t *mapp,
load_result_t *result
)
{

lret = parse_machfile(vp, map, thread, header, file_offset, macho_size,
0, aslr_page_offset, dyld_aslr_page_offset, result,
NULL, imgp);
}

static int activate_exec_state(task_t task, proc_t p, thread_t thread, load_result_t *result)
{
......
// 设置入口点
thread_setentrypoint(thread, result->entry_point);
......
}

  我们再解析完mach-o文件后,就会拿到结果信息取做处理,其中就有一个设置入口点,也就是在解析完毕后就会跳转到这个入口点运行程序,所以这个入口点很关键,那这个入口点是什么呢?其赋值肯定是在解析mach-o的过程中,所以还是得先来看看解析mach-o文件的过程

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
static
load_return_t
parse_machfile(
struct vnode *vp,
vm_map_t map,
thread_t thread,
struct mach_header *header,
off_t file_offset,
off_t macho_size,
int depth,
int64_t aslr_offset,
int64_t dyld_aslr_offset,
load_result_t *result,
load_result_t *binresult,
struct image_params *imgp
)
{
uint32_t ncmds;
struct load_command *lcp;
struct dylinker_command *dlp = 0;
integer_t dlarchbits = 0;
void * control;
load_return_t ret = LOAD_SUCCESS;
void * addr;
vm_size_t alloc_size, cmds_size;
size_t offset;
size_t oldoffset; /* for overflow check */
int pass;
proc_t p = current_proc(); /* XXXX */
int error;
int resid = 0;
size_t mach_header_sz = sizeof(struct mach_header);
boolean_t abi64;
boolean_t got_code_signatures = FALSE;
boolean_t found_header_segment = FALSE;
boolean_t found_xhdr = FALSE;
int64_t slide = 0;
boolean_t dyld_no_load_addr = FALSE;
boolean_t is_dyld = FALSE;
vm_map_offset_t effective_page_mask = MAX(PAGE_MASK, vm_map_page_mask(map));
#if __arm64__
uint32_t pagezero_end = 0;
uint32_t executable_end = 0;
uint32_t writable_start = 0;
vm_map_size_t effective_page_size;

effective_page_size = MAX(PAGE_SIZE, vm_map_page_size(map));
#endif /* __arm64__ */

if (header->magic == MH_MAGIC_64 ||
header->magic == MH_CIGAM_64) {
mach_header_sz = sizeof(struct mach_header_64);
}

/*
* Break infinite recursion
*/
if (depth > 1) {
return(LOAD_FAILURE);
}
// 此函数会被遍历两次,第一次解析主程序的Mach-O,第二次解析dyld
depth++;

/*
* 校验文件的CPU架构和当前运行环境的CPU架构是否一致
*/
if (((cpu_type_t)(header->cputype & ~CPU_ARCH_MASK) != (cpu_type() & ~CPU_ARCH_MASK)) ||
!grade_binary(header->cputype,
header->cpusubtype & ~CPU_SUBTYPE_MASK))
return(LOAD_BADARCH);

abi64 = ((header->cputype & CPU_ARCH_ABI64) == CPU_ARCH_ABI64);

// 根据文件类型,区别处理
switch (header->filetype) {

// 如果是应用程序,即app
case MH_EXECUTE:
if (depth != 1) {
return (LOAD_FAILURE);
}
#if CONFIG_EMBEDDED
// 如果需要作为动态链接器的输入文件,肯定会进入这里,因为dyld还需要解析一次主程序
if (header->flags & MH_DYLDLINK) {
/* Check properties of dynamic executables */
if (!(header->flags & MH_PIE) && pie_required(header->cputype, header->cpusubtype & ~CPU_SUBTYPE_MASK)) {
return (LOAD_FAILURE);
}
result->needs_dynlinker = TRUE;
} else {
/* Check properties of static executables (disallowed except for development) */
#if !(DEVELOPMENT || DEBUG)
return (LOAD_FAILURE);
#endif
}
#endif /* CONFIG_EMBEDDED */

break;

// 如果是动态链接器
case MH_DYLINKER:
if (depth != 2) {
return (LOAD_FAILURE);
}
is_dyld = TRUE;
break;

default:
return (LOAD_FAILURE);
}

addr = kalloc(alloc_size);
if (addr == NULL) {
return LOAD_NOSPACE;
}

......

// 如果是dyld动态链接器,并且设置了随机地址加载这个动态链接器,就将随机地址的偏移值赋给slide
if ((header->flags & MH_PIE) || is_dyld) {
slide = aslr_offset;
}

/*
* 遍历四次,每次只做一件事
* 0: 检查代码段和数据段是否对齐
* 1: 进程状态, uuid, 代码签名
* 2: segments
* 3: dyld, encryption, check entry point
*/

boolean_t slide_realign = FALSE;
#if __arm64__
if (!abi64) {
slide_realign = TRUE;
}
#endif

for (pass = 0; pass <= 3; pass++) {
// 如果不需要做对齐校验,直接下一轮
if (pass == 0 && !slide_realign && !is_dyld) {
/* if we dont need to realign the slide or determine dyld's load
* address, pass 0 can be skipped */
continue;
} else if (pass == 1) {
#if __arm64__
boolean_t is_pie;
int64_t adjust;

is_pie = ((header->flags & MH_PIE) != 0);
if (pagezero_end != 0 &&
pagezero_end < effective_page_size) {
/* need at least 1 page for PAGEZERO */
adjust = effective_page_size;
MACHO_PRINTF(("pagezero boundary at "
"0x%llx; adjust slide from "
"0x%llx to 0x%llx%s\n",
(uint64_t) pagezero_end,
slide,
slide + adjust,
(is_pie
? ""
: " BUT NO PIE ****** :-(")));
if (is_pie) {
slide += adjust;
pagezero_end += adjust;
executable_end += adjust;
writable_start += adjust;
}
}
if (pagezero_end != 0) {
result->has_pagezero = TRUE;
}
if (executable_end == writable_start &&
(executable_end & effective_page_mask) != 0 &&
(executable_end & FOURK_PAGE_MASK) == 0) {

// 数据段或代码段校对,让其页对齐
adjust =
(effective_page_size -
(executable_end & effective_page_mask));
MACHO_PRINTF(("page-unaligned X-W boundary at "
"0x%llx; adjust slide from "
"0x%llx to 0x%llx%s\n",
(uint64_t) executable_end,
slide,
slide + adjust,
(is_pie
? ""
: " BUT NO PIE ****** :-(")));
if (is_pie)
slide += adjust;
}
#endif /* __arm64__ */

if (dyld_no_load_addr && binresult) {
// dyld在用户态的地址 = 随机地址 + 文件最大的虚拟地址
slide = vm_map_round_page(slide + binresult->max_vm_addr, effective_page_mask);
}
}


offset = mach_header_sz;
ncmds = header->ncmds;

while (ncmds--) {

/*
* 获取要解析的load_command地址
*/
lcp = (struct load_command *)(addr + offset);
oldoffset = offset;


switch(lcp->cmd) {
// 指导内核如何设置新运行进行的内存空间。这些段直接从Mach-O加载到内存中
case LC_SEGMENT: {
struct segment_command *scp = (struct segment_command *) lcp;

......

// segment映射和解析
// segment下还有区的概念,比如__objc_classlist,__objc_protolist
ret = load_segment(lcp,
header->filetype,
control,
file_offset,
macho_size,
vp,
map,
slide,
result);

break;
}
// 映射文件中的特定的字节到虚拟内存
case LC_SEGMENT_64: {
struct segment_command_64 *scp64 = (struct segment_command_64 *) lcp;

......

ret = load_segment(lcp,
header->filetype,
control,
file_offset,
macho_size,
vp,
map,
slide,
result);

break;
}
// UNIX线程,包含堆栈
case LC_UNIXTHREAD:
if (pass != 1)
break;
ret = load_unixthread(
(struct thread_command *) lcp,
thread,
slide,
result);
break;
// 替换LC_UNIXTHREAD
case LC_MAIN:
......
ret = load_main(
(struct entry_point_command *) lcp,
thread,
slide,
result);
break;
// 加载动态链接器
case LC_LOAD_DYLINKER:
if (pass != 3)
break;
if ((depth == 1) && (dlp == 0)) {
// 动态解析器地址
dlp = (struct dylinker_command *)lcp;
dlarchbits = (header->cputype & CPU_ARCH_MASK);
} else {
ret = LOAD_FAILURE;
}
break;
// uuid
case LC_UUID:
if (pass == 1 && depth == 1) {
ret = load_uuid((struct uuid_command *) lcp,
(char *)addr + cmds_size,
result);
}
break;
// 代码签名
case LC_CODE_SIGNATURE:
/* CODE SIGNING */
if (pass != 1)
break;
/* pager -> uip ->
load signatures & store in uip
set VM object "signed_pages"
*/
ret = load_code_signature(
(struct linkedit_data_command *) lcp,
vp,
file_offset,
macho_size,
header->cputype,
result,
imgp);

.......

break;
#if CONFIG_CODE_DECRYPTION
// 加密的段信息
case LC_ENCRYPTION_INFO:
case LC_ENCRYPTION_INFO_64:
if (pass != 3)
break;
ret = set_code_unprotect(
(struct encryption_info_command *) lcp,
addr, map, slide, vp, file_offset,
header->cputype, header->cpusubtype);
......
}
break;
#endif
default:
/* Other commands are ignored by the kernel */
ret = LOAD_SUCCESS;
break;
}
if (ret != LOAD_SUCCESS)
break;
}
if (ret != LOAD_SUCCESS)
break;
}

if (ret == LOAD_SUCCESS) {

/* Make sure if we need dyld, we got it */
if (result->needs_dynlinker && !dlp) {
ret = LOAD_FAILURE;
}

if ((ret == LOAD_SUCCESS) && (dlp != 0)) {
/*
* 加载动态解析器, 会再次调用一次parse_machfile
*/
ret = load_dylinker(dlp, dlarchbits, map, thread, depth,
dyld_aslr_offset, result, imgp);
}

.......
}

if (ret == LOAD_BADMACHO && found_xhdr) {
ret = LOAD_BADMACHO_UPX;
}

kfree(addr, alloc_size);

return ret;
}

  上面的过程得到的结果会被赋值进load_result_t这个结果体

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
typedef struct _load_result {
user_addr_t mach_header;
user_addr_t entry_point;

user_addr_t user_stack;
mach_vm_size_t user_stack_size;

user_addr_t user_stack_alloc;
mach_vm_size_t user_stack_alloc_size;

mach_vm_address_t all_image_info_addr;
mach_vm_size_t all_image_info_size;

int thread_count;
unsigned int
/* boolean_t */ unixproc :1,
needs_dynlinker : 1,
dynlinker :1,
validentry :1,
has_pagezero :1,
using_lcmain :1,
is64bit :1,
:0;
unsigned int csflags;
unsigned char uuid[16];
mach_vm_address_t min_vm_addr;
mach_vm_address_t max_vm_addr;
unsigned int platform_binary;
off_t cs_end_offset;
void *threadstate;
size_t threadstate_sz;
} load_result_t;

  那么在哪里设置entry_point,其实entry_point的设置在load_dylinker里

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
static load_return_t load_dylinker{
.......
*myresult = load_result_null;
myresult->is64bit = result->is64bit;

ret = parse_machfile(vp, map, thread, header, file_offset,
macho_size, depth, slide, 0, myresult, result, imgp);

if (ret == LOAD_SUCCESS) {
if (result->threadstate) {
/* don't use the app's threadstate if we have a dyld */
kfree(result->threadstate, result->threadstate_sz);
}
result->threadstate = myresult->threadstate;
result->threadstate_sz = myresult->threadstate_sz;

result->dynlinker = TRUE;
// 将load_result_t的entry_point,设置为dyld动态链接库的entrypoint,所以启动的时候首先加载的会是dyld。
result->entry_point = myresult->entry_point;
result->validentry = myresult->validentry;
result->all_image_info_addr = myresult->all_image_info_addr;
result->all_image_info_size = myresult->all_image_info_size;
if (myresult->platform_binary) {
result->csflags |= CS_DYLD_PLATFORM;
}
}
....
}

  最后,梳理下这个app启动流程:

  • fork一条新的进程出来

  • 激活app

    a. 区分文件,Mach-o Binary和Fat Binary都有对应的加载函数

    b. 分配内存

    c. 解析主程序的Mach-O信息

    d. 读取主程序Mach-O头信息

    e. 遍历主程序每条load command信息,装载进内存

    f. 解析dyld,再把d,e的内容再做一遍,期间会将entry_point入口地址改为dyld的入口地址。

  • 进入entry_point对应的入口,启动dyld

  • 设置进程的主线程

  所有的操作做完,这时候也已经从内核态进入用户态了。

0x05 dyld加载程序流程

  上面在最后一次加载完dyld后,就进入dyld的入口函数,即__dyld_start,这段其实是一段汇编代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
__dyld_start:
// 一些准备工作,获取头、参数等等这类的信息
mov x28, sp
and sp, x28, #~15 // force 16-byte alignment of stack
mov x0, #0
mov x1, #0
stp x1, x0, [sp, #-16]! // make aligned terminating frame
mov fp, sp // set up fp to point to terminating frame
sub sp, sp, #16 // make room for local variables
ldr x0, [x28] // get app's mh into x0
ldr x1, [x28, #8] // get argc into x1 (kernel passes 32-bit int argc as 64-bits on stack to keep alignment)
add x2, x28, #16 // get argv into x2
adrp x4,___dso_handle@page
add x4,x4,___dso_handle@pageoff // get dyld's mh in to x4
adrp x3,__dso_static@page
ldr x3,[x3,__dso_static@pageoff] // get unslid start of dyld
sub x3,x4,x3 // x3 now has slide of dyld
mov x5,sp // x5 has &startGlue

// 启动引导,入口为dyldbootstrap::start函数
bl __ZN13dyldbootstrap5startEPK12macho_headeriPPKclS2_Pm
// 会返回主程序的入口地址,并保存到x16寄存器
mov x16,x0
ldr x1, [sp]
cmp x1, #0
b.ne Lnew

// LC_UNIXTHREAD,由LC_MAIN代替,所以直接看下面的LC_MAIN
add sp, x28, #8
br x16

// LC_MAIN ,设置栈信息,并跳入到主程序的入口
Lnew: mov lr, x1 // simulate return address into _start in libdyld.dylib
ldr x0, [x28, #8] // 参数1 = argc
add x1, x28, #16 // 参数2 = argv
add x2, x1, x0, lsl #3
add x2, x2, #8 // 参数3 = &env[0]
mov x3, x2
Lapple: ldr x4, [x3]
add x3, x3, #8
cmp x4, #0
b.ne Lapple // 参数4 = apple
// 跳转到主程序的main函数
br x16

  __dyld_start首先会调用dyldbootstrap::start函数对主程序再次进行一些处理,比如加载动态库,处理完成后会返回主程序的入口地址,然后设置好主程序入口的一些参数后就进入到主程序的main函数。我们关注的是主程序启动前还做了些什么事情?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
uintptr_t start(const struct macho_header* appsMachHeader, int argc, const char* argv[], 
intptr_t slide, const struct macho_header* dyldsMachHeader,
uintptr_t* startGlue)
{
if ( slide != 0 ) {
rebaseDyld(dyldsMachHeader, slide);
}

mach_init();

// kernel sets up env pointer to be just past end of agv array
const char** envp = &argv[argc+1];

// kernel sets up apple pointer to be just past end of envp array
const char** apple = envp;
while(*apple != NULL) { ++apple; }
++apple;

// set up random value for stack canary
__guard_setup(apple);

#if DYLD_INITIALIZER_SUPPORT
// run all C++ initializers inside dyld
runDyldInitializers(dyldsMachHeader, slide, argc, argv, envp, apple);
#endif

// now that we are done bootstrapping dyld, call dyld's main
uintptr_t appsSlide = slideOfMainExecutable(appsMachHeader);
return dyld::_main(appsMachHeader, appsSlide, argc, argv, envp, apple, startGlue);
}

uintptr_t _main(const macho_header* mainExecutableMH, uintptr_t mainExecutableSlide,
int argc, const char* argv[], const char* envp[], const char* apple[],
uintptr_t* startGlue)
{
dyld3::kdebug_trace_dyld_signpost(DBG_DYLD_SIGNPOST_START_DYLD, 0, 0);

// Grab the cdHash of the main executable from the environment
uint8_t mainExecutableCDHashBuffer[20];
const uint8_t* mainExecutableCDHash = nullptr;
if ( hexToBytes(_simple_getenv(apple, "executable_cdhash"), 40, mainExecutableCDHashBuffer) )
mainExecutableCDHash = mainExecutableCDHashBuffer;

// Trace dyld's load
notifyKernelAboutImage((macho_header*)&__dso_handle, _simple_getenv(apple, "dyld_file"));
#if !TARGET_IPHONE_SIMULATOR
// Trace the main executable's load
notifyKernelAboutImage(mainExecutableMH, _simple_getenv(apple, "executable_file"));
#endif

uintptr_t result = 0;
sMainExecutableMachHeader = mainExecutableMH;
sMainExecutableSlide = mainExecutableSlide;

CRSetCrashLogMessage("dyld: launch started");
// 设置上下文运行环境
setContext(mainExecutableMH, argc, argv, envp, apple);

// Pickup the pointer to the exec path.
sExecPath = _simple_getenv(apple, "executable_path");

// Remember short name of process for later logging
sExecShortName = ::strrchr(sExecPath, '/');
if ( sExecShortName != NULL )
++sExecShortName;
else
sExecShortName = sExecPath;

// 配置进程限制
configureProcessRestrictions(mainExecutableMH);

checkEnvironmentVariables(envp);

defaultUninitializedFallbackPaths(envp);

if ( sEnv.DYLD_PRINT_OPTS )
printOptions(argv);
if ( sEnv.DYLD_PRINT_ENV )
printEnvironmentVariables(envp);
getHostInfo(mainExecutableMH, mainExecutableSlide);


checkSharedRegionDisable((mach_header*)mainExecutableMH);
// 加载共享缓存库
if ( gLinkContext.sharedRegionMode != ImageLoader::kDontUseSharedRegion ) {
mapSharedCache();
}

......

// install gdb notifier
stateToHandlers(dyld_image_state_dependents_mapped, sBatchHandlers)->push_back(notifyGDB);
stateToHandlers(dyld_image_state_mapped, sSingleHandlers)->push_back(updateAllImages);
// make initial allocations large enough that it is unlikely to need to be re-alloced
sImageRoots.reserve(16);
sAddImageCallbacks.reserve(4);
sRemoveImageCallbacks.reserve(4);
sImageFilesNeedingTermination.reserve(16);
sImageFilesNeedingDOFUnregistration.reserve(8);


try {
// add dyld itself to UUID list
addDyldImageToUUIDList();

#if SUPPORT_ACCELERATE_TABLES
bool mainExcutableAlreadyRebased = false;
if ( (sSharedCacheLoadInfo.loadAddress != nullptr) && !dylibsCanOverrideCache() && !sDisableAcceleratorTables && (sSharedCacheLoadInfo.loadAddress->header.accelerateInfoAddr != 0) ) {
struct stat statBuf;
if ( ::stat(IPHONE_DYLD_SHARED_CACHE_DIR "no-dyld2-accelerator-tables", &statBuf) != 0 )
sAllCacheImagesProxy = ImageLoaderMegaDylib::makeImageLoaderMegaDylib(&sSharedCacheLoadInfo.loadAddress->header, sSharedCacheLoadInfo.slide, mainExecutableMH, gLinkContext);
}

reloadAllImages:
#endif

CRSetCrashLogMessage(sLoadingCrashMessage);
// 初始化主程序
sMainExecutable = instantiateFromLoadedImage(mainExecutableMH, mainExecutableSlide, sExecPath);
gLinkContext.mainExecutable = sMainExecutable;
gLinkContext.mainExecutableCodeSigned = hasCodeSignatureLoadCommand(mainExecutableMH);


gLinkContext.strictMachORequired = true;

#if SUPPORT_ACCELERATE_TABLES
sAllImages.reserve((sAllCacheImagesProxy != NULL) ? 16 : INITIAL_IMAGE_COUNT);
#else
sAllImages.reserve(INITIAL_IMAGE_COUNT);
#endif

// Now that shared cache is loaded, setup an versioned dylib overrides
#if SUPPORT_VERSIONED_PATHS
checkVersionedPaths();
#endif


// dyld_all_image_infos image list does not contain dyld
// add it as dyldPath field in dyld_all_image_infos
// for simulator, dyld_sim is in image list, need host dyld added
#if TARGET_IPHONE_SIMULATOR
// get path of host dyld from table of syscall vectors in host dyld
void* addressInDyld = gSyscallHelpers;
#else
// get path of dyld itself
void* addressInDyld = (void*)&__dso_handle;
#endif
char dyldPathBuffer[MAXPATHLEN+1];
int len = proc_regionfilename(getpid(), (uint64_t)(long)addressInDyld, dyldPathBuffer, MAXPATHLEN);
if ( len > 0 ) {
dyldPathBuffer[len] = '\0'; // proc_regionfilename() does not zero terminate returned string
if ( strcmp(dyldPathBuffer, gProcessInfo->dyldPath) != 0 )
gProcessInfo->dyldPath = strdup(dyldPathBuffer);
}

// 加载插入的动态库
if ( sEnv.DYLD_INSERT_LIBRARIES != NULL ) {
for (const char* const* lib = sEnv.DYLD_INSERT_LIBRARIES; *lib != NULL; ++lib)
loadInsertedDylib(*lib);
}
// record count of inserted libraries so that a flat search will look at
// inserted libraries, then main, then others.
sInsertedDylibCount = sAllImages.size()-1;

// 链接主程序
gLinkContext.linkingMainExecutable = true;
link(sMainExecutable, sEnv.DYLD_BIND_AT_LAUNCH, true, ImageLoader::RPathChain(NULL, NULL), -1);
sMainExecutable->setNeverUnloadRecursive();
if ( sMainExecutable->forceFlat() ) {
gLinkContext.bindFlat = true;
gLinkContext.prebindUsage = ImageLoader::kUseNoPrebinding;
}

// 链接插入的动态库
// do this after linking main executable so that any dylibs pulled in by inserted
// dylibs (e.g. libSystem) will not be in front of dylibs the program uses
if ( sInsertedDylibCount > 0 ) {
for(unsigned int i=0; i < sInsertedDylibCount; ++i) {
ImageLoader* image = sAllImages[i+1];
link(image, sEnv.DYLD_BIND_AT_LAUNCH, true, ImageLoader::RPathChain(NULL, NULL), -1);
image->setNeverUnloadRecursive();
}
// only INSERTED libraries can interpose
// register interposing info after all inserted libraries are bound so chaining works
for(unsigned int i=0; i < sInsertedDylibCount; ++i) {
ImageLoader* image = sAllImages[i+1];
image->registerInterposing();
}
}

// <rdar://problem/19315404> dyld should support interposition even without DYLD_INSERT_LIBRARIES
for (long i=sInsertedDylibCount+1; i < sAllImages.size(); ++i) {
ImageLoader* image = sAllImages[i];
if ( image->inSharedCache() )
continue;
image->registerInterposing();
}
#if SUPPORT_ACCELERATE_TABLES
if ( (sAllCacheImagesProxy != NULL) && ImageLoader::haveInterposingTuples() ) {
// Accelerator tables cannot be used with implicit interposing, so relaunch with accelerator tables disabled
ImageLoader::clearInterposingTuples();
// unmap all loaded dylibs (but not main executable)
for (long i=1; i < sAllImages.size(); ++i) {
ImageLoader* image = sAllImages[i];
if ( image == sMainExecutable )
continue;
if ( image == sAllCacheImagesProxy )
continue;
image->setCanUnload();
ImageLoader::deleteImage(image);
}
// note: we don't need to worry about inserted images because if DYLD_INSERT_LIBRARIES was set we would not be using the accelerator table
sAllImages.clear();
sImageRoots.clear();
sImageFilesNeedingTermination.clear();
sImageFilesNeedingDOFUnregistration.clear();
sAddImageCallbacks.clear();
sRemoveImageCallbacks.clear();
sDisableAcceleratorTables = true;
sAllCacheImagesProxy = NULL;
sMappedRangesStart = NULL;
mainExcutableAlreadyRebased = true;
gLinkContext.linkingMainExecutable = false;
resetAllImages();
goto reloadAllImages;
}
#endif

// apply interposing to initial set of images
for(int i=0; i < sImageRoots.size(); ++i) {
sImageRoots[i]->applyInterposing(gLinkContext);
}
gLinkContext.linkingMainExecutable = false;

// <rdar://problem/12186933> do weak binding only after all inserted images linked
// 弱符号绑定
sMainExecutable->weakBind(gLinkContext);


CRSetCrashLogMessage("dyld: launch, running initializers");
// 初始化
initializeMainExecutable();

// notify any montoring proccesses that this process is about to enter main()
dyld3::kdebug_trace_dyld_signpost(DBG_DYLD_SIGNPOST_START_MAIN_DYLD2, 0, 0);
notifyMonitoringDyldMain();

// 寻找主程序入口点
result = (uintptr_t)sMainExecutable->getThreadPC();
if ( result != 0 ) {
// main executable uses LC_MAIN, needs to return to glue in libdyld.dylib
if ( (gLibSystemHelpers != NULL) && (gLibSystemHelpers->version >= 9) )
*startGlue = (uintptr_t)gLibSystemHelpers->startGlueToCallExit;
else
halt("libdyld.dylib support not present for LC_MAIN");
}
else {
// main executable uses LC_UNIXTHREAD, dyld needs to let "start" in program set up for main()
result = (uintptr_t)sMainExecutable->getMain();
*startGlue = 0;
}
}
catch(const char* message) {
syncAllImages();
halt(message);
}
catch(...) {
dyld::log("dyld: launch failed\n");
}

CRSetCrashLogMessage(NULL);

if (sSkipMain) {
dyld3::kdebug_trace_dyld_signpost(DBG_DYLD_SIGNPOST_START_MAIN, 0, 0);
result = (uintptr_t)&fake_main;
*startGlue = (uintptr_t)gLibSystemHelpers->startGlueToCallExit;
}

return result;
}

  主要的步骤如下:

  • 设置上下文运行环境
  • 加载共享缓存库
  • 初始化主程序
  • 加载插入的动态库
  • 链接主程序
  • 链接插入的动态库
  • 初始化主程序
  • 寻找主程序入口点
  • 进入主程序入口点

加载共享缓存库:mapSharedCache

  我们需要知道,像每个app自带的动态库,比如libobj或者libdispatch,都是被映射到在一个共享区,每个app都是从这里读取动态库的内容。这样就可以大大节省了内存空间。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
static void mapSharedCache()
{
dyld3::SharedCacheOptions opts;
opts.cacheDirOverride = sSharedCacheOverrideDir;
opts.forcePrivate = (gLinkContext.sharedRegionMode == ImageLoader::kUsePrivateSharedRegion);
opts.useHaswell = sHaswell;
opts.verbose = gLinkContext.verboseMapping;
// 加载动态库缓存
loadDyldCache(opts, &sSharedCacheLoadInfo);

// update global state
if ( sSharedCacheLoadInfo.loadAddress != nullptr ) {
dyld::gProcessInfo->processDetachedFromSharedRegion = opts.forcePrivate;
dyld::gProcessInfo->sharedCacheSlide = sSharedCacheLoadInfo.slide;
dyld::gProcessInfo->sharedCacheBaseAddress = (unsigned long)sSharedCacheLoadInfo.loadAddress;
sSharedCacheLoadInfo.loadAddress->getUUID(dyld::gProcessInfo->sharedCacheUUID);
dyld3::kdebug_trace_dyld_image(DBG_DYLD_UUID_SHARED_CACHE_A, (const uuid_t *)&dyld::gProcessInfo->sharedCacheUUID[0], {0,0}, {{ 0, 0 }}, (const mach_header *)sSharedCacheLoadInfo.loadAddress);
}
}

bool loadDyldCache(const SharedCacheOptions& options, SharedCacheLoadInfo* results)
{
results->loadAddress = 0;
results->slide = 0;
results->cachedDylibsGroup = nullptr;
results->errorMessage = nullptr;

if ( options.forcePrivate ) {
// mmap cache into this process only
return mapCachePrivate(options, results);
}
else {
// 已经映射到了共享区域了,直接将它在共享内存中的内存地址映射到进程的内存地址空间
if ( reuseExistingCache(options, results) )
return (results->errorMessage != nullptr);

// 如果是第一个程序刚刚启动,共享区其实没内容的,需要将库映射到共享区
return mapCacheSystemWide(options, results);
}
}

初始化主程序:instantiateFromLoadedImage

  主要工作就是创建一个装在主程序的映像加载器(ImageLoader)。主要流程就三步:

  • 检查主程序运行的CPU架构与当前设备的CPU架构是否匹配
  • 实例化一个ImageLoader
  • 把ImageLoader添加到一个管理表中
1
2
3
4
5
6
7
8
9
10
static ImageLoaderMachO* instantiateFromLoadedImage(const macho_header* mh, uintptr_t slide, const char* path)
{
if ( isCompatibleMachO((const uint8_t*)mh, path) ) {
ImageLoader* image = ImageLoaderMachO::instantiateMainExecutable(mh, slide, path, gLinkContext);
addImage(image);
return (ImageLoaderMachO*)image;
}

throw "main executable not a known format";
}

  主要看下instantiateMainExecutable的实现

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
ImageLoader* ImageLoaderMachO::instantiateMainExecutable(const macho_header* mh, uintptr_t slide, const char* path, const LinkContext& context)
{
bool compressed;
unsigned int segCount;
unsigned int libCount;
const linkedit_data_command* codeSigCmd;
const encryption_info_command* encryptCmd;
// 判断主程序是否压缩的,现在基本的程序都是压缩的
// 判断方式通过段类型为LC_DYLD_INFO和LC_DYLD_INFO_ONLY的信息
// switch (cmd->cmd) {
// case LC_DYLD_INFO:
// case LC_DYLD_INFO_ONLY:
// if ( cmd->cmdsize != sizeof(dyld_info_command) )
// throw "malformed mach-o image: LC_DYLD_INFO size wrong";
// dyldInfoCmd = (struct dyld_info_command*)cmd;
// *compressed = true;
// break;
// .......
// }
// ......
sniffLoadCommands(mh, path, false, &compressed, &segCount, &libCount, context, &codeSigCmd, &encryptCmd);
// 根据load commands的内容来实例化具体类,这里返回一个ImageLoaderMachOCompressed对象
if ( compressed )
return ImageLoaderMachOCompressed::instantiateMainExecutable(mh, slide, path, segCount, libCount, context);
else
return ImageLoaderMachOClassic::instantiateMainExecutable(mh, slide, path,
}

加载插入的动态库:loadInsertedDylib

  循环遍历DYLD_INSERT_LIBRARIES环境变量中指定的动态库列表

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
static void loadInsertedDylib(const char* path)
{
ImageLoader* image = NULL;
unsigned cacheIndex;
try {
LoadContext context;
.......
image = load(path, context, cacheIndex);
}
......
}

//
ImageLoader* load(const char* path, const LoadContext& context, unsigned& cacheIndex)
{
......
ImageLoader* image = loadPhase0(path, orgPath, context, cacheIndex, NULL);
......
}

// 从DYLD_ROOT_PATH路径进行查找
static ImageLoader* loadPhase0(const char* path, const char* orgPath, const LoadContext& context, unsigned& cacheIndex, std::vector<const char*>* exceptions)
{
#if SUPPORT_ROOT_PATH
// handle DYLD_ROOT_PATH which forces absolute paths to use a new root
if ( (gLinkContext.rootPaths != NULL) && (path[0] == '/') ) {
for(const char* const* rootPath = gLinkContext.rootPaths ; *rootPath != NULL; ++rootPath) {
char newPath[strlen(*rootPath) + strlen(path)+2];
strcpy(newPath, *rootPath);
strcat(newPath, path);
ImageLoader* image = loadPhase1(newPath, orgPath, context, cacheIndex, exceptions);
if ( image != NULL )
return image;
}
}
#endif

// try raw path
return loadPhase1(path, orgPath, context, cacheIndex, exceptions);
}

// loadPhase1从LD_LIBRARY_PATH路径进行查找
// loadPhase2从executable_path路径进行查找
// loadPhase3 ~ loadPhase4类似的路径查找
static ImageLoader* loadPhase5(const char* path, const char* orgPath, const LoadContext& context, unsigned& cacheIndex, std::vector<const char*>* exceptions)
{
for (std::vector<DylibOverride>::iterator it = sDylibOverrides.begin(); it != sDylibOverrides.end(); ++it) {
if ( strcmp(it->installName, path) == 0 ) {
path = it->override;
break;
}
}

if ( exceptions != NULL )
// 尝试打开,如果在共享区,直接返回该动态库对应的加载器(ImageLoader),如果不是则在硬盘区,则尝试打开,如果能打开调用loadPhase6将动态库映射到一个加载器
return loadPhase5load(path, orgPath, context, cacheIndex, exceptions);
else
// 检查是否已经存在,如果已经存在直接返回
return loadPhase5check(path, orgPath, context);
}

static ImageLoader* loadPhase6(int fd, const struct stat& stat_buf, const char* path, const LoadContext& context)
{
......
if ( isCompatibleMachO(firstPages, path) ) {

// 检查MACH-0类型,只有MH_BUNDLE, MH_DYLIB, 和一些MH_EXECUTE类型才可以被动态加载
const mach_header* mh = (mach_header*)firstPages;
switch ( mh->filetype ) {
case MH_EXECUTE:
case MH_DYLIB:
case MH_BUNDLE:
break;
default:
throw "mach-o, but wrong filetype";
}
......
// 初始化一个映像加载器
ImageLoader* image = ImageLoaderMachO::instantiateFromFile(path, fd, firstPages, headerAndLoadCommandsSize, fileOffset, fileLength, stat_buf, gLinkContext);

// 添加到全局管理
return checkandAddImage(image, context);
}
......
}

  这步主要加载所有的动态库,符号绑定等。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
void ImageLoader::link(const LinkContext& context, bool forceLazysBound, bool preflightOnly, bool neverUnload, const RPathChain& loaderRPaths, const char* imagePath)
{

// clear error strings
(*context.setErrorStrings)(0, NULL, NULL, NULL);

uint64_t t0 = mach_absolute_time();
// 递归加载所有动态库
this->recursiveLoadLibraries(context, preflightOnly, loaderRPaths, imagePath);
context.notifyBatch(dyld_image_state_dependents_mapped, preflightOnly);

// we only do the loading step for preflights
if ( preflightOnly )
return;

uint64_t t1 = mach_absolute_time();
context.clearAllDepths();
// 刷新库依赖的层级。层级越深,depth越大
this->recursiveUpdateDepth(context.imageCount());

uint64_t t2 = mach_absolute_time();
// 递归修正自己和加载动态库的基地址
this->recursiveRebase(context);
context.notifyBatch(dyld_image_state_rebased, false);

uint64_t t3 = mach_absolute_time();
// 对no-lazy符号进行绑定,修正那些指向其他二进制文件所包含的符号的指针
// lazy在运行时绑定。
this->recursiveBind(context, forceLazysBound, neverUnload);

uint64_t t4 = mach_absolute_time();
// 弱符号绑定
if ( !context.linkingMainExecutable )
this->weakBind(context);
uint64_t t5 = mach_absolute_time();

context.notifyBatch(dyld_image_state_bound, false);
uint64_t t6 = mach_absolute_time();

std::vector<DOFInfo> dofs;
// 注册程序的DOF节区,供dtrace使用
this->recursiveGetDOFSections(context, dofs);
context.registerDOFs(dofs);
uint64_t t7 = mach_absolute_time();

// interpose any dynamically loaded images
if ( !context.linkingMainExecutable && (fgInterposingTuples.size() != 0) ) {
this->recursiveApplyInterposing(context);
}

// clear error strings
(*context.setErrorStrings)(0, NULL, NULL, NULL);

fgTotalLoadLibrariesTime += t1 - t0;
fgTotalRebaseTime += t3 - t2;
fgTotalBindTime += t4 - t3;
fgTotalWeakBindTime += t5 - t4;
fgTotalDOF += t7 - t6;

// done with initial dylib loads
fgNextPIEDylibAddress = 0;
}

  继续看Mach-O格式图,我们可以看到text段下有__stubs__stb_helper,以及data段下有__nl_symbol_ptr__la_symbol_ptr

Mach-O示意图

  __nl_symbol_ptr__la_symbol_ptr 分别表示non lazy binding指针表和lazy binding指针表,这两个指针表分别保存的是字符串标对应的函数地址。

  我们通过一个例子来了解__stubs__stb_helper__nl_symbol_ptr__la_symbol_ptr 之间的关系。测试代码如下

1
2
3
4
5
int main(int argc, char * argv[]) {
printf("测试1");
printf("测试2");
return 0;
}

  在第一个printf打下断点,进入汇编模式进行查看

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
testData`main:
0x1000ca618 <+0>: sub sp, sp, #0x30 ; =0x30
0x1000ca61c <+4>: stp x29, x30, [sp, #0x20]
0x1000ca620 <+8>: add x29, sp, #0x20 ; =0x20
0x1000ca624 <+12>: stur wzr, [x29, #-0x4]
0x1000ca628 <+16>: stur w0, [x29, #-0x8]
0x1000ca62c <+20>: str x1, [sp, #0x10]
-> 0x1000ca630 <+24>: adrp x0, 4
0x1000ca634 <+28>: add x0, x0, #0x871 ; =0x871
; 跳入__stubs区
0x1000ca638 <+32>: bl 0x1000cc67c ; symbol stub for: printf

// 查看0x1000cc67c处内存内容是什么
// 的确是跳入stubs区
(lldb) image lookup --address 0x1000cc67c
Address: testData[0x000000010000867c] (testData.__TEXT.__stubs + 540)
Summary: testData`symbol stub for: printf

  给0x1000cc67c下断点,继续看进入stub做什么了

1
2
3
4
5
6
7
8
9
10
testData`printf:
-> 0x1000cc67c <+0>: nop
; 跳入0x00000001000cc934,即进入stub_helper
0x1000cc680 <+4>: ldr x16, #0x3b70 ; (void *)0x0000000100080934
0x1000cc684 <+8>: br x16

// 查看0x00000001000cc934内容,发现进入了stub_helper
(lldb) image lookup --address 0x0000000100080934
Address: testData[0x0000000100008934] (testData.__TEXT.__stub_helper + 588)
Summary:

  0x0000000100080934 - 0x0000000000078000 = 0x0000000100008934 ,而这个0x0000000100008934在Mach-O的位置,就是__la_symbol_ptr 内指向printf位置的地址

Mach-O示意图

  继续给0x0000000100080934 下断点,查看后面指令

1
2
3
4
5
6
->  0x1000cc934: ldr    w16, 0x1000cc93c
0x1000cc938: b 0x1000cc6e8

(lldb) image lookup --address 0x1001046e8
Address: testData[0x00000001000086e8] (testData.__TEXT.__stub_helper + 0)
Summary:

  0x1000cc6e8处下断点

1
2
3
4
5
6
7
8
9
0x1001046e8: adr    x17, #0x3998              ; (void *)0x00000001200da038: initialPoolContent + 2856
0x1001046ec: nop
0x1001046f0: stp x16, x17, [sp, #-0x10]!
0x1001046f4: nop
;跳入dyld_stub_binder函数
0x1001046f8: ldr x16, #0x3980 ; (void *)0x00000001944e915c: dyld_stub_binder
0x1001046fc: br x16
0x100104700: ldr w16, 0x100104708
0x100104704: b 0x1001046e8

  dyld_stub_binder函数是个汇编函数,它函数地址其实是从__nl_symbol_ptr取到的,在ARM中__nl_symbol_ptr就是__got

Mach-O示意图

  总结,第一次访问printf符号的时候先去stubstub告诉从__la_symbol_ptr查找,__la_symbol_ptr表示还没有printf符号真实函数地址,需要动态绑定,于是去__nl_symbol_ptr查找dyld_stub_binder函数的地址,进行查找真实的printf地址。找到后调用printf函数,并把这个地址保存进__la_symbol_ptr。下次调用printf函数的时候在__la_symbol_ptr就能得到真实地址进行跳转。

初始化主程序:initializeMainExecutable

  初始化主程序和以及其相关的模块,比如动态库。这时候就会执行以前文章说的libobjc库

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
void initializeMainExecutable()
{
// record that we've reached this step
gLinkContext.startedInitializingMainExecutable = true;

// run initialzers for any inserted dylibs
ImageLoader::InitializerTimingList initializerTimes[allImagesCount()];
initializerTimes[0].count = 0;
const size_t rootCount = sImageRoots.size();
if ( rootCount > 1 ) {
for(size_t i=1; i < rootCount; ++i) {
sImageRoots[i]->runInitializers(gLinkContext, initializerTimes[0]);
}
}

// run initializers for main executable and everything it brings up
sMainExecutable->runInitializers(gLinkContext, initializerTimes[0]);

// register cxa_atexit() handler to run static terminators in all loaded images when this process exits
if ( gLibSystemHelpers != NULL )
(*gLibSystemHelpers->cxa_atexit)(&runAllStaticTerminators, NULL, NULL);

// dump info if requested
if ( sEnv.DYLD_PRINT_STATISTICS )
ImageLoader::printStatistics((unsigned int)allImagesCount(), initializerTimes[0]);
if ( sEnv.DYLD_PRINT_STATISTICS_DETAILS )
ImageLoaderMachO::printStatisticsDetails((unsigned int)allImagesCount(), initializerTimes[0]);
}

void ImageLoader::runInitializers(const LinkContext& context, InitializerTimingList& timingInfo)
{
uint64_t t1 = mach_absolute_time();
mach_port_t thisThread = mach_thread_self();
ImageLoader::UninitedUpwards up;
up.count = 1;
up.images[0] = this;
processInitializers(context, thisThread, timingInfo, up);
context.notifyBatch(dyld_image_state_initialized, false);
mach_port_deallocate(mach_task_self(), thisThread);
uint64_t t2 = mach_absolute_time();
fgTotalInitTime += (t2 - t1);
}

void ImageLoader::processInitializers(const LinkContext& context, mach_port_t thisThread,
InitializerTimingList& timingInfo, ImageLoader::UninitedUpwards& images)
{
uint32_t maxImageCount = context.imageCount()+2;
ImageLoader::UninitedUpwards upsBuffer[maxImageCount];
ImageLoader::UninitedUpwards& ups = upsBuffer[0];
ups.count = 0;
// 递归初始化所有的映像加载器内的数据,Load方法也在这里被初始化
for (uintptr_t i=0; i < images.count; ++i) {
images.images[i]->recursiveInitialization(context, thisThread, images.images[i]->getPath(), timingInfo, ups);
}
// If any upward dependencies remain, init them.
if ( ups.count > 0 )
processInitializers(context, thisThread, timingInfo, ups);
}

寻找主程序入口点:getThreadPC

  在LC_MAIN段,程序入口点的保存位置是程序的头的起始位置 + 段记录的偏移地址。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
void* ImageLoaderMachO::getThreadPC() const
{
const uint32_t cmd_count = ((macho_header*)fMachOData)->ncmds;
const struct load_command* const cmds = (struct load_command*)&fMachOData[sizeof(macho_header)];
const struct load_command* cmd = cmds;
for (uint32_t i = 0; i < cmd_count; ++i) {
if ( cmd->cmd == LC_MAIN ) {
entry_point_command* mainCmd = (entry_point_command*)cmd;
void* entry = (void*)(mainCmd->entryoff + (char*)fMachOData);
// <rdar://problem/8543820&9228031> verify entry point is in image
if ( this->containsAddress(entry) )
return entry;
else
throw "LC_MAIN entryoff is out of range";
}
cmd = (const struct load_command*)(((char*)cmd)+cmd->cmdsize);
}
return NULL;
}