笔者毕业后就职于某手机厂商从事 Android 系统上层稳定性和一些 Android Framework 的工作。在两年多的工作学习中,逐渐发现想弄清楚 Android 系统中的一些机制,必须深入到 Linux 内核才行,于是打算基于手上现有从某鱼淘来的 J-Link edu 和前年在某宝购买的树莓派 4B ,在自己的 Windows 笔记本上搭建一个便于调试 Linux 内核的环境来辅助学习。由于不熟悉 Linux 内核及 openocd,导致期间踩了一些坑,也遇到了一些难题,好在最后终于完成了环境的搭建和基础调试功能的正常执行。
本文是我在掘金上的第一篇文章,主要介绍了该调试环境的搭建方法,并分享了基础的调试手段,疏漏之处烦请各位大佬指点。
1.设置树莓派 4B
1.1 刷入最新官方树莓派 4B 64 位系统
按照树莓派官网教程 www.raspberrypi.com/software/ 刷入最新 64 位系统即可。
1.2 修改内核并刷入
从 downloads.raspberrypi.org/raspios_arm… 可以得到所刷入系统对应的 info 文件。
该文件的前面几行内容如下:
Raspberry Pi reference 2022-09-22
Generated using pi-gen, https://github.com/RPi-Distro/pi-gen, 005a8c73b05a2cab394073150208bf4f069e861a, stage4
Firmware: https://github.com/raspberrypi/firmware/tree/48cd70fe84432c5d050637b61e4b7b9c831c98bf
Kernel: https://github.com/raspberrypi/linux/tree/5b775d7293eb75d6dfc9c5ffcb95c5012cd0c3f8
Kernel 对应的链接就是我们所需要的内核源码。
源码准备好后,对源码路径下的 arch/arm64/configs/bcm2711_defconfig 做如下修改:
i. 为了方便调试,关闭基地址随机化,即注释掉 CONFIG_RANDOMIZE_BASE=y
ii. 为了得到调试符号,在最后追加 CONFIG_DEBUG_INFO=y
完成以上两步后,按照树莓派官方提供的方法 www.raspberrypi.com/documentati… 编译并刷入内核。
在树莓派 shell 中执行 sudo reboot,检查刷入新内核后树莓派 4B 能否正常开机,以及基本功能是否正常。
2.连接 J-Link 和树莓派 4B
按照笨叔的教程 zhuanlan.zhihu.com/p/465375822 ,连接 J-Link 和树莓派 4B。
按照 visualgdb.com/UsbDriverTo… 中的说明,为 J-Link 安装好 WinUSB 驱动。
3.配置并启动 openocd
3.1 下载并安装 openocd
下载并安装 gnutoolchains.com/arm-eabi/op… 中最新的 openocd。
3.2 获取树莓派 4B 的 openocd 配置文件
笔者使用的配置文件如下(Copy from github.com/sysprogs/op… ):
# SPDX-License-Identifier: GPL-2.0-or-later
# The Broadcom BCM2711 used in Raspberry Pi 4
# No documentation was found on Broadcom website
# Partial information is available in raspberry pi website:
# https://www.raspberrypi.org/documentation/hardware/raspberrypi/bcm2711/
if { [info exists CHIPNAME] } {
set _CHIPNAME $CHIPNAME
} else {
set _CHIPNAME bcm2711
}
if { [info exists CHIPCORES] } {
set _cores $CHIPCORES
} else {
set _cores 4
}
if { [info exists USE_SMP] } {
set _USE_SMP $USE_SMP
} else {
set _USE_SMP 1
}
if { [info exists DAP_TAPID] } {
set _DAP_TAPID $DAP_TAPID
} else {
set _DAP_TAPID 0x4ba00477
}
jtag newtap $_CHIPNAME cpu -expected-id $_DAP_TAPID -irlen 4
adapter speed 4000
dap create $_CHIPNAME.dap -chain-position $_CHIPNAME.cpu
# MEM-AP for direct access
target create $_CHIPNAME.ap mem_ap -dap $_CHIPNAME.dap -ap-num 0
# these addresses are obtained from the ROM table via 'dap info 0' command
set _DBGBASE {0x80410000 0x80510000 0x80610000 0x80710000}
set _CTIBASE {0x80420000 0x80520000 0x80620000 0x80720000}
set _smp_command "target smp"
for { set _core 0 } { $_core < $_cores } { incr _core } {
set _CTINAME $_CHIPNAME.cti$_core
set _TARGETNAME $_CHIPNAME.cpu$_core
cti create $_CTINAME -dap $_CHIPNAME.dap -ap-num 0 -baseaddr [lindex $_CTIBASE $_core]
target create $_TARGETNAME aarch64 -dap $_CHIPNAME.dap -ap-num 0 -dbgbase [lindex $_DBGBASE $_core] -cti $_CTINAME -coreid $_core -rtos hwthread
set _smp_command "$_smp_command $_TARGETNAME"
$_TARGETNAME configure -event reset-assert-post "aarch64 dbginit"
$_TARGETNAME configure -event gdb-attach { halt }
$_TARGETNAME configure -event gdb-detach { resume }
}
if {$_USE_SMP} {
eval $_smp_command
echo $_smp_command
}
# bindto 0.0.0.0 // 使用 WSL2 中的 gdb 连接 windows 中启动的 gdb server 时需要加上这一条
# default target is cpu0
targets $_CHIPNAME.cpu0
3.3 启动 openocd
PS D:\xxx\OpenOCD-20230202-0.12.0\share\openocd\scripts> ..\..\..\bin\openocd.exe -f .\interface\jlink.cfg -f D:\xxx\raspberrypi4.cfg
Open On-Chip Debugger 0.12.0 (2023-02-02) [https://github.com/sysprogs/openocd]
Licensed under GNU GPL v2
libusb1 09e75e98b4d9ea7909e8837b7a3f00dda4589dc3
For bug reports, read
http://openocd.org/doc/doxygen/bugs.html
Info : auto-selecting first available session transport "jtag". To override use 'transport select <transport>'.
Info : Hardware thread awareness created
Info : Hardware thread awareness created
Info : Hardware thread awareness created
Info : Hardware thread awareness created
target smp bcm2711.cpu0 bcm2711.cpu1 bcm2711.cpu2 bcm2711.cpu3
force hard breakpoints
Info : Listening on port 6666 for tcl connections
Info : Listening on port 4444 for telnet connections
Info : J-Link V11 compiled Oct 26 2021 16:23:48
Info : Hardware version: 11.00
Info : VTarget = 3.315 V
Info : clock speed 4000 kHz
Info : JTAG tap: bcm2711.cpu tap/device found: 0x4ba00477 (mfg: 0x23b (ARM Ltd), part: 0xba00, ver: 0x4)
Info : bcm2711.cpu0: hardware has 6 breakpoints, 4 watchpoints
Info : bcm2711.cpu1: hardware has 6 breakpoints, 4 watchpoints
Info : bcm2711.cpu2: hardware has 6 breakpoints, 4 watchpoints
Info : bcm2711.cpu3: hardware has 6 breakpoints, 4 watchpoints
Info : gdb port disabled
Info : starting gdb server for bcm2711.cpu0 on 3333
Info : Listening on port 3333 for gdb connections
从以上信息可以看到 openocd 已经成功连接到了 J-Link 和树莓派 4B,且 gdb server 也被成功启动。
4.使用 gdb 进行远程调试
4.1 下载并安装 gdb
安装 gnutoolchains.com/raspberry64… 中的树莓派 64 位系统的 Windows 工具链,完成后就可以使用里面所包含的 gdb 了。
4.2 启动 gdb 并连接 gdb server
执行如下命令启动 gdb,注意其中的 vmlinux 是树莓派内核代码根目录下的 vmlinux (第 1 步编译内核后得到),笔者将其拷贝到了 D:\xxx\。
PS D:\xxx\xxx> D:\xxx\SysGCC\raspberry64\bin\aarch64-linux-gnu-gdb.exe D:\xxx\vmlinux
D:\xxx\SysGCC\raspberry64\bin\aarch64-linux-gnu-gdb.exe: warning: Couldn't determine a path for the index cache directory.
GNU gdb (Debian 10.2.1) 10.1.90.20210103-git
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "--host=x86_64-w64-mingw32 --target=aarch64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from vmlinux...
(gdb) target extended-remote :3333
Remote debugging using :3333
warning: multi-threaded target stopped without sending a thread-id, using first non-exited thread
cpu_do_idle () at arch/arm64/kernel/idle.c:32
4.3 设置硬件断点
下面以 complete_signal 方法为例介绍如何设置硬件断点。
首先获取到 complete_signal 函数的地址:
(gdb) p complete_signal
$30 = {void (int, struct task_struct *, enum pid_type)} 0xffffffc00809bea8 <complete_signal>
然后使用如下命令设置断点:
(gdb) hb *0xffffffc00809b498
Hardware assisted breakpoint 2 at 0xffffffc00809b498: file kernel/signal.c, line 992.
再执行如下命令让树莓派 4B 继续运行以等待断点命中:
(gdb) c
Continuing.
之后在树莓派 shell 中执行 ls,发现 ls 执行完成后树莓派 shell 卡死了。
xxx@raspberrypi:~ $ ls
Bookshelf Documents linux-5b775d7293eb75d6dfc9c5ffcb95c5012cd0c3f8 Music Public Videos
Desktop Downloads linux-5b775d7293eb75d6dfc9c5ffcb95c5012cd0c3f8.zip Pictures Templates
最后在 gdb 中观察到断点命中:
Thread 2 "bcm2711.cpu1" hit Breakpoint 2, complete_signal (sig=sig@entry=17, p=p@entry=0xffffff8052421e40, type=type@entry=PIDTYPE_TGID) at kernel/signal.c:992
992 kernel/signal.c: No such file or directory.
(gdb) bt
#0 complete_signal (sig=sig@entry=17, p=p@entry=0xffffff8052421e40, type=type@entry=PIDTYPE_TGID) at kernel/signal.c:992
#1 0xffffffc00809bbf4 in __send_signal (sig=sig@entry=17, info=info@entry=0xffffffc00a4e3d08, t=0xffffff8052421e40, type=type@entry=PIDTYPE_TGID, force=force@entry=false)
at kernel/signal.c:1182
#2 0xffffffc00809c988 in do_notify_parent (tsk=0xffffff805242dac0, sig=<optimized out>) at kernel/signal.c:2113
#3 0xffffffc00808e554 in exit_notify (group_dead=<optimized out>, tsk=0xffffff805242dac0) at kernel/exit.c:682
#4 do_exit (code=code@entry=0) at kernel/exit.c:845
#5 0xffffffc00808e748 in do_group_exit (exit_code=0) at kernel/exit.c:922
#6 0xffffffc00808e7d4 in __do_sys_exit_group (error_code=<optimized out>) at kernel/exit.c:933
#7 __se_sys_exit_group (error_code=<optimized out>) at kernel/exit.c:931
#8 __arm64_sys_exit_group (regs=<optimized out>) at kernel/exit.c:931
#9 0xffffffc008028880 in __invoke_syscall (syscall_fn=<optimized out>, regs=0xffffffc00a4e3eb0) at arch/arm64/kernel/syscall.c:38
#10 invoke_syscall (regs=regs@entry=0xffffffc00a4e3eb0, scno=scno@entry=94, sc_nr=sc_nr@entry=449, syscall_table=syscall_table@entry=0xffffffc008ba08d0 <sys_call_table>)
at arch/arm64/kernel/syscall.c:52
#11 0xffffffc0080289b8 in el0_svc_common (regs=0xffffffc00a4e3eb0, scno=94, syscall_table=0xffffffc008ba08d0 <sys_call_table>, sc_nr=449) at arch/arm64/kernel/syscall.c:142
#12 0xffffffc008028aa4 in do_el0_svc (regs=<optimized out>) at arch/arm64/kernel/syscall.c:181
#13 0xffffffc008b7c4c8 in el0_svc (regs=0xffffffc00a4e3eb0) at arch/arm64/kernel/entry-common.c:595
#14 0xffffffc008b7c974 in el0t_64_sync_handler (regs=<optimized out>) at arch/arm64/kernel/entry-common.c:613
#15 0xffffffc008011e10 in el0t_64_sync () at arch/arm64/kernel/entry.S:584
#16 0xffffffc008011e10 in el0t_64_sync () at arch/arm64/kernel/entry.S:584
Backtrace stopped: not enough registers or memory available to unwind further
4.4 查看命中断点的 task_struct
如果我们想知道是哪个 task 命中了当前断点,应该怎么做呢? 熟悉 aarch64 内核的朋友应该知道,目前往往通过 sp_el0 寄存器来得到当前 task_struct 的地址,相关代码如下所示: arch/arm64/include/asm/current.h
/*
* We don't use read_sysreg() as we want the compiler to cache the value where
* possible.
*/
static __always_inline struct task_struct *get_current(void)
{
unsigned long sp_el0;
asm ("mrs %0, sp_el0" : "=r" (sp_el0));
return (struct task_struct *)sp_el0;
}
不幸的是,当我们在 gdb 中通过 info registers 以及 info all-registers 来获取当前的寄存器信息时,返回结果中并不包含 sp_el0 寄存器。这就要求我们通过其他方式来得到当前 task_struct 的地址。
经过了一番研究后,笔者发现可以通过 __entry_task 来得到当前 task,内核会在进程切换时将 next task 的地址存到 __entry_task 中。相关代码如下所示:
arch/arm64/kernel/process.c
/*
* Thread switching.
*/
__notrace_funcgraph __sched
struct task_struct *__switch_to(struct task_struct *prev,
struct task_struct *next)
{
......
entry_task_switch(next);
......
}
......
/*
* We store our current task in sp_el0, which is clobbered by userspace. Keep a
* shadow copy so that we can restore this upon entry from userspace.
*
* This is *only* for exception entry from EL0, and is not valid until we
* __switch_to() a user task.
*/
DEFINE_PER_CPU(struct task_struct *, __entry_task);
static void entry_task_switch(struct task_struct *next)
{
__this_cpu_write(__entry_task, next);
}
既然如此,我们就可以基于 PER_CPU 变量的特性(PER_CPU 相关可以参考 www.dingmos.com/index.php/a… )通过如下方式来获取到 __entry_task 实际存储的内容,并进而获取到当前 task 的信息。
(gdb) p /x __per_cpu_offset // 因为是 cpu1 命中了断点,所以我们使用 __per_cpu_offset 数组中下标为 1 的地址
$2 = {0xffffffc0f26d8000, 0xffffffc0f26f4000, 0xffffffc0f2710000, 0xffffffc0f272c000, 0x0 <repeats 252 times>}
(gdb) p &__entry_task
$3 = (struct task_struct **) 0xffffffc00909f2a8 <__entry_task>
(gdb) x /gx 0xffffffc0f26f4000 + 0xffffffc00909f2a8 // __per_cpu_offset[1] + &__entry_task
0xffffff80fb7932a8: 0xffffff805242dac0
(gdb) p *(struct task_struct*)0xffffff805242dac0 // 输出当前 task_struct 的信息
$4 = {
thread_info = {
flags = 4,
{
preempt_count = 4294967298,
preempt = {
count = 2,
need_resched = 1
}
}
},
__state = 0,
stack = 0xffffffc00a4e0000,
usage = {
refs = {
counter = 1
}
},
flags = 4194308,
ptrace = 0,
on_cpu = 1,
wake_entry = {
llist = {
next = 0x0
},
{
u_flags = 48,
a_flags = {
counter = 48
}
},
src = 0,
dst = 0
},
cpu = 1,
wakee_flips = 63,
wakee_flip_decay_ts = 4329166199,
last_wakee = 0xffffff804641dac0,
recent_used_cpu = 3,
wake_cpu = 1,
on_rq = 1,
prio = 120,
static_prio = 120,
normal_prio = 120,
rt_priority = 0,
sched_class = 0xffffffc008ef5450 <fair_sched_class>,
--Type <RET> for more, q to quit, c to continue without paging--
(gdb) p ((struct task_struct*)0xffffff805242dac0)->comm // 输出当前 task 的名字
$5 = "ls\000h\000)\000\000\060\000\000\000\000\000\000"
从以上信息可以看出,命中当前断点的 task 为 ls,断点 backtrace 中 __send_signal 的参数也为信号 17,符合预期。