【原创声明】:本文系原创作品,谢绝任何形式的未经授权的转载。如需转载,请先私信联系我获取许可,谢谢合作!
一、引言
在嵌入式系统开发中,Linux 内核的稳定性至关重要。然而,在嵌入式设备中,由于CPU内存资源有限、外设种类繁多、系统不断更新迭代,内核崩溃(Kernel Panic)仍然是一种难以完全避免的情况。一旦系统发生崩溃,常规日志机制(如 dmesg、串口输出)可能因为系统已进入不可恢复状态而无法正常工作,从而导致关键的崩溃信息丢失。使得问题定位变得极其困难,也大幅增加了Bug fix调试成本。
为了解决这一痛点,Linux 内核提供了kexec机制。核心思想是在系统正常运行时预留出一块内存区域,并同时加载一个精简版的“捕获内核”(capture kernel)。当主内核发生 Panic 时,它会通过kexec -p切换到这个捕获内核,从而让系统完成对主内核(已发生Kernel Panic)的崩溃现场收集工作,有效避免崩溃信息的丢失。
本文将以 迅为(RK3588s)开发板 为示例平台,演示如何使用 kexec -p 抓取内核崩溃的 dump 文件并上传到 FTP 服务器。从而完成日志自动收集与上传,方便快速分析定位线上内核崩溃。
二、主内核配置
1. 添加依赖的内核配置选项
CONFIG_KEXEC=y
CONFIG_KEXEC_FILE=y
CONFIG_CRASH_DUMP=y
CONFIG_PROC_KCORE=y
CONFIG_DEBUG_FS=y
CONFIG_DEBUG_FS_ALLOW_ALL=y
2. 修改内核文件
arch/arm64/kernel/machine_kexec.c
machine_kexec_mask_interrupts的作用是在 kexec 进入第二内核前,清除全部中断状态(设置为inactive)。Rockchip中但是有些中断会导致irq_set_irqchip_state卡死,因此注释掉irq_set_irqchip_state直接使用EOI(End of Interrupt)代替 irq_set_irqchip_state(i, IRQCHIP_STATE_ACTIVE, false)
diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c
index a0b144cfa..f6cd5c72c 100644
--- a/arch/arm64/kernel/machine_kexec.c
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -220,7 +220,7 @@ static void machine_kexec_mask_interrupts(void)
for_each_irq_desc(i, desc) {
struct irq_chip *chip;
- int ret;
+ //int ret;
chip = irq_desc_get_chip(desc);
if (!chip)
@@ -230,9 +230,9 @@ static void machine_kexec_mask_interrupts(void)
* First try to remove the active state. If this
* fails, try to EOI the interrupt.
*/
- ret = irq_set_irqchip_state(i, IRQCHIP_STATE_ACTIVE, false);
+ //ret = irq_set_irqchip_state(i, IRQCHIP_STATE_ACTIVE, false);
- if (ret && irqd_irq_inprogress(&desc->irq_data) &&
+ if (irqd_irq_inprogress(&desc->irq_data) &&
chip->irq_eoi)
chip->irq_eoi(&desc->irq_data);
3. 主内核启动参数:
kernel/arch/arm64/boot/dts/rockchip/rk3588-linux.dtsi 添加 crashkernel=256M。
chosen: chosen {
bootargs = "earlycon=uart8250,mmio32,0xfeb50000 console=ttyFIQ0 irqchip.gicv3_pseudo_nmi=0 rw rootwait rcupdate.rcu_expedited=1 rcu_nocbs=all crashkernel=256M";
};
主内核启动日志中会打印预留的256 MB内存信息
4. 主Rootfs 基础环境搭建
本文已 Ubuntu 22.04 aarch64 作为rootfs实验环境
./build.sh ubuntu22_update
三、捕获内核配置
1. 捕获内核配置选项
捕获内核删除了大部分驱动,仅保留了核心最小系统
注意删除.txt扩展名
📎topeet_rk3588s_capture_defconfig.txt
cp -raf kernel kerne-capture
# config
make -C /home/${USER}/iTop/kernel-capture/ -j33 CROSS_COMPILE=/home/${USER}/iTop/prebuilts/gcc/linux-x86/aarch64/gcc-arm-10.3-2021.07-x86_64-aarch64-none-linux-gnu/bin/aarch64-none-linux-gnu- ARCH=arm64 topeet_rk3588s_capture_defconfig
# 编译
make -C /home/${USER}/iTop/kernel-capture/ -j33 CROSS_COMPILE=/home/${USER}/iTop/prebuilts/gcc/linux-x86/aarch64/gcc-arm-10.3-2021.07-x86_64-aarch64-none-linux-gnu/bin/aarch64-none-linux-gnu- ARCH=arm64 topeet-rk3588s-linux.img
2. 捕获Rootfs
创建一个在内存中运行的initramfs
2.1 创建文件夹
mkdir ~/initramfs-crash
mkdir -p ~/initramfs-crash/{bin,sbin,etc,proc,sys,dev,mnt,root,tmp}
mkdir -p ~/initramfs-crash/var/db/dhcpcd
mkdir -p ~/initramfs-crash/var/run
2.2 编译二进制可执行文件
为了简化系统结构,二进制文件采用了静态链接的方式
编译过程直接在Topeet-RK3588s中编译,未采用交叉编译(偷懒了😜)
- 安装依赖
sudo apt install -y m4 libncurses-dev libbz2-dev pkg-config
- 文件列表
bison-3.1.tar.xz
flex-2.6.4.tar.gz
libnl-3.11.0.tar.gz
ltm-1.3.0.tar.xz
iw-6.17.tar.xz
wpa_supplicant-2.11.tar.gz
dhcpcd-10.2.4.tar.xz
busybox-1.37.0.tar.bz2
makedumpfile-1.7.8.tar.gz
- bison
cd ~/bison-3.1
./configure
make -j$(nproc)
sudo make install
- flex
cd ~/flex-2.6.4
./configure
make -j$(nproc)
sudo make install
- libnl
cd ~/libnl-3.11.0
./configure --enable-static=yes --enable-shared=no --enable-debug=no
make -j$(nproc)
sudo make install
- ltm
cd ~/libtommath-1.3.0
make -j$(nproc)
sudo make install
- iw
cd ~/iw-6.17
make -j$(nproc)
cp iw /home/topeet/
- wpa_supplicant
cd ~/wpa_supplicant-2.11/wpa_supplicant
cat > .config << "EOF"
CONFIG_BACKEND=file
CONFIG_CTRL_IFACE=y
CONFIG_DEBUG_FILE=y
CONFIG_DEBUG_SYSLOG=y
CONFIG_DEBUG_SYSLOG_FACILITY=LOG_DAEMON
CONFIG_DRIVER_NL80211=y
CONFIG_DRIVER_WEXT=y
CONFIG_DRIVER_WIRED=y
CONFIG_EAP_GTC=y
CONFIG_EAP_LEAP=y
CONFIG_EAP_MD5=y
CONFIG_EAP_MSCHAPV2=y
CONFIG_EAP_OTP=y
CONFIG_EAP_PEAP=y
CONFIG_EAP_TLS=y
CONFIG_EAP_TTLS=y
CONFIG_IEEE8021X_EAPOL=y
CONFIG_IPV6=y
CONFIG_PEERKEY=y
CONFIG_PKCS12=y
CONFIG_READLINE=y
CONFIG_SMARTCARD=y
CONFIG_WPS=y
CFLAGS += -I/usr/include/libnl3
LDFLAGS += -static
EOF
make wpa_supplicant -j$(nproc)
cp wpa_supplicant /home/topeet/
- dhcpcd
cd ~/dhcpcd-10.2.4
./configure --enable-static=yes --enable-shared=no
make -j$(nproc)
cp src/dhcpcd /home/topeet/
- busybox
cd ~/busybox-1.37.0
make menuconfig
选择
Settings --->
Build static binary (no shared libs)
去掉
Settings --->
SHA1: Use hardware accelerated instructions if possible
make -j$(nproc)
cp busybox /home/topeet/
- makedumpfile
cd ~/makedumpfile-1.7.8
make TARGET=arm64 -j$(nproc)
cp makedumpfile /home/topeet/
2.3 拷贝二进制文件
mkdir -p ~/initramfs-crash/bin
cd ~/initramfs-crash/bin
scp topeet@${Topeet-IP}:/home/topeet/busybox .
scp topeet@${Topeet-IP}:/home/topeet/dhcpcd .
scp topeet@${Topeet-IP}:/home/topeet/iw .
scp topeet@${Topeet-IP}:/home/topeet/makedumpfile .
scp topeet@${Topeet-IP}:/home/topeet/wpa_supplicant .
sudo chmod a+x busybox makedumpfile iw dhcpcd wpa_supplicant
2.4 创建指向busybox的软连接
for i in cat echo ls mkdir reboot sync dd dmesg mount sh umount
do
sudo ln -s busybox $i
done
2.5 设置Wi-Fi的 SSID和 密码
mkdir -p ~/initramfs-crash/etc/wpa_supplicant/
vim ~/initramfs-crash/etc/wpa_supplicant/wpa_supplicant.conf
ctrl_interface=/var/run/wpa_supplicant
network={
ssid="MySSID"
psk="MyPassword"
}
2.6 设置dhcpcd接口为 wlan0
echo "interface wlan0" > ~/initramfs-crash/etc/dhcpcd.conf
2.7 拷贝无线网卡驱动
mkdir -p ~/initramfs-crash/lib/modules
mkdir -p ~/initramfs-crash/lib/modules-load.d
cp /home/${user}/topeet/kernel/drivers/net/wireless/rockchip_wlan/rtl8723du/8723du.ko ~/initramfs-crash/lib/modules
2.8 安装init文件
注意要替换 FTP 的地址 USERNAME 和 PASSWD
#!/bin/sh
echo "My Linux kernel dump.."
mount -t proc none /proc
mount -t sysfs none /sys
VMCORE_FILE=vmcore-$(date +%Y%m%d-%H%M%S)
echo "Mount /home/topeet"
mount -t devtmpfs devtmpfs /dev
mount -t ext4 /dev/mmcblk0p6 /mnt/
echo "Start dump core file..."
makedumpfile -c -d 31 /proc/vmcore /mnt/home/topeet/$VMCORE_FILE
sync
echo "Init Wi-Fi driver"
insmod /lib/modules/8723du.ko
echo "Connect to MySSID"
wpa_supplicant -B -i wlan0 -c /etc/wpa_supplicant/wpa_supplicant.conf
# 启动 DHCP
echo "Wi-Fi DHCP"
dhcpcd -f /etc/dhcpcd.conf wlan0
echo "Send vmcore to FTP"
busybox ftpput -u ${USERNAME} -p ${PASSWD} ${FTP} /data/crash/$VMCORE_FILE /mnt/$VMCORE_FILE
echo "Halt..."
sleep 1024d
busybox
添加可执行权限
chmod a+x init
2.9 initramfs结构
.
├── bin
│ ├── busybox
│ ├── busybox-new
│ ├── cat -> busybox
│ ├── dd -> busybox
│ ├── dhcpcd
│ ├── dmesg -> busybox
│ ├── echo -> busybox
│ ├── iw
│ ├── ls -> busybox
│ ├── makedumpfile
│ ├── mkdir -> busybox
│ ├── mount -> busybox
│ ├── reboot -> busybox
│ ├── sh -> busybox
│ ├── sync -> busybox
│ ├── umount -> busybox
│ └── wpa_supplicant
├── dev
│ ├── null
│ ├── tty
│ ├── urandom
│ └── zero
├── etc
│ ├── dhcpcd.conf
│ └── wpa_supplicant
│ └── wpa_supplicant.conf
├── init
├── lib
│ ├── modules
│ │ └── 8723du.ko
│ └── modules-load.d
├── mnt
├── proc
├── root
├── sbin
├── sys
├── tmp
└── var
├── db
│ └── dhcpcd
└── run
2.10 打包initramfs
cd ~/initramfs-crash
find . | cpio -H newc -o | gzip -9 > ~/initramfs.cpio.gz
四、主Rootfs
1. 安装kexec-tools工具,安装完成后选择重启系统
sudo apt install kexec-tools -y
持续集成中可以使用
DEBIAN_FRONTEND=noninteractive apt install -y kexec-tools静默安装
2. 安装捕获内核和捕获Rootfs
在主文件系统中
mkdir -p ~/main-rootfs/home/topeet/
cp ~/initramfs.cpio.gz ~/main-rootfs/home/topeet/
cp /home/${user}/topeet/kernel-capture/arch/arm64/boot/Image ~/main-rootfs/home/topeet/
3. 创建并使能kdump.service
[Unit]
Description=My kernel kdump
[Service]
ExecStart=kexec -p /home/topeet/Image --initrd=/home/topeet/initramfs.cpio.gz --append="rw rootwait earlycon=uart8250,mmio32,0xfeb50000 console=ttyFIQ0 root=/dev/ram0 nr_cpus=1 reset_devices cma=16M"
Type=oneshot
RemainAfterExit=true
[Install]
WantedBy=multi-user.target
使能
sudo systemctl daemon-reload
sudo systemctl enable kdump.service
sudo systemctl start kdump.service
注意这里的uart8250,mmio32,0xfeb50000要对应ttyFIQ0的串口
在Topeet-RK3588s中ttyFIQ0的对应的是uart2,地址对应0xfeb50000
fiq_debugger: fiq-debugger {
compatible = "rockchip,fiq-debugger";
rockchip,serial-id = <2>;
rockchip,wake-irq = <0>;
/* If enable uart uses irq instead of fiq */
rockchip,irq-mode-enable = <1>;
rockchip,baudrate = <115200>; /* Only 115200 and 1500000 */
interrupts = <GIC_SPI 423 IRQ_TYPE_LEVEL_LOW>;
pinctrl-names = "default";
pinctrl-0 = <&uart2m0_xfer>;
status = "okay";
};
uart2: serial@feb50000 {
compatible = "rockchip,rk3588-uart", "snps,dw-apb-uart";
reg = <0x0 0xfeb50000 0x0 0x100>;
interrupts = <GIC_SPI 333 IRQ_TYPE_LEVEL_HIGH>;
clocks = <&cru SCLK_UART2>, <&cru PCLK_UART2>;
clock-names = "baudclk", "apb_pclk";
reg-shift = <2>;
reg-io-width = <4>;
dmas = <&dmac0 10>, <&dmac0 11>;
pinctrl-names = "default";
pinctrl-0 = <&uart2m1_xfer>;
status = "disabled";
};
五、测试
# 加载捕获内核到系统内存
sudo kexec -p /home/topeet/Image --initrd=/home/topeet/initramfs.cpio.gz --append="rw rootwait earlycon=uart8250,mmio32,0xfeb50000 console=ttyFIQ0 root=/dev/ram0 nr_cpus=1 reset_devices cma=16M"
# 主动触发Kernel Panic
sudo bash -c "echo c > /proc/sysrq-trigger"
说明
- 如果要修改为其他RK CPU记得修改kernel-capture CPU相关配置
- 本文已 iTop-rk3588s-linux_20241123.tar.xz 作为Linux SDK