Linux Kernel稳定性-Panic捕获

105 阅读5分钟

【原创声明】:本文系原创作品,谢绝任何形式的未经授权的转载。如需转载,请先私信联系我获取许可,谢谢合作!

一、引言

在嵌入式系统开发中,Linux 内核的稳定性至关重要。然而,在嵌入式设备中,由于CPU内存资源有限、外设种类繁多、系统不断更新迭代,内核崩溃(Kernel Panic)仍然是一种难以完全避免的情况。一旦系统发生崩溃,常规日志机制(如 dmesg、串口输出)可能因为系统已进入不可恢复状态而无法正常工作,从而导致关键的崩溃信息丢失。使得问题定位变得极其困难,也大幅增加了Bug fix调试成本。

为了解决这一痛点,Linux 内核提供了kexec机制。核心思想是在系统正常运行时预留出一块内存区域,并同时加载一个精简版的“捕获内核”(capture kernel)。当主内核发生 Panic 时,它会通过kexec -p切换到这个捕获内核,从而让系统完成对主内核(已发生Kernel Panic)的崩溃现场收集工作,有效避免崩溃信息的丢失。

本文将以 迅为(RK3588s)开发板 为示例平台,演示如何使用 kexec -p 抓取内核崩溃的 dump 文件并上传到 FTP 服务器。从而完成日志自动收集与上传,方便快速分析定位线上内核崩溃。

二、主内核配置

1. 添加依赖的内核配置选项

CONFIG_KEXEC=y
CONFIG_KEXEC_FILE=y
CONFIG_CRASH_DUMP=y
CONFIG_PROC_KCORE=y
CONFIG_DEBUG_FS=y
CONFIG_DEBUG_FS_ALLOW_ALL=y

2. 修改内核文件

arch/arm64/kernel/machine_kexec.c

machine_kexec_mask_interrupts的作用是在 kexec 进入第二内核前,清除全部中断状态(设置为inactive)。Rockchip中但是有些中断会导致irq_set_irqchip_state卡死,因此注释掉irq_set_irqchip_state直接使用EOI(End of Interrupt)代替 irq_set_irqchip_state(i, IRQCHIP_STATE_ACTIVE, false)

diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c
index a0b144cfa..f6cd5c72c 100644
--- a/arch/arm64/kernel/machine_kexec.c
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -220,7 +220,7 @@ static void machine_kexec_mask_interrupts(void)
 
        for_each_irq_desc(i, desc) {
                struct irq_chip *chip;
-               int ret;
+               //int ret;
 
                chip = irq_desc_get_chip(desc);
                if (!chip)
@@ -230,9 +230,9 @@ static void machine_kexec_mask_interrupts(void)
                 * First try to remove the active state. If this
                 * fails, try to EOI the interrupt.
                 */
-               ret = irq_set_irqchip_state(i, IRQCHIP_STATE_ACTIVE, false);
+               //ret = irq_set_irqchip_state(i, IRQCHIP_STATE_ACTIVE, false);
 
-               if (ret && irqd_irq_inprogress(&desc->irq_data) &&
+               if (irqd_irq_inprogress(&desc->irq_data) &&
                    chip->irq_eoi)
                        chip->irq_eoi(&desc->irq_data);

3. 主内核启动参数:

kernel/arch/arm64/boot/dts/rockchip/rk3588-linux.dtsi 添加 crashkernel=256M

chosen: chosen {
        bootargs = "earlycon=uart8250,mmio32,0xfeb50000 console=ttyFIQ0 irqchip.gicv3_pseudo_nmi=0 rw rootwait rcupdate.rcu_expedited=1 rcu_nocbs=all crashkernel=256M";
};

主内核启动日志中会打印预留的256 MB内存信息

image.png

4. 主Rootfs 基础环境搭建

本文已 Ubuntu 22.04 aarch64 作为rootfs实验环境

./build.sh ubuntu22_update

三、捕获内核配置

1. 捕获内核配置选项

捕获内核删除了大部分驱动,仅保留了核心最小系统

注意删除.txt扩展名

📎topeet_rk3588s_capture_defconfig.txt

cp -raf kernel kerne-capture

# config
make -C /home/${USER}/iTop/kernel-capture/ -j33 CROSS_COMPILE=/home/${USER}/iTop/prebuilts/gcc/linux-x86/aarch64/gcc-arm-10.3-2021.07-x86_64-aarch64-none-linux-gnu/bin/aarch64-none-linux-gnu- ARCH=arm64 topeet_rk3588s_capture_defconfig

# 编译
make -C /home/${USER}/iTop/kernel-capture/ -j33 CROSS_COMPILE=/home/${USER}/iTop/prebuilts/gcc/linux-x86/aarch64/gcc-arm-10.3-2021.07-x86_64-aarch64-none-linux-gnu/bin/aarch64-none-linux-gnu- ARCH=arm64 topeet-rk3588s-linux.img

2. 捕获Rootfs

创建一个在内存中运行的initramfs

2.1 创建文件夹
mkdir ~/initramfs-crash
mkdir -p ~/initramfs-crash/{bin,sbin,etc,proc,sys,dev,mnt,root,tmp}
mkdir -p ~/initramfs-crash/var/db/dhcpcd
mkdir -p ~/initramfs-crash/var/run
2.2 编译二进制可执行文件

为了简化系统结构,二进制文件采用了静态链接的方式

编译过程直接在Topeet-RK3588s中编译,未采用交叉编译(偷懒了😜)

  • 安装依赖
sudo apt install -y m4 libncurses-dev libbz2-dev pkg-config
  • 文件列表
bison-3.1.tar.xz
flex-2.6.4.tar.gz
libnl-3.11.0.tar.gz
ltm-1.3.0.tar.xz
iw-6.17.tar.xz
wpa_supplicant-2.11.tar.gz
dhcpcd-10.2.4.tar.xz
busybox-1.37.0.tar.bz2
makedumpfile-1.7.8.tar.gz
  • bison
cd ~/bison-3.1
./configure
make -j$(nproc)
sudo make install
  • flex
cd ~/flex-2.6.4
./configure
make -j$(nproc)
sudo make install
  • libnl
cd ~/libnl-3.11.0
./configure --enable-static=yes --enable-shared=no --enable-debug=no
make -j$(nproc)
sudo make install
  • ltm
cd ~/libtommath-1.3.0
make -j$(nproc)
sudo make install
  • iw
cd ~/iw-6.17
make -j$(nproc)
cp iw /home/topeet/
  • wpa_supplicant
cd ~/wpa_supplicant-2.11/wpa_supplicant

cat > .config << "EOF"
CONFIG_BACKEND=file
CONFIG_CTRL_IFACE=y
CONFIG_DEBUG_FILE=y
CONFIG_DEBUG_SYSLOG=y
CONFIG_DEBUG_SYSLOG_FACILITY=LOG_DAEMON
CONFIG_DRIVER_NL80211=y
CONFIG_DRIVER_WEXT=y
CONFIG_DRIVER_WIRED=y
CONFIG_EAP_GTC=y
CONFIG_EAP_LEAP=y
CONFIG_EAP_MD5=y
CONFIG_EAP_MSCHAPV2=y
CONFIG_EAP_OTP=y
CONFIG_EAP_PEAP=y
CONFIG_EAP_TLS=y
CONFIG_EAP_TTLS=y
CONFIG_IEEE8021X_EAPOL=y
CONFIG_IPV6=y
CONFIG_PEERKEY=y
CONFIG_PKCS12=y
CONFIG_READLINE=y
CONFIG_SMARTCARD=y
CONFIG_WPS=y
CFLAGS += -I/usr/include/libnl3
LDFLAGS += -static
EOF

make wpa_supplicant -j$(nproc)
cp wpa_supplicant /home/topeet/
  • dhcpcd
cd ~/dhcpcd-10.2.4
./configure --enable-static=yes --enable-shared=no
make -j$(nproc)
cp src/dhcpcd /home/topeet/
  • busybox
cd ~/busybox-1.37.0
make menuconfig

选择
Settings  --->
Build static binary (no shared libs)

去掉
Settings  --->
SHA1: Use hardware accelerated instructions if possible

make -j$(nproc)
cp busybox /home/topeet/
  • makedumpfile
cd ~/makedumpfile-1.7.8
make TARGET=arm64 -j$(nproc)
cp makedumpfile /home/topeet/
2.3 拷贝二进制文件
mkdir -p ~/initramfs-crash/bin
cd ~/initramfs-crash/bin

scp topeet@${Topeet-IP}:/home/topeet/busybox .
scp topeet@${Topeet-IP}:/home/topeet/dhcpcd .
scp topeet@${Topeet-IP}:/home/topeet/iw .
scp topeet@${Topeet-IP}:/home/topeet/makedumpfile .
scp topeet@${Topeet-IP}:/home/topeet/wpa_supplicant .

sudo chmod a+x  busybox makedumpfile iw dhcpcd wpa_supplicant
2.4 创建指向busybox的软连接
for i in cat echo ls mkdir reboot sync dd dmesg mount sh umount
do
    sudo ln -s busybox $i
done
2.5 设置Wi-FiSSID和 密码
mkdir -p ~/initramfs-crash/etc/wpa_supplicant/
vim ~/initramfs-crash/etc/wpa_supplicant/wpa_supplicant.conf 
ctrl_interface=/var/run/wpa_supplicant
network={
    ssid="MySSID"
    psk="MyPassword"
}
2.6 设置dhcpcd接口为 wlan0
echo "interface wlan0" > ~/initramfs-crash/etc/dhcpcd.conf 
2.7 拷贝无线网卡驱动
mkdir -p ~/initramfs-crash/lib/modules
mkdir -p ~/initramfs-crash/lib/modules-load.d
cp /home/${user}/topeet/kernel/drivers/net/wireless/rockchip_wlan/rtl8723du/8723du.ko ~/initramfs-crash/lib/modules
2.8 安装init文件

注意要替换 FTP 的地址 USERNAME 和 PASSWD

#!/bin/sh
echo "My Linux kernel dump.."
mount -t proc none /proc
mount -t sysfs none /sys

VMCORE_FILE=vmcore-$(date +%Y%m%d-%H%M%S)

echo "Mount /home/topeet"
mount -t devtmpfs devtmpfs /dev
mount -t ext4 /dev/mmcblk0p6 /mnt/

echo "Start dump core file..."
makedumpfile -c -d 31 /proc/vmcore /mnt/home/topeet/$VMCORE_FILE
sync

echo "Init Wi-Fi driver"
insmod /lib/modules/8723du.ko
echo "Connect to MySSID"
wpa_supplicant -B -i wlan0 -c /etc/wpa_supplicant/wpa_supplicant.conf

# 启动 DHCP
echo "Wi-Fi DHCP"
dhcpcd -f /etc/dhcpcd.conf wlan0

echo "Send vmcore to FTP"
busybox ftpput -u ${USERNAME} -p ${PASSWD} ${FTP}  /data/crash/$VMCORE_FILE /mnt/$VMCORE_FILE

echo "Halt..."
sleep 1024d
busybox

添加可执行权限

chmod a+x init
2.9 initramfs结构
.
├── bin
│   ├── busybox
│   ├── busybox-new
│   ├── cat -> busybox
│   ├── dd -> busybox
│   ├── dhcpcd
│   ├── dmesg -> busybox
│   ├── echo -> busybox
│   ├── iw
│   ├── ls -> busybox
│   ├── makedumpfile
│   ├── mkdir -> busybox
│   ├── mount -> busybox
│   ├── reboot -> busybox
│   ├── sh -> busybox
│   ├── sync -> busybox
│   ├── umount -> busybox
│   └── wpa_supplicant
├── dev
│   ├── null
│   ├── tty
│   ├── urandom
│   └── zero
├── etc
│   ├── dhcpcd.conf
│   └── wpa_supplicant
│       └── wpa_supplicant.conf
├── init
├── lib
│   ├── modules
│   │   └── 8723du.ko
│   └── modules-load.d
├── mnt
├── proc
├── root
├── sbin
├── sys
├── tmp
└── var
    ├── db
    │   └── dhcpcd
    └── run
2.10 打包initramfs
cd ~/initramfs-crash
find . | cpio -H newc -o | gzip -9 > ~/initramfs.cpio.gz

四、主Rootfs

1. 安装kexec-tools工具,安装完成后选择重启系统

sudo apt install kexec-tools -y

持续集成中可以使用 DEBIAN_FRONTEND=noninteractive apt install -y kexec-tools 静默安装

2. 安装捕获内核和捕获Rootfs

在主文件系统中

mkdir -p ~/main-rootfs/home/topeet/
cp ~/initramfs.cpio.gz ~/main-rootfs/home/topeet/
cp /home/${user}/topeet/kernel-capture/arch/arm64/boot/Image ~/main-rootfs/home/topeet/

3. 创建并使能kdump.service

[Unit]
Description=My kernel kdump

[Service]
ExecStart=kexec -p /home/topeet/Image --initrd=/home/topeet/initramfs.cpio.gz --append="rw rootwait earlycon=uart8250,mmio32,0xfeb50000 console=ttyFIQ0 root=/dev/ram0 nr_cpus=1 reset_devices cma=16M"

Type=oneshot
RemainAfterExit=true

[Install]
WantedBy=multi-user.target

使能

sudo systemctl daemon-reload

sudo systemctl enable kdump.service

sudo systemctl start kdump.service

注意这里的uart8250,mmio32,0xfeb50000要对应ttyFIQ0的串口

Topeet-RK3588s中ttyFIQ0的对应的是uart2,地址对应0xfeb50000

fiq_debugger: fiq-debugger {
        compatible = "rockchip,fiq-debugger";
        rockchip,serial-id = <2>;
        rockchip,wake-irq = <0>;
        /* If enable uart uses irq instead of fiq */
        rockchip,irq-mode-enable = <1>;
        rockchip,baudrate = <115200>;  /* Only 115200 and 1500000 */
        interrupts = <GIC_SPI 423 IRQ_TYPE_LEVEL_LOW>;
        pinctrl-names = "default";
        pinctrl-0 = <&uart2m0_xfer>;
        status = "okay";
};

uart2: serial@feb50000 {
        compatible = "rockchip,rk3588-uart", "snps,dw-apb-uart";
        reg = <0x0 0xfeb50000 0x0 0x100>;
        interrupts = <GIC_SPI 333 IRQ_TYPE_LEVEL_HIGH>;
        clocks = <&cru SCLK_UART2>, <&cru PCLK_UART2>;
        clock-names = "baudclk", "apb_pclk";
        reg-shift = <2>; 
        reg-io-width = <4>; 
        dmas = <&dmac0 10>, <&dmac0 11>;
        pinctrl-names = "default";
        pinctrl-0 = <&uart2m1_xfer>;
        status = "disabled";
};

五、测试

# 加载捕获内核到系统内存
sudo kexec -p /home/topeet/Image --initrd=/home/topeet/initramfs.cpio.gz --append="rw rootwait earlycon=uart8250,mmio32,0xfeb50000 console=ttyFIQ0 root=/dev/ram0 nr_cpus=1 reset_devices cma=16M"
# 主动触发Kernel Panic
sudo bash -c "echo c > /proc/sysrq-trigger"

说明

  • 如果要修改为其他RK CPU记得修改kernel-capture CPU相关配置
  • 本文已 iTop-rk3588s-linux_20241123.tar.xz 作为Linux SDK