linux中断子系统详解

1,633 阅读26分钟

操作系统中,中断是很重要的组成部分。出现某些意外情况需主机干预时,机器能自动停止正在运行的程序并转入处理新情况的程序,处理完毕后又返回原被暂停的程序继续运行。

有了中断系统才可以不用一直轮询(polling)是否有事件发生,系统效率才得以提高。

一般在系统中,中断控制分为三个部分:「模块、中断控制器和处理器」

其中模块通常由寄存器控制是否使能中断和中断触发条件等;中断控制器可以管理中断的优先级等,而处理器则由寄存器设置用来响应中断。

ARM Generic Interrupt Controller Architecture version 2.0 - Architecture Specification

中断子系统框架图

本文基于linux4.12

1、中断控制器

外围设备不是把中断请求直接发给处理器,而是发给中断控制器。从软件的角度来看,GIC v2(通用中断控制器,ARM提供的一种标准的中断控制器)控制器有两个主要的功能块:

  1. 分发器:系统中所有中断源连接到分发器,分发器的寄存器用来控制单个中断的属性:优先级、状态、安全、转发消息(可以被发送到哪些处理器)和使能状态。分发器决定哪个中断应该通过处理器接口转发到哪个处理器。
  2. 处理器接口:处理器通过处理器接口接收中断,处理器接口提供的寄存器来屏蔽和识别中断,控制中断的状态,每个处理器有个单独的处理器接口。

软件通过中断号识别中断,每个中断号唯一对应一个中断源。中断有以下四种类型:

  1. 软件生成的中断(Software Generated Interrupt,SGI):中断号0-15,通常用来实现处理器间中断(Inter-Processor Interrupt,IPI),这种中断是由软件写分发器的软件生成中断寄存器(GICD_SGIR)生成的。
  2. 私有外设中断(Private Peripheral Interrupt,PPI),中断号16-31,处理器私有的中断源,不同处理器的相同中断源没有关系,比如每个处理器的定时器。
  3. 共享外设中断(Shared Peripheral Interrupt,SPI):中断号32-1020,这种中断可以被中断控制器转发到多个处理器。
  4. 局部特定外设中断(Locality-specific Peripheral Interrupt,LPI):基于消息的中断,GIC v1与v2不支持。

中断状态:

  1. Inactive:中断源没有发送中断。
  2. Pending:中断源已经发送中断,等待处理器处理。
  3. Active:处理器已经确认中断,正在处理。
  4. Active and pending:处理器正在处理中断,相同的中断源又发送了一个中断。

外围设备(inactive) ----> 分发器(pending) ----> 目标处理器的处理器接口 ----> 处理器 ----> 读取处理器接口的中断确认寄存器(avtive或者active and pending) ----> 得到中断号 ----> 根据中断号调用处理程序 ----> 处理完把中断号写到处理器接口的中断结束寄存器(inactive或者pending)。

image.png

2、中断域

中断控制器是支持级联的,为了把每个中断控制器本地的硬件中断号映射到全局唯一的linux中断号(也称虚拟中断号),内核定义了中断域irq_domain,每个中断控制器都有自己的中断域。

2.1 创建中断域

中断控制器的驱动程序使用分配函数irq_domain_add_*()创建和注册中断域。每种映射方法提供不同的分配函数,调用者必须给分配函数提供irq_domain_ops结构体(),分配函数在执行成功的时候返回irq_domain的指针。其实irq_domain实现硬件中断号到全局唯一的linux中断号的映射,就是通过不同的方式将这种映射关系保存在结构体中。

//由驱动程序提供,并在创建新映射或处理旧映射时调用
struct irq_domain_ops {
        //匹配中断控制器设备节点与
	int (*match)(struct irq_domain *d, struct device_node *node,
		     enum irq_domain_bus_token bus_token);
	int (*select)(struct irq_domain *d, struct irq_fwspec *fwspec,
		      enum irq_domain_bus_token bus_token);
        //创建或更新一个虚拟中断号与硬件中断号的映射
	int (*map)(struct irq_domain *d, unsigned int virq, irq_hw_number_t hw);
        //解映射
	void (*unmap)(struct irq_domain *d, unsigned int virq);
        //给定设备树节点和中断说明符,解码硬件irq号和linux irq类型值
	int (*xlate)(struct irq_domain *d, struct device_node *node,
		     const u32 *intspec, unsigned int intsize,
		     unsigned long *out_hwirq, unsigned int *out_type);
};
//中断域,硬件中断号转换
struct irq_domain {
        //在全局irq_domain list中的元素
	struct list_head link;
        //中断域的name
	const char *name;
        //指向irq_domain_ops
	const struct irq_domain_ops *ops;
        //私有数据
	void *host_data;
	unsigned int flags;

	/* Optional data */
        //指向irq_domain关联的设备树节点
	struct fwnode_handle *fwnode;
	enum irq_domain_bus_token bus_token;
        //指向一个通用芯片的list,为了使用通用chip库设置多个通用chip的中断控制器驱动
	struct irq_domain_chip_generic *gc;
#ifdef	CONFIG_IRQ_DOMAIN_HIERARCHY
	struct irq_domain *parent;
#endif
        //最大硬件中断号
	irq_hw_number_t hwirq_max;
        //直接映射能设置的最大硬件中断号(直接映射用)
	unsigned int revmap_direct_max_irq;
        //线性映射表linear_revmap的大小(线性映射用)
	unsigned int revmap_size;
        //不适合线性映射的基数树硬件中断号
	struct radix_tree_root revmap_tree;
        //硬件中断号到虚拟中断号的线性映射表
	unsigned int linear_revmap[];
};
2.1.1 线性映射(linear map)

线性映射维护一个固定大小的表,索引是硬件中断号,如果硬件中断号最大数量是固定的,且比较小,那么线性映射是好的选择。

static inline struct irq_domain *irq_domain_add_linear(struct device_node *of_node,
					 unsigned int size,
					 const struct irq_domain_ops *ops,
					 void *host_data)
{
	return __irq_domain_add(of_node_to_fwnode(of_node), size, size, 0, ops, host_data);
}
2.1.2 树映射(tree map)

树映射使用基数树(radix tree)保存硬件中断号到linux中断号的映射,如果硬件中断号可能非常大,那么树映射是好的选择,因为不需要根据最大硬件分配号分配一个很大的表。

static inline struct irq_domain *irq_domain_add_tree(struct device_node *of_node,
					 const struct irq_domain_ops *ops,
					 void *host_data)
{
	return __irq_domain_add(of_node_to_fwnode(of_node), 0, ~0, 0, ops, host_data);
}
2.1.3 不映射

有些中断控制器是很强的,硬件中断号是可以配置的,可以直接把中断号写到硬件。

static inline struct irq_domain *irq_domain_add_nomap(struct device_node *of_node,
					 unsigned int max_irq,
					 const struct irq_domain_ops *ops,
					 void *host_data)
{
	return __irq_domain_add(of_node_to_fwnode(of_node), 0, max_irq, max_irq, ops, host_data);
}

分配函数把工作委托给__irq_domain_add(),分配一个irq_domain结构体,初始化成员,通过不同的参数,表示不同的映射方式,然后把中断域添加到全局链表irq_domain_list。

/**
 * __irq_domain_add() - 分配一个新的irq_domain
 * @fwnode: 中断控制器的设备节点
 * @size: 线性映射的大小,基数树时为0
 * @hwirq_max: 控制器支持的最大中断数
 * @direct_max: 直接映射的最大值,无限制时使用~0,非直接映射时为0
 * @ops: 中断域回调操作函数irq_domain_ops
 * @host_data: 控制器私有数据指针
 *
 * Allocates and initialize and irq_domain structure.
 * Returns pointer to IRQ domain, or NULL on failure.
 */
struct irq_domain *__irq_domain_add(struct fwnode_handle *fwnode, int size,
				    irq_hw_number_t hwirq_max, int direct_max,
				    const struct irq_domain_ops *ops,
				    void *host_data)
{
    struct device_node *of_node = to_of_node(fwnode);
    struct irq_domain *domain;

    domain = kzalloc_node(sizeof(*domain) + (sizeof(unsigned int) * size),
                          GFP_KERNEL, of_node_to_nid(of_node));
    if (WARN_ON(!domain))
            return NULL;

    of_node_get(of_node);

    /* Fill structure */
    INIT_RADIX_TREE(&domain->revmap_tree, GFP_KERNEL);
    domain->ops = ops;
    domain->host_data = host_data;
    domain->fwnode = fwnode;
    domain->hwirq_max = hwirq_max;
    domain->revmap_size = size;
    domain->revmap_direct_max_irq = direct_max;
    irq_domain_check_hierarchy(domain);

    mutex_lock(&irq_domain_mutex);
    list_add(&domain->link, &irq_domain_list);
    mutex_unlock(&irq_domain_mutex);

    pr_debug("Added domain %s\n", domain->name);
    return domain;
}

2.2 创建映射

创建中断域以后,需要想中断域添加硬件中断号到linux中断号的映射:

//输入参数是中断域和硬件中断号,返回linux中断号
unsigned int irq_create_mapping(struct irq_domain *domain, irq_hw_number_t hwirq)
{
    struct device_node *of_node;
    int virq;

    pr_debug("irq_create_mapping(0x%p, 0x%lx)\n", domain, hwirq);
    /* Look for default domain if nececssary */
    if (domain == NULL)
            domain = irq_default_domain;
    if (domain == NULL) {
            WARN(1, "%s(, %lx) called with NULL domain\n", __func__, hwirq);
            return 0;
    }
    pr_debug("-> using domain @%p\n", domain);
    of_node = irq_domain_get_of_node(domain);
    /* Check if mapping already exists */
    virq = irq_find_mapping(domain, hwirq);
    if (virq) {
            pr_debug("-> existing mapping on virq %d\n", virq);
            return virq;
    }
    //通过irq_domain_alloc_descs()分配一个虚拟中断号
    virq = irq_domain_alloc_descs(-1, 1, hwirq, of_node_to_nid(of_node), NULL);
    if (virq <= 0) {
            pr_debug("-> virq allocation failed\n");
            return 0;
    }
    //2.调用把硬件中断号与linux虚拟中断号映射添加到中断域
    if (irq_domain_associate(domain, virq, hwirq)) {
            irq_free_desc(virq);
            return 0;
    }
    pr_debug("irq %lu on domain %s mapped to virtual irq %u\n",
            hwirq, of_node_full_name(of_node), virq);
    return virq;
}

2.3 查找映射

//输入参数是中断域和硬件中断号,返回linux中断号
unsigned int irq_find_mapping(struct irq_domain *domain,
			      irq_hw_number_t hwirq)
{
	struct irq_data *data;

	/* Look for default domain if nececssary */
	if (domain == NULL)
		domain = irq_default_domain;
	if (domain == NULL)
		return 0;
        //是否小于直接映射的最大中断号(非直接映射时为0),是则通过硬件获取中断号
	if (hwirq < domain->revmap_direct_max_irq) {
		data = irq_domain_get_irq_data(domain, hwirq);
		if (data && data->hwirq == hwirq)
			return hwirq;
	}

	//是否小于线性映射大小,是则通过线性映射查找(基数树时为0)
	if (hwirq < domain->revmap_size)
		return domain->linear_revmap[hwirq];
        //基数树内查找
	rcu_read_lock();
	data = radix_tree_lookup(&domain->revmap_tree, hwirq);
	rcu_read_unlock();
	return data ? data->irq : 0;
}

3、中断控制器驱动初始化

ARM64使用DTS描述板卡的硬件信息,中断控制器与外围设备中断相关的信息都记录在DTS中,通过解析DTS找到对应的中断控制器驱动程序,执行驱动程序的初始化函数。GIC v2控制器的初始化函数是gic_of_init()。

//驱动程序支持的设备名称(用来与DTS中conpatible匹配),以及对应的初始化函数
IRQCHIP_DECLARE(gic_400, "arm,gic-400", gic_of_init);
IRQCHIP_DECLARE(arm11mp_gic, "arm,arm11mp-gic", gic_of_init);
IRQCHIP_DECLARE(arm1176jzf_dc_gic, "arm,arm1176jzf-devchip-gic", gic_of_init);
IRQCHIP_DECLARE(cortex_a15_gic, "arm,cortex-a15-gic", gic_of_init);
IRQCHIP_DECLARE(cortex_a9_gic, "arm,cortex-a9-gic", gic_of_init);
IRQCHIP_DECLARE(cortex_a7_gic, "arm,cortex-a7-gic", gic_of_init);
IRQCHIP_DECLARE(msm_8660_qgic, "qcom,msm-8660-qgic", gic_of_init);
IRQCHIP_DECLARE(msm_qgic2, "qcom,msm-qgic2", gic_of_init);
IRQCHIP_DECLARE(pl390, "arm,pl390", gic_of_init);
int __init
gic_of_init(struct device_node *node, struct device_node *parent)
{
	struct gic_chip_data *gic;
	int irq, ret;

	if (WARN_ON(!node))
		return -ENODEV;
	if (WARN_ON(gic_cnt >= CONFIG_ARM_GIC_MAX_NR))
		return -EINVAL;
        //从全局数组去一个空闲的元素来保存本中断控制器的信息
	gic = &gic_data[gic_cnt];
        //从设备树文件读取中断控制器的属性"reg",获取分发器和处理器接口的寄存器的物理地址范围,把物理地址映射到内核的虚拟地址空间
	ret = gic_of_setup(gic, node);
	if (ret)
		return ret;
	/*
	 * Disable split EOI/Deactivate if either HYP is not available
	 * or the CPU interface is too small.
	 */
	if (gic_cnt == 0 && !gic_check_eoimode(node, &gic->raw_cpu_base))
		static_key_slow_dec(&supports_deactivate);
        //初始化结构体gic_chip_data
	ret = __gic_init_bases(gic, -1, &node->fwnode);
	if (ret) {
		gic_teardown(gic);
		return ret;
	}
        //本中断控制器是根控制器
	if (!gic_cnt) {
		gic_init_physaddr(node);
		gic_of_setup_kvm_info(node);
	}
        //如果本中断控制器有父设备,即作为中断源连接到其他中断控制器
	if (parent) {
                //从设备树文件中本设备节点的属性"interrupts"获取硬件中断号,映射到linux中断号
		irq = irq_of_parse_and_map(node, 0);
                //把linux中断号的中断描述符成员handle_irq()设置为函数gic_handle_cascade_irq()
		gic_cascade_irq(gic_cnt, irq);
	}
	if (IS_ENABLED(CONFIG_ARM_GIC_V2M))
		gicv2m_init(&node->fwnode, gic_data[gic_cnt].domain);

	gic_cnt++;
	return 0;
}
static int __init __gic_init_bases(struct gic_chip_data *gic,
				   int irq_start,
				   struct fwnode_handle *handle)
{
	char *name;
	int i, ret;

	if (WARN_ON(!gic || gic->domain))
		return -EINVAL;
        //如果本中断控制器是根控制器
	if (gic == &gic_data[0]) {
		/*
		 * Initialize the CPU interface map to all CPUs.
		 * It will be refined as each CPU probes its ID.
		 * This is only necessary for the primary GIC.
		 */
		for (i = 0; i < NR_GIC_CPU_IF; i++)
			gic_cpu_map[i] = 0xff;
#ifdef CONFIG_SMP
                //把全局函数指针__smp_cross_call设置为函数gic_raise_softirq(),用来发送软件生成的中断,即一个处理器向其他处理器发送中断
		set_smp_cross_call(gic_raise_softirq);
#endif
		cpuhp_setup_state_nocalls(CPUHP_AP_IRQ_GIC_STARTING,
					  "irqchip/arm/gic:starting",
					  gic_starting_cpu, NULL);
                //把全局函数指针handle_arch_irq设置为函数gic_handle_irq,是中断处理函数C语言入口
		set_handle_irq(gic_handle_irq);
		if (static_key_true(&supports_deactivate))
			pr_info("GIC: Using split EOI/Deactivate mode\n");
	}
        //初始化中断控制器描述符irq_chip
	if (static_key_true(&supports_deactivate) && gic == &gic_data[0]) {
		name = kasprintf(GFP_KERNEL, "GICv2");
		gic_init_chip(gic, NULL, name, true);
	} else {
		name = kasprintf(GFP_KERNEL, "GIC-%d", (int)(gic-&gic_data[0]));
		gic_init_chip(gic, NULL, name, false);
	}
        //为本中断控制器分配中断域,初始化中断控制器的分发器各种寄存器,处理器接口的各种寄存器
	ret = gic_init_bases(gic, irq_start, handle);
	if (ret)
		kfree(name);

	return ret;
}

void __init set_smp_cross_call(void (*fn)(const struct cpumask *, unsigned int))
{
	__smp_cross_call = fn;
}
void __init set_handle_irq(void (*handle_irq)(struct pt_regs *))
{
	if (handle_arch_irq)
		return;
	handle_arch_irq = handle_irq;
}

4、中断处理

对于中断控制器的每个中断源,内核会分配一个linux中断号和一个中断描述符irq_desc,中断描述符有两个层次的中断处理函数:

  1. 中断描述符成员handle_irq()
  2. 设备驱动程序注册的处理函数,中断描述符中有一个中断处理描述符链表,用来挂载多个中断处理描述符。
struct irq_desc {
        irq_flow_handler_t	handle_irq;
	struct irqaction	*action;	/* IRQ action list */
}

/**
 * struct irq_chip - hardware interrupt chip descriptor
 * @parent_device:	pointer to parent device for irqchip
 * @name:		name for /proc/interrupts
 * @irq_startup:	start up the interrupt (defaults to ->enable if NULL)
 * @irq_shutdown:	shut down the interrupt (defaults to ->disable if NULL)
 * @irq_enable:		enable the interrupt (defaults to chip->unmask if NULL)
 * @irq_disable:	disable the interrupt
 * @irq_ack:		start of a new interrupt
 * @irq_mask:		mask an interrupt source
 * @irq_mask_ack:	ack and mask an interrupt source
 * @irq_unmask:		unmask an interrupt source
 * @irq_eoi:		end of interrupt
 * @irq_set_affinity:	set the CPU affinity on SMP machines
 * @irq_retrigger:	resend an IRQ to the CPU
  ......
*/
struct irq_chip {
	struct device	*parent_device;
	const char	*name;
	unsigned int	(*irq_startup)(struct irq_data *data);
	void		(*irq_shutdown)(struct irq_data *data);
	void		(*irq_enable)(struct irq_data *data);
	void		(*irq_disable)(struct irq_data *data);
	void		(*irq_ack)(struct irq_data *data);
	void		(*irq_mask)(struct irq_data *data);
	void		(*irq_mask_ack)(struct irq_data *data);
	void		(*irq_unmask)(struct irq_data *data);
	void		(*irq_eoi)(struct irq_data *data);
	int		(*irq_set_affinity)(struct irq_data *data, const struct cpumask *dest, bool force);
	int		(*irq_retrigger)(struct irq_data *data);
        ......
	unsigned long	flags;
};

ARM64默认开启配置宏CONFIG_SPARSE_IRQ,使用基数树存储linux中断号与中断描述符的关系。

static RADIX_TREE(irq_desc_tree, GFP_KERNEL);

把硬件中断号映射到linux中断号的时候,根据硬件中断的类型设置中断描述符的成员handle_irq。以GIC v2为例:

//irq_create_mapping() -> irq_domain_associate() -> domain->ops->map() -> gic_irq_domain_map()
static int gic_irq_domain_map(struct irq_domain *d, unsigned int irq,
				irq_hw_number_t hw)
{
	struct gic_chip_data *gic = d->host_data;
        //硬件中断号小于32,说明是SGI或者PPI,handle_irq设置为handle_percpu_devid_irq
	if (hw < 32) {
		irq_set_percpu_devid(irq);
		irq_domain_set_info(d, irq, hw, &gic->chip, d->host_data,
				    handle_percpu_devid_irq, NULL, NULL);
		irq_set_status_flags(irq, IRQ_NOAUTOEN);
	} else {
                //硬件中断号大于等于32,说明是SPI,handle_irq设置为handle_fasteoi_irq
		irq_domain_set_info(d, irq, hw, &gic->chip, d->host_data,
				    handle_fasteoi_irq, NULL, NULL);
		irq_set_probe(irq);
	}
	return 0;
}
/**
 *	request_irq - allocate an interrupt line
 *	@irq: Interrupt line to allocate
 *	@handler: Function to be called when the IRQ occurs.
 *		  Primary handler for threaded interrupts
 *		  If NULL and thread_fn != NULL the default
 *		  primary handler is installed
 *	@flags: Interrupt type flags
            IRQF_SHARED:允许多个设备共享一个中断号
            IRQF_PROBE_SHARED	0x00000100
            __IRQF_TIMER:定时器中断
            IRQF_PERCPU:中断是每个处理器私有的
            IRQF_NOBALANCING:不允许该中断在处理器之间负载均衡
            IRQF_NO_THREAD:中断不能线程化
 *	@devname: An ascii name for the claiming device
 *	@dev: A cookie passed back to the handler function
*/
static inline int __must_check
request_irq(unsigned int irq, irq_handler_t handler, unsigned long flags,
	    const char *name, void *dev)
{
	return request_threaded_irq(irq, handler, NULL, flags, name, dev);
}

中断处理流程(用户态进程产生中断为例):

el0_irq:
        //把用户态进程的寄存器保存到内核栈
	kernel_entry 0
el0_irq_naked:
	enable_dbg
        ...
	ct_user_exit
        //中断处理
	irq_handler
        ...
        //使用内核栈保存的寄存器值恢复寄存器,返回用户空间
	b	ret_to_user
ENDPROC(el0_irq)
    .macro	irq_handler
    //handle_arch_irq在GIC v2在初始化的时候设置为gic_handle_irq
    ldr_l	x1, handle_arch_irq
    mov	x0, sp
    //内核栈切换到中断栈,每个处理器有一个专用的中断栈
    irq_stack_entry
    blr	x1
    //中断栈恢复到内核栈
    irq_stack_exit
    .endm
#define THREAD_SIZE		16384 //16k
#define THREAD_START_SP		(THREAD_SIZE - 16)
#define IRQ_STACK_SIZE			THREAD_SIZE
#define IRQ_STACK_START_SP		THREAD_START_SP
    .macro	irq_stack_entry
    //将svc mode下的栈地址(也就是EL1_SP)保存到x19
    mov	x19, sp			
    /*
     * Compare sp with the base of the task stack.
     * If the top ~(THREAD_SIZE - 1) bits match, we are on a task stack,
     * and should switch to the irq stack.
     */
    ldr	x25, [tsk, TSK_STACK]
    eor	x25, x25, x19
    and	x25, x25, #~(THREAD_SIZE - 1)
    cbnz	x25, 9998f
    //
    adr_this_cpu x25, irq_stack, x26
    //IRQ_STACK_START_SP这是irq mode的栈地址
    mov	x26, #IRQ_STACK_START_SP
    add	x26, x25, x26
    //将irq栈地址,写入到sp
    mov	sp, x26
    /*
     * Add a dummy stack frame, this non-standard format is fixed up
     * by unwind_frame()
     */
    stp     x29, x19, [sp, #-16]!
    mov	x29, sp
9998:
    .endm

GIC v2控制器的函数gic_handle_irq()代码如下:

static void __exception_irq_entry gic_handle_irq(struct pt_regs *regs)
{
	u32 irqstat, irqnr;
	struct gic_chip_data *gic = &gic_data[0];
	void __iomem *cpu_base = gic_data_cpu_base(gic);

	do {
                //读取处理器接口的中断确认寄存器得到中断号(pending -> active)
		irqstat = readl_relaxed(cpu_base + GIC_CPU_INTACK);
		irqnr = irqstat & GICC_IAR_INT_ID_MASK;
                //PPI私有外设中断与SPI共享外设中断
		if (likely(irqnr > 15 && irqnr < 1020)) {
                        //如果设置了supports_deactivate,把中断号写入处理器接口的中断结束寄存器,指示中断处理完成(active -> inactive)
			if (static_key_true(&supports_deactivate))
				writel_relaxed(irqstat, cpu_base + GIC_CPU_EOI);
                        //中断处理函数
			handle_domain_irq(gic->domain, irqnr, regs);
			continue;
		}
                //SGI软件生成的中断
		if (irqnr < 16) {
                        //把中断号写入处理器接口的中断结束寄存器,指示中断处理完成
			writel_relaxed(irqstat, cpu_base + GIC_CPU_EOI);
			if (static_key_true(&supports_deactivate))
				writel_relaxed(irqstat, cpu_base + GIC_CPU_DEACTIVATE);
#ifdef CONFIG_SMP
			//确保在读取GIC上的ACK寄存器后,读取发送IPI的CPU写入的任何共享数据。
			smp_rmb();
			handle_IPI(irqnr, regs);
#endif
			continue;
		}
		break;
	} while (1);
}

4.1 PPI私有外设中断与SPI共享外设中断

int __handle_domain_irq(struct irq_domain *domain, unsigned int hwirq,
			bool lookup, struct pt_regs *regs)
{
	struct pt_regs *old_regs = set_irq_regs(regs);
	unsigned int irq = hwirq;
	int ret = 0;

	irq_enter();

#ifdef CONFIG_IRQ_DOMAIN
	if (lookup)
		irq = irq_find_mapping(domain, hwirq);
#endif

	/*
	 * Some hardware gives randomly wrong interrupts.  Rather
	 * than crashing, do something sensible.
	 */
	if (unlikely(!irq || irq >= nr_irqs)) {
		ack_bad_irq(irq);
		ret = -EINVAL;
	} else {
		generic_handle_irq(irq);
	}

	irq_exit();
	set_irq_regs(old_regs);
	return ret;
}
//进入中断上下文
void irq_enter(void)
{
	rcu_irq_enter();
        //从idle中产生中断
	if (is_idle_task(current) && !in_interrupt()) {
		//防止raise_softirq在这里不必要地唤醒ksoftirqd,因为softirq将在中断返回时得到服务
		local_bh_disable();
                //Called from irq_enter to notify about the possible interruption of idle()
		tick_irq_enter();
		_local_bh_enable();
	}

	__irq_enter();
}
#define __irq_enter()					\
	do {						\
		account_irq_enter_time(current);	\
                //current_thread_info()->preempt_count增加HARDIRQ_OFFSET
		preempt_count_add(HARDIRQ_OFFSET);	\
		trace_hardirq_enter();			\
	} while (0)
/**
 * generic_handle_irq - Invoke the handler for a particular irq
 * @irq:	The irq number to handle
 *
 */
int generic_handle_irq(unsigned int irq)
{
        //获取中断描述符
	struct irq_desc *desc = irq_to_desc(irq);

	if (!desc)
		return -EINVAL;
	generic_handle_irq_desc(desc);
	return 0;
}
static inline void generic_handle_irq_desc(struct irq_desc *desc)
{
	desc->handle_irq(desc);
}

把硬件中断号映射到linux中断号的时候,根据硬件中断的类型设置中断描述符的成员handle_irq。以GIC v2为例:

//irq_create_mapping() -> irq_domain_associate() -> domain->ops->map() -> gic_irq_domain_map()

  1. 硬件中断号小于32,说明是SGI或者PPI,handle_irq设置为handle_percpu_devid_irq
  2. 硬件中断号大于等于32,说明是SPI,handle_irq设置为handle_fasteoi_irq
/**
 * handle_percpu_devid_irq - Per CPU local irq handler with per cpu dev ids
 * @desc:	the interrupt description structure for this irq
 */
void handle_percpu_devid_irq(struct irq_desc *desc)
{
	struct irq_chip *chip = irq_desc_get_chip(desc);
	struct irqaction *action = desc->action;
	unsigned int irq = irq_desc_get_irq(desc);
	irqreturn_t res;

	kstat_incr_irqs_this_cpu(desc);

	if (chip->irq_ack)
		chip->irq_ack(&desc->irq_data);
	if (likely(action)) {
		trace_irq_handler_entry(irq, action);
                //调用irqaction的回调函数
		res = action->handler(irq, raw_cpu_ptr(action->percpu_dev_id));
		trace_irq_handler_exit(irq, action, res);
	} else {
		unsigned int cpu = smp_processor_id();
		bool enabled = cpumask_test_cpu(cpu, desc->percpu_enabled);
		if (enabled)
			irq_percpu_disable(desc, cpu);
		pr_err_once("Spurious%s percpu IRQ%u on CPU%u\n",
			    enabled ? " and unmasked" : "", irq, cpu);
	}
	if (chip->irq_eoi)
		chip->irq_eoi(&desc->irq_data);
}

如果是共享外设中断,那么中断描述符成员handle_irq()是函数handle_fasteoi_irq():

void handle_fasteoi_irq(struct irq_desc *desc)
{
	struct irq_chip *chip = desc->irq_data.chip;

	raw_spin_lock(&desc->lock);
	if (!irq_may_run(desc))
		goto out;
	desc->istate &= ~(IRQS_REPLAY | IRQS_WAITING);
	/*
	 * If its disabled or no action available
	 * then mask it and get out of here:
	 */
	if (unlikely(!desc->action || irqd_irq_disabled(&desc->irq_data))) {
		desc->istate |= IRQS_PENDING;
		mask_irq(desc);
		goto out;
	}

	kstat_incr_irqs_this_cpu(desc);
	if (desc->istate & IRQS_ONESHOT)
		mask_irq(desc);
	preflow_handler(desc);
        //执行设备驱动程序注册的处理函数
	handle_irq_event(desc);
	cond_unmask_eoi_irq(desc, chip);

	raw_spin_unlock(&desc->lock);
	return;
out:
	if (!(chip->flags & IRQCHIP_EOI_IF_HANDLED))
		chip->irq_eoi(&desc->irq_data);
	raw_spin_unlock(&desc->lock);
}

handle_irq_event()主要把工作委托给函数handle_irq_event_percpu()

irqreturn_t handle_irq_event(struct irq_desc *desc)
{
	irqreturn_t ret;

	desc->istate &= ~IRQS_PENDING;
	irqd_set(&desc->irq_data, IRQD_IRQ_INPROGRESS);
	raw_spin_unlock(&desc->lock);

	ret = handle_irq_event_percpu(desc);

	raw_spin_lock(&desc->lock);
	irqd_clear(&desc->irq_data, IRQD_IRQ_INPROGRESS);
	return ret;
}

irqreturn_t handle_irq_event_percpu(struct irq_desc *desc)
{
	irqreturn_t retval;
	unsigned int flags = 0;

	retval = __handle_irq_event_percpu(desc, &flags);

	add_interrupt_randomness(desc->irq_data.irq, flags);

	if (!noirqdebug)
		note_interrupt(desc, retval);
	return retval;
}

__handle_irq_event_percpu去遍历中断描述符的中断处理链表,执行每个中断处理描述符的处理函数。

irqreturn_t __handle_irq_event_percpu(struct irq_desc *desc, unsigned int *flags)
{
	irqreturn_t retval = IRQ_NONE;
	unsigned int irq = desc->irq_data.irq;
	struct irqaction *action;
	for_each_action_of_desc(desc, action) {
		irqreturn_t res;
		trace_irq_handler_entry(irq, action);
                //中断处理函数
		res = action->handler(irq, action->dev_id);
		trace_irq_handler_exit(irq, action, res);
		if (WARN_ONCE(!irqs_disabled(),"irq %u handler %pF enabled interrupts\n",
			      irq, action->handler))
			local_irq_disable();

		switch (res) {
		case IRQ_WAKE_THREAD:
                        ......
                        //唤醒中断线程
			__irq_wake_thread(desc, action);
			//继续往下走,把action->flags作为生成随机数的一个因子
		case IRQ_HANDLED:
			*flags |= action->flags;
			break;
		default:
			break;
		}
		retval |= res;
	}
	return retval;
}

中断退出函数:

/*
 * Exit an interrupt context. Process softirqs if needed and possible:
 */
void irq_exit(void)
{
#ifndef __ARCH_IRQ_EXIT_IRQS_DISABLED
	local_irq_disable();
#else
	WARN_ON_ONCE(!irqs_disabled());
#endif

	account_irq_exit_time(current);
	preempt_count_sub(HARDIRQ_OFFSET);
        //判断是否有软中断pending
	if (!in_interrupt() && local_softirq_pending())
                //调用软中断
		invoke_softirq();
	tick_irq_exit();
	rcu_irq_exit();
	trace_hardirq_exit(); /* must be last! */
}

4.2 IPI处理器间中断

常见的使用处理器间中断函数如下:

  1. 在所有其他处理器上执行一个函数
/**
 * smp_call_function(): Run a function on all other CPUs.
 * @func: 目标处理器在中断处理程序中要执行的函数
 * @info: 传给函数func的参数
 * @wait: 是否需要等待目标处理器执行完函数
 * You must not call this function with disabled interrupts or from a
 * hardware interrupt handler or from a bottom half handler.
 */
int smp_call_function(smp_call_func_t func, void *info, int wait)
{
	preempt_disable();
	smp_call_function_many(cpu_online_mask, func, info, wait);
	preempt_enable();
	return 0;
}
  1. 在指定的处理器上执行一个函数
int smp_call_function_single(int cpu, smp_call_func_t func, void *info, int wait)
  1. 要求指定的处理器重新调度进程
void smp_send_reschedule(int cpu)
{
	smp_cross_call(cpumask_of(cpu), IPI_RESCHEDULE);
}

对于arm64架构的GIC处理器,可以写分发器的寄存器GICD_SGIR(软件生成中断处理器)以生成处理器间中断。函数handle_IPI负责处理处理期间中断:

void handle_IPI(int ipinr, struct pt_regs *regs)
{
        //获得当前运行该代码的cpu id
	unsigned int cpu = smp_processor_id();
        //pt_regs结构体主要包含当前的寄存器信息
	struct pt_regs *old_regs = set_irq_regs(regs);

	if ((unsigned)ipinr < NR_IPI) {
                //ftrace记录进入ipi中断,用于debug
		trace_ipi_entry_rcuidle(ipi_types[ipinr]);
                //统计cpu ipi中断数量,cat /proc/interrupts中IPI相关中断的数据即来自于此
		__inc_irq_stat(cpu, ipi_irqs[ipinr]);
	}
        //
	switch (ipinr) {
        //触发重调度
	case IPI_RESCHEDULE:
		scheduler_ipi();
		break;
        //执行本cpu的function回调
	case IPI_CALL_FUNC:
		irq_enter();
		generic_smp_call_function_interrupt();
		irq_exit();
		break;
        //将本cpu停下来,进入低功耗状态
	case IPI_CPU_STOP:
		irq_enter();
		ipi_cpu_stop(cpu);
		irq_exit();
		break;
        //如果配置了KEXEC,即在系统panic时会进入第二内核,才会有本IPI中断的操作
	case IPI_CPU_CRASH_STOP:
		if (IS_ENABLED(CONFIG_KEXEC_CORE)) {
			irq_enter();
			ipi_cpu_crash_stop(cpu, regs);
			unreachable();
		}
		break;
        //接收timer广播,执行timer的中断回调
#ifdef CONFIG_GENERIC_CLOCKEVENTS_BROADCAST
	case IPI_TIMER:
		irq_enter();
		tick_receive_broadcast();
		irq_exit();
		break;
#endif
        //本cpu执行irq_work
#ifdef CONFIG_IRQ_WORK
	case IPI_IRQ_WORK:
		irq_enter();
		irq_work_run();
		irq_exit();
		break;
#endif
        //从parked状态中唤醒本cpu
#ifdef CONFIG_ARM64_ACPI_PARKING_PROTOCOL
	case IPI_WAKEUP:
		WARN_ONCE(!acpi_parking_protocol_valid(cpu),
			  "CPU%u: Wake-up IPI outside the ACPI parking protocol\n",
			  cpu);
		break;
#endif
	default:
		pr_crit("CPU%u: Unknown IPI message 0x%x\n", cpu, ipinr);
		break;
	}
	if ((unsigned)ipinr < NR_IPI)
		trace_ipi_exit_rcuidle(ipi_types[ipinr]);
	set_irq_regs(old_regs);
}
4.2.1 IPI_RESCHEDULE

硬件中断号是0,函数smp_send_reschedule()生成的中断。一般的使用场景是先为进程设置TIF_NEED_RESCHED 标志(意味着需要调度某个进程),如果发现该进程没有在当前CPU上,那就通过smp_send_reschedule接口触发IPI_RESCHEDULE中断给相应CPU,相应CPU最终进入hadle_IPI的scheduler_ipi中,尝试唤醒pending着的进程。

void scheduler_ipi(void)
{
	/*
	 * Fold TIF_NEED_RESCHED into the preempt_count; anybody setting
	 * TIF_NEED_RESCHED remotely (for the first time) will also send
	 * this IPI.
	 */
	preempt_fold_need_resched();

	if (llist_empty(&this_rq()->wake_list) && !got_nohz_idle_kick())
		return;
	/*
	 * Not all reschedule IPI handlers call irq_enter/irq_exit, since
	 * traditionally all their work was done from the interrupt return
	 * path. Now that we actually do some work, we need to make sure
	 * we do call them.
	 *
	 * Some archs already do call them, luckily irq_enter/exit nest
	 * properly.
	 *
	 * Arguably we should visit all archs and update all handlers,
	 * however a fair share of IPIs are still resched only so this would
	 * somewhat pessimize the simple resched case.
	 */
	irq_enter();
        //尝试唤醒pending着的线程
	sched_ttwu_pending();
	/*
	 * Check if someone kicked us for doing the nohz idle load balance.
	 */
	if (unlikely(got_nohz_idle_kick())) {
		this_rq()->idle_balance = 1;
		raise_softirq_irqoff(SCHED_SOFTIRQ);
	}
	irq_exit();
}
4.2.2 IPI_CALL_FUNC

硬件中断号是1,执行函数,函数smp_call_function()生成的中断。target cpu会经历以下调用handle_IPI->generic_smp_call_function_interrupt-->generic_smp_call_function_single_interrupt-->flush_smp_call_function_queue调用一遍所有pending状态中的function回调。

static void flush_smp_call_function_queue(bool warn_cpu_offline)
{
	struct llist_head *head;
	struct llist_node *entry;
	struct call_single_data *csd, *csd_next;
	static bool warned;

	WARN_ON(!irqs_disabled());
        //拿到本cpu的call_single_queue全局链表,当其他cpu想让本cpu执行某个func时,
        //会通过generic_exec_single()接口往对应cpu的call_signal_queue中添加相应的call_single_data_t结构,
        //这其中包含着需要执行的func回调
	head = this_cpu_ptr(&call_single_queue);
	entry = llist_del_all(head);
	entry = llist_reverse_order(entry);

	/* There shouldn't be any pending callbacks on an offline CPU. */
	if (unlikely(warn_cpu_offline && !cpu_online(smp_processor_id()) &&
		     !warned && !llist_empty(head))) {
		warned = true;
		WARN(1, "IPI on offline CPU %d\n", smp_processor_id());

		/*
		 * We don't have to use the _safe() variant here
		 * because we are not invoking the IPI handlers yet.
		 */
		llist_for_each_entry(csd, entry, llist)
			pr_warn("IPI callback %pS sent to offline CPU\n",
				csd->func);
	}
        //循环遍历call_single_queue链表,把每个成员的func执行一遍
	llist_for_each_entry_safe(csd, csd_next, entry, llist) {
		smp_call_func_t func = csd->func;
		void *info = csd->info;

		//根据func回调设置的属性,是否是同步接口,决定执行顺序
		if (csd->flags & CSD_FLAG_SYNCHRONOUS) {
                        //如果是同步的,就先执行func回调,再放相应csd句柄的锁
			func(info);
			csd_unlock(csd);
		} else {
                        //如果是异步的,先放相应csd句柄的锁,再执行func回调
			csd_unlock(csd);
			func(info);
		}
	}

	/*
	 * Handle irq works queued remotely by irq_work_queue_on().
	 * Smp functions above are typically synchronous so they
	 * better run first since some other CPUs may be busy waiting
	 * for them.
	 */
	irq_work_run();
}
4.2.3 IPI_CPU_STOP

硬件中断号是2,使处理器停止,函数smp_send_stop()生成的中断。

static void ipi_cpu_stop(unsigned int cpu)
{
	set_cpu_online(cpu, false);
	local_irq_disable();
	while (1)
		cpu_relax();
}
static inline void cpu_relax(void)
{
	asm volatile("yield" ::: "memory");
}
4.2.4 IPI_CPU_CRASH_STOP

硬件中断号是3,使处理器停止,函数smp_send_crash_stop()生成的中断。在系统crash时,发生crash的cpu给其他cpu发送该中断,在使能了KEXEC的系统中,主要是将寄存器信息保存下来,传递给第二内核。KEXEC常用来做系统crash时在不重启的情况下快速进入第二内核。第二内核的目的是把当前的ddr内存镜像保存下来,方便之后通过crash tool分析系统crash问题,感兴趣可以参考该**文章**了解。

static void ipi_cpu_crash_stop(unsigned int cpu, struct pt_regs *regs)
{
#ifdef CONFIG_KEXEC_CORE
        //保存当前cpu寄存器信息,方便定位问题
	crash_save_cpu(regs, cpu);
	atomic_dec(&waiting_for_crash_ipi);
	local_irq_disable();
#ifdef CONFIG_HOTPLUG_CPU
	if (cpu_ops[cpu]->cpu_die)
		cpu_ops[cpu]->cpu_die(cpu);
#endif
	//通过wfe和wfi指令,让当前cpu进入low-power standby模式
	cpu_park_loop();
#endif
}
4.2.5 IPI_TIMER

硬件中断号是4,当某cpu调用tick_broadcast(const struct cpumask *mask)时,即调用IPI_TIMER中断给相应cpu(通过mask指定cpu)发送timer的广播中断.

int tick_receive_broadcast(void)
{
	struct tick_device *td = this_cpu_ptr(&tick_cpu_device);
        //本cpu的tick_device的clock_event_device,这在某个cpu上是唯一的
	struct clock_event_device *evt = td->evtdev;

	if (!evt)
		return -ENODEV;
	if (!evt->event_handler)
		return -EINVAL;
        //执行tick clock的回调函数
	evt->event_handler(evt);
	return 0;
}
4.2.6 IPI_IRQ_WORK

硬件中断号是5,在硬中断上下文中执行回调函数,函数irq_work_queue()生成。

//本cpuraised_list和lazy_list链表上挂载的irq_work执行一遍
void irq_work_run(void)
{
	irq_work_run_list(this_cpu_ptr(&raised_list));
	irq_work_run_list(this_cpu_ptr(&lazy_list));
}
4.2.6 IPI_WAKEUP

硬件中断号是6,当cpu接收到该IPI中断,即从parked的状态(wfi和wfe的低功耗状态)唤醒过来,acpi_parking_protocol_cpu_boot()生成的中断。

bool acpi_parking_protocol_valid(int cpu)
{
	struct cpu_mailbox_entry *cpu_entry = &cpu_mailbox_entries[cpu];
	return cpu_entry->mailbox_addr && cpu_entry->version;
}

4.3 中断线程化

4.3.1 中断线程请求

中断线程化就是使用内核线程处理中断,目的是减少系统关中断的时间,增加系统的实时性。内核提供的函数request_threaded_irq()用来注册线程化的中断,其中thread_fn是线程处理函数。

int request_threaded_irq(unsigned int irq, irq_handler_t handler,
			 irq_handler_t thread_fn, unsigned long irqflags,
			 const char *devname, void *dev_id)

少数中断不能线程化,典型的例子是时钟中断,有些流氓进程不主动让出CPU,内核只能依靠周期性的时钟中断夺回处理器的控制权,时钟中断时调度器的脉搏。对于不能线程化的中断,注册处理函数的时候,必须设置标志IRQF_NO_THREAD。

如果开启了强制中断线程化的配置宏CONFIG_IRQ_FORCED_THREADING,并且在引导内核的时候指定内核参数"threadirqs",那么强制除了标识IRQF_NO_THREAD以外的所有中断线程化。ARM64默认配置宏CONFIG_IRQ_FORCED_THREADING。

每个中断处理描述符(irqaction)对应一个内核线程,成员thread指向内核线程的进程描述符,成员thread_fn指向线程处理函数。可以看到,中断处理线程是优先级为50、调度策略是SCHED_FIFO的实时内核线程,名称是"irq/"后面跟着linux中断号,线程处理函数是irq_thread()。

request_threaded_irq() ----> __setup_irq() ----> irq_setup_forced_threading()与setup_irq_thread()

static int irq_setup_forced_threading(struct irqaction *new)
{
        //使能了CONFIG_IRQ_FORCED_THREADING与CONFIG_PREEMPT_RT(RT patch)则force_irqthreads为true
        //使能CONFIG_IRQ_FORCED_THREADING,没有CONFIG_PREEMPT_RT,如果cmdline有threadirqs则force_irqthreads为true
	if (!force_irqthreads)
		return 0;
	if (new->flags & (IRQF_NO_THREAD | IRQF_PERCPU | IRQF_ONESHOT))
		return 0;
	/*
	 * No further action required for interrupts which are requested as
	 * threaded interrupts already
	 */
	if (new->handler == irq_default_primary_handler)
		return 0;

	new->flags |= IRQF_ONESHOT;
	/*
	 * Handle the case where we have a real primary handler and a
	 * thread handler. We force thread them as well by creating a
	 * secondary action.
	 */
	if (new->handler && new->thread_fn) {
		/* Allocate the secondary action */
		new->secondary = kzalloc(sizeof(struct irqaction), GFP_KERNEL);
		if (!new->secondary)
			return -ENOMEM;
                //
		new->secondary->handler = irq_forced_secondary_handler;
		new->secondary->thread_fn = new->thread_fn;
		new->secondary->dev_id = new->dev_id;
		new->secondary->irq = new->irq;
		new->secondary->name = new->name;
	}
	/* Deal with the primary handler */
        //设置IRQTF_FORCED_THREAD标志位
	set_bit(IRQTF_FORCED_THREAD标志位, &new->thread_flags);
        //handler改为thread_fn线程化
	new->thread_fn = new->handler;
        //修改handler,仅仅返回IRQ_WAKE_THREAD,去唤醒线程
	new->handler = irq_default_primary_handler;
	return 0;
}
static irqreturn_t irq_default_primary_handler(int irq, void *dev_id)
{
	return IRQ_WAKE_THREAD;
}

static int
setup_irq_thread(struct irqaction *new, unsigned int irq, bool secondary)
{
	struct task_struct *t;
        //#define MAX_USER_RT_PRIO	100
	struct sched_param param = {
		.sched_priority = MAX_USER_RT_PRIO/2,
	};
        //创建中断处理线程,中断处理线程唤醒后的处理函数是irq_thread()
	if (!secondary) {
		t = kthread_create(irq_thread, new, "irq/%d-%s", irq,
				   new->name);
	} else {
		t = kthread_create(irq_thread, new, "irq/%d-s-%s", irq,
				   new->name);
		param.sched_priority -= 1;
	}

	if (IS_ERR(t))
		return PTR_ERR(t);

	sched_setscheduler_nocheck(t, SCHED_FIFO, &param);
	/*
	 * We keep the reference to the task struct even if
	 * the thread dies to avoid that the interrupt code
	 * references an already freed task_struct.
	 */
	get_task_struct(t);
	new->thread = t;
	/*
	 * Tell the thread to set its affinity. This is
	 * important for shared interrupt handlers as we do
	 * not invoke setup_affinity() for the secondary
	 * handlers as everything is already set up. Even for
	 * interrupts marked with IRQF_NO_BALANCE this is
	 * correct as we want the thread to move to the cpu(s)
	 * on which the requesting code placed the interrupt.
	 */
	set_bit(IRQTF_AFFINITY, &new->thread_flags);
	return 0;
}
4.3.2 中断线程处理

4.1 PPI私有外设中断与SPI共享外设中断一节有提到,在中断处理程序中,函数__handle_irq_event_percpu() ----> __irq_wake_thread()遍历中断描述符的中断处理链表,执行每个中断处理描述符的处理函数。如果处理函数返回IRQ_WAKE_THREAD,说明是线程化的中断,那么唤醒中断处理线程。

void __irq_wake_thread(struct irq_desc *desc, struct irqaction *action)
{
	/*
	 * In case the thread crashed and was killed we just pretend that
	 * we handled the interrupt. The hardirq handler has disabled the
	 * device interrupt, so no irq storm is lurking.
	 */
	if (action->thread->flags & PF_EXITING)
		return;
	/*
	 * Wake up the handler thread for this action. If the
	 * RUNTHREAD bit is already set, nothing to do.
	 */
	if (test_and_set_bit(IRQTF_RUNTHREAD, &action->thread_flags))
		return;

	/*
	 * It's safe to OR the mask lockless here. We have only two
	 * places which write to threads_oneshot: This code and the
	 * irq thread.
	 *
	 * This code is the hard irq context and can never run on two
	 * cpus in parallel. If it ever does we have more serious
	 * problems than this bitmask.
	 *
	 * The irq threads of this irq which clear their "running" bit
	 * in threads_oneshot are serialized via desc->lock against
	 * each other and they are serialized against this code by
	 * IRQS_INPROGRESS.
	 *
	 * Hard irq handler:
	 *
	 *	spin_lock(desc->lock);
	 *	desc->state |= IRQS_INPROGRESS;
	 *	spin_unlock(desc->lock);
	 *	set_bit(IRQTF_RUNTHREAD, &action->thread_flags);
	 *	desc->threads_oneshot |= mask;
	 *	spin_lock(desc->lock);
	 *	desc->state &= ~IRQS_INPROGRESS;
	 *	spin_unlock(desc->lock);
	 *
	 * irq thread:
	 *
	 * again:
	 *	spin_lock(desc->lock);
	 *	if (desc->state & IRQS_INPROGRESS) {
	 *		spin_unlock(desc->lock);
	 *		while(desc->state & IRQS_INPROGRESS)
	 *			cpu_relax();
	 *		goto again;
	 *	}
	 *	if (!test_bit(IRQTF_RUNTHREAD, &action->thread_flags))
	 *		desc->threads_oneshot &= ~mask;
	 *	spin_unlock(desc->lock);
	 *
	 * So either the thread waits for us to clear IRQS_INPROGRESS
	 * or we are waiting in the flow handler for desc->lock to be
	 * released before we reach this point. The thread also checks
	 * IRQTF_RUNTHREAD under desc->lock. If set it leaves
	 * threads_oneshot untouched and runs the thread another time.
	 */
	desc->threads_oneshot |= action->thread_mask;

	/*
	 * We increment the threads_active counter in case we wake up
	 * the irq thread. The irq thread decrements the counter when
	 * it returns from the handler or in the exit path and wakes
	 * up waiters which are stuck in synchronize_irq() when the
	 * active count becomes zero. synchronize_irq() is serialized
	 * against this code (hard irq handler) via IRQS_INPROGRESS
	 * like the finalize_oneshot() code. See comment above.
	 */
	atomic_inc(&desc->threads_active);

	wake_up_process(action->thread);
}

wake_up_process(action->thread)唤醒中断处理线程,中断处理线程的处理函数是irq_thread(),调用函数irq_thread_fn(),然后函数irq_thread_fn()调用注册的线程处理函数。

static int irq_thread(void *data)
{
	struct callback_head on_exit_work;
	struct irqaction *action = data;
	struct irq_desc *desc = irq_to_desc(action->irq);
	irqreturn_t (*handler_fn)(struct irq_desc *desc,
			struct irqaction *action);
        //使能了CONFIG_IRQ_FORCED_THREADING与CONFIG_PREEMPT_RT则force_irqthreads为true
        //使能CONFIG_IRQ_FORCED_THREADING,没有CONFIG_PREEMPT_RT,如果cmdline有threadirqs则force_irqthreads为true
        //force_irqthreads使能的情况下,action->thread_flags在irq_setup_forced_threading()中设置为IRQTF_FORCED_THREAD
        //这里是为了force_irqthreads使能服务(例如打了RT patch),如果有软中断pending,会在中断线程中运行软中断(优先级50)
	if (force_irqthreads && test_bit(IRQTF_FORCED_THREAD,
					&action->thread_flags))
		handler_fn = irq_forced_thread_fn;
	else
                //如果没有force_irqthreads使能,则仅仅是唤醒中断线程,如果有软中断pending,软中断在ksoftirqd(优先级120)
		handler_fn = irq_thread_fn;

	init_task_work(&on_exit_work, irq_thread_dtor);
	task_work_add(current, &on_exit_work, false);
	irq_thread_check_affinity(desc, action);
        //通过判断IRQTF_RUNTHREAD标志是否被设置,因为在action->handler()执行后,
        //如果需要唤醒中断线程,会主动设置IRQTF_RUNTHREAD标志,所以while循环可以运行。
        //但是立马有clear flag中的IRQTF_RUNTHREAD,下一次再判断是否又被置位
	while (!irq_wait_for_interrupt(action)) {
		irqreturn_t action_ret;
		irq_thread_check_affinity(desc, action);
                //执行handler_fn,有上面判断是irq_forced_thread_fn还是irq_thread_fn
		action_ret = handler_fn(desc, action);
		if (action_ret == IRQ_HANDLED)
			atomic_inc(&desc->threads_handled);
		if (action_ret == IRQ_WAKE_THREAD)
			irq_wake_secondary(desc, action);
		wake_threads_waitq(desc);
	}

	/*
	 * This is the regular exit path. __free_irq() is stopping the
	 * thread via kthread_stop() after calling
	 * synchronize_irq(). So neither IRQTF_RUNTHREAD nor the
	 * oneshot mask bit can be set. We cannot verify that as we
	 * cannot touch the oneshot mask at this point anymore as
	 * __setup_irq() might have given out currents thread_mask
	 * again.
	 */
	task_work_cancel(current, irq_thread_dtor);
	return 0;
}

如果有软中断pending,irq_forced_thread_fn()与irq_thread_fn()的区别:

static irqreturn_t
irq_forced_thread_fn(struct irq_desc *desc, struct irqaction *action)
{
	irqreturn_t ret;

	local_bh_disable();
        //执行线程化函数
	ret = action->thread_fn(action->irq, action->dev_id);
	if (ret == IRQ_HANDLED)
		atomic_inc(&desc->threads_handled);
	irq_finalize_oneshot(desc, action);
        //local_bh_enable ----> __local_bh_enable_ip ----> do_softirq ----> 
        //__do_softirq ----> h->action(h);在中断线程上下文执行软中断回调函数
	local_bh_enable();
	return ret;
}
static irqreturn_t irq_thread_fn(struct irq_desc *desc,
		struct irqaction *action)
{
	irqreturn_t ret;
        //执行线程化函数,会在软中断线程ksoftirqd中执行软中断回调函数
	ret = action->thread_fn(action->irq, action->dev_id);
	irq_finalize_oneshot(desc, action);
	return ret;
}
4.3.3 中断线程绑核

线程化(或被强制线程化)的非nested中断在__setup_irq中都会被设置IRQTF_AFFINITY标志位,在中断线程调用irq_thread时通过irq_thread_check_affinity处理,判断是否要改变中断线程的affinity。
最后调用set_cpus_allowed_ptr(current, mask);设置当前进程(即irq thread)的cpu affinity,并迁移到合适的CPU上运行。

linux4.12上,中断线程默认绑定到CPU0上运行,如果设置了中断的绑核,则中断线程也会随着一起绑到固定的核上运行。

/*
 * Check whether we need to change the affinity of the interrupt thread.
 */
static void
irq_thread_check_affinity(struct irq_desc *desc, struct irqaction *action)
{
	cpumask_var_t mask;
	bool valid = true;
        //线程化(或被强制线程化)的非nested中断在__setup_irq中都会被设置IRQTF_AFFINITY标志位
	if (!test_and_clear_bit(IRQTF_AFFINITY, &action->thread_flags))
		return;
	/*
	 * In case we are out of memory we set IRQTF_AFFINITY again and
	 * try again next time
	 */
	if (!alloc_cpumask_var(&mask, GFP_KERNEL)) {
		set_bit(IRQTF_AFFINITY, &action->thread_flags);
		return;
	}
	raw_spin_lock_irq(&desc->lock);
	/*
	 * This code is triggered unconditionally. Check the affinity
	 * mask pointer. For CPU_MASK_OFFSTACK=n this is optimized out.
	 */
	if (cpumask_available(desc->irq_common_data.affinity))
                //返回desc->irq_common_data.affinity给mask
                //默认desc->irq_common_data.affinity的值为f,即不绑核
                //如果设置了中断的绑核(通过echo xx > /proc/irq/xx/smp_affinity设置),则中断线程也会随着一起绑到固定的核上
		cpumask_copy(mask, desc->irq_common_data.affinity);
	else
		valid = false;
	raw_spin_unlock_irq(&desc->lock);

	if (valid)
                //用set_cpus_allowed_ptr设置当前进程(即irq thread)的cpu affinity
		set_cpus_allowed_ptr(current, mask);
	free_cpumask_var(mask);
}

linux5.4上,中断线程默认绑定到CPU0上运行,如果设置了中断的绑核,则中断线程也会随着一起绑到固定的核上运行。

/*
 * Check whether we need to change the affinity of the interrupt thread.
 */
static void
irq_thread_check_affinity(struct irq_desc *desc, struct irqaction *action)
{
	cpumask_var_t mask;
	bool valid = true;
        //线程化(或被强制线程化)的非nested中断在__setup_irq中都会被设置IRQTF_AFFINITY标志位
	if (!test_and_clear_bit(IRQTF_AFFINITY, &action->thread_flags))
		return;
	/*
	 * In case we are out of memory we set IRQTF_AFFINITY again and
	 * try again next time
	 */
	if (!alloc_cpumask_var(&mask, GFP_KERNEL)) {
		set_bit(IRQTF_AFFINITY, &action->thread_flags);
		return;
	}

	raw_spin_lock_irq(&desc->lock);
	/*
	 * This code is triggered unconditionally. Check the affinity
	 * mask pointer. For CPU_MASK_OFFSTACK=n this is optimized out.
	 */
	if (cpumask_available(desc->irq_common_data.affinity)) {
		const struct cpumask *m;
                //返回d->common->effective_affinity给m
		m = irq_data_get_effective_affinity_mask(&desc->irq_data);
                //将m copy给mask
		cpumask_copy(mask, m);
	} else {
		valid = false;
	}
	raw_spin_unlock_irq(&desc->lock);

	if (valid)
                //用set_cpus_allowed_ptr设置当前进程(即irq thread)的cpu affinity
		set_cpus_allowed_ptr(current, mask);
	free_cpumask_var(mask);
}
//在linux5.4上,默认d->common->effective_affinity的值为0,即中断线程默认绑定到CPU0上运行(可以通过cat /proc/irq/xx/effective_affinity查看)
//如果设置了中断的绑核(通过echo xx > /proc/irq/xx/smp_affinity设置),则中断线程也会随着一起绑到固定的核上
static inline
struct cpumask *irq_data_get_effective_affinity_mask(struct irq_data *d)
{
	return d->common->effective_affinity;
}

4.4 禁止/开启中断

4.4.1 禁止/开启中断

软件可以禁止中断,使处理器不响应所有中断请求,但是不可屏蔽中断是个例外。

禁止中断的接口如下:

  1. local_irq_disable()
  2. local_irq_save(flags):首先把中断状态保存在参数flags中,然后禁止中断。

这两个接口只能禁止本处理器的中断,不能禁止其他处理器的中断,禁止中断以后,处理器不会响应中断请求。

开启中断接口如下:

  1. local_irq_enable()
  2. local_irq_restore(flags):恢复本处理器的中断状态。

local_irq_disable()和local_irq_enable()不能嵌套使用,local_irq_save(flags)与local_irq_restore(flags)可以嵌套使用。

static inline void arch_local_irq_disable(void)
{
	asm volatile(
		"msr	daifset, #2		// arch_local_irq_disable"
		:
		:
		: "memory");
}

把处理器状态的中断掩码位设置为1,从此以后处理器不会响应中断请求。

static inline void arch_local_irq_enable(void)
{
	asm volatile(
		"msr	daifclr, #2		// arch_local_irq_enable"
		:
		:
		: "memory");
}

把处理器状态的中断掩码位设置为0,开启响应中断请求。

4.42 禁止/开启单个中断‘

软件可以禁止单个外围设备的中断,中断控制器不会把该设备发送的中断转发给处理器。

禁止单个中断的函数是:

/**
 *	disable_irq - disable an irq and wait for completion
 *	@irq: Interrupt to disable
 *
 *	Disable the selected interrupt line.  Enables and Disables are
 *	nested.
 *	This function waits for any pending IRQ handlers for this interrupt
 *	to complete before returning. If you use this function while
 *	holding a resource the IRQ handler may need you will deadlock.
 *
 *	This function may be called - with care - from IRQ context.
 */
void disable_irq(unsigned int irq)
{
	if (!__disable_irq_nosync(irq))
		synchronize_irq(irq);
}
/**
 *	enable_irq - enable handling of an irq
 *	@irq: Interrupt to enable
 *
 *	Undoes the effect of one call to disable_irq().  If this
 *	matches the last disable, processing of interrupts on this
 *	IRQ line is re-enabled.
 *
 *	This function may be called from IRQ context only when
 *	desc->irq_data.chip->bus_lock and desc->chip->bus_sync_unlock are NULL !
 */
void enable_irq(unsigned int irq)
{
	unsigned long flags;
	struct irq_desc *desc = irq_get_desc_buslock(irq, &flags, IRQ_GET_DESC_CHECK_GLOBAL);

	if (!desc)
		return;
	if (WARN(!desc->irq_data.chip,
		 KERN_ERR "enable_irq before setup/request_irq: irq %u\n", irq))
		goto out;

	__enable_irq(desc);
out:
	irq_put_desc_busunlock(desc, flags);
}

如果需要开启硬件中断n,那么设置分发器的寄存器GICD_ISENABLERn(Interrupt Set-Enable Register);如果需要禁止硬件中断n,那么设置分发器的寄存器GICD_ICENABLERn(Interrupt Clear-Enable Register)。

4.5 中断亲和性

在多处理器系统中,管理员可以设置中断亲和性,允许中断控制器把某个中断转发给哪些处理器,有两种配置方法:

  1. 写文件/proc/irq/IRQ#/smp_affinity,参数是位掩码
  2. 写文件/proc/irq/IRQ#/smp_list,参数是处理器列表

例如:把linux中断号为32的中断转发给处理器0-3,配置方式有两种:

  1. echo 0f > /proc/irq/32/smp_affinity
  2. echo 0-3 > /proc/irq/32/smp_affinity_list

配置完后,可以连续执行cat /proc/interrupts | grep 'CPU\|32:'观察是否有处理器0-3收到了linux中断号位32的中断。

内核也提供了设置中断亲和性的函数:

/**
 * irq_set_affinity - Set the irq affinity of a given irq
 * @irq:	Interrupt to set affinity
 * @cpumask:	cpumask
 *
 * Fails if cpumask does not contain an online CPU
 */
static inline int
irq_set_affinity(unsigned int irq, const struct cpumask *cpumask)
{
	return __irq_set_affinity(irq, cpumask, false);
}
//cpumask可用cpumask_of()赋值
int cpu = smp_processor_id();
cpumask_of(cpu);

对于ARM64 GIC控制器,可以设置分发器的寄存器GICD_ITARGETSRn(Interrupt Targets Regitster)允许把硬件中断n转发到哪些处理器,硬件中断n必须是共享外设中断。

5、参考资料

《linux内核深度解析》--余华兵

linux内核源码

一文肝翻 Linux 中断要点

基于ARM Cortex-A9中断详解