操作系统中,中断是很重要的组成部分。出现某些意外情况需主机干预时,机器能自动停止正在运行的程序并转入处理新情况的程序,处理完毕后又返回原被暂停的程序继续运行。
有了中断系统才可以不用一直轮询(polling)是否有事件发生,系统效率才得以提高。
一般在系统中,中断控制分为三个部分:「模块、中断控制器和处理器」。
其中模块通常由寄存器控制是否使能中断和中断触发条件等;中断控制器可以管理中断的优先级等,而处理器则由寄存器设置用来响应中断。
ARM Generic Interrupt Controller Architecture version 2.0 - Architecture Specification
本文基于linux4.12
1、中断控制器
外围设备不是把中断请求直接发给处理器,而是发给中断控制器。从软件的角度来看,GIC v2(通用中断控制器,ARM提供的一种标准的中断控制器)控制器有两个主要的功能块:
- 分发器:系统中所有中断源连接到分发器,分发器的寄存器用来控制单个中断的属性:优先级、状态、安全、转发消息(可以被发送到哪些处理器)和使能状态。分发器决定哪个中断应该通过处理器接口转发到哪个处理器。
- 处理器接口:处理器通过处理器接口接收中断,处理器接口提供的寄存器来屏蔽和识别中断,控制中断的状态,每个处理器有个单独的处理器接口。
软件通过中断号识别中断,每个中断号唯一对应一个中断源。中断有以下四种类型:
- 软件生成的中断(Software Generated Interrupt,SGI):中断号0-15,通常用来实现处理器间中断(Inter-Processor Interrupt,IPI),这种中断是由软件写分发器的
软件生成中断寄存器(GICD_SGIR)
生成的。 - 私有外设中断(Private Peripheral Interrupt,PPI),中断号16-31,处理器私有的中断源,不同处理器的相同中断源没有关系,比如每个处理器的定时器。
- 共享外设中断(Shared Peripheral Interrupt,SPI):中断号32-1020,这种中断可以被中断控制器转发到多个处理器。
- 局部特定外设中断(Locality-specific Peripheral Interrupt,LPI):基于消息的中断,GIC v1与v2不支持。
中断状态:
- Inactive:中断源没有发送中断。
- Pending:中断源已经发送中断,等待处理器处理。
- Active:处理器已经确认中断,正在处理。
- Active and pending:处理器正在处理中断,相同的中断源又发送了一个中断。
外围设备(inactive) ----> 分发器(pending) ----> 目标处理器的处理器接口 ----> 处理器 ----> 读取处理器接口的中断确认寄存器(avtive或者active and pending) ----> 得到中断号 ----> 根据中断号调用处理程序 ----> 处理完把中断号写到处理器接口的中断结束寄存器(inactive或者pending)。
2、中断域
中断控制器是支持级联的,为了把每个中断控制器本地的硬件中断号映射到全局唯一的linux中断号(也称虚拟中断号),内核定义了中断域irq_domain,每个中断控制器都有自己的中断域。
2.1 创建中断域
中断控制器的驱动程序使用分配函数irq_domain_add_*()
创建和注册中断域。每种映射方法提供不同的分配函数,调用者必须给分配函数提供irq_domain_ops结构体(),分配函数在执行成功的时候返回irq_domain的指针。其实irq_domain实现硬件中断号到全局唯一的linux中断号的映射,就是通过不同的方式将这种映射关系保存在结构体中。
//由驱动程序提供,并在创建新映射或处理旧映射时调用
struct irq_domain_ops {
//匹配中断控制器设备节点与
int (*match)(struct irq_domain *d, struct device_node *node,
enum irq_domain_bus_token bus_token);
int (*select)(struct irq_domain *d, struct irq_fwspec *fwspec,
enum irq_domain_bus_token bus_token);
//创建或更新一个虚拟中断号与硬件中断号的映射
int (*map)(struct irq_domain *d, unsigned int virq, irq_hw_number_t hw);
//解映射
void (*unmap)(struct irq_domain *d, unsigned int virq);
//给定设备树节点和中断说明符,解码硬件irq号和linux irq类型值
int (*xlate)(struct irq_domain *d, struct device_node *node,
const u32 *intspec, unsigned int intsize,
unsigned long *out_hwirq, unsigned int *out_type);
};
//中断域,硬件中断号转换
struct irq_domain {
//在全局irq_domain list中的元素
struct list_head link;
//中断域的name
const char *name;
//指向irq_domain_ops
const struct irq_domain_ops *ops;
//私有数据
void *host_data;
unsigned int flags;
/* Optional data */
//指向irq_domain关联的设备树节点
struct fwnode_handle *fwnode;
enum irq_domain_bus_token bus_token;
//指向一个通用芯片的list,为了使用通用chip库设置多个通用chip的中断控制器驱动
struct irq_domain_chip_generic *gc;
#ifdef CONFIG_IRQ_DOMAIN_HIERARCHY
struct irq_domain *parent;
#endif
//最大硬件中断号
irq_hw_number_t hwirq_max;
//直接映射能设置的最大硬件中断号(直接映射用)
unsigned int revmap_direct_max_irq;
//线性映射表linear_revmap的大小(线性映射用)
unsigned int revmap_size;
//不适合线性映射的基数树硬件中断号
struct radix_tree_root revmap_tree;
//硬件中断号到虚拟中断号的线性映射表
unsigned int linear_revmap[];
};
2.1.1 线性映射(linear map)
线性映射维护一个固定大小的表,索引是硬件中断号,如果硬件中断号最大数量是固定的,且比较小,那么线性映射是好的选择。
static inline struct irq_domain *irq_domain_add_linear(struct device_node *of_node,
unsigned int size,
const struct irq_domain_ops *ops,
void *host_data)
{
return __irq_domain_add(of_node_to_fwnode(of_node), size, size, 0, ops, host_data);
}
2.1.2 树映射(tree map)
树映射使用基数树(radix tree)保存硬件中断号到linux中断号的映射,如果硬件中断号可能非常大,那么树映射是好的选择,因为不需要根据最大硬件分配号分配一个很大的表。
static inline struct irq_domain *irq_domain_add_tree(struct device_node *of_node,
const struct irq_domain_ops *ops,
void *host_data)
{
return __irq_domain_add(of_node_to_fwnode(of_node), 0, ~0, 0, ops, host_data);
}
2.1.3 不映射
有些中断控制器是很强的,硬件中断号是可以配置的,可以直接把中断号写到硬件。
static inline struct irq_domain *irq_domain_add_nomap(struct device_node *of_node,
unsigned int max_irq,
const struct irq_domain_ops *ops,
void *host_data)
{
return __irq_domain_add(of_node_to_fwnode(of_node), 0, max_irq, max_irq, ops, host_data);
}
分配函数把工作委托给__irq_domain_add(),分配一个irq_domain结构体,初始化成员,通过不同的参数,表示不同的映射方式,然后把中断域添加到全局链表irq_domain_list。
/**
* __irq_domain_add() - 分配一个新的irq_domain
* @fwnode: 中断控制器的设备节点
* @size: 线性映射的大小,基数树时为0
* @hwirq_max: 控制器支持的最大中断数
* @direct_max: 直接映射的最大值,无限制时使用~0,非直接映射时为0
* @ops: 中断域回调操作函数irq_domain_ops
* @host_data: 控制器私有数据指针
*
* Allocates and initialize and irq_domain structure.
* Returns pointer to IRQ domain, or NULL on failure.
*/
struct irq_domain *__irq_domain_add(struct fwnode_handle *fwnode, int size,
irq_hw_number_t hwirq_max, int direct_max,
const struct irq_domain_ops *ops,
void *host_data)
{
struct device_node *of_node = to_of_node(fwnode);
struct irq_domain *domain;
domain = kzalloc_node(sizeof(*domain) + (sizeof(unsigned int) * size),
GFP_KERNEL, of_node_to_nid(of_node));
if (WARN_ON(!domain))
return NULL;
of_node_get(of_node);
/* Fill structure */
INIT_RADIX_TREE(&domain->revmap_tree, GFP_KERNEL);
domain->ops = ops;
domain->host_data = host_data;
domain->fwnode = fwnode;
domain->hwirq_max = hwirq_max;
domain->revmap_size = size;
domain->revmap_direct_max_irq = direct_max;
irq_domain_check_hierarchy(domain);
mutex_lock(&irq_domain_mutex);
list_add(&domain->link, &irq_domain_list);
mutex_unlock(&irq_domain_mutex);
pr_debug("Added domain %s\n", domain->name);
return domain;
}
2.2 创建映射
创建中断域以后,需要想中断域添加硬件中断号到linux中断号的映射:
//输入参数是中断域和硬件中断号,返回linux中断号
unsigned int irq_create_mapping(struct irq_domain *domain, irq_hw_number_t hwirq)
{
struct device_node *of_node;
int virq;
pr_debug("irq_create_mapping(0x%p, 0x%lx)\n", domain, hwirq);
/* Look for default domain if nececssary */
if (domain == NULL)
domain = irq_default_domain;
if (domain == NULL) {
WARN(1, "%s(, %lx) called with NULL domain\n", __func__, hwirq);
return 0;
}
pr_debug("-> using domain @%p\n", domain);
of_node = irq_domain_get_of_node(domain);
/* Check if mapping already exists */
virq = irq_find_mapping(domain, hwirq);
if (virq) {
pr_debug("-> existing mapping on virq %d\n", virq);
return virq;
}
//通过irq_domain_alloc_descs()分配一个虚拟中断号
virq = irq_domain_alloc_descs(-1, 1, hwirq, of_node_to_nid(of_node), NULL);
if (virq <= 0) {
pr_debug("-> virq allocation failed\n");
return 0;
}
//2.调用把硬件中断号与linux虚拟中断号映射添加到中断域
if (irq_domain_associate(domain, virq, hwirq)) {
irq_free_desc(virq);
return 0;
}
pr_debug("irq %lu on domain %s mapped to virtual irq %u\n",
hwirq, of_node_full_name(of_node), virq);
return virq;
}
2.3 查找映射
//输入参数是中断域和硬件中断号,返回linux中断号
unsigned int irq_find_mapping(struct irq_domain *domain,
irq_hw_number_t hwirq)
{
struct irq_data *data;
/* Look for default domain if nececssary */
if (domain == NULL)
domain = irq_default_domain;
if (domain == NULL)
return 0;
//是否小于直接映射的最大中断号(非直接映射时为0),是则通过硬件获取中断号
if (hwirq < domain->revmap_direct_max_irq) {
data = irq_domain_get_irq_data(domain, hwirq);
if (data && data->hwirq == hwirq)
return hwirq;
}
//是否小于线性映射大小,是则通过线性映射查找(基数树时为0)
if (hwirq < domain->revmap_size)
return domain->linear_revmap[hwirq];
//基数树内查找
rcu_read_lock();
data = radix_tree_lookup(&domain->revmap_tree, hwirq);
rcu_read_unlock();
return data ? data->irq : 0;
}
3、中断控制器驱动初始化
ARM64使用DTS描述板卡的硬件信息,中断控制器与外围设备中断相关的信息都记录在DTS中,通过解析DTS找到对应的中断控制器驱动程序,执行驱动程序的初始化函数。GIC v2控制器的初始化函数是gic_of_init()。
//驱动程序支持的设备名称(用来与DTS中conpatible匹配),以及对应的初始化函数
IRQCHIP_DECLARE(gic_400, "arm,gic-400", gic_of_init);
IRQCHIP_DECLARE(arm11mp_gic, "arm,arm11mp-gic", gic_of_init);
IRQCHIP_DECLARE(arm1176jzf_dc_gic, "arm,arm1176jzf-devchip-gic", gic_of_init);
IRQCHIP_DECLARE(cortex_a15_gic, "arm,cortex-a15-gic", gic_of_init);
IRQCHIP_DECLARE(cortex_a9_gic, "arm,cortex-a9-gic", gic_of_init);
IRQCHIP_DECLARE(cortex_a7_gic, "arm,cortex-a7-gic", gic_of_init);
IRQCHIP_DECLARE(msm_8660_qgic, "qcom,msm-8660-qgic", gic_of_init);
IRQCHIP_DECLARE(msm_qgic2, "qcom,msm-qgic2", gic_of_init);
IRQCHIP_DECLARE(pl390, "arm,pl390", gic_of_init);
int __init
gic_of_init(struct device_node *node, struct device_node *parent)
{
struct gic_chip_data *gic;
int irq, ret;
if (WARN_ON(!node))
return -ENODEV;
if (WARN_ON(gic_cnt >= CONFIG_ARM_GIC_MAX_NR))
return -EINVAL;
//从全局数组去一个空闲的元素来保存本中断控制器的信息
gic = &gic_data[gic_cnt];
//从设备树文件读取中断控制器的属性"reg",获取分发器和处理器接口的寄存器的物理地址范围,把物理地址映射到内核的虚拟地址空间
ret = gic_of_setup(gic, node);
if (ret)
return ret;
/*
* Disable split EOI/Deactivate if either HYP is not available
* or the CPU interface is too small.
*/
if (gic_cnt == 0 && !gic_check_eoimode(node, &gic->raw_cpu_base))
static_key_slow_dec(&supports_deactivate);
//初始化结构体gic_chip_data
ret = __gic_init_bases(gic, -1, &node->fwnode);
if (ret) {
gic_teardown(gic);
return ret;
}
//本中断控制器是根控制器
if (!gic_cnt) {
gic_init_physaddr(node);
gic_of_setup_kvm_info(node);
}
//如果本中断控制器有父设备,即作为中断源连接到其他中断控制器
if (parent) {
//从设备树文件中本设备节点的属性"interrupts"获取硬件中断号,映射到linux中断号
irq = irq_of_parse_and_map(node, 0);
//把linux中断号的中断描述符成员handle_irq()设置为函数gic_handle_cascade_irq()
gic_cascade_irq(gic_cnt, irq);
}
if (IS_ENABLED(CONFIG_ARM_GIC_V2M))
gicv2m_init(&node->fwnode, gic_data[gic_cnt].domain);
gic_cnt++;
return 0;
}
static int __init __gic_init_bases(struct gic_chip_data *gic,
int irq_start,
struct fwnode_handle *handle)
{
char *name;
int i, ret;
if (WARN_ON(!gic || gic->domain))
return -EINVAL;
//如果本中断控制器是根控制器
if (gic == &gic_data[0]) {
/*
* Initialize the CPU interface map to all CPUs.
* It will be refined as each CPU probes its ID.
* This is only necessary for the primary GIC.
*/
for (i = 0; i < NR_GIC_CPU_IF; i++)
gic_cpu_map[i] = 0xff;
#ifdef CONFIG_SMP
//把全局函数指针__smp_cross_call设置为函数gic_raise_softirq(),用来发送软件生成的中断,即一个处理器向其他处理器发送中断
set_smp_cross_call(gic_raise_softirq);
#endif
cpuhp_setup_state_nocalls(CPUHP_AP_IRQ_GIC_STARTING,
"irqchip/arm/gic:starting",
gic_starting_cpu, NULL);
//把全局函数指针handle_arch_irq设置为函数gic_handle_irq,是中断处理函数C语言入口
set_handle_irq(gic_handle_irq);
if (static_key_true(&supports_deactivate))
pr_info("GIC: Using split EOI/Deactivate mode\n");
}
//初始化中断控制器描述符irq_chip
if (static_key_true(&supports_deactivate) && gic == &gic_data[0]) {
name = kasprintf(GFP_KERNEL, "GICv2");
gic_init_chip(gic, NULL, name, true);
} else {
name = kasprintf(GFP_KERNEL, "GIC-%d", (int)(gic-&gic_data[0]));
gic_init_chip(gic, NULL, name, false);
}
//为本中断控制器分配中断域,初始化中断控制器的分发器各种寄存器,处理器接口的各种寄存器
ret = gic_init_bases(gic, irq_start, handle);
if (ret)
kfree(name);
return ret;
}
void __init set_smp_cross_call(void (*fn)(const struct cpumask *, unsigned int))
{
__smp_cross_call = fn;
}
void __init set_handle_irq(void (*handle_irq)(struct pt_regs *))
{
if (handle_arch_irq)
return;
handle_arch_irq = handle_irq;
}
4、中断处理
对于中断控制器的每个中断源,内核会分配一个linux中断号和一个中断描述符irq_desc,中断描述符有两个层次的中断处理函数:
- 中断描述符成员handle_irq()
- 设备驱动程序注册的处理函数,中断描述符中有一个中断处理描述符链表,用来挂载多个中断处理描述符。
struct irq_desc {
irq_flow_handler_t handle_irq;
struct irqaction *action; /* IRQ action list */
}
/**
* struct irq_chip - hardware interrupt chip descriptor
* @parent_device: pointer to parent device for irqchip
* @name: name for /proc/interrupts
* @irq_startup: start up the interrupt (defaults to ->enable if NULL)
* @irq_shutdown: shut down the interrupt (defaults to ->disable if NULL)
* @irq_enable: enable the interrupt (defaults to chip->unmask if NULL)
* @irq_disable: disable the interrupt
* @irq_ack: start of a new interrupt
* @irq_mask: mask an interrupt source
* @irq_mask_ack: ack and mask an interrupt source
* @irq_unmask: unmask an interrupt source
* @irq_eoi: end of interrupt
* @irq_set_affinity: set the CPU affinity on SMP machines
* @irq_retrigger: resend an IRQ to the CPU
......
*/
struct irq_chip {
struct device *parent_device;
const char *name;
unsigned int (*irq_startup)(struct irq_data *data);
void (*irq_shutdown)(struct irq_data *data);
void (*irq_enable)(struct irq_data *data);
void (*irq_disable)(struct irq_data *data);
void (*irq_ack)(struct irq_data *data);
void (*irq_mask)(struct irq_data *data);
void (*irq_mask_ack)(struct irq_data *data);
void (*irq_unmask)(struct irq_data *data);
void (*irq_eoi)(struct irq_data *data);
int (*irq_set_affinity)(struct irq_data *data, const struct cpumask *dest, bool force);
int (*irq_retrigger)(struct irq_data *data);
......
unsigned long flags;
};
ARM64默认开启配置宏CONFIG_SPARSE_IRQ,使用基数树存储linux中断号与中断描述符的关系。
static RADIX_TREE(irq_desc_tree, GFP_KERNEL);
把硬件中断号映射到linux中断号的时候,根据硬件中断的类型设置中断描述符的成员handle_irq。以GIC v2为例:
//irq_create_mapping() -> irq_domain_associate() -> domain->ops->map() -> gic_irq_domain_map()
static int gic_irq_domain_map(struct irq_domain *d, unsigned int irq,
irq_hw_number_t hw)
{
struct gic_chip_data *gic = d->host_data;
//硬件中断号小于32,说明是SGI或者PPI,handle_irq设置为handle_percpu_devid_irq
if (hw < 32) {
irq_set_percpu_devid(irq);
irq_domain_set_info(d, irq, hw, &gic->chip, d->host_data,
handle_percpu_devid_irq, NULL, NULL);
irq_set_status_flags(irq, IRQ_NOAUTOEN);
} else {
//硬件中断号大于等于32,说明是SPI,handle_irq设置为handle_fasteoi_irq
irq_domain_set_info(d, irq, hw, &gic->chip, d->host_data,
handle_fasteoi_irq, NULL, NULL);
irq_set_probe(irq);
}
return 0;
}
/**
* request_irq - allocate an interrupt line
* @irq: Interrupt line to allocate
* @handler: Function to be called when the IRQ occurs.
* Primary handler for threaded interrupts
* If NULL and thread_fn != NULL the default
* primary handler is installed
* @flags: Interrupt type flags
IRQF_SHARED:允许多个设备共享一个中断号
IRQF_PROBE_SHARED 0x00000100
__IRQF_TIMER:定时器中断
IRQF_PERCPU:中断是每个处理器私有的
IRQF_NOBALANCING:不允许该中断在处理器之间负载均衡
IRQF_NO_THREAD:中断不能线程化
* @devname: An ascii name for the claiming device
* @dev: A cookie passed back to the handler function
*/
static inline int __must_check
request_irq(unsigned int irq, irq_handler_t handler, unsigned long flags,
const char *name, void *dev)
{
return request_threaded_irq(irq, handler, NULL, flags, name, dev);
}
中断处理流程(用户态进程产生中断为例):
el0_irq:
//把用户态进程的寄存器保存到内核栈
kernel_entry 0
el0_irq_naked:
enable_dbg
...
ct_user_exit
//中断处理
irq_handler
...
//使用内核栈保存的寄存器值恢复寄存器,返回用户空间
b ret_to_user
ENDPROC(el0_irq)
.macro irq_handler
//handle_arch_irq在GIC v2在初始化的时候设置为gic_handle_irq
ldr_l x1, handle_arch_irq
mov x0, sp
//内核栈切换到中断栈,每个处理器有一个专用的中断栈
irq_stack_entry
blr x1
//中断栈恢复到内核栈
irq_stack_exit
.endm
#define THREAD_SIZE 16384 //16k
#define THREAD_START_SP (THREAD_SIZE - 16)
#define IRQ_STACK_SIZE THREAD_SIZE
#define IRQ_STACK_START_SP THREAD_START_SP
.macro irq_stack_entry
//将svc mode下的栈地址(也就是EL1_SP)保存到x19
mov x19, sp
/*
* Compare sp with the base of the task stack.
* If the top ~(THREAD_SIZE - 1) bits match, we are on a task stack,
* and should switch to the irq stack.
*/
ldr x25, [tsk, TSK_STACK]
eor x25, x25, x19
and x25, x25, #~(THREAD_SIZE - 1)
cbnz x25, 9998f
//
adr_this_cpu x25, irq_stack, x26
//IRQ_STACK_START_SP这是irq mode的栈地址
mov x26, #IRQ_STACK_START_SP
add x26, x25, x26
//将irq栈地址,写入到sp
mov sp, x26
/*
* Add a dummy stack frame, this non-standard format is fixed up
* by unwind_frame()
*/
stp x29, x19, [sp, #-16]!
mov x29, sp
9998:
.endm
GIC v2控制器的函数gic_handle_irq()代码如下:
static void __exception_irq_entry gic_handle_irq(struct pt_regs *regs)
{
u32 irqstat, irqnr;
struct gic_chip_data *gic = &gic_data[0];
void __iomem *cpu_base = gic_data_cpu_base(gic);
do {
//读取处理器接口的中断确认寄存器得到中断号(pending -> active)
irqstat = readl_relaxed(cpu_base + GIC_CPU_INTACK);
irqnr = irqstat & GICC_IAR_INT_ID_MASK;
//PPI私有外设中断与SPI共享外设中断
if (likely(irqnr > 15 && irqnr < 1020)) {
//如果设置了supports_deactivate,把中断号写入处理器接口的中断结束寄存器,指示中断处理完成(active -> inactive)
if (static_key_true(&supports_deactivate))
writel_relaxed(irqstat, cpu_base + GIC_CPU_EOI);
//中断处理函数
handle_domain_irq(gic->domain, irqnr, regs);
continue;
}
//SGI软件生成的中断
if (irqnr < 16) {
//把中断号写入处理器接口的中断结束寄存器,指示中断处理完成
writel_relaxed(irqstat, cpu_base + GIC_CPU_EOI);
if (static_key_true(&supports_deactivate))
writel_relaxed(irqstat, cpu_base + GIC_CPU_DEACTIVATE);
#ifdef CONFIG_SMP
//确保在读取GIC上的ACK寄存器后,读取发送IPI的CPU写入的任何共享数据。
smp_rmb();
handle_IPI(irqnr, regs);
#endif
continue;
}
break;
} while (1);
}
4.1 PPI私有外设中断与SPI共享外设中断
int __handle_domain_irq(struct irq_domain *domain, unsigned int hwirq,
bool lookup, struct pt_regs *regs)
{
struct pt_regs *old_regs = set_irq_regs(regs);
unsigned int irq = hwirq;
int ret = 0;
irq_enter();
#ifdef CONFIG_IRQ_DOMAIN
if (lookup)
irq = irq_find_mapping(domain, hwirq);
#endif
/*
* Some hardware gives randomly wrong interrupts. Rather
* than crashing, do something sensible.
*/
if (unlikely(!irq || irq >= nr_irqs)) {
ack_bad_irq(irq);
ret = -EINVAL;
} else {
generic_handle_irq(irq);
}
irq_exit();
set_irq_regs(old_regs);
return ret;
}
//进入中断上下文
void irq_enter(void)
{
rcu_irq_enter();
//从idle中产生中断
if (is_idle_task(current) && !in_interrupt()) {
//防止raise_softirq在这里不必要地唤醒ksoftirqd,因为softirq将在中断返回时得到服务
local_bh_disable();
//Called from irq_enter to notify about the possible interruption of idle()
tick_irq_enter();
_local_bh_enable();
}
__irq_enter();
}
#define __irq_enter() \
do { \
account_irq_enter_time(current); \
//current_thread_info()->preempt_count增加HARDIRQ_OFFSET
preempt_count_add(HARDIRQ_OFFSET); \
trace_hardirq_enter(); \
} while (0)
/**
* generic_handle_irq - Invoke the handler for a particular irq
* @irq: The irq number to handle
*
*/
int generic_handle_irq(unsigned int irq)
{
//获取中断描述符
struct irq_desc *desc = irq_to_desc(irq);
if (!desc)
return -EINVAL;
generic_handle_irq_desc(desc);
return 0;
}
static inline void generic_handle_irq_desc(struct irq_desc *desc)
{
desc->handle_irq(desc);
}
把硬件中断号映射到linux中断号的时候,根据硬件中断的类型设置中断描述符的成员handle_irq。以GIC v2为例:
//irq_create_mapping() -> irq_domain_associate() -> domain->ops->map() -> gic_irq_domain_map()
- 硬件中断号小于32,说明是SGI或者PPI,handle_irq设置为handle_percpu_devid_irq
- 硬件中断号大于等于32,说明是SPI,handle_irq设置为handle_fasteoi_irq
/**
* handle_percpu_devid_irq - Per CPU local irq handler with per cpu dev ids
* @desc: the interrupt description structure for this irq
*/
void handle_percpu_devid_irq(struct irq_desc *desc)
{
struct irq_chip *chip = irq_desc_get_chip(desc);
struct irqaction *action = desc->action;
unsigned int irq = irq_desc_get_irq(desc);
irqreturn_t res;
kstat_incr_irqs_this_cpu(desc);
if (chip->irq_ack)
chip->irq_ack(&desc->irq_data);
if (likely(action)) {
trace_irq_handler_entry(irq, action);
//调用irqaction的回调函数
res = action->handler(irq, raw_cpu_ptr(action->percpu_dev_id));
trace_irq_handler_exit(irq, action, res);
} else {
unsigned int cpu = smp_processor_id();
bool enabled = cpumask_test_cpu(cpu, desc->percpu_enabled);
if (enabled)
irq_percpu_disable(desc, cpu);
pr_err_once("Spurious%s percpu IRQ%u on CPU%u\n",
enabled ? " and unmasked" : "", irq, cpu);
}
if (chip->irq_eoi)
chip->irq_eoi(&desc->irq_data);
}
如果是共享外设中断,那么中断描述符成员handle_irq()是函数handle_fasteoi_irq():
void handle_fasteoi_irq(struct irq_desc *desc)
{
struct irq_chip *chip = desc->irq_data.chip;
raw_spin_lock(&desc->lock);
if (!irq_may_run(desc))
goto out;
desc->istate &= ~(IRQS_REPLAY | IRQS_WAITING);
/*
* If its disabled or no action available
* then mask it and get out of here:
*/
if (unlikely(!desc->action || irqd_irq_disabled(&desc->irq_data))) {
desc->istate |= IRQS_PENDING;
mask_irq(desc);
goto out;
}
kstat_incr_irqs_this_cpu(desc);
if (desc->istate & IRQS_ONESHOT)
mask_irq(desc);
preflow_handler(desc);
//执行设备驱动程序注册的处理函数
handle_irq_event(desc);
cond_unmask_eoi_irq(desc, chip);
raw_spin_unlock(&desc->lock);
return;
out:
if (!(chip->flags & IRQCHIP_EOI_IF_HANDLED))
chip->irq_eoi(&desc->irq_data);
raw_spin_unlock(&desc->lock);
}
handle_irq_event()主要把工作委托给函数handle_irq_event_percpu()
irqreturn_t handle_irq_event(struct irq_desc *desc)
{
irqreturn_t ret;
desc->istate &= ~IRQS_PENDING;
irqd_set(&desc->irq_data, IRQD_IRQ_INPROGRESS);
raw_spin_unlock(&desc->lock);
ret = handle_irq_event_percpu(desc);
raw_spin_lock(&desc->lock);
irqd_clear(&desc->irq_data, IRQD_IRQ_INPROGRESS);
return ret;
}
irqreturn_t handle_irq_event_percpu(struct irq_desc *desc)
{
irqreturn_t retval;
unsigned int flags = 0;
retval = __handle_irq_event_percpu(desc, &flags);
add_interrupt_randomness(desc->irq_data.irq, flags);
if (!noirqdebug)
note_interrupt(desc, retval);
return retval;
}
__handle_irq_event_percpu去遍历中断描述符的中断处理链表,执行每个中断处理描述符的处理函数。
irqreturn_t __handle_irq_event_percpu(struct irq_desc *desc, unsigned int *flags)
{
irqreturn_t retval = IRQ_NONE;
unsigned int irq = desc->irq_data.irq;
struct irqaction *action;
for_each_action_of_desc(desc, action) {
irqreturn_t res;
trace_irq_handler_entry(irq, action);
//中断处理函数
res = action->handler(irq, action->dev_id);
trace_irq_handler_exit(irq, action, res);
if (WARN_ONCE(!irqs_disabled(),"irq %u handler %pF enabled interrupts\n",
irq, action->handler))
local_irq_disable();
switch (res) {
case IRQ_WAKE_THREAD:
......
//唤醒中断线程
__irq_wake_thread(desc, action);
//继续往下走,把action->flags作为生成随机数的一个因子
case IRQ_HANDLED:
*flags |= action->flags;
break;
default:
break;
}
retval |= res;
}
return retval;
}
中断退出函数:
/*
* Exit an interrupt context. Process softirqs if needed and possible:
*/
void irq_exit(void)
{
#ifndef __ARCH_IRQ_EXIT_IRQS_DISABLED
local_irq_disable();
#else
WARN_ON_ONCE(!irqs_disabled());
#endif
account_irq_exit_time(current);
preempt_count_sub(HARDIRQ_OFFSET);
//判断是否有软中断pending
if (!in_interrupt() && local_softirq_pending())
//调用软中断
invoke_softirq();
tick_irq_exit();
rcu_irq_exit();
trace_hardirq_exit(); /* must be last! */
}
4.2 IPI处理器间中断
常见的使用处理器间中断函数如下:
- 在所有其他处理器上执行一个函数
/**
* smp_call_function(): Run a function on all other CPUs.
* @func: 目标处理器在中断处理程序中要执行的函数
* @info: 传给函数func的参数
* @wait: 是否需要等待目标处理器执行完函数
* You must not call this function with disabled interrupts or from a
* hardware interrupt handler or from a bottom half handler.
*/
int smp_call_function(smp_call_func_t func, void *info, int wait)
{
preempt_disable();
smp_call_function_many(cpu_online_mask, func, info, wait);
preempt_enable();
return 0;
}
- 在指定的处理器上执行一个函数
int smp_call_function_single(int cpu, smp_call_func_t func, void *info, int wait)
- 要求指定的处理器重新调度进程
void smp_send_reschedule(int cpu)
{
smp_cross_call(cpumask_of(cpu), IPI_RESCHEDULE);
}
对于arm64架构的GIC处理器,可以写分发器的寄存器GICD_SGIR(软件生成中断处理器)以生成处理器间中断。函数handle_IPI负责处理处理期间中断:
void handle_IPI(int ipinr, struct pt_regs *regs)
{
//获得当前运行该代码的cpu id
unsigned int cpu = smp_processor_id();
//pt_regs结构体主要包含当前的寄存器信息
struct pt_regs *old_regs = set_irq_regs(regs);
if ((unsigned)ipinr < NR_IPI) {
//ftrace记录进入ipi中断,用于debug
trace_ipi_entry_rcuidle(ipi_types[ipinr]);
//统计cpu ipi中断数量,cat /proc/interrupts中IPI相关中断的数据即来自于此
__inc_irq_stat(cpu, ipi_irqs[ipinr]);
}
//
switch (ipinr) {
//触发重调度
case IPI_RESCHEDULE:
scheduler_ipi();
break;
//执行本cpu的function回调
case IPI_CALL_FUNC:
irq_enter();
generic_smp_call_function_interrupt();
irq_exit();
break;
//将本cpu停下来,进入低功耗状态
case IPI_CPU_STOP:
irq_enter();
ipi_cpu_stop(cpu);
irq_exit();
break;
//如果配置了KEXEC,即在系统panic时会进入第二内核,才会有本IPI中断的操作
case IPI_CPU_CRASH_STOP:
if (IS_ENABLED(CONFIG_KEXEC_CORE)) {
irq_enter();
ipi_cpu_crash_stop(cpu, regs);
unreachable();
}
break;
//接收timer广播,执行timer的中断回调
#ifdef CONFIG_GENERIC_CLOCKEVENTS_BROADCAST
case IPI_TIMER:
irq_enter();
tick_receive_broadcast();
irq_exit();
break;
#endif
//本cpu执行irq_work
#ifdef CONFIG_IRQ_WORK
case IPI_IRQ_WORK:
irq_enter();
irq_work_run();
irq_exit();
break;
#endif
//从parked状态中唤醒本cpu
#ifdef CONFIG_ARM64_ACPI_PARKING_PROTOCOL
case IPI_WAKEUP:
WARN_ONCE(!acpi_parking_protocol_valid(cpu),
"CPU%u: Wake-up IPI outside the ACPI parking protocol\n",
cpu);
break;
#endif
default:
pr_crit("CPU%u: Unknown IPI message 0x%x\n", cpu, ipinr);
break;
}
if ((unsigned)ipinr < NR_IPI)
trace_ipi_exit_rcuidle(ipi_types[ipinr]);
set_irq_regs(old_regs);
}
4.2.1 IPI_RESCHEDULE
硬件中断号是0,函数smp_send_reschedule()生成的中断。一般的使用场景是先为进程设置TIF_NEED_RESCHED
标志(意味着需要调度某个进程),如果发现该进程没有在当前CPU上,那就通过smp_send_reschedule
接口触发IPI_RESCHEDULE中断给相应CPU,相应CPU最终进入hadle_IPI的scheduler_ipi中,尝试唤醒pending着的进程。
void scheduler_ipi(void)
{
/*
* Fold TIF_NEED_RESCHED into the preempt_count; anybody setting
* TIF_NEED_RESCHED remotely (for the first time) will also send
* this IPI.
*/
preempt_fold_need_resched();
if (llist_empty(&this_rq()->wake_list) && !got_nohz_idle_kick())
return;
/*
* Not all reschedule IPI handlers call irq_enter/irq_exit, since
* traditionally all their work was done from the interrupt return
* path. Now that we actually do some work, we need to make sure
* we do call them.
*
* Some archs already do call them, luckily irq_enter/exit nest
* properly.
*
* Arguably we should visit all archs and update all handlers,
* however a fair share of IPIs are still resched only so this would
* somewhat pessimize the simple resched case.
*/
irq_enter();
//尝试唤醒pending着的线程
sched_ttwu_pending();
/*
* Check if someone kicked us for doing the nohz idle load balance.
*/
if (unlikely(got_nohz_idle_kick())) {
this_rq()->idle_balance = 1;
raise_softirq_irqoff(SCHED_SOFTIRQ);
}
irq_exit();
}
4.2.2 IPI_CALL_FUNC
硬件中断号是1,执行函数,函数smp_call_function()生成的中断。target cpu会经历以下调用handle_IPI->generic_smp_call_function_interrupt-->generic_smp_call_function_single_interrupt-->flush_smp_call_function_queue
调用一遍所有pending状态中的function回调。
static void flush_smp_call_function_queue(bool warn_cpu_offline)
{
struct llist_head *head;
struct llist_node *entry;
struct call_single_data *csd, *csd_next;
static bool warned;
WARN_ON(!irqs_disabled());
//拿到本cpu的call_single_queue全局链表,当其他cpu想让本cpu执行某个func时,
//会通过generic_exec_single()接口往对应cpu的call_signal_queue中添加相应的call_single_data_t结构,
//这其中包含着需要执行的func回调
head = this_cpu_ptr(&call_single_queue);
entry = llist_del_all(head);
entry = llist_reverse_order(entry);
/* There shouldn't be any pending callbacks on an offline CPU. */
if (unlikely(warn_cpu_offline && !cpu_online(smp_processor_id()) &&
!warned && !llist_empty(head))) {
warned = true;
WARN(1, "IPI on offline CPU %d\n", smp_processor_id());
/*
* We don't have to use the _safe() variant here
* because we are not invoking the IPI handlers yet.
*/
llist_for_each_entry(csd, entry, llist)
pr_warn("IPI callback %pS sent to offline CPU\n",
csd->func);
}
//循环遍历call_single_queue链表,把每个成员的func执行一遍
llist_for_each_entry_safe(csd, csd_next, entry, llist) {
smp_call_func_t func = csd->func;
void *info = csd->info;
//根据func回调设置的属性,是否是同步接口,决定执行顺序
if (csd->flags & CSD_FLAG_SYNCHRONOUS) {
//如果是同步的,就先执行func回调,再放相应csd句柄的锁
func(info);
csd_unlock(csd);
} else {
//如果是异步的,先放相应csd句柄的锁,再执行func回调
csd_unlock(csd);
func(info);
}
}
/*
* Handle irq works queued remotely by irq_work_queue_on().
* Smp functions above are typically synchronous so they
* better run first since some other CPUs may be busy waiting
* for them.
*/
irq_work_run();
}
4.2.3 IPI_CPU_STOP
硬件中断号是2,使处理器停止,函数smp_send_stop()生成的中断。
static void ipi_cpu_stop(unsigned int cpu)
{
set_cpu_online(cpu, false);
local_irq_disable();
while (1)
cpu_relax();
}
static inline void cpu_relax(void)
{
asm volatile("yield" ::: "memory");
}
4.2.4 IPI_CPU_CRASH_STOP
硬件中断号是3,使处理器停止,函数smp_send_crash_stop()生成的中断。在系统crash时,发生crash的cpu给其他cpu发送该中断,在使能了KEXEC的系统中,主要是将寄存器信息保存下来,传递给第二内核。KEXEC常用来做系统crash时在不重启的情况下快速进入第二内核。第二内核的目的是把当前的ddr内存镜像保存下来,方便之后通过crash tool分析系统crash问题,感兴趣可以参考该**文章**了解。
static void ipi_cpu_crash_stop(unsigned int cpu, struct pt_regs *regs)
{
#ifdef CONFIG_KEXEC_CORE
//保存当前cpu寄存器信息,方便定位问题
crash_save_cpu(regs, cpu);
atomic_dec(&waiting_for_crash_ipi);
local_irq_disable();
#ifdef CONFIG_HOTPLUG_CPU
if (cpu_ops[cpu]->cpu_die)
cpu_ops[cpu]->cpu_die(cpu);
#endif
//通过wfe和wfi指令,让当前cpu进入low-power standby模式
cpu_park_loop();
#endif
}
4.2.5 IPI_TIMER
硬件中断号是4,当某cpu调用tick_broadcast(const struct cpumask *mask)
时,即调用IPI_TIMER中断给相应cpu(通过mask指定cpu)发送timer的广播中断.
int tick_receive_broadcast(void)
{
struct tick_device *td = this_cpu_ptr(&tick_cpu_device);
//本cpu的tick_device的clock_event_device,这在某个cpu上是唯一的
struct clock_event_device *evt = td->evtdev;
if (!evt)
return -ENODEV;
if (!evt->event_handler)
return -EINVAL;
//执行tick clock的回调函数
evt->event_handler(evt);
return 0;
}
4.2.6 IPI_IRQ_WORK
硬件中断号是5,在硬中断上下文中执行回调函数,函数irq_work_queue()生成。
//本cpuraised_list和lazy_list链表上挂载的irq_work执行一遍
void irq_work_run(void)
{
irq_work_run_list(this_cpu_ptr(&raised_list));
irq_work_run_list(this_cpu_ptr(&lazy_list));
}
4.2.6 IPI_WAKEUP
硬件中断号是6,当cpu接收到该IPI中断,即从parked的状态(wfi和wfe的低功耗状态)唤醒过来,acpi_parking_protocol_cpu_boot()生成的中断。
bool acpi_parking_protocol_valid(int cpu)
{
struct cpu_mailbox_entry *cpu_entry = &cpu_mailbox_entries[cpu];
return cpu_entry->mailbox_addr && cpu_entry->version;
}
4.3 中断线程化
4.3.1 中断线程请求
中断线程化就是使用内核线程处理中断,目的是减少系统关中断的时间,增加系统的实时性。内核提供的函数request_threaded_irq()用来注册线程化的中断,其中thread_fn是线程处理函数。
int request_threaded_irq(unsigned int irq, irq_handler_t handler,
irq_handler_t thread_fn, unsigned long irqflags,
const char *devname, void *dev_id)
少数中断不能线程化,典型的例子是时钟中断,有些流氓进程不主动让出CPU,内核只能依靠周期性的时钟中断夺回处理器的控制权,时钟中断时调度器的脉搏。对于不能线程化的中断,注册处理函数的时候,必须设置标志IRQF_NO_THREAD。
如果开启了强制中断线程化的配置宏CONFIG_IRQ_FORCED_THREADING,并且在引导内核的时候指定内核参数"threadirqs",那么强制除了标识IRQF_NO_THREAD以外的所有中断线程化。ARM64默认配置宏CONFIG_IRQ_FORCED_THREADING。
每个中断处理描述符(irqaction)对应一个内核线程,成员thread指向内核线程的进程描述符,成员thread_fn指向线程处理函数。可以看到,中断处理线程是优先级为50、调度策略是SCHED_FIFO的实时内核线程,名称是"irq/"后面跟着linux中断号,线程处理函数是irq_thread()。
request_threaded_irq() ----> __setup_irq() ----> irq_setup_forced_threading()与setup_irq_thread()
static int irq_setup_forced_threading(struct irqaction *new)
{
//使能了CONFIG_IRQ_FORCED_THREADING与CONFIG_PREEMPT_RT(RT patch)则force_irqthreads为true
//使能CONFIG_IRQ_FORCED_THREADING,没有CONFIG_PREEMPT_RT,如果cmdline有threadirqs则force_irqthreads为true
if (!force_irqthreads)
return 0;
if (new->flags & (IRQF_NO_THREAD | IRQF_PERCPU | IRQF_ONESHOT))
return 0;
/*
* No further action required for interrupts which are requested as
* threaded interrupts already
*/
if (new->handler == irq_default_primary_handler)
return 0;
new->flags |= IRQF_ONESHOT;
/*
* Handle the case where we have a real primary handler and a
* thread handler. We force thread them as well by creating a
* secondary action.
*/
if (new->handler && new->thread_fn) {
/* Allocate the secondary action */
new->secondary = kzalloc(sizeof(struct irqaction), GFP_KERNEL);
if (!new->secondary)
return -ENOMEM;
//
new->secondary->handler = irq_forced_secondary_handler;
new->secondary->thread_fn = new->thread_fn;
new->secondary->dev_id = new->dev_id;
new->secondary->irq = new->irq;
new->secondary->name = new->name;
}
/* Deal with the primary handler */
//设置IRQTF_FORCED_THREAD标志位
set_bit(IRQTF_FORCED_THREAD标志位, &new->thread_flags);
//handler改为thread_fn线程化
new->thread_fn = new->handler;
//修改handler,仅仅返回IRQ_WAKE_THREAD,去唤醒线程
new->handler = irq_default_primary_handler;
return 0;
}
static irqreturn_t irq_default_primary_handler(int irq, void *dev_id)
{
return IRQ_WAKE_THREAD;
}
static int
setup_irq_thread(struct irqaction *new, unsigned int irq, bool secondary)
{
struct task_struct *t;
//#define MAX_USER_RT_PRIO 100
struct sched_param param = {
.sched_priority = MAX_USER_RT_PRIO/2,
};
//创建中断处理线程,中断处理线程唤醒后的处理函数是irq_thread()
if (!secondary) {
t = kthread_create(irq_thread, new, "irq/%d-%s", irq,
new->name);
} else {
t = kthread_create(irq_thread, new, "irq/%d-s-%s", irq,
new->name);
param.sched_priority -= 1;
}
if (IS_ERR(t))
return PTR_ERR(t);
sched_setscheduler_nocheck(t, SCHED_FIFO, ¶m);
/*
* We keep the reference to the task struct even if
* the thread dies to avoid that the interrupt code
* references an already freed task_struct.
*/
get_task_struct(t);
new->thread = t;
/*
* Tell the thread to set its affinity. This is
* important for shared interrupt handlers as we do
* not invoke setup_affinity() for the secondary
* handlers as everything is already set up. Even for
* interrupts marked with IRQF_NO_BALANCE this is
* correct as we want the thread to move to the cpu(s)
* on which the requesting code placed the interrupt.
*/
set_bit(IRQTF_AFFINITY, &new->thread_flags);
return 0;
}
4.3.2 中断线程处理
在4.1 PPI私有外设中断与SPI共享外设中断一节有提到,在中断处理程序中,函数__handle_irq_event_percpu() ----> __irq_wake_thread()
遍历中断描述符的中断处理链表,执行每个中断处理描述符的处理函数。如果处理函数返回IRQ_WAKE_THREAD,说明是线程化的中断,那么唤醒中断处理线程。
void __irq_wake_thread(struct irq_desc *desc, struct irqaction *action)
{
/*
* In case the thread crashed and was killed we just pretend that
* we handled the interrupt. The hardirq handler has disabled the
* device interrupt, so no irq storm is lurking.
*/
if (action->thread->flags & PF_EXITING)
return;
/*
* Wake up the handler thread for this action. If the
* RUNTHREAD bit is already set, nothing to do.
*/
if (test_and_set_bit(IRQTF_RUNTHREAD, &action->thread_flags))
return;
/*
* It's safe to OR the mask lockless here. We have only two
* places which write to threads_oneshot: This code and the
* irq thread.
*
* This code is the hard irq context and can never run on two
* cpus in parallel. If it ever does we have more serious
* problems than this bitmask.
*
* The irq threads of this irq which clear their "running" bit
* in threads_oneshot are serialized via desc->lock against
* each other and they are serialized against this code by
* IRQS_INPROGRESS.
*
* Hard irq handler:
*
* spin_lock(desc->lock);
* desc->state |= IRQS_INPROGRESS;
* spin_unlock(desc->lock);
* set_bit(IRQTF_RUNTHREAD, &action->thread_flags);
* desc->threads_oneshot |= mask;
* spin_lock(desc->lock);
* desc->state &= ~IRQS_INPROGRESS;
* spin_unlock(desc->lock);
*
* irq thread:
*
* again:
* spin_lock(desc->lock);
* if (desc->state & IRQS_INPROGRESS) {
* spin_unlock(desc->lock);
* while(desc->state & IRQS_INPROGRESS)
* cpu_relax();
* goto again;
* }
* if (!test_bit(IRQTF_RUNTHREAD, &action->thread_flags))
* desc->threads_oneshot &= ~mask;
* spin_unlock(desc->lock);
*
* So either the thread waits for us to clear IRQS_INPROGRESS
* or we are waiting in the flow handler for desc->lock to be
* released before we reach this point. The thread also checks
* IRQTF_RUNTHREAD under desc->lock. If set it leaves
* threads_oneshot untouched and runs the thread another time.
*/
desc->threads_oneshot |= action->thread_mask;
/*
* We increment the threads_active counter in case we wake up
* the irq thread. The irq thread decrements the counter when
* it returns from the handler or in the exit path and wakes
* up waiters which are stuck in synchronize_irq() when the
* active count becomes zero. synchronize_irq() is serialized
* against this code (hard irq handler) via IRQS_INPROGRESS
* like the finalize_oneshot() code. See comment above.
*/
atomic_inc(&desc->threads_active);
wake_up_process(action->thread);
}
wake_up_process(action->thread)
唤醒中断处理线程,中断处理线程的处理函数是irq_thread(),调用函数irq_thread_fn(),然后函数irq_thread_fn()调用注册的线程处理函数。
static int irq_thread(void *data)
{
struct callback_head on_exit_work;
struct irqaction *action = data;
struct irq_desc *desc = irq_to_desc(action->irq);
irqreturn_t (*handler_fn)(struct irq_desc *desc,
struct irqaction *action);
//使能了CONFIG_IRQ_FORCED_THREADING与CONFIG_PREEMPT_RT则force_irqthreads为true
//使能CONFIG_IRQ_FORCED_THREADING,没有CONFIG_PREEMPT_RT,如果cmdline有threadirqs则force_irqthreads为true
//force_irqthreads使能的情况下,action->thread_flags在irq_setup_forced_threading()中设置为IRQTF_FORCED_THREAD
//这里是为了force_irqthreads使能服务(例如打了RT patch),如果有软中断pending,会在中断线程中运行软中断(优先级50)
if (force_irqthreads && test_bit(IRQTF_FORCED_THREAD,
&action->thread_flags))
handler_fn = irq_forced_thread_fn;
else
//如果没有force_irqthreads使能,则仅仅是唤醒中断线程,如果有软中断pending,软中断在ksoftirqd(优先级120)
handler_fn = irq_thread_fn;
init_task_work(&on_exit_work, irq_thread_dtor);
task_work_add(current, &on_exit_work, false);
irq_thread_check_affinity(desc, action);
//通过判断IRQTF_RUNTHREAD标志是否被设置,因为在action->handler()执行后,
//如果需要唤醒中断线程,会主动设置IRQTF_RUNTHREAD标志,所以while循环可以运行。
//但是立马有clear flag中的IRQTF_RUNTHREAD,下一次再判断是否又被置位
while (!irq_wait_for_interrupt(action)) {
irqreturn_t action_ret;
irq_thread_check_affinity(desc, action);
//执行handler_fn,有上面判断是irq_forced_thread_fn还是irq_thread_fn
action_ret = handler_fn(desc, action);
if (action_ret == IRQ_HANDLED)
atomic_inc(&desc->threads_handled);
if (action_ret == IRQ_WAKE_THREAD)
irq_wake_secondary(desc, action);
wake_threads_waitq(desc);
}
/*
* This is the regular exit path. __free_irq() is stopping the
* thread via kthread_stop() after calling
* synchronize_irq(). So neither IRQTF_RUNTHREAD nor the
* oneshot mask bit can be set. We cannot verify that as we
* cannot touch the oneshot mask at this point anymore as
* __setup_irq() might have given out currents thread_mask
* again.
*/
task_work_cancel(current, irq_thread_dtor);
return 0;
}
如果有软中断pending,irq_forced_thread_fn()与irq_thread_fn()的区别:
static irqreturn_t
irq_forced_thread_fn(struct irq_desc *desc, struct irqaction *action)
{
irqreturn_t ret;
local_bh_disable();
//执行线程化函数
ret = action->thread_fn(action->irq, action->dev_id);
if (ret == IRQ_HANDLED)
atomic_inc(&desc->threads_handled);
irq_finalize_oneshot(desc, action);
//local_bh_enable ----> __local_bh_enable_ip ----> do_softirq ---->
//__do_softirq ----> h->action(h);在中断线程上下文执行软中断回调函数
local_bh_enable();
return ret;
}
static irqreturn_t irq_thread_fn(struct irq_desc *desc,
struct irqaction *action)
{
irqreturn_t ret;
//执行线程化函数,会在软中断线程ksoftirqd中执行软中断回调函数
ret = action->thread_fn(action->irq, action->dev_id);
irq_finalize_oneshot(desc, action);
return ret;
}
4.3.3 中断线程绑核
线程化(或被强制线程化)的非nested中断在__setup_irq中都会被设置IRQTF_AFFINITY标志位,在中断线程调用irq_thread时通过irq_thread_check_affinity处理,判断是否要改变中断线程的affinity。
最后调用set_cpus_allowed_ptr(current, mask);设置当前进程(即irq thread)的cpu affinity,并迁移到合适的CPU上运行。
在linux4.12上,中断线程默认绑定到CPU0上运行,如果设置了中断的绑核,则中断线程也会随着一起绑到固定的核上运行。
/*
* Check whether we need to change the affinity of the interrupt thread.
*/
static void
irq_thread_check_affinity(struct irq_desc *desc, struct irqaction *action)
{
cpumask_var_t mask;
bool valid = true;
//线程化(或被强制线程化)的非nested中断在__setup_irq中都会被设置IRQTF_AFFINITY标志位
if (!test_and_clear_bit(IRQTF_AFFINITY, &action->thread_flags))
return;
/*
* In case we are out of memory we set IRQTF_AFFINITY again and
* try again next time
*/
if (!alloc_cpumask_var(&mask, GFP_KERNEL)) {
set_bit(IRQTF_AFFINITY, &action->thread_flags);
return;
}
raw_spin_lock_irq(&desc->lock);
/*
* This code is triggered unconditionally. Check the affinity
* mask pointer. For CPU_MASK_OFFSTACK=n this is optimized out.
*/
if (cpumask_available(desc->irq_common_data.affinity))
//返回desc->irq_common_data.affinity给mask
//默认desc->irq_common_data.affinity的值为f,即不绑核
//如果设置了中断的绑核(通过echo xx > /proc/irq/xx/smp_affinity设置),则中断线程也会随着一起绑到固定的核上
cpumask_copy(mask, desc->irq_common_data.affinity);
else
valid = false;
raw_spin_unlock_irq(&desc->lock);
if (valid)
//用set_cpus_allowed_ptr设置当前进程(即irq thread)的cpu affinity
set_cpus_allowed_ptr(current, mask);
free_cpumask_var(mask);
}
在linux5.4上,中断线程默认绑定到CPU0上运行,如果设置了中断的绑核,则中断线程也会随着一起绑到固定的核上运行。
/*
* Check whether we need to change the affinity of the interrupt thread.
*/
static void
irq_thread_check_affinity(struct irq_desc *desc, struct irqaction *action)
{
cpumask_var_t mask;
bool valid = true;
//线程化(或被强制线程化)的非nested中断在__setup_irq中都会被设置IRQTF_AFFINITY标志位
if (!test_and_clear_bit(IRQTF_AFFINITY, &action->thread_flags))
return;
/*
* In case we are out of memory we set IRQTF_AFFINITY again and
* try again next time
*/
if (!alloc_cpumask_var(&mask, GFP_KERNEL)) {
set_bit(IRQTF_AFFINITY, &action->thread_flags);
return;
}
raw_spin_lock_irq(&desc->lock);
/*
* This code is triggered unconditionally. Check the affinity
* mask pointer. For CPU_MASK_OFFSTACK=n this is optimized out.
*/
if (cpumask_available(desc->irq_common_data.affinity)) {
const struct cpumask *m;
//返回d->common->effective_affinity给m
m = irq_data_get_effective_affinity_mask(&desc->irq_data);
//将m copy给mask
cpumask_copy(mask, m);
} else {
valid = false;
}
raw_spin_unlock_irq(&desc->lock);
if (valid)
//用set_cpus_allowed_ptr设置当前进程(即irq thread)的cpu affinity
set_cpus_allowed_ptr(current, mask);
free_cpumask_var(mask);
}
//在linux5.4上,默认d->common->effective_affinity的值为0,即中断线程默认绑定到CPU0上运行(可以通过cat /proc/irq/xx/effective_affinity查看)
//如果设置了中断的绑核(通过echo xx > /proc/irq/xx/smp_affinity设置),则中断线程也会随着一起绑到固定的核上
static inline
struct cpumask *irq_data_get_effective_affinity_mask(struct irq_data *d)
{
return d->common->effective_affinity;
}
4.4 禁止/开启中断
4.4.1 禁止/开启中断
软件可以禁止中断,使处理器不响应所有中断请求,但是不可屏蔽中断是个例外。
禁止中断的接口如下:
- local_irq_disable()
- local_irq_save(flags):首先把中断状态保存在参数flags中,然后禁止中断。
这两个接口只能禁止本处理器的中断,不能禁止其他处理器的中断,禁止中断以后,处理器不会响应中断请求。
开启中断接口如下:
- local_irq_enable()
- local_irq_restore(flags):恢复本处理器的中断状态。
local_irq_disable()和local_irq_enable()不能嵌套使用,local_irq_save(flags)与local_irq_restore(flags)可以嵌套使用。
static inline void arch_local_irq_disable(void)
{
asm volatile(
"msr daifset, #2 // arch_local_irq_disable"
:
:
: "memory");
}
把处理器状态的中断掩码位设置为1,从此以后处理器不会响应中断请求。
static inline void arch_local_irq_enable(void)
{
asm volatile(
"msr daifclr, #2 // arch_local_irq_enable"
:
:
: "memory");
}
把处理器状态的中断掩码位设置为0,开启响应中断请求。
4.42 禁止/开启单个中断‘
软件可以禁止单个外围设备的中断,中断控制器不会把该设备发送的中断转发给处理器。
禁止单个中断的函数是:
/**
* disable_irq - disable an irq and wait for completion
* @irq: Interrupt to disable
*
* Disable the selected interrupt line. Enables and Disables are
* nested.
* This function waits for any pending IRQ handlers for this interrupt
* to complete before returning. If you use this function while
* holding a resource the IRQ handler may need you will deadlock.
*
* This function may be called - with care - from IRQ context.
*/
void disable_irq(unsigned int irq)
{
if (!__disable_irq_nosync(irq))
synchronize_irq(irq);
}
/**
* enable_irq - enable handling of an irq
* @irq: Interrupt to enable
*
* Undoes the effect of one call to disable_irq(). If this
* matches the last disable, processing of interrupts on this
* IRQ line is re-enabled.
*
* This function may be called from IRQ context only when
* desc->irq_data.chip->bus_lock and desc->chip->bus_sync_unlock are NULL !
*/
void enable_irq(unsigned int irq)
{
unsigned long flags;
struct irq_desc *desc = irq_get_desc_buslock(irq, &flags, IRQ_GET_DESC_CHECK_GLOBAL);
if (!desc)
return;
if (WARN(!desc->irq_data.chip,
KERN_ERR "enable_irq before setup/request_irq: irq %u\n", irq))
goto out;
__enable_irq(desc);
out:
irq_put_desc_busunlock(desc, flags);
}
如果需要开启硬件中断n,那么设置分发器的寄存器GICD_ISENABLERn(Interrupt Set-Enable Register);如果需要禁止硬件中断n,那么设置分发器的寄存器GICD_ICENABLERn(Interrupt Clear-Enable Register)。
4.5 中断亲和性
在多处理器系统中,管理员可以设置中断亲和性,允许中断控制器把某个中断转发给哪些处理器,有两种配置方法:
- 写文件/proc/irq/IRQ#/smp_affinity,参数是位掩码
- 写文件/proc/irq/IRQ#/smp_list,参数是处理器列表
例如:把linux中断号为32的中断转发给处理器0-3,配置方式有两种:
- echo 0f > /proc/irq/32/smp_affinity
- echo 0-3 > /proc/irq/32/smp_affinity_list
配置完后,可以连续执行cat /proc/interrupts | grep 'CPU\|32:'
观察是否有处理器0-3收到了linux中断号位32的中断。
内核也提供了设置中断亲和性的函数:
/**
* irq_set_affinity - Set the irq affinity of a given irq
* @irq: Interrupt to set affinity
* @cpumask: cpumask
*
* Fails if cpumask does not contain an online CPU
*/
static inline int
irq_set_affinity(unsigned int irq, const struct cpumask *cpumask)
{
return __irq_set_affinity(irq, cpumask, false);
}
//cpumask可用cpumask_of()赋值
int cpu = smp_processor_id();
cpumask_of(cpu);
对于ARM64 GIC控制器,可以设置分发器的寄存器GICD_ITARGETSRn(Interrupt Targets Regitster)允许把硬件中断n转发到哪些处理器,硬件中断n必须是共享外设中断。
5、参考资料
《linux内核深度解析》--余华兵