本周工作概述

与QEMU maintainer进行交流
阅读RISC-V PMP机制
查看QEMU对于PMP的实现
查看enhanced PMP support相关内容

QEMU代码贡献方向

在上周的报告中我提到可以尝试增加SiFive board的支持。但是和maintainer的邮件通信让我有了新的想法。

上周查看RISC-V patch和仓库contributor list的时候，我在想可否联系其中的一个maintainer并且询问一些建议。没想到过真收到了回信。可惜回信时间在北京时间凌晨1:30，我早上醒来把邮件忽略了，直到五天后才看到(T_T)。

对方在邮件中提出了非常有效的建议，给予我很大启发。以下是对方的邮件原文：

That depends what you feel like doing and how much time you are willing
to spend.

A great area would be to add an instruction extension to QEMU, such as
the bit manipulation extension. Depending on the extension that could
be a big chunk of work though.

Another cool option would be adding enhanced PMP support, which
wouldn't be too hard and would be very useful.

If that is too much then maybe just add extra device support. I am
currently adding OpenTitan support (Ibex CPU) so if you want to you
could add a Ibex CPU peripheral, such as the GPIO, SPI or other device.

The same would apply to the SiFive E and SiFIve U boards, which are
also missing device emulation. You can check the C code as most of the
boards have unimplemented devices added where there are missing ones.

Otherwise you could just try to run things on QEMU and see if you can
find bugs in the RISC-V implementation. We have lots of people running
Linux, but running Zephyr or something like that might uncover bugs.
This might not lead very useful results though.

Another idea would be to work on automated testing (I think QEMU is
migrating to GitLab pipelines). I don't know much about this though.

Finally you could add migration support to the RISC-V machines. This
would allow a user to migrate from one machine to another. This
shouldn't be too hard but would require some testing.

Let me know if any of those ideas jump out and I am happy to keep
discussing it.

考虑到时间成本和难度系数，我顺着他的建议开始研究PMP机制和相关问题。研究的进展在本文后面。之后发现目前全部官方文档里的PMP功能都已经在QEMU中实现了。于是询问了一下对方enhanced PMP相关内容，得知这是RISC-V正在商讨的未成熟版本，具体的提议参见：docs.google.com/document/d/…

经过前面两次方向的变革，希望最终顺着这个方向我们可以研究并贡献下去。

RISC-V PMP物理内存保护机制详解

本文档中内容遵循规范 The RISC-V Instruction Set ManualVolume II: Privileged ArchitectureDocument Version 1.12-draft

What is PMP

我们知道，RISC-V提供了三种权限模式：

其中，M（machine mode）可以访问全部的地址。为了禁止不可信的代码执行特权指令，引入了U（User mode）。为了限制不可信的代码使其只能访问自己的那部分内存，处理器可以提供一个物理内存保护（PMP，Physical Memory Protection）功能，以提供在各种模式下的内存保护。

PMP CSRs

RISC-V通过设置两类寄存器来实现PMP：

配置寄存器，8位
地址寄存器，对于RV32是32位，对于RV64是64位，统一记作MXLEN位

一个配置寄存器和一个地址寄存器组成一个PMP入口（PMP entry）。配置寄存器和地址寄存器均属于CSR（Control and Status Register）。

地址寄存器通常为8-16个，8位的控制寄存器共16个。但显然，无论是对于RV32还是RV64，都不会存在一个只有8位（一个字节大小）的寄存器。故实际的实现上，是把几个控制寄存器组合到一个CSR中，如下图（图片来自官方手册）：

对于RV32，将4个8bit配置寄存器放到一个32bit CSR寄存器中，对于RV64，每个真实寄存器中则保存了8个配置寄存器的信息。这些真实寄存器的名字即pmpcf0~pmpcf3

值得注意的是，对于RV64来说，pmp8cfg~pmp15cfg是保存在pmpcfg2中，而不是按顺序保存在pmpcfg1中。这样做是为了使得在RV32和RV64两种情况下，pmp8cfg~pmp11cfg均保存在pmpcfg2中，这样可以减少对于64位支持的开销。

在RISC-V设计的Sv32分页虚拟内存模式下，RV32拥有34位物理地址空间，故对于RV32来说，PMP必须支持34位的物理内存访问管理。故在32位地址寄存器中，保存33~2位的地址数据。而对于RV64，保存第55~2位，如下图：

也就是说，对于一个34位地址，若要保存到PMP地址寄存器中，需要将数据左移两位后存储。对于56位地址，将55~2位保存在PMP地址寄存器的53~0位。

下面来看配置寄存器：

其中，R、W、X分别对应读、写、执行权限，为1时有该权限，0时无权限。对于R=0 且 W=1的情况不符合实际含义，故作为保留，以便未来某些情况下使用。

当一条指令试图execute, load, store但遭到拒绝时，分别触发instruction access-fault exception, load access-fault exception, store access-fault exception。

Address Matching

之前说明了，一个PMP entry由一个地址寄存器和一个配置寄存器组成。那么，如何知道该PMP entry控制的物理地址范围呢？这是由配置寄存器中的A字段和地址寄存器共同决定的。

这部分主要解释配置寄存器中的A field的作用。A字段取值如下：

当A=0时，该PMP entry处于未启用状态，不匹配任何地址。

当A不等于零的时候，又分为三种情况，TOR, NA4, NAPOT（如上图中所示）。其中NA4可以看作时NAPOT的一种特殊情况。所以我们先看一下NAPOT模式下一个PMP entry的地址寄存器所控制的地址范围是多少。

在上图中，当pmpcfg.A为NAPOT时，从pmpaddr的低位开始寻找连续1的个数。

若pmpaddr值为yyyy...yyy0，即连续1的个数为0，则该PMP entry所控制的地址空间为从yyyy...yyy0开始的8个字节
若pmpaddr值为yyyy...yy01，即连续1的个数为1，则该PMP entry所控制的地址空间为从yyyy...yy00开始的16个字节
......
若pmpaddr值为y...y01...1，设连续1的个数为n，则该PMP entry所控制的地址空间为从y...y00...0开始的 $2^{n+3}$ 个字节

这种控制地址范围的方式叫做自然对其2指数地址范围（Naturally Aligned Power-of-2 regions, NAPOT）

考虑一种边界情况，若pmpaddr值为yyyy...yyyy，此时控制的地址范围即是从yyyy...yyyy开始的4个字节，而pmpcfg.A的值为NA4，即Naturally Aligned Four-byte regions.

另一种A字段的取值的TOR。当某个PMP entry 的配置寄存器的A字段设置位TOR时，该PMP entry所控制的地址范围由前一个PMP entry的地址寄存器（值为 $pmpaddr_{i-1}$ ）和该PMP entry的地址寄存器（值为 $pmpaddr_i$ ）共同决定。其匹配任意满足如下条件的地址 $y$ ：

特别的，若第0个PMP entry的A字段为TOR，其所控制的地址空间的下界被认为是0，即匹配所有满足如下条件的地址 $y$ ：

Locking and Privilege Mode

这部分介绍配置寄存器中L field的作用。L字段表示PMP entry处于锁定状态，此时对于配置寄存器和对应的地址寄存器的写入会被忽略。被锁定的PMP entries在hart(hardware thread)重置之前都将保持锁定。

我们知道，通常情况下M模式拥有对于所有地址的所有权限。但当L字段为1时，M、S、U模式都必须遵循配置寄存器的权限设置（是否读、写、执行权限）。而当L字段为0时，在M模式下匹配到此PMP entry的任何操作都将成功，而S和U模式下需要遵循配置寄存器中的权限设置。

Priotity and Matching Logic

如果看明白上述内容，这部分就比较容易理解了。直接把手册上的原文复制过来了，有空再翻译。

RISC-V PMP 在 QEMU 中的实现

关于RISC-V PMP机制，详见：juejin.cn/post/684490…

综述

QEMU中关于RISC-V PMP的部分在target\riscv\pmp.c中。

在这部分，目前设置了如下public interface:

bool pmp_hart_has_privs(CPURISCVState *env, target_ulong addr,
    target_ulong size, pmp_priv_t privs, target_ulong mode)
    
void pmpcfg_csr_write(CPURISCVState *env, uint32_t reg_index,
    target_ulong val)

target_ulong pmpcfg_csr_read(CPURISCVState *env, uint32_t reg_index)

void pmpaddr_csr_write(CPURISCVState *env, uint32_t addr_index,
    target_ulong val)

target_ulong pmpaddr_csr_read(CPURISCVState *env, uint32_t addr_index)

第一个api用来判断对于某个物理地址的操作（Read, Write, Execute）是否有权限。之后的api分别用来对PMP entry的地址寄存器和配置寄存器进行修改。在target\riscv\pmp.c中，注释提到，PMP功能目前还没有完整经过测试:

PMP (Physical Memory Protection) is as-of-yet unused and needs testing.

目前上述api中，只有第一个pmp_hart_has_privs由外部调用过（因为PMP功能还没有公开给用户使用），故我们从这个api进入源码进行分析。

`bool pmp_hart_has_privs()`

先看完整代码：

bool pmp_hart_has_privs(CPURISCVState *env, target_ulong addr,
    target_ulong size, pmp_priv_t privs, target_ulong mode)
{
    int i = 0;
    int ret = -1;
    int pmp_size = 0;
    target_ulong s = 0;
    target_ulong e = 0;
    pmp_priv_t allowed_privs = 0;

    /* Short cut if no rules */
    if (0 == pmp_get_num_rules(env)) {
        return true;
    }

    /*
     * if size is unknown (0), assume that all bytes
     * from addr to the end of the page will be accessed.
     */
    if (size == 0) {
        pmp_size = -(addr | TARGET_PAGE_MASK);
    } else {
        pmp_size = size;
    }

    /* 1.10 draft priv spec states there is an implicit order
         from low to high */
    for (i = 0; i < MAX_RISCV_PMPS; i++) {
        s = pmp_is_in_range(env, i, addr);
        e = pmp_is_in_range(env, i, addr + pmp_size - 1);

        /* partially inside */
        if ((s + e) == 1) {
            qemu_log_mask(LOG_GUEST_ERROR,
                          "pmp violation - access is partially inside\n");
            ret = 0;
            break;
        }

        /* fully inside */
        const uint8_t a_field =
            pmp_get_a_field(env->pmp_state.pmp[i].cfg_reg);

        /*
         * If the PMP entry is not off and the address is in range, do the priv
         * check
         */
        if (((s + e) == 2) && (PMP_AMATCH_OFF != a_field)) {
            allowed_privs = PMP_READ | PMP_WRITE | PMP_EXEC;
            if ((mode != PRV_M) || pmp_is_locked(env, i)) {
                allowed_privs &= env->pmp_state.pmp[i].cfg_reg;
            }

            if ((privs & allowed_privs) == privs) {
                ret = 1;
                break;
            } else {
                ret = 0;
                break;
            }
        }
    }

    /* No rule matched */
    if (ret == -1) {
        if (mode == PRV_M) {
            ret = 1; /* Privileged spec v1.10 states if no PMP entry matches an
                      * M-Mode access, the access succeeds */
        } else {
            ret = 0; /* Other modes are not allowed to succeed if they don't
                      * match a rule, but there are rules.  We've checked for
                      * no rule earlier in this function. */
        }
    }

    return ret == 1 ? true : false;
}

函数参数

env：保存RISC-V CPU全部信息
addr：hart要访问的基地址
size：从基地址开始，hart要访问多大的地址空间
privs：hart想要使用的权限
mode：cpu所处的特权模式（U/S/M）

函数过程

#3## 获取PMP规则数目若没有设置PMP规则，即PMP未启用，那所有的访问都是允许的，直接返回True

下面我们看一下如何获取PMP规则数目：

static inline uint32_t pmp_get_num_rules(CPURISCVState *env)
{
     return env->pmp_state.num_rules;
}

直接返回pmp_state.num_rules成员，那么一定有什么地方实时更新这个成员变量。我们在接下来其它函数中会看到。

处理未声明size大小的情况

目前的git版本中这部分内容如下：

/*
 * if size is unknown (0), assume that all bytes
 * from addr to the end of the page will be accessed.
 */
if (size == 0) {
    pmp_size = -(addr | TARGET_PAGE_MASK);
} else {
    pmp_size = size;
}

即，若是没有声明的话，默认需要访问整个页的大小。但是，在没有启用虚拟内存分页机制的情况下，这样的地址空间显然太大了。于是，在还没有被同步到主仓库的patch中，我们看到了Alistair将其做了以下修改：

Signed-off-by: Alistair Francis <address@hidden>
---
 target/riscv/pmp.c | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/target/riscv/pmp.c b/target/riscv/pmp.c
index 0e6b640fbd..5aba4d13ea 100644
--- a/target/riscv/pmp.c
+++ b/target/riscv/pmp.c
@@ -233,12 +233,21 @@ bool pmp_hart_has_privs(CPURISCVState *env, target_ulong 
addr,
         return true;
     }
 
-    /*
-     * if size is unknown (0), assume that all bytes
-     * from addr to the end of the page will be accessed.
-     */
     if (size == 0) {
-        pmp_size = -(addr | TARGET_PAGE_MASK);
+        if (!riscv_feature(env, RISCV_FEATURE_MMU)) {
+            /*
+             * if size is unknown (0), assume that all bytes
+             * from addr to the end of the page will be accessed.
+             */
+            pmp_size = -(addr | TARGET_PAGE_MASK);
+        } else {
+            /*
+             * If size is unknown (0) and we don't have an MMU,
+             * just guess the size as the xlen as we don't want to
+             * access an entire page worth.
+             */
+            pmp_size = sizeof(target_ulong);
+        }
     } else {
         pmp_size = size;
     }
--

也就是说，当没有启用分页的时候，猜一个访问空间大小，将其设置为sizeof(target_ulong)，对于32位即4个字节，正好是PMP支持的最小颗粒度。

判断地址空间

接下来要将hart希望访问的地址空间与每一个PMP entry管理的地址空间相比较。即如下这个for循环：

for (i = 0; i < MAX_RISCV_PMPS; i++) {
    s = pmp_is_in_range(env, i, addr);
    e = pmp_is_in_range(env, i, addr + pmp_size - 1);

    /* partially inside */
    if ((s + e) == 1) {
        qemu_log_mask(LOG_GUEST_ERROR,
                      "pmp violation - access is partially inside\n");
        ret = 0;
        break;
    }

    /* fully inside */
    const uint8_t a_field =
        pmp_get_a_field(env->pmp_state.pmp[i].cfg_reg);

    /*
     * If the PMP entry is not off and the address is in range, do the priv
     * check
     */
    if (((s + e) == 2) && (PMP_AMATCH_OFF != a_field)) {
        allowed_privs = PMP_READ | PMP_WRITE | PMP_EXEC;
        if ((mode != PRV_M) || pmp_is_locked(env, i)) {
            allowed_privs &= env->pmp_state.pmp[i].cfg_reg;
        }

        if ((privs & allowed_privs) == privs) {
            ret = 1;
            break;
        } else {
            ret = 0;
            break;
        }
    }
}

循环遍历所有的PMP entry。对于任一PMP entry，分别判断hart希望访问的地址空间的首尾是否在其管理的物理地址范围之内，若是，记为1，若否，记为0。这部分工作由pmp_is_in_range函数完成。

`static int pmp_is_in_range()`

函数内容如下：

static int pmp_is_in_range(CPURISCVState *env, int pmp_index, target_ulong addr)
{
    int result = 0;

    if ((addr >= env->pmp_state.addr[pmp_index].sa)
        && (addr <= env->pmp_state.addr[pmp_index].ea)) {
        result = 1;
    } else {
        result = 0;
    }

    return result;
}

这里，只需要判断一个给定的地址是否属于某个PMP entry的管理范围。而在QEMU中对PMP entry做了优化，在设置规则的时候就对一个PMP entry负责的地址范围做了记录，记在sa与ea中（后面会看到实现方式），而不需要每次都通过PMP配置寄存器的A字段和对应的地址寄存器来判断。

若hart要访问的空间只有部分在某个PMP entry的管理范围，而不是全部嵌入或完全不相交，则结束循环，提示错误信息：

/* partially inside */
if ((s + e) == 1) {
    qemu_log_mask(LOG_GUEST_ERROR,
                  "pmp violation - access is partially inside\n");
    ret = 0;
    break;
}

若不在这个PMP entry的管理范围，则继续循环。

若在当前PMP entry的地址管理范围，则获取当前PMP entry的配置寄存器的A字段：

const uint8_t a_field =
    pmp_get_a_field(env->pmp_state.pmp[i].cfg_reg);

pmp_get_a_field函数实现也很简单：

static inline uint8_t pmp_get_a_field(uint8_t cfg)
{
    uint8_t a = cfg >> 3;
    return a & 0x3;
}

直接写成了内联函数。

之后就进入了匹配PMP entry情况下的核心部分了：

/*
 * If the PMP entry is not off and the address is in range, do the priv
 * check
 */
if (((s + e) == 2) && (PMP_AMATCH_OFF != a_field)) {
    allowed_privs = PMP_READ | PMP_WRITE | PMP_EXEC;
    if ((mode != PRV_M) || pmp_is_locked(env, i)) {
        allowed_privs &= env->pmp_state.pmp[i].cfg_reg;
    }

    if ((privs & allowed_privs) == privs) {
        ret = 1;
        break;
    } else {
        ret = 0;
        break;
    }
}

判断条件是PMP entry匹配且当前PMP entry的配置寄存器的A字段不为OFF。可以看一下QEMU对于 A字段的枚举定义：

typedef enum {
    PMP_AMATCH_OFF,  /* Null (off)                            */
    PMP_AMATCH_TOR,  /* Top of Range                          */
    PMP_AMATCH_NA4,  /* Naturally aligned four-byte region    */
    PMP_AMATCH_NAPOT /* Naturally aligned power-of-two region */
} pmp_am_t;

这与PMP机制中对于A的取值是对应的。

在手册中提到过，当CPU处于M模式的时候，除非PMP的L位为1，否则允许所有操作。故允许所有操作的条件是CPU处于M模式且PMP的L位为0。对其取非即为需要判断权限的情况，即CPU不是M模式，或L位为1，即之后的判断条件：(mode != PRV_M) || pmp_is_locked(env, i)

之后导出PMP配置寄存器中的权限情况，和hart想要使用的权限进行比对即可。

有意思的是这里代码的实现方式，即将PMP配置寄存器声明了一个8位的空间：

uint8_t  cfg_reg;

和实际中PMP配置寄存器的大小相同。而将权限枚举如下：

typedef enum {
    PMP_READ  = 1 << 0,
    PMP_WRITE = 1 << 1,
    PMP_EXEC  = 1 << 2,
    PMP_LOCK  = 1 << 7
} pmp_priv_t;

那么之后只需要按位比较和操作allowed_privs即可。这种方式让我想起了linux的chmod对于权限的说明方式。

没有匹配的规则

若遍历了所有的PMP entry但是没有匹配的规则，需要判断当前是否是M模式，若是M模式，那么直接允许访问，若不是，则根据官方文档，拒绝访问（因为之前已经判断过是否启用了PMP了）

/* No rule matched */
if (ret == -1) {
    if (mode == PRV_M) {
        ret = 1; /* Privileged spec v1.10 states if no PMP entry matches an
                  * M-Mode access, the access succeeds */
    } else {
        ret = 0; /* Other modes are not allowed to succeed if they don't
                  * match a rule, but there are rules.  We've checked for
                  * no rule earlier in this function. */
    }
}

`static void pmp_update_rule()`

另一个重要的函数是对于规则的更新。在写配置寄存器的时候会调用这个函数。函数如下：

static void pmp_update_rule(CPURISCVState *env, uint32_t pmp_index)
{
    int i;

    env->pmp_state.num_rules = 0;

    uint8_t this_cfg = env->pmp_state.pmp[pmp_index].cfg_reg;
    target_ulong this_addr = env->pmp_state.pmp[pmp_index].addr_reg;
    target_ulong prev_addr = 0u;
    target_ulong sa = 0u;
    target_ulong ea = 0u;

    if (pmp_index >= 1u) {
        prev_addr = env->pmp_state.pmp[pmp_index - 1].addr_reg;
    }

    switch (pmp_get_a_field(this_cfg)) {
    case PMP_AMATCH_OFF:
        sa = 0u;
        ea = -1;
        break;

    case PMP_AMATCH_TOR:
        sa = prev_addr << 2; /* shift up from [xx:0] to [xx+2:2] */
        ea = (this_addr << 2) - 1u;
        break;

    case PMP_AMATCH_NA4:
        sa = this_addr << 2; /* shift up from [xx:0] to [xx+2:2] */
        ea = (this_addr + 4u) - 1u;
        break;

    case PMP_AMATCH_NAPOT:
        pmp_decode_napot(this_addr, &sa, &ea);
        break;

    default:
        sa = 0u;
        ea = 0u;
        break;
    }

    env->pmp_state.addr[pmp_index].sa = sa;
    env->pmp_state.addr[pmp_index].ea = ea;

    for (i = 0; i < MAX_RISCV_PMPS; i++) {
        const uint8_t a_field =
            pmp_get_a_field(env->pmp_state.pmp[i].cfg_reg);
        if (PMP_AMATCH_OFF != a_field) {
            env->pmp_state.num_rules++;
        }
    }
}

这部分完成的功能是将一个PMP entry控制的地址空间计算出来并且将其首尾地址放入QEMU设计的sa和ea中。这样的设计可以减少判断管理地址空间时的开销，相当于一种用空间换时间的优化方式。

值得注意的是这里对于NAPOT模式下地址空间的解析函数：pmp_decode_napot，代码如下:

static void pmp_decode_napot(target_ulong a, target_ulong *sa, target_ulong *ea)
{
    /*
       aaaa...aaa0   8-byte NAPOT range
       aaaa...aa01   16-byte NAPOT range
       aaaa...a011   32-byte NAPOT range
       ...
       aa01...1111   2^XLEN-byte NAPOT range
       a011...1111   2^(XLEN+1)-byte NAPOT range
       0111...1111   2^(XLEN+2)-byte NAPOT range
       1111...1111   Reserved
    */
    if (a == -1) {
        *sa = 0u;
        *ea = -1;
        return;
    } else {
        target_ulong t1 = ctz64(~a);
        target_ulong base = (a & ~(((target_ulong)1 << t1) - 1)) << 2;
        target_ulong range = ((target_ulong)1 << (t1 + 3)) - 1;
        *sa = base;
        *ea = base + range;
    }
}

关键的是else中的部分。首先将a（PMP地址寄存器中的地址）取反，判断末尾连续0的个数，也就是a末尾连续1的个数t1。

之后将1左移t1位，减一获得一个末尾为t1个连续1的中间结果，取非，和a做与，再左移两位。这部分实际上就取出了基地址。

之后计算管理的地址空间大小，从而算出开始地址和结束地址。

这段代码可以说是非常精妙且容易理解。事实上，关于PMP配置寄存器中NAPOT模式下的地址空间计算方法，我也是先看了QEMU的代码再重读官方文档才看明白的。

Enhanced PMP

这是我们下一步要研究的主要方向。

提议链接如下：lists.riscv.org/g/tech-tee/…

目前已有很多讨论，我们接下来的几周将了解这种机制并且尝试在QEMU中实现它。

QEMU开源实战（九）