MIT 6.828 Lab3 用户态环境
Like a Unix process, a JOS environment couples the concepts of "thread" and "address space". The thread is defined primarily by the saved registers (the
env_tffield), and the address space is defined by the page directory and page tables pointed to byenv_pgdir. To run an environment, the kernel must set up the CPU with both the saved registers and the appropriate address space.
JOS 中的 enviroment 是线程与地址空间的组合概念,个人感觉类似于 linux 中 task_struct,提供操作系统的进程实现;
struct Env {
struct Trapframe env_tf; // Saved registers
struct Env *env_link; // Next free Env
envid_t env_id; // Unique environment identifier
envid_t env_parent_id; // env_id of this env's parent
enum EnvType env_type; // Indicates special system environments
unsigned env_status; // Status of the environment
uint32_t env_runs; // Number of times environment has run
// Address space
pde_t *env_pgdir; // Kernel virtual address of page dir
};
1. 计算理论与控制流
在开始本次 lab 前,先简单的回顾一下计算理论与控制流相关的知识
1.1 计算理论
计算机可以做什么?不能做什么?如果可以,计算机怎么来完成一个任务?这些计算计算理论相关的话题了,这边介绍两个计算模型:
- 确定性有限状态机,数学表示为 FSM 五元组(V,S,,,F)
在地铁闸机的模型中:
不管是地铁闸机,还是早期单机游戏里面的NPC,都可以用 FSM 来表示,例如地铁闸机初始状态为关闭,投币后状态转换为解锁,通过之后状态转换为关闭。
- 图灵机,七元组,在 FSM 的基础上增加,{ L,R }与 Blank ,区别是有纸带用于读写,现在的计算机都是图灵完备的。
1.2 控制流
在 CSAPP 这本书异常控制流这一章,通过控制转移序列来表示计算机的指令执行序列,并引入异常控制流的概念来表示中断,异常;
将计算机自上电开始执行的指令序列记为:,控制转移记为: ,将控制转移序列称为控制流
1.3 总结
通过计算理论角度来看,一个图灵完备的计算机,其核心状态 K 为: 以及纸带上的输出,通过暂存与恢复 K,可以达到重放的效果;
以复读机为例,人工不改变磁带位置的情况下,下次播放还是会从相同的位置开始;
在 JOS 中,通过 trapframe 结构体来记录 K,主要包括通用寄存器,控制流相关的(CS:IP)二元组,栈相关的(SS:SP)二元组,以及一些段地址;
通过暂存与恢复 trapframe,计算机可以维持多个图灵机状态,并在其中进行切换,类似复读机与多个磁带,换着播放;
另外要注意到,中断与异常在 6.828 中被称为受保护的控制转移 Protected Control Transfer,其机制与普通的栈调用并无本质区别(无权限级别切换【用户->内核】的情形下许多状态保持不变,不需要维护),实现细节如下:
-
需要维护的状态更多,通过 trapframe 的暂存与恢复(主要通过内核栈来实现)
-
受内核保护
为避免中断与异常受用户态环境 bug 或恶意代码的影响,JOS 通过 IDT 与 TSS 两种机制来提供保护,详情见第三部分:3.保护控制转移
On the x86, two mechanisms work together to provide this protection,The Interrupt Descriptor Table. The Task State Segment.
图示如下:
2. 用户态环境
lab 3 若出现 kernel panic at kern/pmap.c:147: PADDR called with invalid kva xxx,可能是链接脚本bug导致,可参考这个解决方案:qiita.com/kagurazakak…
Exercise 1
用户态环境的管理参考 lab2 中的内存管理,使用 static 指针变量 envs 指向通过 boot_alloc 申请的 Environment 数组;
// Make 'envs' point to an array of size 'NENV' of 'struct Env'.
// LAB 3: Your code here.
envs = (struct Env *)boot_alloc(NENV*sizeof(struct Env));
memset(envs,0,NENV*sizeof(struct Env));
// 内存区域映射,方便用户态程序访问
boot_map_region(kern_pgdir,UENVS,NENV*sizeof(struct Env),PADDR(envs),PTE_U|PTE_P);
Exercise 2
根据 hint 完成要求的函数即可,代码调用路径如下:
- env_init,需要注意的是 env_free_list 的顺序,这边通过 tail 指针来完成
void
env_init(void)
{
// Set up envs array
// LAB 3: Your code here.
struct Env *tail = NULL;
struct Env *cur_env;
for(int i = 0;i<NENV;i++){
cur_env = &envs[i];
cur_env->env_id = 0;
cur_env->env_type = ENV_FREE;
if(tail==NULL){
tail = cur_env;
//改变指针与改变指针内容
env_free_list = cur_env;
}else{
tail->env_link = cur_env;
tail = cur_env;
}
}
cprintf("env init finished,header:%08x,tail:%08x\n",env_free_list,tail);
// Per-CPU part of the initialization
env_init_percpu();
}
- env_setup_vm
设置用户态环境的虚拟内存,主要是申请 page 作为页目录,拷贝原先的映射,增加 env_pgdir 自身虚拟地址对物理地址的映射配置;
static int
env_setup_vm(struct Env *e)
{
int i;
struct PageInfo *p = NULL;
// Allocate a page for the page directory
if (!(p = page_alloc(ALLOC_ZERO)))
return -E_NO_MEM
e->env_pgdir = (pde_t *)page2kva(p); //虚拟地址与内存映射,指针转换,void *需显式转换
uint32_t utop_index = PDX(UTOP);
memcpy(e->env_pgdir,kern_pgdir,PGSIZE);
p->pp_ref++;
// UVPT maps the env's own page table read-only.
// Permissions: kernel R, user R
e->env_pgdir[PDX(UVPT)] = PADDR(e->env_pgdir) | PTE_P | PTE_U;
return 0;
}
- region_alloc
为用户态环境申请内存,这边循环申请内存时,循环的判断条件可以简单通过取整之后的区间地址判断(从最小单元的角度看待待分配的空间)
static void
region_alloc(struct Env *e, void *va, size_t len)
{
pde_t *env_pgdir = e ->env_pgdir;
uintptr_t pdx = PDX(va);
uintptr_t ptx = PTX(va);
uintptr_t align_start = ROUNDDOWN((uint32_t)va,PGSIZE);
uintptr_t align_end = ROUNDUP((uint32_t)va+len,PGSIZE);
//循环申请内存
uintptr_t start_addr = align_start;
struct PageInfo *cur_pp;
pte_t *cur_pte;
//先判断能否申请到页表的内存,考虑右边界单元
while(start_addr<align_end){
cur_pp = page_alloc(ALLOC_ZERO);
if(!cur_pp){
panic("Can't Allocate Physical Memory:%08x",start_addr);
}
// 使用 page_insert 而非直接赋值,可能存在页替换的情况,region_alloc需注意
int r = page_insert(env_pgdir,cur_pp,(void *)start_addr,PTE_P|PTE_U|PTE_W);
if(r!=0){
panic("Page Insert Failed:%e",r);
}
start_addr+=PGSIZE;
}
}
- load_icode
在用户态环境中加载 ELF 格式的代码,参考 Bootloader 的实现即可,其中 p_memsz 与 p_filesz 分别为内存大小与文件大小,相减之后的大小为 bss 段,通过 tf_eip 来设置程序入口,申请用户栈,通过 lcr3 加载用户态的页目录地址;
static void
load_icode(struct Env *e, uint8_t *binary)
{
struct Elf *elf_header = (struct Elf *)binary;
if(elf_header->e_magic!=ELF_MAGIC){
panic("Not a Valid ELF Format");
}
//切换至环境的pgdir
lcr3(PADDR(e->env_pgdir));
struct Proghdr *ph = (struct Proghdr *) ((uint8_t *) elf_header + elf_header->e_phoff);
struct Proghdr *eph = ph+elf_header->e_phnum;
for(;ph<eph;ph++){
if(ph->p_type==ELF_PROG_LOAD){
//将Program加载入环境env中,首先申请空间
region_alloc(e,(uint8_t *)(ph->p_va),ph->p_memsz);
//拷贝数据
memcpy((uint8_t *)(ph->p_va),binary+ph->p_offset,ph->p_filesz);
//其余置0,也可以提前一步置零
memset((uint8_t *)(ph->p_va)+ph->p_filesz,0,ph->p_memsz-ph->p_filesz);
}
}
//entry point
e->env_tf.tf_eip = elf_header->e_entry;
// Now map one page for the program's initial stack
// at virtual address USTACKTOP - PGSIZE.
// LAB 3: Your code here.
region_alloc(e,(void *)(USTACKTOP - PGSIZE),PGSIZE);
lcr3(PADDR(kern_pgdir));
}
- env_create
用户态环境创建,申请 env,加载代码,设置环境类型
void
env_create(uint8_t *binary, enum EnvType type)
{
// LAB 3: Your code here.
struct Env *cur_env;
int res_code = env_alloc(&cur_env,0);
if(res_code!=0){
panic("Can't Create New Env");
}
load_icode(cur_env,binary);
cur_env->env_type = type;
cprintf("env created:%08x,load code from:%08x\n",cur_env,binary);
}
- env_run
curenv 指向当前的用户态环境,如已有用户态环境运行,将原先用户态环境状态改为ENV_RUNNABLE,将当前状态设为ENV_RUNNING,加载用户态环境的页目录地址,通过 env_pop_tf 来设置 CPU 寄存器状态,使当前环境开始运行
void
env_run(struct Env *e)
{
if(curenv!=NULL && curenv->env_status == ENV_RUNNING){
curenv->env_status = ENV_RUNNABLE;
}
curenv = e;
curenv-> env_status = ENV_RUNNING;
curenv-> env_runs++;
lcr3(PADDR(curenv->env_pgdir));
//Step 2
env_pop_tf(&(curenv->env_tf));
}
3. 控制转移
Exceptions and interrupts are both "protected control transfers," which cause the processor to switch from user to kernel mode (CPL=0) without giving the user-mode code any opportunity to interfere with the functioning of the kernel or other environments. In Intel's terminology, an interrupt is a protected control transfer that is caused by an asynchronous event usually external to the processor, such as notification of external device I/O activity. An exception, in contrast, is a protected control transfer caused synchronously by the currently running code, for example due to a divide by zero or an invalid memory access.
异常与中断均被视作受保护状态转移,中断异步触发(设置IO),异常同步触发(除零|无效内存访问)
X86提供两种机制来保证控制转移出于受保护状态
- IDT,
Interrupt Descriptor Table,中断描述符表,机制类似于 GDT
struct Gatedesc idt[256] = { { 0 } };
struct Pseudodesc idt_pd = {
sizeof(idt) - 1, (uint32_t) idt
};
类似与 GDT,处理器从 IDT entry 中加载 CS 与 EIP 寄存器,IDT entry 格式如下
- TSS,
Task State Segment,任务状态段
异常或中断发生时,为保证安全,原先的 CS 与 EIP 寄存器值需要专门存储,另外若存在权限级别切换(用户->内核),栈也会切换至内核栈,所以处理器会将SS, ESP, EFLAGS, CS, EIP, 以及可选的异常码存储至栈中,其中 TSS 就是 JOS 中用来标识栈位置的结构,TSS 段的信息存放在 gdt 中;
static struct Taskstate ts;
- trap_init
通过指令加载 idt 与 tss,有关 gdt 的内容可以参考 lab 2,最后 IDT 与 TSS 的信息分别由对应的专用指令加载
// Initialize and load the per-CPU TSS and IDT
void
trap_init_percpu(void)
{
// Setup a TSS so that we get the right stack
// when we trap to the kernel.
ts.ts_esp0 = KSTACKTOP;
ts.ts_ss0 = GD_KD;
ts.ts_iomb = sizeof(struct Taskstate);
// Initialize the TSS slot of the gdt.
gdt[GD_TSS0 >> 3] = SEG16(STS_T32A, (uint32_t) (&ts),
sizeof(struct Taskstate) - 1, 0);
gdt[GD_TSS0 >> 3].sd_s = 0;
// Load the TSS selector (like other segment selectors, the
// bottom three bits are special; we leave them 0)
ltr(GD_TSS0);
// Load the IDT
lidt(&idt_pd);
}
Exercise 4
IDT 在 trap_init 函数中进行初始化,其中 SETGATE 宏负责填充 IDT entry 的内容,其中 x_handler 只起一个占位的作用,后续的汇编代码中会通过 TRAPHANDLER(pgflt_handler, T_PGFLT); 的方式对外暴露一个同名的全局变量,并通过 jmp 指令跳转至 _alltraps 进行统一处理。
最后编译好的 kernel.asm 中相应代码如下:
- 汇编相关
#define SETGATE(gate, istrap, sel, off, dpl) \
{ \
(gate).gd_off_15_0 = (uint32_t) (off) & 0xffff; \
(gate).gd_sel = (sel); \
(gate).gd_args = 0; \
(gate).gd_rsv1 = 0; \
(gate).gd_type = (istrap) ? STS_TG32 : STS_IG32; \
(gate).gd_s = 0; \
(gate).gd_dpl = (dpl); \
(gate).gd_p = 1; \
(gate).gd_off_31_16 = (uint32_t) (off) >> 16; \
}
#define TRAPHANDLER(name, num) \
.globl name; /* define global symbol for 'name' */ \
.type name, @function; /* symbol type is function */ \
.align 2; /* align function definition */ \
name: /* function starts here */ \
pushl $(num); \
jmp _alltraps
- trap_init
void *divide_handler();
void *debug_handler();
void *nmi_handler();
void *brkpt_handler();
void *oflow_handler();
void *bound_handler();
void *illop_handler();
void *device_handler();
void *dblflt_handler();
void *tss_handler();
void *segnp_handler();
void *stack_handler();
void *gpflt_handler();
void *pgflt_handler();
void *fperr_handler();
void *align_handler();
void *mchk_handler();
void *simderr_handler();
void *syscall_handler();
void *default_handler();
void
trap_init(void)
{
extern struct Segdesc gdt[];
cprintf("IDT initializing!\n");
// LAB 3: Your code here.
// 填充ldt
SETGATE(idt[T_DIVIDE],0,GD_KT,divide_handler,0);
SETGATE(idt[T_DEBUG],0,GD_KT,debug_handler,0);
SETGATE(idt[T_NMI],0, GD_KT,nmi_handler,0);
// dpl改为3,可以从用户态进行调用
SETGATE(idt[T_BRKPT],0,GD_KT,brkpt_handler,3);
SETGATE(idt[T_OFLOW],0,GD_KT,oflow_handler,0);
SETGATE(idt[T_BOUND],0,GD_KT,bound_handler,0);
SETGATE(idt[T_ILLOP],0,GD_KT,illop_handler,0);
SETGATE(idt[T_DEVICE],0,GD_KT,device_handler,0);
SETGATE(idt[T_DBLFLT],0,GD_KT,dblflt_handler,0);
/* #define T_COPROC 9 */ // reserved (not generated by recent processors)
SETGATE(idt[T_TSS],0,GD_KT,tss_handler,0);
SETGATE(idt[T_SEGNP],0,GD_KT,segnp_handler,0);
SETGATE(idt[T_STACK],0,GD_KT,stack_handler,0);
SETGATE(idt[T_GPFLT],0,GD_KT,gpflt_handler,0);
SETGATE(idt[T_PGFLT],0,GD_KT,pgflt_handler,0);
/* #define T_RES 15 */ // reserved
SETGATE(idt[T_FPERR],0,GD_KT,fperr_handler,0);
SETGATE(idt[T_ALIGN],0,GD_KT,align_handler,0);
SETGATE(idt[T_MCHK],0,GD_KT,mchk_handler,0);
SETGATE(idt[T_SIMDERR],0,GD_KT,simderr_handler,0);
// dpl改为3,可以从用户态调用
SETGATE(idt[T_SYSCALL],0,GD_KT,syscall_handler,3);
SETGATE(idt[T_DEFAULT],0,GD_KT,default_handler,0);
// Per-CPU setup
trap_init_percpu();
}
- trapEntry.S
通过 TRAPHANDLER_X 相关的宏提供全局的符号
/*
* Lab 3: Your code here for generating entry points for the different traps.
*/
TRAPHANDLER_NOEC(divide_handler, T_DIVIDE);
TRAPHANDLER_NOEC(debug_handler, T_DEBUG);
TRAPHANDLER_NOEC(nmi_handler, T_NMI);
TRAPHANDLER_NOEC(brkpt_handler, T_BRKPT);
TRAPHANDLER_NOEC(oflow_handler, T_OFLOW);
TRAPHANDLER_NOEC(bound_handler, T_BOUND);
TRAPHANDLER_NOEC(illop_handler, T_ILLOP);
TRAPHANDLER_NOEC(device_handler, T_DEVICE);
TRAPHANDLER_NOEC(fperr_handler, T_FPERR);
TRAPHANDLER_NOEC(align_handler, T_ALIGN);
TRAPHANDLER_NOEC(mchk_handler, T_MCHK);
TRAPHANDLER_NOEC(simderr_handler, T_SIMDERR);
TRAPHANDLER_NOEC(syscall_handler, T_SYSCALL);
TRAPHANDLER_NOEC(default_handler, T_DEFAULT);
TRAPHANDLER(dblflt_handler, T_DBLFLT);
TRAPHANDLER(tss_handler, T_TSS);
TRAPHANDLER(segnp_handler, T_SEGNP);
TRAPHANDLER(stack_handler, T_STACK);
TRAPHANDLER(gpflt_handler, T_GPFLT);
TRAPHANDLER(pgflt_handler, T_PGFLT);
TRAPHANDLER_NOEC(irq_timer_handler, IRQ_OFFSET+IRQ_TIMER);
TRAPHANDLER_NOEC(irq_kbd_handler, IRQ_OFFSET+IRQ_KBD);
TRAPHANDLER_NOEC(irq_serial_handler, IRQ_OFFSET+IRQ_SERIAL);
TRAPHANDLER_NOEC(irq_spurious_handler, IRQ_OFFSET+IRQ_SPURIOUS);
TRAPHANDLER_NOEC(irq_ide_handler, IRQ_OFFSET+IRQ_IDE);
TRAPHANDLER_NOEC(irq_error_handler,IRQ_OFFSET+IRQ_ERROR);
/*
* Lab 3: Your code here for _alltraps
*/
_alltraps:
pushl %ds
pushl %es
pushal
movw $GD_KD,%ax
movw %ax,%ds
movw %ax,%es
pushl %esp
call trap
trapframe 读写
栈与 trapframe 的关系如下图所示
读写,读通过 trapframe 来接收栈上传递的参数,通过 curenv->env_tf = *tf 来进行赋值
缺页,断点处理
Exercise 5 & 6
在 trap_dispatch 函数下,增加对于缺页,断点相关异常的处理(函数调用)
switch(tf->tf_trapno){
case T_PGFLT:
page_fault_handler(tf);
return;
case T_BRKPT:
monitor(tf);
return;
}
4. 系统调用
用户态发起系统调用的代码在 lib/syscall.c 中,通过内联汇编来触发,内核按相应顺序接收参数,返回值放在 eax 中;
Exercise 7
系统调用实现,按顺序接收参数,然后调用内核中的函数,将返回值写入 eax 寄存器
switch(tf->tf_trapno){
case T_SYSCALL:
{
//细分syscall传入的num参数
struct PushRegs regs = tf->tf_regs;
uint32_t func_code = regs.reg_eax;
//返回值处理
uint32_t a1 = regs.reg_edx;
uint32_t a2 = regs.reg_ecx;
uint32_t a3 = regs.reg_ebx;
uint32_t a4 = regs.reg_edi;
uint32_t a5 = regs.reg_esi;
int32_t res = syscall(func_code,a1,a2,a3,a4,a5);
cprintf("[KERN]res:%08x\n",res);
regs.reg_eax = res;
cprintf("regs:%p\n",®s);
tf->tf_regs.reg_eax = res;
cprintf("tf_regs:%p\n",&(tf->tf_regs));
}
return;
}
Exercise 8
通过系统调用获得 env_id,然后将 thisenv 指针指向用户态环境在 envs 数组中的地址
extern void umain(int argc, char **argv);
const volatile struct Env *thisenv;
const char *binaryname = "<unknown>";
void
libmain(int argc, char **argv)
{
// set thisenv to point at our Env structure in envs[].
// LAB 3: Your code here.
//目前在用户态,只能通过系统调用获得环境id
envid_t env_id = sys_getenvid();
cprintf("env_id:%08x\n",env_id);
//通过环境id来获取对应环境引用,因在用户态,不可使用envid2env
thisenv = &envs[ENVX(env_id)];
// save the name of the program so that panic() can use it
if (argc > 0)
binaryname = argv[0];
// call user main routine
umain(argc, argv);
// exit gracefully
exit();
}
Exercise 9 &10
发生缺页异常时,可以通过 trapframe 中的代码段选择子来判断是用户态还是内核态
// 保护模式下cs中为代码段选择子
uint16_t code_segment_selector= tf->tf_cs;
cprintf("code_segment_selector:%08x\n",code_segment_selector);
// DPL中0为内核态,3为用户态
if((tf->tf_cs & 0x3) == 0){
panic("Page Fault Occured in Kernel Mode");
}
用户态内存校验,主要判断是否为空,以及权限是否满足
int
user_mem_check(struct Env *env, const void *va, size_t len, int perm)
{
// LAB 3: Your code here.
//左闭右开
uintptr_t start_addr = ROUNDDOWN((uintptr_t)va,PGSIZE);
uintptr_t end_addr = ROUNDUP((uintptr_t)va+len,PGSIZE);
cprintf("[PMAP]start->end:[%08x:%08x]\n",start_addr,end_addr);
//用户态下的kern_pgdir,
pde_t *env_pgdir = env->env_pgdir;
pte_t* cur_pte;
//低12位为权限位
uint32_t flag_range = 0x0fff;
//用户内存校验,不需要具体到分量校验
while(start_addr<end_addr){
//条件1
if(start_addr>=ULIM){
user_mem_check_addr = (uintptr_t)va;
return -E_FAULT;
}
cur_pte= pgdir_walk(env_pgdir,(void *)start_addr,0);
// 检查PTE_P权限与perm,权限校验参考
if(cur_pte==NULL||!(*cur_pte&PTE_P)||(*cur_pte&perm)!=perm){
//处理round down的情况
if(start_addr<(uintptr_t)va){
user_mem_check_addr = (uintptr_t)va;
}else{
user_mem_check_addr = start_addr;
}
cprintf("user_mem_check_addr value:%08x\n",user_mem_check_addr);
return -E_FAULT;
}
start_addr+=PGSIZE;
}
return 0;
}
在 kdebug.c 增加对相关内存地址的校验