Chapter7 Scheduling

Multiplexing

xv6在2种情况下在进程之间切换从而实现multiplexing：

sleep和wakeup机制。当一个进程在等待某种资源可用或某种条件成立，如设备、管道、或子进程的退出时，进程可以通过sleep主动放弃CPU，让其它的进程执行，然后稍后条件满足时，再通过wakeup唤醒该进程重新执行，这就实现了一种切换执行进程的方式。
轮转调度Round-Robin机制。每一个进程都有一个时间片，时间片耗尽时，时钟中断也随之发生，当前执行进程就会被挂起，而新的进程会被调度执行。只要中断没有被关闭，这种调度就能强制执行，因此xv6能够公平地，轮转地调度不同的进程在CPU上执行。

Code: Context Switching

一个总体的调度思想。最简单的，要从一个进程切换到另一个进程，我们应该要保存旧进程的相关运行状态和寄存器（上下文Context），然后通过某种方式选择一个将要执行的进程，恢复这个进程上次的运行状态，然后继续执行它，这样我们就完成了整个调度过程。

image-20230518124929303转存失败，建议直接上传图片文件

上图简单地展示了xv6是如何从一个用户进程切换到另一个用户进程的：以时钟中断引发的进程调度为例，原用户进程通过trap陷入内核空间，原内核线程进行上下文切换，回到该CPU的调度线程中，调度线程决定新的要运行的用户进程，因而又进行一次上下文切换，切换到新内核线程中，最后，再从新内核线程返回到用户空间，新用户进程得以执行。

上下文切换的核心函数是swtch(a0, a1)（kernel/swtch.S）这段代码通过将寄存器的状态保存到内存中，然后从内存中恢复状态来执行上下文切换。

.globl swtch
swtch:
        sd ra, 0(a0) // "存储双字"
        sd sp, 8(a0)
        sd s0, 16(a0)
        sd s1, 24(a0)
        sd s2, 32(a0)
        sd s3, 40(a0)
        sd s4, 48(a0)
        sd s5, 56(a0)
        sd s6, 64(a0)
        sd s7, 72(a0)
        sd s8, 80(a0)
        sd s9, 88(a0)
        sd s10, 96(a0)
        sd s11, 104(a0)

        ld ra, 0(a1) //"加载双字"
        ld sp, 8(a1)
        ld s0, 16(a1)
        ld s1, 24(a1)
        ld s2, 32(a1)
        ld s3, 40(a1)
        ld s4, 48(a1)
        ld s5, 56(a1)
        ld s6, 64(a1)
        ld s7, 72(a1)
        ld s8, 80(a1)
        ld s9, 88(a1)
        ld s10, 96(a1)
        ld s11, 104(a1)
        
        ret

所以上图中的两次上下文切换，可以简单表示为swtch(old_thread->context, scheduler_thread->context)和swtch(scheduler_thread->context, new_thread->context)。内核线程的上下文保存在进程的结构p->context中，而CPU的调度线程上下文则保存至cpu->context中。swtch并不会返回到同一个内核线程中，因为我们恢复的是另一个内核线程的ra，所以swtch会返回到另一个内核线程中。

注意到，这里context的寄存器只是之前我们所展示的，trapframe所保存的用户寄存器的一个子集，因为在swtch中，只保存由被调用者保存的寄存器Callee-saved registers。ra、 sp、保存寄存器(saved registers，s0-s11)都是Callee-saved registers，被调用者swtch负责保存这些寄存器，这样同一个线程（调用者）在调用swtch之前，和从swtch返回之后（通过另一个swtch返回到原调用者），调用者的这些寄存器的值都没有被改变。出于效率考虑，其它的由调用者保存的寄存器Caller-saved registers，则不由被调用者swtch负责保存和恢复，如果确实需要维持这些值，以防止被调用者覆盖它们，则由调用者自己保存在它的内核栈上。

进程的切换，有主动型和被动型，同时无论主动型还是被动型都必然涉及到sched()函数的调用，在xv6中，有三个函数会涉及到sched()函数的调用：

exit(): 当一个进程需要结束时，会设置进程状态为ZOMBIE，之后调用sched函数来切换到CPU的scheduler调度器进程，从而进行进程的切换。
yield(): 进程如果是被时钟器所打断，那么就会调用yield函数来进行进程切换，该函数会将进程状态设置为RUNNABLE，之后调用sched函数进行进程切换。
sleep(): 进程如果需要暂时sleep，就会调用该函数，设置进程状态为SLEEPING，然后调用sched函数进行进程切换。

void
sched(void)
{
  int intena;   //中断标志
  struct proc *p = myproc();   //当前进程
  
  if(!holding(&p->lock))
    panic("sched p->lock");
  if(mycpu()->noff != 1)
    panic("sched locks");
  if(p->state == RUNNING)
    panic("sched running");
  if(intr_get())
    panic("sched interruptible");

  intena = mycpu()->intena;
  //通过调用“swtch”函数执行上下文切换。
  //该函数有两个参数：指向当前进程上下文的指针和指向当前 CPU 上下文的指针。
  //这允许 CPU 切换到另一个进程的上下文。
  swtch(&p->context, &mycpu()->context);
  mycpu()->intena = intena;
}

用户进程PA->usertrap->内核线程TA->yield(TA)->sched(TA)->swtch(TA, S)->调度线程S->swtch(S, TB)->sched(TB)->yield(TB)->内核线程TB->usertrapret->用户进程PB

Role of lock in context switching

p->lock的一个作用就是，在整个调度过程完成之前，不能让其它的调度器调度执行旧的进程，因为它处于一种不稳定的中间态;此外，通过acquire(p->lock)，中断也被关闭，因此保证了调度过程的原子性。

总的来说，p->lock保证了取消调度和调度这两个过程的原子执行：

如果进程状态是RUNNING，那么时钟中断导致的yield可以安全地，从这个进程的内核线程，切换到调度线程的调度器中。在这个过程中，CPU寄存器要保持着该进程的寄存器值，c->proc要保持指向该进程。
如果进程状态是RUNNABLE，那么调度器可以安全地，调度执行这个进程。在这个过程中，p->context要保持着该进程的寄存器值，没有CPU使用该进程的内核栈，而且没有CPU的c->proc指向该进程。

Code: mycpu and myproc

xv6为每个CPU维护一个struct cpu（kernel/proc.h）结构，如下所示。该结构包含了，CPU正在运行的进程的进程结构，CPU调度线程的上下文，以及用于管理中断的相关信息。

// Per-CPU state.
struct cpu {
  struct proc *proc;          // The process running on this cpu, or null.
  struct context context;     // swtch() here to enter scheduler().
  int noff;                   // Depth of push_off() nesting.
  int intena;                 // Were interrupts enabled before push_off()?
};

mycpu返回一个指向当前CPU的struct cpu的指针，索引的方法是通过每个CPU独有的hartid来查找。xv6将每个CPU的hartid存储在相应CPU的thread pointer，即tp寄存器中。

struct cpu*
mycpu(void) {
  int id = cpuid();
  struct cpu *c = &cpus[id];
  return c;
}

int
cpuid()
{
  int id = r_tp();
  return id;
}

确保CPU的tp寄存器总是存放正确的值稍微有一些复杂，tp寄存器的值是在系统启动的早期阶段，即在机器模式下设置好的。usertrapret将tp保存在trapframe中，因为在用户空间下tp可能会被修改；然后从用户空间再次陷入到内核时，uservec将之前保存的tp值恢复。如果能直接读取tp寄存器的值会更方便，但该操作只能在机器模式下发生。所以，在系统启动阶段，在机器模式下为我们设置好tp的值之后，我们就要小心地维护该值，确保有一个正确的副本。

因为时钟中断随时可能发生，为了保证mycpu或者说cpuid返回的值是正确的，在调用mycpu并且使用该cpu值的时候，应该保持中断的关闭。

myproc则在mycpu的基础上，返回当前CPU正在运行进程的进程结构，可以看到，在获取进程结构时保持中断关闭。当取出进程结构之后就可以关闭中断，即使该进程运行在一个新的CPU上，c->proc也能正确地指向那一个进程。

struct proc*
myproc(void) {
  push_off();
  struct cpu *c = mycpu();
  struct proc *p = c->proc;
  pop_off();
  return p;
}

Sleep and wakeup

sleep是当一个进程在等待某一个事件时陷入休眠状态，当这个事件发生时另外一个进程唤醒它。陷入休眠状态可以让这个进程不在等待的时候占用CPU资源sleep(chan)让这个进程睡眠在chan这个wait channel上，wakeup(chan)将所有睡眠在chan上的进程全部唤醒。

lost wake-up problem：当一个进程A即将睡眠时，另外一个进程B发现已经满足了唤醒它的条件进行了唤醒，但是这时还没有进程睡眠在chan上，当进程A开始进入睡眠后，进程B可能不会再对进程A进行唤醒，进程A永远进入睡眠状态对lost wake-up problem的解决方法：用condition lock对sleep和wakeup前后进行保护，比如

void
V(struct semaphore *s)
{
    acquire(&s->lock);
    s->count += 1;
	wakeup(s);
	release(&s->lock);
}

void
P(struct semaphore *s)
{
	acquire(&s->lock);
	while(s->count == 0)
		sleep(s);
	s->count -= 1;
	release(&s->lock);
}

但是由于sleep时这个进程还拿着s->lock，V永远也无法将P唤醒，因此会导致死锁。所以需要修改sleep，让sleep获取&s->lock这个参数，在sleep中将p->state设置为asleep之后将这个锁release掉，在从sleep中唤醒时，重新获取s->lock

sleep(s, &s->lock);

Code: sleep and wakeup

用sleepandwakeup这两个接口可以建立一些进程之间的协同工作模型。主要的思想是，一个进程可以调用sleep，从而等待一个事件的发生；当事件确实发生之后，另一个进程调用wakeup唤醒该等待进程。这种协同工作的方式，我们称为条件同步Conditional Synchronization。

void
sleep(void *chan, struct spinlock *lk)
{
  struct proc *p = myproc();
  //确保进程可以更改其状态并调用调度程序 ( sched) 而不会丢失任何唤醒信号。
  if(lk != &p->lock){  //DOC: sleeplock0
    acquire(&p->lock);  //DOC: sleeplock1
    release(lk);
  }

  // Go to sleep.
    //表示进程将进入休眠状态。将进程的状态更改为SLEEPING。
  p->chan = chan;
  p->state = SLEEPING;

  sched(); //让出CPU

  // Tidy up.
    //表示它不再处于休眠状态。
  p->chan = 0;

  // Reacquire original lock.
  if(lk != &p->lock){
    release(&p->lock);
    acquire(lk);
  }
}
//确保它们被唤醒后在继续执行之前重新获取锁。

void
wakeup(void *chan)
{
  struct proc *p;

  for(p = proc; p < &proc[NPROC]; p++) {
    acquire(&p->lock);
    if(p->state == SLEEPING && p->chan == chan) {
      p->state = RUNNABLE;
    }
    release(&p->lock);
  }
}
//目的是唤醒当前在特定通道上休眠的所有进程，允许它们继续执行。它确保进程的锁被正确获取和释放以保持同步并防止竞争条件。

Code: Pipes

字节流从管道的一端写入，被拷贝到内核缓冲区，然后再从管道的另一端读出。每个管道都由一个struct pipe，包含了一把自旋锁，一个缓冲区data，对管道读写的字节数计数值，以及表示读写端打开的标志位。缓冲区是循环的，但是不循环计数。

#define PIPESIZE 512

struct pipe {
  struct spinlock lock;
  char data[PIPESIZE];
  uint nread;     // number of bytes read
  uint nwrite;    // number of bytes written 
   //nwrite == nread+PIPESIZE时缓冲区满
   //nwrite == nread时为空
  int readopen;   // read fd is still open
  int writeopen;  // write fd is still open
};

int
pipewrite(struct pipe *pi, uint64 addr, int n)
{
  int i;
  char ch;
  struct proc *pr = myproc();

  acquire(&pi->lock); //先获取管道的锁
	//往管道中写入数据，写够n个字节后，就唤醒nread频道上的piperead进程，然后跳出for循环，释放锁，并且顺利返回；
  for(i = 0; i < n; i++){
     
    while(pi->nwrite == pi->nread + PIPESIZE){  //DOC: pipewrite-full
        //如果在写入字节的过程中，缓冲区满了，那么就会陷入while循环，唤醒nread频道上的piperead进程，并且将自己挂起在nwrite频道上睡眠。
      if(pi->readopen == 0 || pr->killed){
        release(&pi->lock);
        return -1;
      }
      wakeup(&pi->nread);
      sleep(&pi->nwrite, &pi->lock);
    }

    if(copyin(pr->pagetable, &ch, addr + i, 1) == -1)
      break;
    pi->data[pi->nwrite++ % PIPESIZE] = ch;
  }

  wakeup(&pi->nread);
  release(&pi->lock);
  return i;
}

int
piperead(struct pipe *pi, uint64 addr, int n)
{
  int i;
  struct proc *pr = myproc();
  char ch;

  acquire(&pi->lock);
	//检查缓冲区是否为空
    //如果是，直接将自己挂起在nread频道上睡眠；
    //否则，我们可以从缓冲区读出所有的字节（<=n），最后唤醒nwrite频道上的pipewrite进程，释放锁，然后返回。
  while(pi->nread == pi->nwrite && pi->writeopen){  //DOC: pipe-empty
    if(pr->killed){
      release(&pi->lock);
      return -1;
    }
    sleep(&pi->nread, &pi->lock); //DOC: piperead-sleep
  }

  for(i = 0; i < n; i++){  //DOC: piperead-copy
    if(pi->nread == pi->nwrite)
      break;
    ch = pi->data[pi->nread++ % PIPESIZE];
    if(copyout(pr->pagetable, addr + i, &ch, 1) == -1)
      break;
  }

  wakeup(&pi->nwrite);  //DOC: piperead-wakeup
  release(&pi->lock);
  return i;
}

Code: Wait, Exit and Kill

在子进程调用exit终止时，父进程可能已经在wait调用上被挂起，或者正在处理其它的工作，如果是后者，那么下一次wait调用应该要能发现子进程的终止，即使子进程已经调用exit很久了。xv6为了让父进程的wait发现子进程已经终止，在子进程exit的时候，将其运行状态设置为ZOMBIE，然后wait就会注意到这个终止的子进程，并将该子进程标记为UNUSED，复制子进程的退出状态，并且返回子进程PID给父进程。如果父进程比子进程先exit，那么它的子进程都会托管给init进程（第一个用户进程，第二个是shell），即init进程现在是它们的父进程，init进程（user/init.c）就是在循环中不断地调用wait，如下所示，以释放这些被托管给它的终止子进程的资源。**因此，每个子进程终止并退出后，都由它的父进程清理释放它们。**在实现这两个接口的时候，要注意wait和exit，又或是exit和exit之间可能会出现竞争条件或死锁。

// init进程的主体部分，不断地调用wait

for(;;){
      // this call to wait() returns if the shell exits,
      // or if a parentless process exits.
      wpid = wait((int *) 0);
      if(wpid == pid){
        // the shell exited; restart it.
        break;
      } else if(wpid < 0){
        printf("init: wait returned an error\n");
        exit(1);
      } else {
        // it was a parentless process; do nothing.
      }
    }

wait使用调用进程的p->lock作为条件锁，以防止唤醒丢失。wait在开始时先获取调用进程的p->lock，然后在一个循环中扫描所有进程，如果发现是它的子进程，就获取子进程的锁np->lock，并检查子进程状态，如果状态是ZOMBIE，那么就将子进程的退出状态复制到wait传入的地址addr，并调用freeproc清理子进程的资源和进程结构，最后释放np->lock和p->lock，并且返回退出子进程的pid。如果wait发现自己没有子进程，就会直接返回；如果它的子进程都没有终止，那么wait接下来就会调用sleep挂起自己，释放调用进程的锁p->lock，等待它的其中一个子进程调用exit终止。注意到wait在一段时间内同时持有两把锁，而xv6规定的顺序是，先对父进程上锁，再对子进程上锁，以防止死锁发生。

// Wait for a child process to exit and return its pid.
// Return -1 if this process has no children.
int
wait(uint64 addr)
{
  struct proc *np;
  int havekids, pid;
  struct proc *p = myproc();

  // hold p->lock for the whole time to avoid lost
  // wakeups from a child's exit().
  acquire(&p->lock);

  for(;;){
    // Scan through table looking for exited children.
    havekids = 0;
    for(np = proc; np < &proc[NPROC]; np++){
      // this code uses np->parent without holding np->lock.
      // acquiring the lock first would cause a deadlock,
      // since np might be an ancestor, and we already hold p->lock.
      if(np->parent == p){
        // np->parent can't change between the check and the acquire()
        // because only the parent changes it, and we're the parent.
        acquire(&np->lock);
        havekids = 1;
        if(np->state == ZOMBIE){
          // Found one.
          pid = np->pid;
          if(addr != 0 && copyout(p->pagetable, addr, (char *)&np->xstate,
                                  sizeof(np->xstate)) < 0) {
            release(&np->lock);
            release(&p->lock);
            return -1;
          }
          freeproc(np);
          release(&np->lock);
          release(&p->lock);
          return pid;
        }
        release(&np->lock);
      }
    }

    // No point waiting if we don't have any children.
    if(!havekids || p->killed){
      release(&p->lock);
      return -1;
    }
    
    // Wait for a child to exit.
    sleep(p, &p->lock);  //DOC: wait-sleep
  }
}

// free a proc structure and the data hanging from it,
// including user pages.
// p->lock must be held.
static void
freeproc(struct proc *p)
{
  if(p->trapframe)
    kfree((void*)p->trapframe);
  p->trapframe = 0;
  if(p->pagetable)
    proc_freepagetable(p->pagetable, p->sz);
  p->pagetable = 0;
  p->sz = 0;
  p->pid = 0;
  p->parent = 0;
  p->name[0] = 0;
  p->chan = 0;
  p->killed = 0;
  p->xstate = 0;
  p->state = UNUSED;
}

exit主要的工作是，记录调用进程的退出状态，释放一定的资源，把当前进程的所有子进程托管给init进程，然后唤醒当前进程的父进程，将当前进程状态设为ZOMBIE，最后让出CPU。值得注意的是一些上锁的部分，调用exit的进程在设置状态并且唤醒父进程时，必须持有父进程的锁，这是为了防止父进程出现唤醒丢失。调用exit的进程自己也要持有自己的锁，因为进程有一段时间状态是ZOMBIE，但我们实际上还在运行它，因此不应该让父进程发现并释放这个子进程。这里遵守同样的上锁规则，先父进程后子进程，防止死锁发生。

// Exit the current process.  Does not return.
// An exited process remains in the zombie state
// until its parent calls wait().
void
exit(int status)
{
  struct proc *p = myproc();

  if(p == initproc)
    panic("init exiting");

  // Close all open files.
  for(int fd = 0; fd < NOFILE; fd++){
    if(p->ofile[fd]){
      struct file *f = p->ofile[fd];
      fileclose(f);
      p->ofile[fd] = 0;
    }
  }

  begin_op();
  iput(p->cwd);
  end_op();
  p->cwd = 0;

  // we might re-parent a child to init. we can't be precise about
  // waking up init, since we can't acquire its lock once we've
  // acquired any other proc lock. so wake up init whether that's
  // necessary or not. init may miss this wakeup, but that seems
  // harmless.
  acquire(&initproc->lock);
  wakeup1(initproc);
  release(&initproc->lock);

  // grab a copy of p->parent, to ensure that we unlock the same
  // parent we locked. in case our parent gives us away to init while
  // we're waiting for the parent lock. we may then race with an
  // exiting parent, but the result will be a harmless spurious wakeup
  // to a dead or wrong process; proc structs are never re-allocated
  // as anything else.
  // 当前进程和他的父进程有可能一起exit，因此父进程可能把当前进程托管给init
  // 于是p->parent被改变，为了保证之后上锁和解锁的父进程是同一个（原来的）
  // 这里先把p->parent提取出来，否则可能会发生死锁
  acquire(&p->lock);
  struct proc *original_parent = p->parent;
  release(&p->lock);
  
  // we need the parent's lock in order to wake it up from wait().
  // the parent-then-child rule says we have to lock it first.
  acquire(&original_parent->lock);

  acquire(&p->lock);

  // Give any children to init.
  reparent(p);

  // Parent might be sleeping in wait().
  wakeup1(original_parent);

  p->xstate = status;
  p->state = ZOMBIE;

  release(&original_parent->lock);

  // Jump into the scheduler, never to return.
  sched();
  panic("zombie exit");
}

exit是让调用进程终止自己，而kill则能够让调用进程终止别的进程。如果让kill直接终止指定的进程，可能会让代码变得复杂，因为那个进程可能正在被其它CPU执行，可能正处于临界区中，更新某些重要数据结构。显然我们需要一种更合理的kill方式。所以kill做的事情很有限，它将目标进程的p->killed设为1，同时如果目标进程正在睡眠，将其状态设为RUNNABLE，从而将其唤醒。

// Kill the process with the given pid.
// The victim won't exit until it tries to return
// to user space (see usertrap() in trap.c).
int
kill(int pid)
{
  struct proc *p;

  for(p = proc; p < &proc[NPROC]; p++){
    acquire(&p->lock);
    if(p->pid == pid){
      p->killed = 1;
      if(p->state == SLEEPING){
        // Wake process from sleep().
        p->state = RUNNABLE;
      }
      release(&p->lock);
      return 0;
    }
    release(&p->lock);
  }
  return -1;
}

最终，要被终止的进程总有机会进入或离开内核空间，如下所示，然后在usertrap中的两个检查点，检查p->killed是否为真：系统调用发生之前；或者trap处理完成，进入usertrapret之前。如果进程p->killed为真，就调用exit终止该进程。

void
usertrap(void)
{
  // ... 
  
  if(r_scause() == 8){
    // system call

    if(p->killed)
      exit(-1);

    // ...

    syscall();
  }
  // On the way out,usertrap checks if the process has been killed 
  // or should yield the CPU (if this trap is a timer interrupt)

  if(p->killed)
    exit(-1);

  // ...

  usertrapret();
}

如果被kill的进程正处于睡眠中，kill只是更改进程状态为RUNNABLE，而不是通过wakeup来唤醒该进程。睡眠的进程被唤醒，并从sleep返回，因为在一个循环中，所以检查条件之后并不会跳出该循环，而我们现在可以在循环里面额外加一个检测，检测p->killed是否被设置，如果是，就放弃当前的函数工作，直接返回-1，最后回到usertrap时就会调用exit来终止。例如在piperead和pipewrite里面，你就可以看到循环中有对p->killed的检查。

当然，也有sleep所在的循环不检查p->killed的情况，这是因为该临界区是一些系统的关键部分，要分多步完成，因此可能要睡眠——唤醒好几次，因为整个过程应该保持原子性，所以我们不应该在关键的临界区里终止这些进程。磁盘驱动程序就是一个典型例子，它并不会在sleep的循环中检查p->killed，出于磁盘I/O效率考虑，一系列的写操作会被缓冲为一次磁盘操作，而整个过程要保持原子性，这样文件系统才会保持在一个正确的状态。因此即使一个正在等待磁盘I/O的进程，p->killed被设置，也要等到系统调用完成，usertrap才能处理它。

MIT-6.S081 XV6 Chapter7 Scheduling