6.824 记录 Lab 1

247 阅读3分钟

Lab 1 MapReduce

解析

Lab1需要实现worker.gocoordinator.gorpc.go

首先,在Lab1的演示中,运行了mrsequential.go,阅读代码发现,这其实是mapreduce的顺序执行版本,大致逻辑如下:

  1. go run mrsequential.go wc.so pg*.txt, 读取wc.sopg-*.txt
  2. 通过loadPlugin()函数加载map函数和reduce函数
  3. 顺序读取所有的pg-*文件
    1. 将文件内容 映射为 KV对
    2. 将其存储到intermediate中
  4. 对最终的中间KV对产物进行排序
  5. 顺序遍历所有的KV对
    1. 对相同的K进行合并,reduce操作
    2. 将其输出到指定位置

我们需要做的事,将顺序执行,改为mapreduce执行,相关的类包括mrcoordinator.gomrworker.go

他们的作用分别是:

  1. mrcoordinator.go,负责新建一个coordinator,并传递需要读取的文件名,以及确定nReduce的值,然后直到所有的task完成之后才会退出
  2. mrworker.go负责读取mapreduce函数,并传递给worker rpc.go
// 获取套接字名称
func coordinatorSock() string {  
	s := "/var/tmp/824-mr-"  
	// 新建socket,前缀为临时文件的路径
	s += strconv.Itoa(os.Getuid())
	// 拼接 uid
	return s  
}

worker.go

// 一个发送RPC调用的例子
func CallExample() {  
	// 声明参数,并初始化
	args := ExampleArgs{}  
	args.X = 99  
	// 声明响应结构  
	reply := ExampleReply{}  
	// 调用worker.call方法,等待响应  
	call("Coordinator.Example", &args, &reply) 
	// 打印响应值
	fmt.Printf("reply.Y %v\n", reply.Y)  
}

// worker 像 coordinator 发送一个 RPC 请求,并等待响应
func call(rpcname string, args interface{}, reply interface{}) bool {   
	sockname := coordinatorSock()
	// 生成 套接字名称  
	c, err := rpc.DialHTTP("unix", sockname)  
	if err != nil {  
		log.Fatal("dialing:", err)  
	}  
	defer c.Close()  
	  
	err = c.Call(rpcname, args, reply)  
	// 调用client.go中的Call,并等待reply, args为传递的消息
	if err == nil {  
		return true  
	}  
	fmt.Println(err)  
	return false  
}

coordinator.go

// 处理RPC请求
func (c *Coordinator) Example(args *ExampleArgs, reply *ExampleReply) error {  
	reply.Y = args.X + 1  
	return nil  
}

// 开启一个线程,监听来自worker的请求  
func (c *Coordinator) server() {  
	// 注册一个coordinator 
	rpc.Register(c)  
	rpc.HandleHTTP()  
	// 生成socket名称
	sockname := coordinatorSock()  
	os.Remove(sockname)  
	// 监听指定socket
	l, e := net.Listen("unix", sockname)  
	if e != nil {  
		log.Fatal("listen error:", e)  
	}  
	go http.Serve(l, nil)  
}

// 用于判断是否所有的任务都已完成  
func (c *Coordinator) Done() bool {  
	//ret := false  
	ret := true  
	// Your code here.  
	return ret  
}
//创建coordinator
func MakeCoordinator(files []string, nReduce int) *Coordinator {  
	// files 需要读取的文件名, nReduce 对应 reduce任务的数量  
	c := Coordinator{}  
	// Your code here.  
	c.server()  
	return &c  
}

实现

coordinator负责任务的分配以及任务的管理 worker只专注于任务的实现

model

type Coordinator struct {  
	// Your definitions here.  
	nReduce int  
	nMap int  
	files []string  
	  
	mapCompleted int  
	mapTasksStatus []int // 0 对应 idle ; 1 对应 in-progress ; 2 对应 completed  
	reduceCompleted int  
	reduceTasksStatus []int  
	  
	mu sync.Mutex  
}
type FetchArgs struct {  
}  
  
type FetchReply struct {  
	TaskType int // 1 == map task ; 2 == reduce task ; 3 == all finished ; 4 == all task finished  
	FileName string  
	TaskNo int  
	  
	NMap int  
	NReduce int  
}  
  
type CommitArgs struct {  
	TaskNo int  
}  
  
type CommitReply struct {  
}

思路

worker通过call向coordinator发送请求,获取task,并执行 (具体流程可以参考mrsequential.go),略有不同的地方在于,map操作时需要将中间生成的KV写入临时文件,并重命名为指定格式,然后reduce阶段时阅读指定文件,并对其进行reduce操作后,写入最终输出文件,同时删除之前的中间产物,最终得到nReduce个mr-out.txt。

ps 遇到可能产生Race的调用时需要加锁和及时释放 需要处理crash的worker

$ bash test-mr.sh 
*** Starting wc test.
2023/07/06 15:55:31 rpc.Register: method "Done" has 1 input parameters; needs exactly three
--- wc test: PASS
*** Starting indexer test.
2023/07/06 15:55:40 rpc.Register: method "Done" has 1 input parameters; needs exactly three
--- indexer test: PASS
*** Starting map parallelism test.
2023/07/06 15:55:44 rpc.Register: method "Done" has 1 input parameters; needs exactly three
--- map parallelism test: PASS
*** Starting reduce parallelism test.
2023/07/06 15:55:51 rpc.Register: method "Done" has 1 input parameters; needs exactly three
--- reduce parallelism test: PASS
*** Starting job count test.
2023/07/06 15:55:59 rpc.Register: method "Done" has 1 input parameters; needs exactly three
--- job count test: PASS
*** Starting early exit test.
2023/07/06 15:56:15 rpc.Register: method "Done" has 1 input parameters; needs exactly three
--- early exit test: PASS
*** Starting crash test.
2023/07/06 15:56:22 rpc.Register: method "Done" has 1 input parameters; needs exactly three
--- crash test: PASS
*** PASSED ALL TESTS