Lab 1 MapReduce
解析
Lab1需要实现worker.go,coordinator.go,rpc.go。
首先,在Lab1的演示中,运行了mrsequential.go,阅读代码发现,这其实是mapreduce的顺序执行版本,大致逻辑如下:
go run mrsequential.go wc.so pg*.txt, 读取wc.so和pg-*.txt- 通过
loadPlugin()函数加载map函数和reduce函数 - 顺序读取所有的
pg-*文件- 将文件内容 映射为 KV对
- 将其存储到intermediate中
- 对最终的中间KV对产物进行排序
- 顺序遍历所有的KV对
- 对相同的K进行合并,reduce操作
- 将其输出到指定位置
我们需要做的事,将顺序执行,改为mapreduce执行,相关的类包括mrcoordinator.go,mrworker.go
他们的作用分别是:
mrcoordinator.go,负责新建一个coordinator,并传递需要读取的文件名,以及确定nReduce的值,然后直到所有的task完成之后才会退出mrworker.go负责读取map和reduce函数,并传递给worker rpc.go
// 获取套接字名称
func coordinatorSock() string {
s := "/var/tmp/824-mr-"
// 新建socket,前缀为临时文件的路径
s += strconv.Itoa(os.Getuid())
// 拼接 uid
return s
}
worker.go
// 一个发送RPC调用的例子
func CallExample() {
// 声明参数,并初始化
args := ExampleArgs{}
args.X = 99
// 声明响应结构
reply := ExampleReply{}
// 调用worker.call方法,等待响应
call("Coordinator.Example", &args, &reply)
// 打印响应值
fmt.Printf("reply.Y %v\n", reply.Y)
}
// worker 像 coordinator 发送一个 RPC 请求,并等待响应
func call(rpcname string, args interface{}, reply interface{}) bool {
sockname := coordinatorSock()
// 生成 套接字名称
c, err := rpc.DialHTTP("unix", sockname)
if err != nil {
log.Fatal("dialing:", err)
}
defer c.Close()
err = c.Call(rpcname, args, reply)
// 调用client.go中的Call,并等待reply, args为传递的消息
if err == nil {
return true
}
fmt.Println(err)
return false
}
coordinator.go
// 处理RPC请求
func (c *Coordinator) Example(args *ExampleArgs, reply *ExampleReply) error {
reply.Y = args.X + 1
return nil
}
// 开启一个线程,监听来自worker的请求
func (c *Coordinator) server() {
// 注册一个coordinator
rpc.Register(c)
rpc.HandleHTTP()
// 生成socket名称
sockname := coordinatorSock()
os.Remove(sockname)
// 监听指定socket
l, e := net.Listen("unix", sockname)
if e != nil {
log.Fatal("listen error:", e)
}
go http.Serve(l, nil)
}
// 用于判断是否所有的任务都已完成
func (c *Coordinator) Done() bool {
//ret := false
ret := true
// Your code here.
return ret
}
//创建coordinator
func MakeCoordinator(files []string, nReduce int) *Coordinator {
// files 需要读取的文件名, nReduce 对应 reduce任务的数量
c := Coordinator{}
// Your code here.
c.server()
return &c
}
实现
coordinator负责任务的分配以及任务的管理 worker只专注于任务的实现
model
type Coordinator struct {
// Your definitions here.
nReduce int
nMap int
files []string
mapCompleted int
mapTasksStatus []int // 0 对应 idle ; 1 对应 in-progress ; 2 对应 completed
reduceCompleted int
reduceTasksStatus []int
mu sync.Mutex
}
type FetchArgs struct {
}
type FetchReply struct {
TaskType int // 1 == map task ; 2 == reduce task ; 3 == all finished ; 4 == all task finished
FileName string
TaskNo int
NMap int
NReduce int
}
type CommitArgs struct {
TaskNo int
}
type CommitReply struct {
}
思路
worker通过call向coordinator发送请求,获取task,并执行
(具体流程可以参考mrsequential.go),略有不同的地方在于,map操作时需要将中间生成的KV写入临时文件,并重命名为指定格式,然后reduce阶段时阅读指定文件,并对其进行reduce操作后,写入最终输出文件,同时删除之前的中间产物,最终得到nReduce个mr-out.txt。
ps 遇到可能产生Race的调用时需要加锁和及时释放 需要处理crash的worker
$ bash test-mr.sh
*** Starting wc test.
2023/07/06 15:55:31 rpc.Register: method "Done" has 1 input parameters; needs exactly three
--- wc test: PASS
*** Starting indexer test.
2023/07/06 15:55:40 rpc.Register: method "Done" has 1 input parameters; needs exactly three
--- indexer test: PASS
*** Starting map parallelism test.
2023/07/06 15:55:44 rpc.Register: method "Done" has 1 input parameters; needs exactly three
--- map parallelism test: PASS
*** Starting reduce parallelism test.
2023/07/06 15:55:51 rpc.Register: method "Done" has 1 input parameters; needs exactly three
--- reduce parallelism test: PASS
*** Starting job count test.
2023/07/06 15:55:59 rpc.Register: method "Done" has 1 input parameters; needs exactly three
--- job count test: PASS
*** Starting early exit test.
2023/07/06 15:56:15 rpc.Register: method "Done" has 1 input parameters; needs exactly three
--- early exit test: PASS
*** Starting crash test.
2023/07/06 15:56:22 rpc.Register: method "Done" has 1 input parameters; needs exactly three
--- crash test: PASS
*** PASSED ALL TESTS