Pac Man: AI 搜索算法的理解与应用 (1)UC Berkeley开设的课程CS188: Introductio

作者：光火

邮箱：victor_b_zhang@163.com

UC Berkeley开设的课程CS188: Introduction to AI结构清晰，内容详实，是AI入门的不二之选。作为其搜索算法章节的作业，Pac Man更是独具匠心，设计精巧，值得反复研究与思考。考虑到目前掘金稀土上相关的文章寥若晨星，因此笔者计划发布一系列文章详细地解析Pac Man的各项任务，方便各位读者学习与理解人工智能及搜索算法。

拿到源码后，首先是要理解项目的整体结构：

assets文件夹用于存放静态资源，初始状态下，里面只有一张demo.png图片

layouts文件夹用于存放地图资源，并允许我们自制地图进行测试。对于一个具体的.lay文件，它主要包含如下几种元素：
- %: 代表墙体
- .: 可以加分的豆子
- o: 可以使怪物恐慌的胶囊
- P: 玩家（吃豆人）的初始位置
- G: 怪物的初始位置（支持放置多个怪物）
通过摆放以上几种元素的位置，我们就可以自己创造一张地图。对于小游戏而言，这种利用文本文件存储地图的方式颇为常见，.lay和普通的.txt其实本质上没什么区别。
需要我们阅读的代码文件：
- $utils.py:$ 里面实现了 Stack、Queue、PriorityQueue、PriorityWithFunction、Counter等数据结构。其中PriorityQueue是基于小顶堆实现的，其内部元素是一个三元组，但我们只需关注元素item和它的优先级priority即可。PriorityQueueWithFunction则是继承了 PriorityQueue，允许用户传入自定义的评估函数。
- $pacman.py:$ 定义了GameState类并提供了一系列接口，通过它们，你不仅可以获知吃豆人和怪物的位置及数量，还可以得到指定agent在执行特定的动作后所产生的子状态（这在博弈问题中非常关键）。当然，食物、胶囊、得分同样支持访问。因此，总的来说，通过GameState类的接口，你可以得到游戏的全状态。
- $game.py:$ 定义了Pac Man游戏的一些基础类，需要阅读的部分在源码中已经有所标记。其中，要特别注意的是Grid类，在后面增加启发式，重写评价函数时会用到。
需要我们编写的代码文件：
- $search.py:$ 在此，我们应当实现DFS、BFS、UCS、A*算法，并将其应用于寻径问题。
- $searchAgents.py:$ 在此，我们应自定义启发式函数，并针对两个具体的迷宫，通过修改代价函数，让吃豆人尽可能地获取高分。
- $multiAgents.py:$ 在此，我们应当实现Minimax算法、Alpha-Beta剪枝、并修改评价函数，最终完成一款智能的吃豆人小游戏。

倘若你对上述的一些名词感到陌生，不要担心，我们在后文中会由浅入深，更为详细地讲解算法原理和代码结构。

暂不考虑怪物，分别实现DFS、BFS、UCS、A*四种搜索算法，让Pac Man吃到迷宫里的一个食物。

该任务需要我们在search.py中进行代码的编写。实际上，该文件已经声明了如上四个函数，它们应当返回一个动作序列，吃豆人会依据这个动作序列进行活动。

四个函数都需要接收一个problem参数，这个problem实则就是searchAgents.py中PositionSearchProblem类的一个对象。通过它，我们可以获知吃豆人的当前状态及是否到达了终点。

根据源码注释的提示，通过打印problem.getStartState()，我们发现所谓的state，指的就是吃豆人当前所处的位置(x,y)。考虑到吃豆人移动的灵活性，我们应当使用图搜索，引入探索集避免展开同一节点。

深度优先搜索

def depthFirstSearch(problem):
    explored = set()
    result = util.Stack()
    frontier = util.Stack()

    result.push([])
    frontier.push(problem.getStartState())

    while True:
        if frontier.isEmpty():
            return []

        node = frontier.pop()
        action = result.pop()

        if problem.isGoalState(node):
            return action

        explored.add(node)
        children = problem.expand(node)

        for child in children:
            if child[0] not in explored and child[0] not in frontier.list:
                frontier.push(child[0])
                result.push(action + [child[1]])

四种搜索算法都可以通过上述模式进行实现，只是采用的数据结构不同。对于DFS，我们习惯将其写成递归形式，这本质上是在利用程序栈。倘若我们利用迭代来实现DFS，则需要手动开一个Stack模拟程序栈的行为。
本题的难点在于如何有效地记录搜索路径，因为我们最终需要返回的是一个动作序列，该序列应当指导吃豆人自起点移动至终点。通过阅读problem.expand函数的源码，可知该函数的返回值为一个list，而list中的每个元素是一个(child, action, stepCost)三元组，其中action就代表自parent移动至child所需要采取的步骤，这就是我们需要记录的。因此，一种直截了当的做法是，就把这个三元组加入到frontier中，然后逐层维护action，让其代表从起点开始移动该位置所需要的步骤。这个方法是通用的，我们会在UCS中采用该做法。
不过，此处我们使用了一个额外的result栈，用于追踪frontier的进出。实际上，记录路径的核心点，就在于我们要将parent的一部分内容移到child中来，然后再加上从parent怎么到的child，路径就记录好了。这也是代码中result.push(action + [child[1]]的含义，action就是此前parent的内容，代表自起点如何到达parent，[child[1]]则表示parent到child的方法，将两者拼接起来，就是自起点到达当前child的动作序列。
将result的数据类型选为和Stack，就可以同步frontier中元素的进出栈过程，保证当最终状态被搜索到后，result pop出的action也是自起点到达终点的路径。

宽度优先搜索

def breadthFirstSearch(problem):
    explored = set()
    result = util.Queue()
    frontier = util.Queue()

    result.push([])
    frontier.push(problem.getStartState())

    while True:
        if frontier.isEmpty():
            return []

        node = frontier.pop()
        action = result.pop()

        if problem.isGoalState(node):
            return action

        explored.add(node)
        children = problem.expand(node)

        for child in children:
            if child[0] not in explored and child[0] not in frontier.list:
                frontier.push(child[0])
                result.push(action + [child[1]])

如上所述，BFS的实现方式和DFS如出一辙，只是将LIFO的Stack替换为了FIFO的Queue。由于我们普遍习惯利用迭代来实现BFS，所以上述代码看起来更为自然。
相较于DFS，逐层搜索BFS可以保证找到全局最优解。因此，实际运行时可以发现，利用BFS获得的分数要比DFS高一些。但是另一方面，BFS在平均意义下，耗时更长，内存占用也更高。
我们使用了一个Queue来同步追踪frontier的入队及出队情况。对于有类似需求的场景，以上代码可作为模板程序。

一致代价搜索

def uniformCostSearch(problem):
    explored = set()
    frontier = util.PriorityQueue()
    initial = (problem.getStartState(), [], 0)

    frontier.push(initial, 0)

    while True:
        if frontier.isEmpty():
            return []

        (node, result, value) = frontier.pop()

        if problem.isGoalState(node):
            return result

        explored.add(node)
        children = problem.expand(node)

        for child, action, cost in children:
            if child not in explored:
                temp = value + cost
                frontier.push((child, result + [action], temp), temp)

由 $Dijkstra$ 提出的一致代价搜索UCS可以理解为等值线意义下的BFS，因为它是依据根点到当前节点的cost进行扩展的。这个cost是真实，确定的，应与后文中我们利用启发式函数得到的评估值进行区分。
既然要依据cost进行节点的出队及子节点的扩展，那么传统的Queue已经无法满足我们的需求了，因此我们使用由小顶堆实现的PriorityQueue，每层都扩展frontier中代价最低的节点。当然，由于优先级队列的数据结构已经在源码中实现了，我们直接调用即可。这里附上PriorityQueue的源码，我个人认为实现得相当精彩。

class PriorityQueue:
    """
      Implements a priority queue data structure. Each inserted item
      has a priority associated with it and the client is usually interested
      in quick retrieval of the lowest-priority item in the queue. This
      data structure allows O(1) access to the lowest-priority item.
    """
    def  __init__(self):
        self.heap = []
        self.count = 0

    def push(self, item, priority):
        entry = (priority, self.count, item)
        heapq.heappush(self.heap, entry)
        self.count += 1

    def pop(self):
        (_, _, item) = heapq.heappop(self.heap)
        return item

    def isEmpty(self):
        return len(self.heap) == 0

    def update(self, item, priority):
        # If item already in priority queue with higher priority, update its priority and rebuild the heap.
        # If item already in priority queue with equal or lower priority, do nothing.
        # If item not in priority queue, do the same thing as self.push.
        for index, (p, c, i) in enumerate(self.heap):
            if i == item:
                if p <= priority:
                    break
                del self.heap[index]
                self.heap.append((priority, c, item))
                heapq.heapify(self.heap)
                break
        else:
            self.push(item, priority)

需要注意的是priority和cost是负相关的，cost越低，priority越高，因此在update函数中，倘若我们发现原有的p比新传来的参数priority要低，则证明原有路径更优，因此直接break，不去更新。
回到UCS的代码实现，这次我们push到frontier的元素是一个三元组，如此做的目的当然还是记录路径：

initial = (problem.getStartState(), [], 0)
frontier.push(initial, 0)

之所以不采用上文中DFS和BFS的记录方式，是因为除了action，路径代价cost同样需要累加。不过在采用这种记录方法后，就不必调用源码中的update函数了，因为即便state相同，path和cost也不同，所以直接push到优先级队列即可。

A*搜索

def aStarSearch(problem, heuristic=nullHeuristic):
    explored = set()
    frontier = util.PriorityQueue()
    initial = problem.getStartState()
    tot = heuristic(initial, problem)
    frontier.push((initial, [], tot), tot)

    while True:
        if frontier.isEmpty():
            return []

        (node, result, value) = frontier.pop()

        if problem.isGoalState(node):
            return result

        explored.add(node)
        children = problem.expand(node)

        for child, action, cost in children:
            if child not in explored:
                tmp = value + cost + heuristic(child, problem)
                frontier.push((child, result + [action], tmp), tmp)

A*搜索是UCS和Greedy Search的结合体 (所谓Greedy Search，就是完全依据启发式进行搜索。该方法无法保证最优先和完备性)。A*算法的代码结构和UCS基本相同，只是额外引入了启发式函数heuristic。笔者首次接触到启发式这个概念，是在大一学习八数码的时候，这是一个相对基本的问题。不过，启发式是无处不在的，就连运筹学的运输规划都会利用启发式来快速获得初始基可行解。优良的启发式函数能够在常数上大幅优化原算法，加快搜索速度。不过即便如此，A*算法仍旧是指数复杂度的，只是在状态空间庞大的问题中，它会比前文中的几种朴素搜索算法快捷得多。
倘若我们不调用heuristic，那么这里的A*算法就退化成了UCS。因此，我们需要自行设计启发式函数，对于A*算法而言，启发式函数需要满足两条性质：
- Admissibility：对于任一节点而言，启发式函数所得的估计值，应当 $\leq$ 该点到达终止状态的真实路径代价，即 $heuristic\quad cost \leq actual\quad cost$ ；
- Consistency：其英文解释见下，相当于三角不等式。
for every node $n$ and every successor $n'$ of $n$ generated by any action $a$ , the estimated cost of reaching the goal from $n$ is no greater than the step cost of getting to $n'$ plus the estimated cost of reaching the goal from $n'$
在不超过真实路径代价的前提下，启发式函数所计算的值越大越好。因此对于Pac Man而言，采用哈密顿距离要优于欧氏距离。

def yourHeuristic(position, problem, info={}):
    goal = problem.goal
    return abs(position[0] - goal[0]) + abs(position[1] - goal[1])

至此，Pac Man作业的任务一就完成了。目前，我们利用四种搜索算法，解决了一个简单的寻径问题。在任务二中，我们将会对代价函数和项目源码有更进一步的了解，而在任务三中，我们将会接触到博弈问题，利用Minimax算法、Alpha-Beta剪枝、启发式评价函数实现一款真正智能的吃豆人小游戏。