【译】An O(ND) Difference Algorithm and Its Variations

831 阅读17分钟

作者: EUGENE W. MYERS

论文地址: neil.fraser.name/writing/dif…

本文翻译了 myers diff 论文的 ABSTRACTEdit GraphsAn O((M+N)D) Greedy Algorithm 三个部分。文中在引用部分包含了论文的原内容,但是没有调整公式和格式,仅为读者提供一个参考。

ABSTRACT(摘要)

The problems of finding a longest common subsequence of two sequences A and B and a shortest edit script for transforming A into B have long been known to be dual problems. In this paper, they are shown to be equivalent to finding a shortest/longest path in an edit graph. Using this perspective, a simple O(ND) time and space algorithm is developed where N is the sum of the lengths of A and B and D is the size of the minimum edit script for A and B. The algorithm performs well when differences are small (sequences are similar) and is consequently fast in typical applications. The algorithm is shown to have O(N + D) expected-time performance under a basic stochastic model. A refinement of the algorithm requires only O(N) space, and the use of suffix trees leads to an O( NlgN + D ) time variation.

寻找两个序列 A 和 B 的最长公共子序列以及将序列 A 转换为序列 B 最短edit script的问题长期以来被认为是对偶问题。在本文中,它们等价于在编辑图中寻找最短/最长路径。使用这一观点,本文提出了一个时间复杂度是 O(ND)O(ND) 的算法,其中 N 是 A 和 B 的长度之和,D 是 A 和 B 的最小edit script的大小。该算法在序列差异较小时表现良好,因此在典型应用中速度较快。在基本随机模型下,该算法具有 O(N+D)O(N+D) 的时间复杂度。除此之外,本文还提出了对算法的改进,需要 O(N)O(N) 的空间和后缀树的使用,可以使算法达到 O(NlgN+D)O(NlgN+D) 的时间复杂度。

KEY WORDS longest common subsequence shortest edit script edit graph file comparison

关键词: 最长公共子序列 最短编辑脚本 编辑图 文件比较

Edit Graphs(编辑图)

Let A=a1a2...aNA = a_1 a_2 ... a_N and B=b1b2...bMB = b_1 b_2 ... b_M be sequences of length N and M respectively. The edit graph for A and B has a vertex at each point in the grid (x,y), x∈[0,N] and y∈[0,M]. The vertices of the edit graph are connected by horizontal, vertical, and diagonal directed edges to form a directed acyclic graph. Horizontal edges connect each vertex to its right neighbor, i.e. (x- 1,y)→(x,y) for x∈[1,N] and y∈[0,M]. Vertical edges connect each vertex to the neighbor below it, i.e. (x,y- 1)→(x,y) for x∈[0,N] and y∈[1,M]. If a = b then there is a diagonal edge connecting vertex (x- 1,y- 1) to vertex (x,y). The points (x,y) for which a = b are called match points. The total number of match points between A and B is the parameter R characterizing the Hunt & Szymanski algorithm. It is also the number of diagonal edges in the edit graph as diagonal edges are in one-to-one correspondence with match points. Figure 1 depicts the edit graph for the sequences A = abcabba and B = cbabac.

A=a1a2...aNA=a_1 a_2 ... a_NB=b1b2...bMB=b_1 b_2 ... b_M 分别是长度为 NNMM 的序列。A 和 B 构成的的编辑图如图 1 所示,编辑图的顶点对应着二维坐标系 (x,y) x[0,N] and y[0,M](x, y) \text{ } x∈[0, N]\text{ and }y∈[0, M] 上的点。编辑图的顶点通过水平、垂直和对角有向边连接,形成有向无环图。水平边将每个顶点连接到其右邻居,即 (x1,y)(x,y) for x[1,N] and y[0,M](x-1, y)→(x, y) \text{ for } x∈[1,N]\text{ and } y∈[0,M]。垂直边将每个顶点连接到它下面的相邻顶点,即 (x,y1)(x,y) for x[0,N] and y[1,M](x, y-1)→(x, y) \text{ for } x∈[0,N] \text{ and } y∈[1,M]。如果 ax=bya_x=b_y,则有一条对角边将顶点(x1,y1)(x-1,y-1)连接到顶点(x,y)(x,y), 其中(x,y)(x, y)称为匹配点。因为对角边与匹配点一一对应,所以 A 和 B 之间的匹配点总数等于编辑图中的对角边数。如图 1 所述,描述了序列 A=abcabbaA=abcabbaB=cbabacB=cbabac 构成的编辑图。

图1 编辑图

A trace of length L is a sequence of L match points, (x ,y )(x ,y ) (x ,y ), such that x < x and y < y for successive points (x ,y ) and (x ,y ), i∈[1,L- 1]. Every trace is in exact correspondence with the diagonal edges of a path in the edit graph from (0,0) to (N,M). The sequence of match points visited in traversing a path from start to finish is easily verified to be a trace. Note that L is the number of diagonal edges in the corresponding path. To construct a path from a trace, take the sequence of diagonal edges corresponding to the match points of the trace and connect successive diagonals with a series of horizontal and vertical edges. This can always be done as x < x and y < y for successive match points. Note that several paths differing only in their non-diagonal edges can correspond to a given trace. Figure 1 illustrates this relation between paths and traces.

长度为 LLtrace 是有 LL匹配点 (x1y1)(x2y2)...(xLyL)(x_1,y_1)(x_2,y_2)...(x_L,y_L) 的序列,其中对于连续的匹配点 (xi,yi) and (xi+1,yi+1),i[1L1](x_i, y_i) \text{ and } (x_{i + 1}, y_{i + 1}), i∈[1,L-1],满足 xi<xi+1 and yi<yi+1x_i < x_{i + 1} \text{ and } y_i < y_{i + 1}trace与编辑图中从 (0,0)(0,0)(N,M)(N, M)path 的对角边完全对应。从开始到结束遍历 path 时访问的匹配点序列很容易验证为是 tracetrace 中的 LL 是相应 path 中的对角边数。要从 trace 构造 path,需要获取 trace 包含的 匹配点 所对应的对角边,并将连续对角线与一系列水平或垂直的边连接起来。图 1 说明了 tracepath 之间的这种关系。

A subsequence of a string is any string obtained by deleting zero or more symbols from the given string. A common subsequence of two strings, A and B, is a subsequence of both. Each trace gives rise to a common subsequence of A and B and vice versa. Specifically, a a a = b b b is a common subsequence of A and B if and only if (x ,y )(x ,y ) (x ,y ) is a trace of A and B.

字符串的子序列是 从给定字符串中删除零个或多个符号 而获得的字符串。两个字符串 A 和 B 的公共子序列是这两个字符串的子序列。每条 trace 会产生一个 A 和 B 的公共子序列,反之亦然。具体地说,当且仅当 (x1,y1)(x2,y2)...(xL,yL)(x_1, y_1)(x_2, y_2)...(x_L, y_L) 是 A 和 B 的trace时,ax1,ax2,...axL=by1,by2...byLa_{x_1}, a_{x_2}, ... a_{x_L}=b_{y_1}, b_{y_2}...b_{y_L} 是 a 和 b 的公共子序列,。

An edit script for A and B is a set of insertion and deletion commands that transform A into B. The delete command ‘‘xD’’ deletes the symbol a from A. The insert command ‘‘x I b ,b , b ’’ inserts the sequence of symbols b b immediately after a . Script commands refer to symbol positions within A before any commands have been performed. One must think of the set of commands in a script as being executed simultaneously. The length of a script is the number of symbols inserted and deleted.

A 和 B 的 edit script 是一组将 A 转换为 B 的插入和删除命令。删除命令 xD'xD'从 A 中删除符号 axa_x。插入命令 xIb1,b2,...bt'x I b_1, b_2, ...b_t' 是在 axa_x 后插入b1,b2,...btb_1, b_2, ...b_tedit script 的长度是插入和删除的符号的长度。

Every trace corresponds uniquely to an edit script. Let (x ,y )(x ,y ) (x ,y ) be a trace. Let y = 0 and y = M + 1. The associated script consists of the commands: ‘‘xD’’ for x∈ / {x ,x , ,x }, and ‘‘x I b , ,b ’’ for k such that y + 1 < y . The script deletes N - L symbols and inserts M - L symbols. So for every trace of length L there is a corresponding script of length D = N+M- 2L. To map an edit script to a trace, simply perform all delete commands on A, observe that the result is a common subsequence of A and B, and then map the subsequence to its unique trace. Note that inverting the action of the insert commands gives a set of delete commands that map B to the same common subsequence.

每个 trace 都唯一地对应于一个edit script。设 (x1y1)(x2y2)...(xLyL)(x_1,y_1)(x_2,y_2)...(x_L,y_L) 为一个 trace。令 y0=0 and yL+1=M+1y_0=0 \text{ and } y_{L + 1}=M+1。则关联的 edit script 由命令 xD for x∉x1,x2...xL xkIbyk+1,...byk+11 for k such that yk+1<yk+1'xD' \text{ for } x \not\in {x_1, x_2 ... x_L} 'x_k I b_{y_k + 1}, ... b_{y_{k+1} - 1}' \text{ for } k \text{ such that } y_k + 1 < y_{k + 1} 组成。该脚本删除 NLN-L 个符号,并插入 MLM-L 个符号。因此,对于长度为 LL 的每个 trace,都有一个长度为 D=N+M2LD=N+M-2Ledit script。要将 edit script 映射到 trace,只需对 A 执行所有 delete 命令,留下的是 A 和 B 的公共子序列,然后将子序列映射到其唯一trace

Common subsequences, edit scripts, traces, and paths from (0,0) to (N,M) in the edit graph are all isomorphic formalisms. The edges of every path have the following direct interpretations in terms of the corresponding common subsequence and edit script. Each diagonal edge ending at (x,y) gives a symbol, a (= b ), in the common subsequence; each horizontal edge to point (x,y) corresponds to the delete command ‘‘x D’’; and a sequence of vertical edges from (x,y) to (x,z) corresponds to the insert command, ‘‘x I b , ,b ’’. Thus the number of vertical and horizontal edges in the path is the length of its corresponding script, the number of diagonal edges is the length of its corresponding subsequence, and the total number of edges is N+M- L. Figure 1 illustrates these observations.

编辑图中从 (0,0)(0,0)(N,M)(N, M)公共子序列edit scripttracepath都是同构形式。根据相应的公共子序列edit script,每个path的边都有以下直接解释。每个在以点 (x,y)(x, y) 结束的对角线边中,公共子序列中有一个符号 ax(=by)a_x(=b_y);每个在点 (x,y)(x, y) 结束的水平边都对应于删除命令 xD'xD';从(x,y)(x, y)(x,z)(x, z)的垂直边对应于插入命令xIby+1,...bz'xIb_{y+1}, ...b_z'。因此,path 中垂直和水平边的数量是其相应 edit script 的长度,对角线边的数量是其相应子序列的长度,边的总数是 N+MLN+M-L。图 1 说明了这些观察结果。

The problem of finding a longest common subsequence(LCS)is equivalent to finding a path from (0,0 to (N,M) with the maximum number of diagonal edges.The problem of finding a shortest edit script(SES)is equivalent to finding a path from (0,0) to (N,M) with the minimum number of non-diagonal edges.These are dual problems as a path with the maximum number of diagonal edges has the minimal number of non-diagonal edges(D+2L= M+N).Consider adding a weight or cost to every edge.Give diagonal edges weight 0 and non-diagonal edges weight 1.The LCS/SES problem is equivalent to finding a minimum-cost path from (0,0) to(N,M)in the weighted edit graph and is thus a special instance of the single-source shortest path problem.

从上述的论述可知,查找最长公共子序列(LCS)的问题相当于查找从 (0,0)(0,0)(NM)(N,M) 的具有最大对角边数的路径。查找最短 edit script (SES)的问题相当于查找从 (0,0)(0,0)(NM)(N,M) 的路径具有最小数量的非对角边。这是对偶问题,因为具有最大数量对角边的路径具有最小数量的非对角边 (D+2L=M+N) 。考虑每边增加一个权重或成本。给出对角线边权重 0 和非对角线边权重 1。LCS/SES 问题等价于在加权编辑图中找到从 (00)(0,0)(nm)(n,m) 的最小代价路径,因此上述问题是单源最短路径问题的特殊实例。

An O((M+N)D)O((M+N)D) Greedy Algorithm

The problem of finding a shortest edit script reduces to finding a path from (0,0) to (N,M) with the fewest number of horizontal and vertical edges. Let a D-path be a path starting at (0,0) that has exactly D non-diagonal edges. A 0-path must consist solely of diagonal edges. By a simple induction, it follows that a D-path must consist of a (D - 1)-path followed by a non-diagonal edge and then a possibly empty sequence of diagonal edges called a snake.

查找最短 edit script 的问题可以简化为查找从 (0,0)(0,0)(NM)(N,M) 且水平和垂直边的数量最少的路径。设 D-pathD\text{-}path 是一条从(0,0)(0,0)开始的路径,该路径正好有 DD 条非对角边。那么 0-path0\text{-}path 必须仅由对角边组成。通过一个简单的归纳可得知,D-pathD\text{-}path 必须由 (D1)-path(D-1)\text{-}path 和一条非对角边组成,然后是对角边序列,这个对角边序列称为 snake

Number the diagonals in the grid of edit graph vertices so that diagonal k consists of the points (x,y) for which x - y = k. With this definition the diagonals are numbered from - M to N. Note that a vertical (horizontal) edge with start point on diagonal k has end point on diagonal k - 1 (k + 1) and a snake remains on the diagonal in which it starts.

对编辑图中的对角线进行编号,则 diagonal k 由点 (x,y)xy=k(x, y),x-y=k 组成。根据此定义,对角线可以从 M-MNN 进行编号。通过观察可知,起点在 diagonal k 上的垂直和水平的边的终点在 diagonal k-1(k+1) 上,并且 snake 仍保留在其起点的对角线上。

Lemma 1: A D-path must end on diagonal k ∈ { - D, - D + 2, D - 2, D }.

引理 1D-pathD\text{-}path 必须在 diagonal k, kD,D+2,...,D2,Dk ∈ {-D,-D+2,..., D-2,D} 上结束

Proof: A 0-path consists solely of diagonal edges and starts on diagonal 0. Hence it must end on diagonal 0. Assume inductively that a D-path must end on diagonal k in { - D, - D + 2, D - 2, D }. Every (D+1)-path consists of a prefix D-path, ending on say diagonal k, a non-diagonal edge ending on diagonal k+1 or k- 1, and a snake that must also end on diagonal k+1 or k- 1. It then follows that every (D+1)-path must end on a diagonal in { (- D)±1, (- D+2)±1, (D- 2)±1, (D)±1 } = { - D- 1, - D+1, D- 1, D+1 }. Thus the result holds by induction.

证明0-path0\text{-}path 仅由对角边组成,从 对角线 0 开始。因此它必须在对角线 0 上结束。假设一条 D-pathD\text{-}pathdiagonal k, D,D+2,...,D2,D{-D,-D+2,...,D-2,D} 上结束。而每个 (D+1)path(D+1)-path 由前缀 D-pathD\text{-}path、非对角线边(diagonal k + 1diagonal k - 1)和 snake(以 diagonal k + 1diagonal k - 1 结尾)组成。然后,每个(D+1)path(D+1)-path必须在 (D)±1,(D+2)±1,...(D2)±1,(D)±1=D1,D+1,...D1,D+1{(-D)±1, (-D+2)±1, ... (D-2)±1, (D)±1}={-D-1, -D+1, ... D-1,D+1} 中的对角线上结束。因此,引理 1 通过归纳法可以证明。

The lemma implies that D-paths end solely on odd diagonals when D is odd and even diagonals when D is even.

引理 1 暗示当 DD 为奇数时,D-pathD\text{-}path 仅在 奇数对角线 上结束,当 DD 为偶数时,D-pathD\text{-}path 仅在 偶数对角线上结束。

A D-path is furthest reaching in diagonal k if and only if it is one of the D-paths ending on diagonal k whose end point has the greatest possible row (column) number of all such paths. Informally, of all D-paths ending in diagonal k, it ends furthest from the origin, (0,0). The following lemma gives an inductive characterization of furthest reaching D-paths and embodies a greedy principle: furthest reaching D-paths are obtained by greedily extending furthest reaching (D - 1)-paths.

当且仅当 D-pathD\text{-}path 的终点在在 diagonal k,并且其终点具有最大行(列)数时,D-pathD\text{-}pathdiagonal k 最远。即在所有以 diagonal k 结尾的 D-pathD\text{-}path 中,它的端点距离原点 (0,0)(0,0) 最远。下面的引理给出了最远可达 D-pathD\text{-}path 的归纳特征,并体现了贪婪原则:最远可达 D-pathD\text{-}path 是通过贪婪地扩展最远可达 (D1)-path(D-1)\text{-}path 获得的。

Lemma 2: A furthest reaching 0-path ends at (x,x), where x is min( z- 1 || a ≠b or z>M or z>N). A furthest reaching D-path on diagonal k can without loss of generality be decomposed into a furthest reaching (D - 1)-path on diagonal k - 1, followed by a horizontal edge, followed by the longest possible snake or it may be decomposed into a furthest reaching (D - 1)-path on diagonal k+1, followed by a vertical edge, followed by the longest possible snake.

引理 2:最远到达的 0-path0\text{-}path(x,x)(x, x) 处结束,其中 xxmin(z1azbz or z>M or z>N)min(z-1 || a_z ≠ b_z \text{ or } z>M \text{ or } z>N) 。在不丧失一般性的情况下,D-pathD\text{-}pathdiagonal k 上的最远距离可以分解为 (D1)-path(D-1)\text{-}pathdiagonal k-1 上的最远距离 ,然后是水平边,然后是尽可能长的 snake;或者 (D1)-path(D-1)\text{-}pathdiagonal k + 1 上的最远距离 ,然后是垂直边,然后是尽可能长的 snake

Proof: The basis for 0-paths is straightforward. As noted before, a D-path consists of a (D - 1)-path, a non-diagonal edge, and a snake. If the D-path ends on diagonal k, it follows that the (D - 1)-path must end on diagonal k±1 depending on whether a vertical or horizontal edge precedes the final snake. The final snake must be maximal, as the D-path would not be furthest reaching if the snake could be extended. Suppose that the (D - 1)-path is not furthest reaching in its diagonal. But then a further reaching (D - 1)-path can be connected to the final snake of the D-path with an appropriate non-diagonal move. Thus the D-path can always be decomposed as desired.

证明: 0-path0\text{-}path 的证明很简单。如前所述, D-pathD\text{-}path(D1)-path(D-1)\text{-}path、非对角边和 snake 组成。如果 D-pathD\text{-}path 是以 diagonal k 结束,则 (D1)-path(D-1)\text{-}path 必须以 diagonal k±1 结束,这取决于最终snake之前是垂直边还是水平边。最后一条snake必须是最大的,因为如果snake可以延伸,则 D-pathD\text{-}path 不会到达最远的地方。假设 (D1)-path(D-1)\text{-}path 在其 diagonal k±1 中不是最远的,那么通过 D-pathD\text{-}path 的最后一条 snake 连接到适当的非对角可以到达 (D1)-path(D-1)\text{-}path。综上,D-pathD\text{-}path 是可以进行分解。

Given the endpoints of the furthest reaching (D - 1)-paths in diagonal k+1 and k- 1, say (x’,y’) and (x",y") respectively, Lemma 2 gives a procedure for computing the endpoint of the furthest reaching D-path in diagonal k. Namely, take the further reaching of (x’,y’+1) and (x"+1,y") in diagonal k and then follow diagonal edges until it is no longer possible to do so or until the boundary of the edit graph is reached. Furthermore, by Lemma 1 there are only D+1 diagonals in which a D-path can end. This suggests computing the endpoints of D-paths in the relevant D+1 diagonals for successively increasing values of D until the furthest reaching path in diagonal N - M reaches (N,M).

给定 diagonal k+1diagonal k-1 中最远到达的 (D1)-path(D-1)\text{-}path 的端点,例如 (x,y)(x', y')(x",y")(x", y"),引理 2 给出了计算 diagonal k 中最远到达的 D-pathD\text{-}path 的端点的过程。也就是说,在 diagonal k 中进一步达到 (x,y+1)(x', y'+1)(x"+1,y")(x"+1, y"),然后沿着对角线边缘,直到不再可能这样做或直到达到编辑图的边界。此外,引理 1 中,D-pathD\text{-}path 的端点存在 D+1D+1 条对角线。这建议只计算 D-pathD\text{-}path 中相关对角线中的端点,然后连续增加 DD 值,直到 diagonal 的端点到达 (N,M)(N, M)

for (let D = 0; D <= M + N; D++) {
  for (let k = -D; k <= D; k += 2) {
    // Find the endpoint of the furthest reaching D-path in diagonal k:
    // If (N,M) is the endpoint, Then The D-path is an optimal solution. Stop
  }
}

The outline above stops when the smallest D is encountered for which there is a furthest reaching D-path to (N,M). This must happen before the outer loop terminates because D must be less than or equal to M+N. By construction this path must be minimal with respect to the number of non-diagonal edges within it. Hence it is a solution to the LCS/SES problem.

D-pathD\text{-}path 最远到达 (N,M)(N, M) 点时,上述程序停止。这必须在外循环终止之前发生,因为 D<=M+ND <= M+N。通过构造,该路径必须相对于其 非对角边 的数量最小。因此,它是 LCS/SES 问题的解决方案。

In presenting the detailed algorithm in Figure 2 below, a number of simple optimizations are employed. An array, V, contains the endpoints of the furthest reaching D-paths in elements V[D],V[D+2],...,V[D2],V[D]V[- D], V[- D + 2], ..., V[D-2],V[D]. By Lemma 1 this set of elements is disjoint from those where the endpoints of the (D+1)-paths will be stored in the next iteration of the outer loop. Thus the array V can simultaneously hold the endpoints of the D-paths while the (D+1)-path endpoints are being computed from them. Furthermore, to record an endpoint (x,y) in diagonal k it suffices to retain just x because y is known to be x - k. Consequently, V is an array of integers where V[k] contains the row index of the endpoint of a furthest reaching path in diagonal k.

下图 2 中展示详细的,采用了许多简单优化的算法。数组 VV 包含元素 V[D]V[D+2]V[D2]V[D]V[-D]、V[-D+2]、…、V[D-2]、V[D],代表 D-pathD\text{-}path 最远到达的端点。通过引理 1 可知,这组元素与 (D+1)path(D+1)-path 的端点与存储在外循环的下一次迭代中的元素是不相交的。因此,数组 VV 可以保存 D-pathD\text{-}path的端点,同时从它们计算 (D+1)path(D+1)-path 的端点。此外,为了在 diagonal k 中记录端点 (x,y)(x, y),只保留 xx 就足够了,因为 y=xky = x -k。因此,VV 是一个整数数组,其中 V[k]V[k] 包含对角线 kk 中最远路径端点的行索引。

图2 贪婪算法

As a practical matter the algorithm searches D-paths where D≤MAX and if no such path reaches (N,M) then it reports that any edit script for A and B must be longer than MAX in Line 14. By setting the constant MAX to M+N as in the outline above, the algorithm is guaranteed to find the length of the LCS/SES. Figure 3 illustrates the D-paths searched when the algorithm is applied to the example of Figure 1. Note that a fictitious endpoint, (0, - 1), set up in Line 1 of the algorithm is used to find the endpoint of the furthest reaching 0-path. Also note that D-paths extend off the left and lower boundaries of the edit graph proper as the algorithm progresses. This boundary situation is correctly handled by assuming that there are no diagonal edges in this region.

实际上,该算法搜索 D-pathD\text{-}path,其中 DMAXD≤MAX。 如果没有这样的路径到达(N,M)(N, M),那么它会报告 A 和 B 的任何edit script必须比第 14 行中的 MAXMAX 长。如上文所述,通过将常量 MAXMAX 设置为 M+NM+N,可以保证算法能够找到 LCS/SESLCS/SES 的长度。图 3 说明了将该算法应用于图 1 示例时搜索的 D-pathD\text{-}path。注意,在算法的第 1 行中设置的虚拟端点 (01)(0,-1) 用于查找到达最远的 0-path0\text{-}path 的端点。而且随着算法的进行,D-pathD\text{-}path 从编辑图的左边界和下边界向外延伸。通过假设此区域中没有对角边,可以正确处理此边界情况。

图3 最远到达路径

The greedy algorithm takes at most O((M+N)D) time. Lines 1 and 14 consume O(1) time. The inner For loop (Line 3) is repeated at most (D+1)(D+2)/2 times because the outer For loop (Line 3) is repeated D+1 times and during its k iteration the inner loop is repeated at most k times. All the lines within this inner loop take constant time except for the While loop (Line 9). Thus O(D ) time is spent executing Lines 2-8 and 10-13. The While loop is iterated once for each diagonal traversed in the extension of furthest reaching paths. But at most O((M+N)D) diagonals are traversed since all D-paths lie between diagonals - D and D and there are at most (2D+1)min(N,M) points within this band. Thus the algorithm requires a total of O((M+N)D) time. Note that just Line 9, the traversal of snakes, is the limiting step. The rest of the algorithm is O(D ). Furthermore the algorithm never takes more than O((M+N)MAX) time in the practical case where the threshold MAX is set to a value much less than M+N.

贪婪算法最多需要 O((M+N)D)O((M+N)D) 个时间。第 1 行和第 14 行消耗 O(1)O(1) 时间。内部 For 循环(第 3 行)最多重复 (D+1)(D+2)/2(D+1)(D+2)/2 次,因为外部 For 循环(第 3 行)重复 D+1 次,并且在其第 k 次迭代期间,内部循环最多重复 k 次。除了 While 循环(第 9 行)之外,该内部循环中的所有行都使用固定时间。因此,O(D)时间用于执行第 2-8 行和第 10-13 行。While 循环对于在最远到达路径的扩展中遍历的每个对角线迭代一次。但是,由于所有的 D 路径都位于对角线 -D 和 D 之间,并且在这个频带内最多有 (2D+1)(2D+1)min(NM)min(N,M) 点,所以最多要穿过 O((M+N)D)O((M+N)D) 条对角线。因此,该算法总共需要 O((M+N)D)O((M+N)D) 个时间。请注意,只有第 9 行(snake) 是限制步骤。算法的其余部分是 O(D)。此外,在将阈值 MAX 设置为远小于 M+N 的值的实际情况下,该算法所花费的时间不会超过 O((M+N)MAX)O((M+N)MAX)

The search of the greedy algorithm traces the optimal D-paths among others. But only the current set of furthest reaching endpoints are retained in V. Consequently, only the length of an SES/LCS can be reported in Line 12. To explicitly generate a solution path, O(D ) space is used to store a copy of V after each iteration of the outer loop. th Let V be the copy of V kept after the d iteration. To list an optimal path from (0,0) to the point V [k] first determine whether it is at the end of a maximal snake following a vertical edge from V [k + 1] or a horizontal edge d - 1 from V [k - 1]. To be concrete, suppose it is V [k - 1]. Recursively list an optimal path from (0,0) to this d - 1 d - 1 point and then list the vertical edge and maximal snake to V [k]. The recursion stops when d = 0 in which case the d2 snake from (0,0) to (V [0],V [0]) is listed. So with O(M+N) additional time and O(D ) space an optimal path 0 0 can be listed by replacing Line 12 with a call to this recursive procedure with V [N - M] as the initial point. A D refinement >requiring only O(M+N) space is shown in the next section.

贪婪算法的搜索跟踪了最优 D-pathD\text{-}path。但 VV 中仅保留了当前最远到达的端点。因此,第 12 行中只能报告 SES/LCS 的长度。为了生成最优路径,需要使用 O(D2)O(D^2) 个空间,在每次外循环后存储 VV 的副本。设 VdV_d 为第 dd 次迭代后保存的 VV 的副本。要列出从点 (0,0)(0,0) 到点 Vd[k]V_d[k] 的最佳路径,需要首先确定到点 Vd[k]V_d[k]snake 是跟随 Vd1[k+1]V_{d - 1}[k+1] 的垂直边还是跟随 Vd1[k1]V_{d - 1}[k-1] 的水平边。假设它是 Vd1[k1]V_{d - 1}[k-1] 的水平边,则递归地列出从 (0,0)(0,0) 到该点的最佳路径,然后列出水平边和到 Vd[k]V_d[k] 的最大 snake。当 d=0d=0 时,递归停止,此时的 snake(0,0)(0, 0)(V0[0],V0[0])(V_0[0], V_0[0])。因此,在 O(M+N)O(M+N) 的时间和 O(D2)O(D^2) 空间下,可以通过使用 VD[NM]V_D[N-M] 作为初始点的递归过程调用来替换第 12 行,从而找到最佳路径。