problem 1

Description

Living cells on earth encode their genetic information in a chemical code made from repetitions of four basic compounds abbreviated as A, C, G, and T. These constituents are arranged in long linear sequences of triplets, e.g., GCT and TGA, referred to as codons. The meaning of a short subsequence of a genetic encoding is often not immediately clear, since there are different reading frames in which to interpret the information. For instance, the partial sequence CTGTGCCGCAATTGAC might specify any of the following codons, where question marks indicate missing data:

CTG TGC CGC AAT TGA C??
?CT GTG CCG CAA TTG AC?
??C TGT GCC GCA ATT GAC

Certain codons have special meanings that resolve the reading frame ambiguity. The start codon GTG denotes the beginning of a gene-encoding region. One of the three termination codons TAG, TAA, and TGA marks the end of such a region.

A commonly asked question is whether a partial string acquired from a biological sample contains a complete gene-coding region under some reading frame. Design a nondeterministic finite automaton that accepts such a substring. Ensure that the start and termination codons are correctly aligned.

Solution

正则表达式为

\begin{aligned} r&=r_1(r_2|r_3|r_4|r_5)^*r_6\\ r_1&=GTG\\ r_2&=[ACG][ACGT][ACGT]\\ r_3&=T[CT][ACGT]\\ r_4&=TA[CT]\\ r_5&=TG[CGT]\\ r_6&=TAG|TAA|TGA\\ \end{aligned}

NFA如下图所示:

problem 2

Description

Construct a deterministic version of the following nondeterministic finite automaton. Make sure to indicate the initial and terminal states. Label each DFA state with the set of NFA states to which it corresponds.

original NFA:

Solution

转换后的NFA为:

NFA->DFA迭代如下:

Set Name	DFA States	NFA States	a	b	c	d	e
$q_0$	$d_0$	$\begin{Bmatrix}1,5,\\11,12\end{Bmatrix}$	$\begin{Bmatrix}2,3,4\\7,9,13\\15\end{Bmatrix}$	-	$\begin{Bmatrix}3,7,\\9,14,\\15\end{Bmatrix}$	-	-
$q_1$	$d_1$	$\begin{Bmatrix}2,3,4\\7,9,13\\15\end{Bmatrix}$	-	$\begin{Bmatrix}1,5,\\8,11,\\12\end{Bmatrix}$	$\begin{Bmatrix}2,4\end{Bmatrix}$	$\begin{Bmatrix}1,3,5,\\7,9,11,\\12\end{Bmatrix}$	$\begin{Bmatrix}1,5,10,\\11,12\end{Bmatrix}$
$q_2$	$d_2$	$\begin{Bmatrix}3,7,\\9,14,\\15\end{Bmatrix}$	-	$\begin{Bmatrix}1,5,\\8,11,\\12\end{Bmatrix}$	$\begin{Bmatrix}2,4\end{Bmatrix}$	-	$\begin{Bmatrix}1,5,10,\\11,12\end{Bmatrix}$
$q_3$	$d_3$	$\begin{Bmatrix}1,5,\\8,11,\\12\end{Bmatrix}$	$\begin{Bmatrix}2,3,4\\7,9,13\\15\end{Bmatrix}$	-	$\begin{Bmatrix}3,7,\\9,14,\\15\end{Bmatrix}$	-	-
$q_4$	$d_4$	$\begin{Bmatrix}2,4\end{Bmatrix}$	-	-	-	$\begin{Bmatrix}1,3,5,\\7,9,11,\\12\end{Bmatrix}$	-
$q_5$	$d_5$	$\begin{Bmatrix}1,3,5,\\7,9,11,\\12\end{Bmatrix}$	$\begin{Bmatrix}2,3,4\\7,9,13\\15\end{Bmatrix}$	$\begin{Bmatrix}1,5,\\8,11,\\12\end{Bmatrix}$	$\begin{Bmatrix}2,3,4\\7,9,14\\15\end{Bmatrix}$	-	$\begin{Bmatrix}1,5,10,\\11,12\end{Bmatrix}$
$q_6$	$d_6$	$\begin{Bmatrix}1,5,10,\\11,12\end{Bmatrix}$	$\begin{Bmatrix}2,3,4\\7,9,13\\15\end{Bmatrix}$	-	$\begin{Bmatrix}3,7,\\9,14,\\15\end{Bmatrix}$	-	-
$q_7$	$d_7$	$\begin{Bmatrix}2,3,4\\7,9,14\\15\end{Bmatrix}$	-	$\begin{Bmatrix}1,5,\\8,11,\\12\end{Bmatrix}$	$\begin{Bmatrix}2,4\end{Bmatrix}$	$\begin{Bmatrix}1,3,5,\\7,9,11,\\12\end{Bmatrix}$	$\begin{Bmatrix}1,5,10,\\11,12\end{Bmatrix}$

生成的DFA为:

句法分析小练习

problem 1

Description

Solution

problem 2

Description

Solution