problem 1
Description
Living cells on earth encode their genetic information in a chemical code made from repetitions of four basic compounds abbreviated as A, C, G, and T. These constituents are arranged in long linear sequences of triplets, e.g., GCT and TGA, referred to as codons. The meaning of a short subsequence of a genetic encoding is often not immediately clear, since there are different reading frames in which to interpret the information. For instance, the partial sequence CTGTGCCGCAATTGAC might specify any of the following codons, where question marks indicate missing data:
- CTG TGC CGC AAT TGA C??
- ?CT GTG CCG CAA TTG AC?
- ??C TGT GCC GCA ATT GAC
Certain codons have special meanings that resolve the reading frame ambiguity. The start codon GTG denotes the beginning of a gene-encoding region. One of the three termination codons TAG, TAA, and TGA marks the end of such a region.
A commonly asked question is whether a partial string acquired from a biological sample contains a complete gene-coding region under some reading frame. Design a nondeterministic finite automaton that accepts such a substring. Ensure that the start and termination codons are correctly aligned.
Solution
正则表达式为
NFA如下图所示:
problem 2
Description
Construct a deterministic version of the following nondeterministic finite automaton. Make sure to indicate the initial and terminal states. Label each DFA state with the set of NFA states to which it corresponds.
original NFA:
Solution
转换后的NFA为:
NFA->DFA迭代如下:
| Set Name | DFA States | NFA States | a | b | c | d | e |
|---|---|---|---|---|---|---|---|
| - | - | - | |||||
| - | |||||||
| - | - | ||||||
| - | - | - | |||||
| - | - | - | - | ||||
| - | |||||||
| - | - | - | |||||
| - |
生成的DFA为: