Hack Programming
最近在看Coursera上Build a Modern Computer from First Principles: From Nand to Tetris (Project-Centered Course)Unit 4.6: Hack Programming, Part 1 | Coursera这门课旨在引导学生从Nand门搓出一台计算机。讲到汇编时,有些思考,作以记录。
Part 0 Preamble
Fundamental Knowledge
我们要构建一台16位计算机,每条计算指令长度最大为16位,使用Hack模拟汇编语言,有三个我们可以直接操作的寄存器A(数据/地址寄存器),D(数据寄存器),M(内存寄存器),A寄存器配合A指令,指定寄存器地址,由于16位中有一位用于区分A/C指令,所以最大寻址范围为(2^16-1)即二进制位为0111,1111,1111,1111的情况。该计算机是字寻址(一个内存地址对应16bit存储空间)。
注意,每个内存地址的大小是1Byte,不要和寄存器相混淆!
D寄存器存储数据,语言有点难以描述,M寄存器为左值时,代表地址A的寄存器,M寄存器为右值时,值为地址A的寄存器的值。
The basic of Hack Programming Language
A-instruction:
format:
@value
set A-register a specific value
C-instruction:
format:
dest = comp;jump
C指令determine all the possible operations of the computer.
它主要解决三个问题:
1,计算什么(右值)
2,将计算的结果存储到哪里(左值)
3,计算完成之后做什么(jump跳转)
之后是一些位对应操作表,这里不再赘述。详情参见:The Elements of Computing Systems
Part 1 Working with registers and memory
实际上,我们能自由指定数字,也仅限于@...后面的地址了。我们不能说:
D = 100
因为C指令规定的操作列表:
我们只能有限地直接对D寄存器操作,为了完成一些赋值操作我们将A寄存器作为右值进行赋值。
@100
D = A
One thing that you have to understand is that computers never stand still.
计算机处理操作不会终止,为了避免让计算机一直向下执行指令(NOP,即使那里什么也没有)我们往往使用死循环来终止程序。
**内置符号:**R0R15:系统自带变量,默认值为015
Part 2 Branching Variables Iteration
Branching实现
traget:
if R0 > 0:
R1 = 1
else :
R1 = 0
Hack assembly code:
@R0
D = M
@8
D;JGT //if R0 > 0, goto 8
@R1
M = 0
@10
0;JMP
@R1
M = 1 //R1 = 1
@10
0;JMP //Teminator
"如果去掉行序,这样的代码会很难懂(cryptic)"
"为了方便,我们可以做符号引用(symbolic references)表示代码段落:"
@R0
D = M
@POSITIVE
D;JGT
@R1
M = 0
@END
0;JMP
(POSITIVE) //标记判断为真的代码位置
@R1
M = 1
(END) //标记结束的代码位置
@END
0;JMP
"当标签声明时,每个对标签的引用都被 对标签后面的指令编号的引用 所取代"
Variables实现
traget:
flips the values of
RAM[0] and RAM[1]
temp = R1
R1 = R0
R0 = temp
Hack assembly code:
@R1
D = M
@temp
M = D
"@temp : find some available memory register(?), and ues it to represent temp. Each occurance of @temp in the program will be translated into @RAM[?]"
@R0
D = M
@R1
M = D
@temp
D = M
@R0
M = D
If you look at this piece of code you will realize that this program is relatively easy to read and debug.
Iteration实现
Computes RAM[1] = 1 + 2 + 3 + ... + RAM[0]
//i、sum、n为变量,R0、R1为值
n = R0, i = 1, sum = 0
LOOP:
if i > n goto STOP
sum = sum + 1
i = i + 1
goto LOOP
STOP:
R1 = sum
Hack assembly code:
@R0
D = M // D = RAM[0] "M虽然是指寄存器,但是出现在等式右边,要视为值,反之视为变量"
"我又想了想,直接操作R0行不行呢,那样还能省掉变量n,在这里是可以的。实际上R0是我们赋予的一个初始量,如果同时被一个程序的多个过程调用,我们还是应该避免对它直接操作。"
@n
M = D // RAM[n] = R0
@i
M = 1 // RAM[i] = 1
@sum
M = 0 // RAM[sum] = 0
(LOOP)
@i
D = M // Get current i
@n
D = D - M // Jump judgement condition
@STOP
D;JGT // if i > n goto STOP
@sum
D = M // Get current sum
@i
D = D + M // D = sum + i
@sum
M = D // sum = sum + i
@i
M = M + 1 // i += 1
@LOOP
0;JMP // Forced jump
(STOP)
@sum
D = M
@R1
M = D // RAM[1] = sum
(END)
@END
0;JMP // Teminator
Part 3 Pointers
Pointers实现
target:
for(i = 0; i < n; i++){
arr[i] = -1
}
//arr is a address
//suppose that arr = 100 and n = 10
The notion of arrays gets lost in the translation.
学习C时,我们经常听到这样一句话“数组名即数组首元素之地址”,实际上,在Hack中,我们将数组变量的值设定为数组首元素的地址。
刚刚还在想D、M都既能做左值,又能做右值,那A呢,这就来了,A做左值,表示地址为A的值的寄存器,通常和M搭配使用,比如在这个问题中。
"arr = 100"
@100
D = A
@arr
M = D
"n = 10"
@10
D = A
@n
M = D
"initialize i = 0"
@i
M = 0
(LOOP)
"if (i == n) goto end"
@i
D = M
@n
D = D - M
@END
D;JEQ
"RAM[arr + i] = -1"
@arr
D = M // Get arr
@i
A = D + M // A->RAM[arr + i]
M = -1 // RAM[arr + i] = -1
"i++"
@i
M = M + 1
@LOOP
0;JMP
(END)
@END
0;JMP
只要理解了左值,右值的概念,想通其中原理并不难。存储其他变量的内存地址的 变量 就叫做指针。当我们想要通过指针访问内存,就必须通过A = M这样的操作来完成**(set the address register to the contents of some memory register)**,本例中,i, arr都是指针。
最后,引用课程讲师Shimon Schocken的一句话作为结尾:“Simple-minded people are impressed by sophisticated things, and sophisticated people are impressed by simple things. And, in particular when the simple things are fantastically expressive, they are very impressive.”
计算机底层原理想必正是这样简洁而震撼的事物。