Polymorphsim of Data in Computer

166 阅读2分钟

Polymorphsim means there are multiply forms for the same thing. And data in computer looks like that.

Basically, every data in computer is binary number with limited size. The meaning of data is defined how it is treated. For example, a byte contains 10010011. When you treat it as unsigned number, it's 163; when you treat it as signed number, it's -109; when you treat it as minifloat, 8 bit floating point, it's -2.75; when you treat it as an address, it's a pointer pointing to the memory cell indexed at 163; when you treat is as an instruction, it's decoded according to a ISA and will be executed on a CPU; when you treat it as a code point in some character encoding, it is a character in that alphabet; moreover you can treat it personally to mean something relevant to your program as well.

In a computer, when a program is run, the data that program can be used must be in register or memory, if the data is sit in file system, it must be loaded into memory before it can be used. An memory is an array of memory cells indexed started from 0, so the index of each memory cell is its address. And registers in CPU have names, which can be refered in Assembly then be translated into code in the instruction. Registers and Memory Cells are locations for storing data for our programs, and they are accessable by code fragment in instruction or by memory address.

Based on the above knowledge about data in computer, it is quite simple to distinguish a popular confusion about value and address. When we use C or C++, we can pass a value to function or pass an address to function. The mechanism for a function to receive a value or an address is the same as copy the content of that value or that address to a local variable, but the difference is the content of that local variable is treated as a value or a pointer with that address. Therefore the semantics of the content is depened on how it is treated latter on. But with type system of C or C++, it's treatment is enforced by its type, e.g. T or T*. So with weak type system or uni-type system, a data is data, there is no meaning associated to it, its semantics is defined by its use. Whereas with strong type system, a data will be tagged a type, which enforces its behavior, how it can be used.

But in CPU, the different register classes to the registers is somewhat similar to the type to the data. For instance, General Purpose Registers is to store any data without usage assumption; whereas PC register is to store the address of next instruction to be executed, and FP (Frame-pointer) register is to store the address of stack frame. So the PC, FP registers are for storing addresses, and their contents will be treated as addresses to memory cells.

In uni-type system, we have great flexibility on our program, that is every thing is just a binary stream, a program is a binary stream, a number is a binary stream as well. Therefore we can modify a program on the fly if no enforcement on program code is read only. Then it will lead to some interesting concept, such as untyped lambda calculus.