DBMS -- Pages and Tuples 联系+学习

79 阅读2分钟

Page

简单表示数据组织关系

image.png

问题:Assume that a data file is composed of 4KB pages, where each page is structured as follows:

image.png

The start of the page contains a tuple directory which is a sequence of three-byte values, where the first 12 bits contain the offset into the page of the tuple and the second 12 bits contain the tuple length.

Write a C function that takes three parameters: an open file descriptor, a page number and a record number and reads the data for the corresponding record. Do not read the whole page; read just enough of the data to solve the problem. Dynamically allocate a memory buffer large enough to hold the tuple data when read in. The function should return a pointer to the start of the tuple memory buffer.

The function should do appropriate error-checking and return NULL in the case that any operation cannot be completed. Use the following function template:

char *getTuple(int inFile, int pageNumber, int recNumber) { ... }

解答:
提炼,tuple directory,24bits (3bytes)存在offset和tuple length的信息
inFile: 文件描述符
pageNumber:页号
recNUmber:也就是tuple number

char *getTuple(int inFile, int pageNumber, int recNumber) {
    Page p = get_page(inFile, pageNumber);
    Tulpe t = get_tuple
}

ans:
    #define PAGE_SIZE 4096

char *getTuple(int inFile, int pageNumber, int recNumber)
{
    // position file at start of page

off_t pageAddr = pageNumber * PAGE_SIZE;         //文件中page的起始位置
if (lseek(inFile, pageAddr, SEEK_SET) < 0)
	return NULL;

// re-position the file to the start of the tuple directory entry

off_t dirOffset = recNumber * 3; // 3 bytes per directory entry, 文件中tuple对应的tuple directory的位置 (recap数据文件的组织关系:文件包含page,page包含tuples,文件中的起始位置包含tuple direcory)
    
if (lseek(inFile, dirOffset, SEEK_CUR) < 0)
	return NULL;

// read 3-byte directory entry for this tuple

unsigned int dirEntry;
if (read(inFile, &dirEntry, 3) != 3) // 读取这个tuple对应的3个byte的directory,direcoty包含真正的offset和tuple的长度信息
	return NULL;

// extract tuple offset and length from directory entry

unsigned int tupOffset, tupLength;
unsigned int lengthMask = 0x00000fff; // low-order 12 bits
unsigned int offsetMask = 0x00fff000; // high-order 12 bits

tupOffset = (dirEntry & offsetMask) >> 12;
tupLength = dirEntry & lengthMask;   // 从directory提取相应的信息

// allocate memory buffer to hold tuple data

char *tupBuf;                        // 为tuple字符串申请相应的空间
if ((tupBuf = malloc(tupLength)) == NULL)
	return NULL;

// position file at tuple location

off_t tupAddr = pageAddr + tupOffset; // 通过page的offset和tuple在page中的offset来找到目标tuple的信息
if (lseek(inFile, tupAddr, SEEK_SET) < 0)
	return NULL;

// read tuple data into buffer

if (read(inFile, tupBuf, tupLength) != tupLength)
	return NULL;

return tupBuf;
}

问题:Consider a data file containing tuples with a page structure similar to that in the previous question. Pages are 4KB in size, and each page contains a tuple directory with 100 entries in it, where each entry is 3-bytes long. Assuming that the (minimum,average,maximum) tuple lengths are (32,64,256) bytes and that the file has 100 pages, determine the following:

1. The minimum number of tuples that the file can hold

2. The maximum number of tuples that the file can hold

解答:
    假设tuple都为最小值32bytes,则一个page可以有 (4096bytes - 100 * 3)/ 32 ~ 118 * 100 pages = 11800tuple

答案:
    理论上是11800tuple,但是tuple directory只有100个entry,所以最大是100 * 100 = 10000tuple

问题:Consider a variation on the above scenario. Rather than pages having a fixed size tuple directory, the tuple directory can grow and shrink depending on the number of tuples in the page. For this to work, the tuple directory starts at the bottom of the page (address 0) and grows up, while tuples are added from the top of the page (address 4095) and grow down. If all other factors are the same (total 100 pages, (min,avg,max) tuple lengths (32,64,128)), what is the maximum number of tuples that the file can hold? You may assume that tuples can begin at any address (i.e. they do not have to start at a 4-byte address).

解答:
    tuple directory现在无限制,那么答案就是11800
ans:
    需要平衡tuple和tuple directory的空间,解出不等式3 * N + 32 * N)<4096