DBMS -- Implementing Selection on One Attribute (1-d)

144 阅读2分钟

问题:

image.png

image.png

a.
   解答:
      文件已经sort好
      两种情况
      1. dense primary index,这样的话不管sort或者不sort都可以
      2. dense clustring index,这样文件必须要sort
      
      综上,只要文件sort,可以在任意的field进行dense index
   答案:
       dA dense index has one index entry for each data record. 
       It is required that the entries in the index are sorted (to make searching the index efficient). 
       However, there are no specific requirements on where the records are located within the data file. 
       Thus, you can build a dense index on any or all fields.

b. 
    non-dense index
    1. sparse primary index: 对每个block进行索引,要求block中的数据是sort的
    2. sparse clustering index:这个一定要sort
    
    
    综上,要对一个key(field)构建sparse index,必须是rel在其上已经sort了

c.
    解答:
        对id进行dense index, id占4byte,总共5000个record

        一个index entry占有4byte key + 1 byte 地址 = 5
        5 * 5000 = 25000 bytes

        一个page 4096 bytes

        25000/4096 = 6 pages
    答案:
        一个index entry应该有4 bytes key + 4 bytes address = 8bytes
        8 * 50000 / 4096 ~ 98
 d.
     解答:
         对code进行dense index
         dense index大小为 8 key + 4 addr = 12 bytes
         5000个subject * 12 = 60000
         60000/4096 = 15
 e.
    解答:
        在两者上同时构建non-dense index
        类型为primary non-dense index
        
         block内部需要进行排序
         
         students record为100个record,也就是对每一个block进行index entry,有多少个block? c_student = 100
         50000/100 = 500 blocks
    ans:
        For a non-dense index, we need only one index entry per block of the data file. 
        Assuming that the files are fully compacted, the Student file has ceil(50000/100) = 500 blocks, and the Subject file has ceil(5000/20) = 250 blocks. 
        The analysis then proceeds as for the previous question, except with one index entry per block.
        Thus #index blocks for Student = ceil(500/512) = 1, and #index blocks for Subject = ceil(250/341) = 1
         
  f.

问题:

image.png

关系代数解释
等同于SQLselect R.a,R.b from R where R.a > 50000
首先排除f,顺序遍历,很慢
对比a,c选项,排除c
clusterd ?
linear hashing?

Selection With Hashing

image.png

静态hash,用于hash的文件大小是固定的,比如基于静态hash存在5个block/bucket/page,总之就是可以存放经过hash函数之后的tuple

image.png

问题:

image.png

a. 图中可见每一个桶能存5个数据,当所有桶满的时候则double size