DBMS - Ass2,Signature Indexes

393 阅读1分钟

UNSW COMP9315 Assingment2笔记

Introduction

  1. Selection is performed by first forming a query signature, based on the values of the known attributes, and then scanning the stored signatures, matching them against the query signature, to identify potentially matching tuples1.
  2. Signature matching can result in "false matches", where the query and tuple signatures match, but the tuple is not a valid result for the query.
  3. The kind of signature matching described above uses one signature for each tuple

image.png

Signatures

  1. superimposed codewords (SIMC): all codewords and signatures are m bits wide, and each codeword has k bits set to 1 (由attribute的bitwise-or运算得来)
  2. CATC: signatures are m bits wide, but codewords occupy approximately equal numbers of bits of the signature. Since there are m bits in the signature and n attributes, each codeword is u = m/n bits long,

CATC组成结构如图

image.png

实际中构建CATC的方法,左边第一列为理论上的codeword的值,中间扩展为m bits之后的的长度,最后为实际的CATC值,就是将codeword做left shift到对应位置,然后拼接

image.png

定义几个约定符号

tuple signatures as m, 
the length of page signatures as mp, 
and the length of CATC codewords as u

Relation

  1. R.info containing global information
  2. R.data containing data pages
  3. R.tsig containing tuple signatures
  4. R.psig containing page signatures
  5. R.bsig containing bit-sliced signatures 其中bits sig和page sig的关系类似90度视图,如下图所示

image.png

Goal

  1. Bit string Datatype
  2. Scanning for Results: 在没有signature index的情况下执行query。 Implement the scanAndDisplayMatchingTuples() function, which performs the check for matching tuples in each of the marked pages
  3. Tuple signatures