UNSW COMP9315 Assingment2笔记
Introduction
- Selection is performed by first forming a query signature, based on the values of the known attributes, and then scanning the stored signatures, matching them against the query signature, to identify potentially matching tuples1.
- Signature matching can result in "false matches", where the query and tuple signatures match, but the tuple is not a valid result for the query.
- The kind of signature matching described above uses one signature for each tuple
Signatures
- superimposed codewords (SIMC): all codewords and signatures are m bits wide, and each codeword has k bits set to 1 (由attribute的bitwise-or运算得来)
- CATC: signatures are m bits wide, but codewords occupy approximately equal numbers of bits of the signature. Since there are m bits in the signature and n attributes, each codeword is u = m/n bits long,
CATC组成结构如图
实际中构建CATC的方法,左边第一列为理论上的codeword的值,中间扩展为m bits之后的的长度,最后为实际的CATC值,就是将codeword做left shift到对应位置,然后拼接
定义几个约定符号
tuple signatures as m,
the length of page signatures as mp,
and the length of CATC codewords as u
Relation
- R.info containing global information
- R.data containing data pages
- R.tsig containing tuple signatures
- R.psig containing page signatures
- R.bsig containing bit-sliced signatures 其中bits sig和page sig的关系类似90度视图,如下图所示
Goal
- Bit string Datatype
- Scanning for Results: 在没有signature index的情况下执行query。 Implement the scanAndDisplayMatchingTuples() function, which performs the check for matching tuples in each of the marked pages
- Tuple signatures