How DataBase systems work internallyLater Half of the semest

Later Half of the semester.

How a DBMS stores and accesses data.
How a DBMS builds indexes and runs queries.
How data is protected from crashed and faults.

Week 8

**Chapter 12 - data storage and organization. **

How disks work and how to model their speed.
How to design algorithms for disks - external sorting
How databases organize their table data on disk
Storage management and caching

Disk and Disk Models

Memory Hierarchy
Hard Disks,
SSD
Modeling Disk Performance
I/O - Efficient Sorting

截屏2024-12-07 下午3.29.43.png

Max trasfer rate: per rotation, all data that fits on one track

Disk access latency: swivel + rotation

Differences between swivel and rotation

截屏2024-12-07 下午3.41.58.png

1. Swivel（寻道时间，Seek Time）：

作用：磁盘的读写磁头需要移动到存储数据的正确轨道（Track） 。这是一个上下或横向的移动。
过程：磁头固定在机械臂上，机械臂会向内或向外移动（类似上下运动）以到达正确的轨道。
目标：找到文件所在的轨道（Track）。
本质：这个动作是沿着磁盘的半径方向（radial）移动。
举例：假设磁头当前在轨道10上，而你要读取轨道50的数据，磁头需要“滑动”（swivel）到轨道50。

2. Rotate（旋转延迟，Rotational Latency）：

作用：当磁头到达正确的轨道后，还需要等磁盘旋转到正确的扇区（Sector）。
过程：磁盘像一个转盘一样旋转，直到数据所在的扇区正好到达磁头下方。
目标：找到轨道上的具体扇区（Sector）。
本质：这个动作是磁盘盘片绕中心点的旋转（circular motion）。
举例：假设磁头已经在轨道50上，但数据在轨道50的第8扇区。磁盘需要旋转，直到第8扇区转到磁头下方。

综合起来：

Swivel (寻道) ：磁头上下移动，找到文件所在的磁道（Track）。
Rotate (旋转) ：磁盘旋转，找到轨道上的具体扇区（Sector）。

这两个步骤加起来就是完成一次磁盘访问的机械延迟（Disk Access Latency）。再加上数据传输时间（Transfer Time），就形成了完整的磁盘访问时间。

截屏2024-12-07 下午3.42.48.png

截屏2024-12-07 下午3.46.02.png

SSD(Solid State Drives) 原来只要$100-200 Per TB Transfer rate > 500 MB/s Big impact on large data and I/O efficient computing

我们的计算题更贴近Hard Disks 而不是 SSD.

计算题例题：

截屏2024-12-07 下午3.50.09.png ？这里的block size是什么意思：

Block size

磁盘块，是磁盘中用于存储文件数据的最小单位。图上计算了读取一个4kb block的时间情况。

为什么需要块？

文件系统效率：磁盘按块划分是为了方便文件存储和管理。一个文件可能会分布在多个块上。
读取和写入性能：块的大小影响读取性能。较小的块适合存储小文件，但可能增加文件系统元数据的开销；较大的块更适合大文件，但可能浪费空间（因为块内未使用的空间也会被占用）。

已上传的图片

磁盘访问的俩种不同的性能建模方式： Block Model 和 Seek or Latency Transfer-Rate (LTR) Model。

1. Block Model（块模型）

含义：在这种模型下，磁盘将文件视为由固定大小的块组成（如 4KB、8KB、16KB 等）。访问每个块都会产生固定的访问延迟。
特点：
- 每次读取一个块都会计算一次访问时间（包括寻道时间和旋转延迟），即使这些块在同一个文件中。
- “块导向” ：强调以块为单位进行读写，而不是文件整体。
- 例子：
  - 如果读取一个 40KB 的文件（假设块大小为 4KB），需要读取 10 个块。
  - 如果每次块的访问时间为 10ms，那么总时间为： 10 ms/block×10 blocks=100 ms10 , \text{ms/block} \times 10 , \text{blocks} = 100 , \text{ms}
适用场景：这种模型更适合用于随机小块的读写场景，因为它考虑了每块独立的访问成本。

Performance Measures of Disks

书 Page 561 12.3.2

The main measures of the qualities of a disk are capacity, access time, data-trasfer rate, and reliability.

Access time is the time from when a read or write request is issued to when data transfer begins. To access data on a given sector of a disk, the arm first must move so that it is positioned over the correct tract, and then must wait for the sector to appear under it as the disk rotates. The time for repositioning the arm is called the seek time 寻道时间, and it increases with the distance that the arm must move, depending on how far the tract is from the initial arm position. Smaller disks tend to have lower seek times since the head has to traver a smaller distance. Normally 1/2 of the max seek time. Avg: 4-10 ms

Rotation Latency time 旋转延迟时间: Sector rotate. avg of the average latency time is 1/2 of a full rotation of the disk.

Access Time = seek time + rotation latency time. avg ranging from 5 to 20 ms.

Sequential Access Pattern: 只用加一次 acess time

2. Seek or Latency Transfer-Rate (LTR) Model（寻道或延迟-传输率模型）

含义：在这种模型下，文件被整体处理，访问时间（寻道时间 + 旋转延迟）只计算一次，然后加上文件传输所需的时间。
特点：
- “文件导向” ：将文件视为一个整体。
- 访问时间只计算一次，适合假设文件连续存储在磁盘上（大多数情况下这个假设是“部分正确”的）。
- 总时间为：总时间=访问时间（一次）+传输时间\text{总时间} = \text{访问时间（一次）} + \text{传输时间}
例子：
- 对于一个大小为 40KB 的文件，如果访问时间为 10ms，传输速率为 80KB/ms：总时间=10 ms (访问时间)+4080 ms (传输时间)=10.5 ms\text{总时间} = 10 , \text{ms (访问时间)} + \frac{40}{80} , \text{ms (传输时间)} = 10.5 , \text{ms}
适用场景：这种模型适合顺序访问的大文件，能够更精确地反映读取大块数据时的性能。

比较与适用场景

模型	特点	适用场景
Block Model	以块为单位计算，每块单独计算访问时间	随机访问的小块数据
LTR Model	以文件为单位计算，只计算一次访问时间，文件假设连续	顺序访问的大文件

总结：

Block Model 更适用于小文件或随机读写操作，因为它精确考虑了每次访问的开销。
LTR Model 假设文件是连续存储的，因此在读取大文件时更高效，更贴近真实的顺序访问场景。

这两个模型与传统硬盘（HDD）和固态硬盘（SSD）之间的关系是间接的，而不是绝对的划分。让我们详细讨论一下：

1. Block Model 更贴近 HDD 吗？

贴近 HDD 的原因：
- 传统硬盘（HDD）的访问时间主要由机械运动（寻道时间 和 旋转延迟）主导，这正是 Block Model 强调的核心。
- 在 HDD 上，读取每个块都可能涉及磁头的移动（寻道）和磁盘的旋转（旋转延迟），即使这些块属于同一个文件。因此，Block Model 更适合描述 HDD 的性能。
使用场景：
- HDD 的随机读取性能很差，因为机械移动会造成显著延迟。如果一个文件分散在多个位置（非连续存储），访问多个块的总时间会接近 Block Model 的估算。

2. LTR Model 更贴近 SSD 吗？

贴近 SSD 的原因：
- 固态硬盘（SSD）没有机械运动（如寻道和旋转延迟），其访问时间主要依赖于电子操作（如读取内存单元的延迟和传输时间）。
- 在 SSD 上，文件的连续性不再像 HDD 那样显著影响性能。即使文件分布在多个位置，SSD 的访问时间差异也很小。
- 因此，LTR Model 假设访问延迟只计算一次（更小），其传输时间是主要因素，这更符合 SSD 的性能特性。
使用场景：
- 在 SSD 上，无论是随机访问还是顺序访问，其延迟差异很小，因此 LTR Model 更能准确描述 SSD 的性能。

3. 是否绝对相关？

并非绝对：
- 这两个模型更多是描述访问性能的不同方式，而不是直接绑定到 HDD 或 SSD。
- 在某些情况下，HDD 的文件连续性很好（顺序存储），可以用 LTR Model 估算性能；而 SSD 在处理非常小的随机读写时（例如数据库的小块读写操作），也可能需要用 Block Model 描述性能。

总结：

Block Model 更贴近 HDD，因为 HDD 的性能受寻道时间和旋转延迟的显著影响。
LTR Model 更贴近 SSD，因为 SSD 没有机械延迟，更适合用传输时间和一次性访问延迟来建模。
不过这两个模型更多是概念上的分类，实际应用中可能根据具体场景（如文件大小、访问模式等）选择合适的模型，而不完全依赖硬件类型。

Block Model计算示例

已上传的图片

RPM： Revolution Per Minute,每分钟旋转次数。磁盘的转速时7200 RPM，代表磁盘盘片每分钟旋转7200次，

每次完整的旋转时间是60秒/RPM 。也可以通过旋转公式计算： 1000/RPM/60 得到的是每次旋转的实际使用平均值：毫秒(ms)。-> 但这个真的想问一下，这个究竟是每次旋转的实际使用平均值还是实际使用平均值的一半。

Transfer Time 从MB/s到KB/s的转化： 1MB = 1024KB

Block Model is still very popular:

It is simple.
It is okey for small random reads and writes(transactions)

Block Model Algorithm to calculate the time is no good for search engines and data mining workloads：块模型特别适合处理随机的小规模读写操作

因为 - 搜索引擎和数据挖掘工作负载通常需要处理大量顺序存储的文件或大规模连续数据。 - 块模型在这些场景中会高估访问时间，因为它假设每块数据都需要单独计算访问延迟（如寻道时间和旋转延迟）。 - 实际上，在连续读写场景下（如搜索引擎读取索引文件、大型数据集），文件可能是顺序存储的，访问延迟只需计算一次，而块模型无法反映这一点。

适用模型：顺序存储的大文件更适合用 LTR Model（Latency Transfer-Rate Model）进行建模。

视频学习链接：# 计算机组成原理题型一关于磁盘的计算题

I/O: Efficient Sorting

书 Chapter 15.4 Sorting

Sorting needed in many cases. Data may not fit in main memory. Many algorithms will be inefficient if data on disk. Most popular I/O efficient method: Merge Sort. 会有比较小的sorting 能够fit 在main memory里的sorting 我们这里只讨论大于main memory的sorting 以及使用的时间。

截屏2024-12-07 下午4.47.25.png

截屏2024-12-07 下午4.52.39.png

我觉得很有意思的一点是，之前一直没有明白，为什么要选择merge sort，merge sort的使用场地，但是在这里就很明确，因为内存有限，分为一块一块的然后在sorting （这里仍然没有特别明白）

这里明白双输入因为可以加快输入进程，但是没有明白为什么输出输入是80:20（另一页PPT）也没明白最后是怎么就sort好的。I/O efficiency 这里没有特别明白我个人的感觉是：80 进行双输入，20 进行输出，因为输出不用 明白了-》请看视频：408数据结构梳理—外部排序讲的比PPT好多了。

补充思考

计算机在执行一项操作的过程，其主要分为三部分
1.数据的输入 2.数据的处理 3.数据的输出
这里1和3就是IO过程，对于数据库而言，时间成本主要由IO造成

d-way merge 多路归并排序

视频链接【天勤考研】外部排序1

读入读出算俩次I/O操作。初始归并越长，减少I/O次数。有空可以多看一下，时间不够不想听了。

截屏2024-12-07 下午10.04.20.png

Week 9 Midterm

Week 10 External Sorting + Indexing 外部排序/索引

Chapter 15.4 Chapter 14.1 Chapter 14.2 Chapter 14.3-14.6 Chapter 14.9

An Index is a data structure that supports efficient retrival of "certain types of tuples" from a table.

Without index: need to scan entire table or binary search if the table is sorted(b站讲的更好一些)

problem: Find records of certain type quickly: Equality Queries/Range Queries. Goal: faster processing of queries.

Various Types if Indexes:

Clustered(primary) or Unclustered(Secondary) 视频 Clustered vs. Nonclustered Index Structures in SQL Server

聚集索引(Clustered Index)

Also called the primary indices. Since a clustered Index contains the base table data itself, this is why you can only create one clustered index table. 最后一节直接是数据。If all files are ordered sequentially on some search key(通常是主键). Such files, with clustering index on the search key, are called index sequential files. 比如按照学号排序的学生表

非聚集索引（Unclusted Index）

Also called secondary Indices. 最后一节是指向数据的pointer。Order is indepedent of the base table data.
Index record points to a bucket that contains pointers to all the actual records with that particular search-key value.
Secondary indices have to be dense.
聚簇索引的特点：
- 优势：对顺序扫描和范围查询效率更高。
- 劣势：插入、删除和更新操作会产生较大开销，因为数据需要保持物理顺序。
非聚簇索引的特点：
- 优势：可以有多个索引，不受物理存储顺序限制。
- 劣势：顺序扫描和范围查询效率低，尤其在磁盘 I/O 敏感场景中表现较差。
适用场景：
- 聚簇索引：适用于频繁的顺序或范围查询（如时间序列数据、日志文件等）。
- 非聚簇索引：适用于特定值的随机查询（如主键或唯一键查询）。

常见面试题：clustered Index, Unclusted Index Differences.

Sparse or Dense：
- Index Entry(索引条目) search-key value Entry.
- dense index 稠密索引, a record is created for every search key valued in the database. Helps search faster but needs more space to store index records.
- Sparse index: only for some values in the file. Applicable when records are sequentially ordered on search-key. Less space and less maintenance overhead for insertions and deletions. Generally slower than dense index for locating records. Least search-key value in the block. For unclustered index: sparse index on the top of dense index(multilevel index)
Single Key or Composite or Multi-Dimensional

Multilevel Index

If the index does not fit in memory, access becomes expensive. Treat index kept on disk as a sequential file and construct a sparse index on it.

二分查找对于Insertion, deletion 会导致整个sorting之后的表的大改变，效率反而--。

Different Implementation(data structure: Tree Structures(orderd), Hashing(non-ordered), others.

Binary Tree, B Tree, B+ Tree 视频: 一个动画搞懂MySQL索引原理！

B+ Tree:

Degree >> 2: B+树的阶数远大于2，也就是说，每个节点（特别是非叶子节点）可以有很多个子节点，而不仅仅是二叉树的两个子节点（2 是二叉树的阶数）
Data in leaves only.
Same depth everywhere：不一样的高度会导致读取数据的速度不一，导致程序不稳定。
B+ 树是树+链表的格式（最后）
B+ Tree VS Index-Sequential Files:
- Indexed-Sequential Files：
- 缺点：性能下降随文件增长而加剧，需要定期重组。
- 适用场景：适合数据较小且增长较慢的场景。
- B+树：
- 优点：动态平衡、高效支持插入、删除和范围查询，无需定期重组。
- 缺点：额外的插入/删除开销和存储空间需求。(节点满了更新节点会导致父节点的更新需要时间)
- 适用场景：大规模数据管理，尤其是频繁插入、删除和查询的数据库系统。

Indexing Outline:

Basic Concepts,
Ordered Indices
B+ Tree Index Files
Hashing

Index Evaluation Metrics

Access types supported Efficiently
- Records with a specified value in the attribute. 特定值
- Records with an attribute value falling in a specific range 特定区域
Access time 访问时间: $t_{a}, t_{b}$
- $t_a$ ：表示访问磁盘块的时间（通常涉及磁盘 I/O）。
- $t_b$ ：表示在内存中查找数据的时间（比如在节点中进行键值比较）。
- 总访问时间可以表示为 $T=k×ta+m×tb$ ，其中： k：访问磁盘块的次数。 m：在内存中比较键值的次数。高效的索引结构应尽量减少磁盘访问（降低 kkk 值）和内存比较次数（优化 mmm 值）
Insertion Time
Deletion Time
Space Overhead

数据库索引的工作原理： blog.csdn.net/zhongkeyuan…

截屏2024-12-07 下午10.40.49.png

Ordered Indices.

B+ Tree

Each leaf can hold up to n - 1 nodes, and contains as few as (n-1)/2 values.

The nonleaf nodes of the B+ tree from a multilevel index on the leaf nodes.

The nonleaf nodes of the B+-tree form a multilevel (sparse) index on the leaf nodes. The structure of nonleaf nodes is the same as that for leaf nodes, except that all pointers are pointers to tree nodes. A nonleaf node may hold up to n pointers and must hold at least ⌈n/2⌉ pointers. The number of pointers in a node is called the fanout of the node. Nonleaf nodes are also referred to as internal nodes.

截屏2024-12-08 下午12.08.40.png

截屏2024-12-08 下午12.09.48.png

！root节点至少有俩个孩子。

Queries on B+ Tree

Find, Range Queries, Updates, Insertion, Deletion

Week 11 Query Processing and Optimization

Chapter 15

树 + Query Optimization with indexing.(HW6 )

How DataBase systems work internally