数据库超码：可以用来唯一的确定一个元组，可以有对确定元组无用的属性存在。候选码：最小的超码，可以有多个。视图是一种虚

数据库基础

关系模型介绍

关系数据库的结构

关系：数据库中的表元组：数据库表中的一行（代表一组数据）属性：数据库表中的一列（如ID）

码

超码：可以用来唯一的确定一个元组，可以有对确定元组无用的属性存在。候选码：最小的超码，可以有多个。主码：选一个候选码用来标识元组，即为主码外码：如果r1的属性中包括了r2的主码，则称作r1参照了r2的外码

关系运算

SQL

数据定义语言(DDL)：定义关系模式，删除关系，修改关系的命令数据操纵语言(DML)：从数据库中查信息，插，删，修改信息的命令这里的sql语句都以mysql为例

用到的表

基本数据类型

类型	特性
char(n)	固定长度的字符串，指定n
varchar(n)	可变长字符串，最大长度n
int	整数类型
numeric(p,d)	定点数，p位数字，d为在小数点右边
float(n)	精度为n为的浮点数

基本运算

DDL

创建

create table name
( A d1,
  B d2,
  约束
 );

# 例子
create table instructor
( ID char(5) not null,
  name varchar(50),
  salary numeric(8,2)
  dept_name varchar(20),
  primary key(ID),
  foreig key(dept_name) references department);

修改

	# 增加一个属性，A是名字，D是域
	alter table r add A D;
	
	# 删除一个属性，A是名字
	alter table r drop A;

DML

增

insert into instructor
values(10000, "qaz", 10000, "cs",
	   10001, "wsx", 10001, "ee");

insert into instructor
	select id, name, dept_name, 190000
	from student
	where dept_name = 'cs';

删

# delete删除数据，但是保留关系
delete from student
where x;

# drop删除数据和关系
drop table instructor;

改

update instructor
set salary = salary*1.1
where salary < 50000;

查

单关系查询

# 使用distinct查询出来的结果会默认去除重复，默认不写的话，则没有
select distinct dept_name, salary*1.1
from instructor
where dept_name = "cs" and salary > 5000;

多关系查询查询的顺序依次是from（产生笛卡尔积）->where（用于筛选）->select（用于输出指定属性）
```
select instructor.name, department.building
from instructor, department
where instructor.dept_name = department.dept_name;
```

自然连接自然连接两个表相同属性名的值相等，然后连起来为一个新的元组

select name coure_id
from instructor natural join teaches;

# 指定属性
select name, title
from (instructor natural join teaches) join course using (course_id);

附加的基本运算

更名

select I.name as instructor_name
from instructor as I;

字符串运算

# 使用like来进行模式匹配，%匹配任意子串，_匹配任意一个字符
select dept_name 
from departname
where building like '%a%';

排序

select *
from instructor
order by salary desc;

集合运算

# 并
(select course_id
from course
where dept_name = 'cs')
union
(select course_id
from course
where dept_name = 'ee')

# 交
(select course_id
from course
where dept_name = 'cs')
intersect
(select course_id
from course
where dept_name = 'ee')

# 差
(select course_id
from course
where dept_name = 'cs')
except
(select course_id
from course
where dept_name = 'ee')

聚集函数

平均值：avg 最小值：min 最大值：max 总和：sum 计数：count

# 基本聚集
select avg(salary)
from instructor
where dept_name = 'cs';

# 分组聚集
select dept_name, avg(salary) as avg_salary
from instructor
group by dept_name;

# having子句，having是在形成分组之后才起作用，且出现having子句中，但是没被聚集的属性必须出现在group by子句中。
select dept_name, avg(salary) as avg_salary
from instructor
group by dept_name;
having avg(salary) > 42000;

嵌套子查询

成员资格查询

select course_id
from section
where semester = 'fall' and year = 2009 and 
course_id in (select course_id
			  from section
			  where semester = 'spring' and year = 2010);

集合的比较

# all代表所有，some代表至少有一个
select name
from instructor
where salary > all(some) (select salary
						  from instructor
						  where dept_name = 'biology');

空关系测试

# 找出选修了biology系开设的所有课程的学生
select s.id, s.name
from student as s
where not exists((select course_id
				  from course
				  where dept_name = 'biology'
				  except
				  (select t.course_id
				  from takes as t
				  where s.id = t.id));

from子句中的子查询

select dept_name, avg_salary
from (select dept_name, avg(salary) as avg_salary
	  from instructor
	  group by dept_name)
where avg_salary > 42000;

with子句

# with子句新建了一个临时表
with max_budget(value) as
	(select max(budget)
	from department)
select budget
from department, max_budget
where department.budget = max_budget.value;

高级一点的sql操作

连接表达式

而内连接就是普通连接

# 会保留student中所有的元组
select *
from student natural left join takes

视图视图是一种虚关系，而不是真实存在的。但是使用可以带来很多方便，比如说一个非常复杂语句才能查出来的一组数组，我们可以使用视图，直接使用select语句就能查出来。
```
create view faculty(ID, name, dept_name) as
select ID, name, dept_name
from instructor;
```

视图也可以物化，即保持视图随着实际的关系改变而改变。

check子句我们使用check字句对属性域加以限制

create table section
(course_id varchar(7),
semester varchar(10),
 room_number varchar(6),
 check (semester in ('fall', 'winter', 'spring', 'summer')));

授权创建一个用户并且对其设定访问的权限

# 授权
grant 权限列表
on 关系名
to 用户名

# 收回权限
revoke 权限列表
on 关系名
from 用户名

# 创建一个角色
create role instructor;

使用高级语言去连接数据库使用python的例子

#!/usr/bin/python
# -*- coding: UTF-8 -*-

import MySQLdb

# 打开数据库连接
db = MySQLdb.connect("localhost", "user", "password", "TESTDB", charset='utf8' )

# 使用cursor()方法获取操作游标 
cursor = db.cursor()

# 使用execute方法执行SQL语句
cursor.execute("SELECT VERSION()")

# 使用 fetchone() 方法获取一条数据
data = cursor.fetchone()

print("Database version : %s " % data)

# 关闭数据库连接
db.close()

函数类似于高级程序语言中的函数

# 创建一个函数
create function instructor_of(dept_name varchar(20))
return table(
	id varchar(5),
	name varchar(20))
return table
	(select id, name
	 from instructor
	 where instructor_dept_name = dept_name );
	
# 调用一个函数
call instructor_of('cs');

触发器触发器是用来当某种情况发生时，自动开始执行某项任务。

create trigger timeslot_check1 after insert on section
referencing new row as nrow
for each row
when (nrow, time_slot_id not in
	 (select time_slot_id
	 from time_slot))
begin
	rollback
end;

数据库设计

E-R图

E-R图示用来图形化表示数据库的全局逻辑结构。

基本结构
映射的基数
也可以是下面这种
上图说明一个教师至少指导0个及以上的学生，一个学生有且有一个老师。
弱实体集没有足够的属性形成主码的实体集。

弱实体集通过联系集sec_course依赖于强实体集course
如果将E-R图转换为关系模式
而对于弱实体集来说，需要把依赖的实体的主码给加上。

范式

第一范式

在关系模型中，属性没有任何子结构，即一个域是原子的，该域的元素都是不可分的单元，我们称一个关系模式R属于第一范式

BC范式

首先给出函数依赖的概念

解决不是BC范式

举个例子即dept_name可以完全推出inst_dept，但是其又不是主码，所以不满足BCNF。

分解为

第三范式（比BC范式弱）

索引与散列

索引是为了快速找到值，数据库中索引使用B+树建立。分类：主键索引：不能重复。id 不能是null (设定为主键后数据库会自动建立索引)。唯一索引：不能重复。id 可以是null，主键索引不一样的地方就是可以有多个。

B+树

下面是B+的结点的例子，其中k是每个结点中的搜索码值（用来表示指针），p是指针（用来指向文件的位置）。

一个B+树的例子

B+树的特点一个B+树的非叶结点的指针都是指向树中结点的指针，一个非叶结点最多容纳n个指针，最少容纳n/2的向上取整（这里解释下n，n一般是自己定的一个值），而叶结点最多有n-1个值，最少包含(n-1)/2向上取整个值（注意要和指针区分开来）。具体的插入删除操作可以参考这篇博客

事务

事务是更新各种数据项的一个程序执行单元。需要满足原子性：事务要么执行，要么不执行，不能执行到一半。隔离性：几个事务并发，每个事务都能保证原子性和一致性。一致性：保证数据的总值不会多，也不会少。持久性：不会丢。

事务的原子性和一致性

可串行化

即让一个并行的调度可串行化。我们考虑一个调度S，如果只有read和write，我们考虑四种情况。

如果调度S可以经过一系列非冲突指令交换转换为S'，则两者是冲突等价的。如何看一个调度是否可以串行化，我们使用看起生成的优先图是否能拓扑排序

我们可以看到t1的read(A)之后是t2的write(A)，然后t2的read(A)，他的write(A)，所以就构成了下图。

事务的隔离级别（按顺序由高到底）

可串行化：保证可串行化调度，使事务直接不会冲突。
可重读：它确保同一事务的多个实例在并发读取数据时，会看到同样的数据行。
读取提交内容：一个事务只能看见已经提交事务所做的改变。这种隔离级别也支持所谓的 - 不可重复读，因为同一事务的其他实例在该实例处理其间可能会有新的commit，所以同一select可能返回不同结果。
读取未提交内容：最低的隔离等级，允许其他事务看到没有提交的数据，会导致脏读。

如果隔离性不好，可能会导致以下问题

脏读：脏读是指在一个事务处理过程里读取了另一个未提交的事务中的数据。
不可重复读：不可重复读是指在对于数据库中的某个数据，一个事务范围内多次查询却返回了不同的数据值，这是由于在查询间隔，被另一个事务修改并提交了。
幻读：一个事务T1按相同的查询条件重新读取以前检索过的数据，却发现其他事务T2插入了满足其查询条件的新数据，这种现象就称为“幻读”。（和可重复读类似，但是事务 T2 的数据操作仅仅是插入和删除，不是修改数据，读取的记录数量前后不一致）

使用锁来解决隔离性

悲观锁：数据库层面加锁，每次去拿数据的时候都认为别人会修改，所以每次在拿数据的时候都会上锁，这样别人拿这个数据就会阻塞，直到它拿锁。共享锁：如果事务t拿到了数据项上的共享锁，则其不能对数据项进行写，但是可以读。

排他锁：如果事务t拿到了数据项上的排他锁，则其可以对数据项进行写和读。
乐观锁：表中有一个版本字段（时间戳），第一次读的时候，获取到这个字段。处理完业务逻辑开始更新的时候，需要再次查看该字段的值是否和第一次的一样。如果一样更新，反之拒绝。之所以叫乐观，因为这个模式没有从数据库加锁。

[1] 数据库系统概念第六版，机械工业出版社 [2] github.com/ZXZxin/ZXBl… [3] www.cnblogs.com/nullzx/p/87…