前言
列示存储,你的优势就是你的劣势,你的劣势还是你的列示。
环境准备
默认已经搭好了cassandra
创建键空间
示例:
CREATE KEYSPACE one WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };
CREATE KEYSPACE two WITH replication = {'class': 'NetworkTopologyStrategy', 'DC1' : 1, 'DC2' : 3} AND durable_writes = false;
CREATE KEYSPACE语句有 两个属性:replication和durable_writes 和 NetworkTopologyStrategy
两个策略:简单策略 SimpleStrategy ,网络拓扑策略 NetworkTopologyStrategy
两个策略简单来说区别是 单数据中心用简单策略,多数据中心用网络拓扑策略 这里用简单策略 即使用键空间为 one
建表 与分区
create table device_sensor_smoke
(
unique_id text,
time_id timeuuid,
#略
temperature double,
primary key (unique_id, time_id, event_id)
) #表后可以跟一些选项,目前先忽略
主键选择有3种方式
1. unique_id text PRIMARY KEY,
2. primary key (unique_id, time_id, event_id)
3. primary key (unique_id, time_id), event_id)
- a primary key 或者写作 primary key(a) 表示 a 是分区键,没有聚簇列(clustering columns)
- primary key (a, b , c ) a是分区键(partition key) b,c 是聚簇类
- primary key (a, b), c) a,b 组成分区键或者叫复合主键(composite partition key) event_id是聚簇列 在表中,CQL 定义了分区的概念。分区只是共享相同分区键值的一组行。注意,如果分区键由多列组成,则行属于同一分区,只有它们对所有这些分区键列具有相同的值。
例如,给定下表定义和内容:
CREATE TABLE t (
a int,
b int,
c int,
d int,
PRIMARY KEY ((a, b), c, d)
);
SELECT * FROM t;
a | b | c | d
---+---+---+---
0 | 0 | 0 | 0 // row 1
0 | 0 | 1 | 1 // row 2
0 | 1 | 2 | 2 // row 3
0 | 1 | 3 | 3 // row 4
1 | 1 | 4 | 4 // row 5
row1 与 row2 处于同一分区中 这解释了为什么我们的项目表为什么用 unique_id 作为第一位的主键 ,因为 unique_id拥有大量重复,cassandra 会根据unique_id顺序查询。
CREATE TABLE t (
a int,
b int,
c int,
PRIMARY KEY (a, b, c)
);
SELECT * FROM t;
a | b | c
---+---+---
0 | 0 | 4 // row 1
0 | 1 | 9 // row 2
0 | 2 | 2 // row 3
0 | 3 | 3 // row 4
也正是由于此特性,所以我们将拥有大量重复的unique_id 作为主键第一位,也就是对应上方表中的 a 。
这样我们表中的数据才是有序的,如果我们将唯一的值 例如 time_id。作为第一主键。那么每个单独的值都是一个区,这样的话最终的结果将是乱序的。
查询语句
通过上面的建表语句我们已经见识到了 cql 是有严格的规则的,这一特性也体现到了查询语句中 建立如下表
CREATE TABLE posts (
userid text,
blog_title text,
posted_at timestamp,
entry_title text,
content text,
category int,
PRIMARY KEY (userid, blog_title, posted_at)
)
表中未插入任何数据 , 随后做如下查询
SELECT entry_title, content FROM posts
WHERE userid = 'john doe'
AND blog_title='John''s Blog'
AND posted_at >= '2012-01-01' AND posted_at < '2012-01-31'
得到的结果为(我未插入任何数据)
entry_title | content
-------------+---------
(0 rows)
在之后做如下查询
SELECT entry_title, content FROM posts
WHERE userid = 'john doe'
AND posted_at >= '2012-01-01' AND posted_at < '2012-01-31'
#得到的结果为
InvalidRequest: Error from server: code=2200 [Invalid query] message="PRIMARY KEY column "posted_at" cannot be restricted as preceding column "blog_title" is not restricted"
如果熟悉sql的话,乍一看并没有感觉出第二条有什么不妥,并且是在第一次查询成功的情况下。 但是我们发现 主键结构为(userid, blog_title, posted_at) ,第一次查询顺序 与主键相同,第二次只查询了 userid, posted_at 。
userid 与 posted_at在主键上并不是连续的,所以会报错。 Cassandra是允许“过滤”的,同事也要考虑空间连续性。
select * from posts ;
select * from entry_title ="cql怎么用";
上述两条是可以通过的。
但是
select * from entry_title ="cql怎么用" and category = 1;
这个查询会被拒绝,因为 entry_title 与 category 并非连续,所以Cassandra 并不能保证即使 满足的条件很少,也要扫描大量的数据。
但是可以加上 ALLOW FILTERING 来强制执行,同理上面失效的 查询也可以。 所以要想要刚才两条查询生效需要写成如下的样子 。
select * from entry_title ="cql怎么用" and category = 1 ALLOW FILTERING ;
SELECT entry_title, content FROM posts
WHERE userid = 'john doe'
AND posted_at >= '2012-01-01' AND posted_at < '2012-01-31' ALLOW FILTERING ;
排序
其实cassandra的任何查询,最后的结果都是有序的,默认与建表时指定的排序规则一致(如无指定默认为升序,参考上方 t表)
但是如果有特殊排序需求也能满足。
也就是说 Cassandra支持自定义排序,但也是限制重重
创建如下表
create table teacher(
id int,
address text,
name text,
age int,
height int,
primary key(id,address,name)
)WITH CLUSTERING ORDER BY(address DESC, name ASC);
插入如下数据
insert into teacher(id,address,name,age,height) values(1,'guangdong','lixiao',32,172);
insert into teacher(id,address,name,age,height) values(1,'guangxi','linzexu',68,178);
insert into teacher(id,address,name,age,height) values(1,'guangxi','lihao',25,178);
insert into teacher(id,address,name,age,height) values(2,'guangxi','lixiaolong',32,172);
insert into teacher(id,address,name,age,height) values(2,'guangdong','lixiao',32,172);
insert into teacher(id,address,name,age,height) values(2,'guangxi','linzexu',68,178);
insert into teacher(id,address,name,age,height) values(2,'guangxi','lihao',25,178);
insert into teacher(id,address,name,age,height) values(2,'guangxi','nnd',32,172);
通过以上建表与查询的套路,我们也知道,所有的限制必然是来自辣个男人(表结构)。
那么我们直接上语句,来感受这次列式存储在又作什么妖。
正确示例 1:
SELECT * FROM teacher WHERE id=1 ORDER BY address ASC;
SELECT * FROM teacher WHERE id=1 ORDER BY address ASC, name ASC;
SELECT * FROM teacher WHERE id=1 AND address='guangxi' ORDER BY address ASC;
SELECT * FROM teacher WHERE id=1 AND address='guangxi' ORDER BY address ASC, name ASC;
SELECT * FROM teacher WHERE id=1 ORDER BY address DESC;
SELECT * FROM teacher WHERE id=1 ORDER BY address DESC, name DESC;
SELECT * FROM teacher WHERE id=1 AND address='guangxi' ORDER BY address DESC;
SELECT * FROM teacher WHERE id=1 AND address='guangxi' ORDER BY address DESC, name DESC;
正确示例 2:
SELECT * FROM tt WHERE id=1 ORDER BY address DESC;
SELECT * FROM tt WHERE id=1 ORDER BY address DESC, name ASC;
SELECT * FROM tt WHERE id=1 AND address='guangxi' ORDER BY address DESC;
SELECT * FROM tt WHERE id=1 AND address='guangxi' ORDER BY address DESC, name ASC;
SELECT * FROM tt WHERE id=1 ORDER BY address ASC;
SELECT * FROM tt WHERE id=1 ORDER BY address ASC, name DESC;
SELECT * FROM tt WHERE id=1 AND address='guangxi' ORDER BY address ASC;
SELECT * FROM tt WHERE id=1 AND address='guangxi' ORDER BY address ASC, name DESC;
错误示例展示
SELECT * FROM teacher ORDER BY address DESC; //没有第一主键 不行
SELECT * FROM teacher WHERE id=1 ORDER BY name DESC; //必须以第二主键开始排序
SELECT * FROM teacher WHERE id=1 ORDER BY address DESC, name ASC; //不是与建表时指定的排序一致或者完全相反 (默认是address ASC, name ASC)
SELECT * FROM teacher WHERE age=1 ORDER BY address DESC; //不能有索引
SELECT * FROM tt WHERE id=1 ORDER BY address DESC, name DESC; //不是与建表时指定的排序一致或者完全相反 (建表时指定了address DESC, name ASC)
推荐阅读/本文参考文献
SpringDate for Cassandra
Cassandra 官网
cassandra.apache.org/doc/latest/…
互联网文章