Hbase 架构
Data: get, put, delete
Region: splitRegion, compactRegion
Table:create, delete, after
RegionServer: 分配regions到每个RegionServer, 监控每个RegionServer的状态
软连接hadoop配置文件到Hbase
ln -s /usr/local/src/hadoop-2.7.7/etc/hadoop/core-site.xml /usr/local/src/hbase-1.3.6/conf/core-site.xml
创建命名空间
list_namespace
create_namespace "elaiza"
create "elaiza:stu", "info"
hive 与 hbase 关联
create 'stuedent','info'
put 'stuedent', 'info:name', 'elaiza'
put 'stuedent', 'info:age', '24'
put 'stuedent', 'info:sex', 'fmale'
CREATE EXTERNAL TABLE hbase_student(
user_id string,
age string,
name string,
sex string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,info:name,info:age,info:sex")
TBLPROPERTIES ("hbase.table.name" = "student");
CREATE EXTERNAL TABLE hbase_student_map(
user_id string,
info map<string,string>)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,info:")
TBLPROPERTIES ("hbase.table.name" = "student");
编译
bin/sqlline.py master:2181
预分区
1.手动设定预分区
create 'staff1', 'info', 'partition1', SPLITS => ['1000','2000','3000','4000']
2.跟集群数量相关和数据量相关
2.1集群数量的两到三倍
rowkey的设计
一、预分区
二、散列性(与多个分区相关)=> 没有顺序的
三、唯一性 (一条数据的唯一标识就是rowKey,那么这条数据储存于哪个分区,取决于rowkey处于哪个一个预分区的区间内)
作用:设计rowkey的主要目的,就是让数据均匀的分分布于所有的region中,在一定程度上防止数据倾斜