Doris安装使用

705 阅读5分钟

一.整体架构

image.png

二. 安装部署

1. 资源规划

core1master1
FE FOLLOWERFE OBSERVER
BEBE
实例名称端口名称默认端口通信方式说明
FEhttp_port8030 -> 8034用户/FE<--> FEFE上的http端口 web管理界面
BEwebserver_port8040 -> 8041BE<-->FEBE上的http端口 跟hadoop端口号冲突
FEquery_port9030用户<--> FEFE上的mysql server端口 navicat连接端口

2.部署步骤

  1. 方式一:bin包安装
1.下载apache-doris-2.0.0-bin-x64.tar.gz
2.解压  
3.修改配置文件 
修改ip、port
新增JAVA_HOME=/data/module/jdk1.8.0_212

4.新建数据目录
mkdir doris-meta
mkdir doris-storage1
mkdir doris-storage2

5.建库
#连接FE,默认无密码  
mysql -h core1 -P 9030 -uroot
#修改密码:
SET PASSWORD FOR 'root' = PASSWORD('000000'); 
#添加用户:  
create user 'test' identified by 'test';
#再次进入: 
mysql -h master1 -P 9030 -uroot -p000000

CREATE USER 'test' IDENTIFIED BY 'test';
GRANT ALL ON wafbase TO test;

SHOW ROLES;
show all grants;

#查看sql语法: 
HELP ALTER TABLE ;

#添加BE要用ip,不能用host
ALTER SYSTEM ADD BACKEND "192.168.12.14:9050"; 
ALTER SYSTEM ADD BACKEND "192.168.12.10:9050"; 
ALTER SYSTEM ADD BACKEND "192.168.12.5:9050"; 
SHOW PROC '/backends';

#添加FE
ALTER SYSTEM ADD FOLLOWER "192.168.14:9010";
ALTER SYSTEM ADD OBSERVER "192.168.12.10:9010";
SHOW PROC '/frontends';

#添加BROKER
ALTER SYSTEM ADD BROKER broker_name "192.168.12.5:8000","192.168.12.10:8000","192.168.12.14:8000";
SHOW PROC "/brokers";

6.启动 
第一次启动加helper
fe/bin/start_fe.sh --helper 192.168.12.5:9010 --daemon 
fe/bin/start_fe.sh --helper 192.168.12.5:9010 --daemon 

be/bin/start_be.sh --daemon
fe/bin/start_fe.sh --daemon

7.web界面
http://192.168.12.14:8034/login
  1. 方式二:安装manager一键部署doris
1.安装manager的用户(hadoop)要配好ssh免密
ssh: vim /etc/ssh/sshd_config(注意不同用户路径不同)
切换到hadoop用户生成公钥


2.权限
具有是否具有crontab权限:crontab -l
查看是否具有S权限位: ll /bin/crontab
chmod a+s /bin/crontab
chmod g-s /bin/crontab

vim /etc/security/access.conf
+ : hadoop : crontab : ALL
 
echo hadoop >> /etc/cron.allow

3.启动
1.下载Enterprise Manager:doris-manager-23.11.3-x64-bin.tar.gz
cd webserver/
bin/start.sh

4.可视化界面
manager:http://192.168.12.14/:8004  root/000000
sql:http://192.168.12.14:8010/cluster/query



apache-doris升级selectdb-doris才能使用WEBUI界面,manager支持一键升级。
1.下载Enterprise Core:selectdb-doris-2.0.4-b01-bin-x64.tar.gz
2.移到到/data/software/doris目录,让manager找到安装包即可直接she

image.png

image.png

三.使用

1.数据模型

1.Aggregate

聚合需求 Key-维度列 排序列和唯一列 Value-指标列 AggregationType SUM / MAX / MIN / REPLACE / REPLACE_IF_NOT_NULL / HLL_UNION / BITMAP_UNION

CREATE TABLE IF NOT EXISTS example_tbl_agg
(
    `user_id` LARGEINT NOT NULL COMMENT "用户id",
    `date` DATE NOT NULL COMMENT "数据灌入日期时间",
    `timestamp` DATETIME NOT NULL COMMENT "数据灌入日期时间戳",
    `city` VARCHAR(20) COMMENT "用户所在城市",
    `age` SMALLINT COMMENT "用户年龄",
    `sex` TINYINT COMMENT "用户性别",
    `last_visit_date` DATETIME REPLACE DEFAULT "1970-01-01 00:00:00" COMMENT "用户最后一次访问时间",
    `cost` BIGINT SUM DEFAULT "0" COMMENT "用户总消费",
    `max_dwell_time` INT MAX DEFAULT "0" COMMENT "用户最大停留时间",
    `min_dwell_time` INT MIN DEFAULT "99999" COMMENT "用户最小停留时间"
)
AGGREGATE KEY(`user_id`, `date`, `timestamp` ,`city`, `age`, `sex`)
DISTRIBUTED BY HASH(`user_id`) BUCKETS 1
PROPERTIES ("replication_allocation" = "tag.location.default: 1");

局限性:

  1.查询与聚合类型不一致的聚合类时,查询效率降低,比如cost是SUM,查询时用MIN(cost);

  2.count(*)开销很大,增加一个值恒为1的SUM的列,select sum(count) from table。

2.Unique

UNIQUE KEY:根据主键去重,默认写时合并,排序列和唯一列 效果等同于Aggregate模式的REPLACE

CREATE TABLE IF NOT EXISTS example_tbl_unique
(
    `user_id` LARGEINT NOT NULL COMMENT "用户id",
    `username` VARCHAR(50) NOT NULL COMMENT "用户昵称",
    `city` VARCHAR(20) COMMENT "用户所在城市",
    `age` SMALLINT COMMENT "用户年龄",
    `sex` TINYINT COMMENT "用户性别",
    `phone` LARGEINT COMMENT "用户电话",
    `address` VARCHAR(500) COMMENT "用户地址",
    `register_time` DATETIME COMMENT "用户注册时间"
)
UNIQUE KEY(`user_id`, `username`)
DISTRIBUTED BY HASH(`user_id`) BUCKETS 1
PROPERTIES ("replication_allocation" = "tag.location.default: 1");
3. Duplicate

默认的模型,保留明细数据不做聚合、去重 DUPLICATE KEY:排序列

CREATE TABLE IF NOT EXISTS example_tbl_duplicate
(
    `timestamp` DATETIME NOT NULL COMMENT "日志时间",
    `type` INT NOT NULL COMMENT "日志类型",
    `error_code` INT COMMENT "错误码",
    `error_msg` VARCHAR(1024) COMMENT "错误详细信息",
    `op_id` BIGINT COMMENT "负责人id",
    `op_time` DATETIME COMMENT "处理时间"
)
DUPLICATE KEY(`timestamp`, `type`, `error_code`)
DISTRIBUTED BY HASH(`type`) BUCKETS 1
PROPERTIES ("replication_allocation" = "tag.location.default: 1");

2.数据划分

1. Partition & Bucket

Partition 分区:Range 和 List 50GB Bucket(Tablet)分桶:Hash 和 Random 1G-10G

  500MB 4-8个Bucket    
  5GB 8-16个Bucket
  50GB 32个Bucket
  500GB 10Partition 每个分区32个Bucket
  5TB 100Partition 每个分区32个Bucket
2.动态分区
CREATE TABLE event_log (
    ts DATETIME
) 
DUPLICATE KEY(ts)
PARTITION BY RANGE(ts) ()
DISTRIBUTED BY HASH(ts) 
PROPERTIES
(
    "replication_num" = "1",
    "dynamic_partition.enable" = "true",
    "dynamic_partition.time_unit" = "DAY",
    "dynamic_partition.start" = "-30",   --删除历史分区
    "dynamic_partition.end" = "2",
    "dynamic_partition.prefix" = "p",
    "dynamic_partition.buckets" = "32"

);

SHOW DYNAMIC PARTITION TABLES;

3.查询优化

1.Rollup

Aggregate 和 Uniq 模型中的 ROLLUP表

在Base表的基础上,获得更粗粒度的聚合数据

desc example_site_visit2 all;

alter table example_site_visit2 add rollup rollup_cost_userid(user_id,cost);  
alter table example_site_visit2 add rollup rollup_city_age_cost_maxd_mind(city,age,cost,max_dwell_time,min_dwell_time);

explain SELECT user_id, sum(cost) FROM example_site_visit2 GROUP BY user_id; 

Duplicate 模型中的 ROLLUP

调整列顺序,以命中前缀索引(前36个字节)

说明:

  1.用户可以创建和删除ROLLUP表,但是否命中是Doris自动决定;

  2.创建的ROLLUP越多,占用的磁盘空间越大,查询速度变快,导入速度会减低;

2.物化视图
LOAD LABEL test_db.hdfs_student 
( 
    DATA INFILE("hdfs://master1:8020/student.csv") 
    INTO TABLE `hdfs_student` 
    COLUMNS TERMINATED BY "," 
    FORMAT AS "csv" (id, name, age, score) 
) 
WITH BROKER broker_name  
PROPERTIES 
( 
    "timeout" = "3600" 
  
); 
curl --location-trusted -u root:000000 -H "label:stream" -H "column_separator:," -T student.csv http://172.31.12.5:8034/api/test_db/hdfs_student/_stream_load

内置的前缀索引 前36个字节遇到varchar截断 所以字段顺序决定查询效率 通过rollup调整字段顺序 指定列的倒排索引 可以用来进行文本类型的全文检索、普通数值日期类型的等值范围查询