大规模数据(2000W)管理相关技术学习笔记

148 阅读5分钟

目标

通过预研找到契合团队技术现状的足以支撑业务下一阶段可能达到的大规模数据(2000W)管理的解决方案,并建立原型框架验证解决方案。

技术调研主要过程

测试单表2000W数据的性能,从而确定是否需要进行分库分表

SQL(ms)Mybatis(ms)备注
计数166163164596SELECT COUNT(*) from result_data
分页查询121066SELECT * from result_data LIMIT 10;
单一属性查询143912145203SELECT (result_data_id) from result_data where remark =‘我是第480001006产生’

调研对比分库分表相关开源技术

a) MyCat,阿里在Cobar调整的,有了mycat2, Stars,5861
b) Cobar,基本停滞了,Cobar is a proxy for sharding databases and tables,compatible with MySQL protocal and MySQL SQL grama,underlying storage only support MySQL for support foreground business more simple,stable,efficient and safety。Stars,2555
c) Sharding JDBC,当当,6881,非常活跃
d) Sharding Proxy,当当,6881,非常活跃
e) TDDL ,淘宝搞的,1069,停滞
f) Atlas,360产品,Star 3909,不活跃
g) DBProxy,美团,Star 1714,基本停滞
h) HBase,重型,与目前的技术体系差别大,需要专业运维

分布式数据库(下一步发展重点)

a) DRDS,DRDS 是阿里巴巴集团自主研发的分布式数据库中间件产品,专注于解决单机关系型数据库扩展性问题,具备轻量(无状态)、灵活、稳定、高效等特性,经历历届双十一核心交易业务和各类行业业务的考验。云产品,无法离线使用
b) TiDB,最后了解到的,相关信息需要补充,star 17598,非常活跃,
i. iDB (The pronunciation is: /'taɪdiːbi:/ tai-D-B, etymology: titanium) is an open-source distributed scalable Hybrid Transactional and Analytical Processing (HTAP) database. It features horizontal scalability, strong consistency, and high availability. TiDB is MySQL compatible and serves as a one-stop data warehouse for both OLTP (Online Transactional Processing) and OLAP (Online Analytical Processing) workloads.
ii. 生产级需要使用ansible技术,这也意味着必须采用kubernetes集群
iii. 应用场景

  1. 数据量大,单机保存不下
  2. 不希望做 Sharding 或者懒得做 Sharding
  3. 访问模式上没有明显的热点
  4. 需要事务、需要强一致、需要灾备
    c) 主要应用案例
    i. 北京银行:事务性,性能,以及扩展性,无侵入性
    ii. 头条:如今(数据截至发文),TiDB 支撑着今日头条 OLTP 系统里数据流量最大、QPS 最高的场景:集群容量 几十T,数据量日增 5 亿,日常 QPS 均值在 12W,峰值 20W 左右。

对比分析各项技术

确定sharding proxy,sharding jdbc为突破点
a) 简单
b) 性能由一点损失

Sharding proxy

基本理论

搭建Sharding proxy集群

PS:建立四台虚拟机组成集群,三台数据库,一台proxy,OS均为windows server 2008

  1. 安装mysql 到三台数据库节点
  2. 下载3.0.0版本的release包
  3. 修改配置文件
    a) Config-sharding.yaml
# 
# Here you can configure the rules for the proxy.
# This example is configuration of sharding rule.
#   
# If you want to use sharding, please refer to this file; 
# if you want to use master-slave, please refer to the config-master_slave.yaml.
# 
######################################################################################################

schemaName: dim_research

dataSources:
 ds_0:
   url: jdbc:mysql://10.0.4.212:3306/dim_research?serverTimezone=UTC&useSSL=false&characterEncoding=utf8
   username: root
   password: ABC.123456.ZwSe2
   autoCommit: true
   connectionTimeout: 30000
   idleTimeout: 60000
   maxLifetime: 1800000
   maximumPoolSize: 50
 ds_1:
   url: jdbc:mysql://10.0.4.207:3306/dim_research?serverTimezone=UTC&useSSL=false&characterEncoding=utf8
   username: root
   password: ABC.123456.ZwSe2
   autoCommit: true
   connectionTimeout: 30000
   idleTimeout: 60000
   maxLifetime: 1800000
   maximumPoolSize: 50
 ds_2:
   url: jdbc:mysql://10.0.4.213:3306/dim_research?serverTimezone=UTC&useSSL=false&characterEncoding=utf8
   username: root
   password: ABC.123456.ZwSe2
   autoCommit: true
   connectionTimeout: 30000
   idleTimeout: 60000
   maxLifetime: 1800000
   maximumPoolSize: 50
shardingRule:
 tables:
   result_data:
     actualDataNodes: ds0.result_data_0,ds1.result_data_1,ds2.result_data_2
     tableStrategy:
       inline:
         shardingColumn: result_data_id
         algorithmExpression: result_data_${result_data_id % 3}
     keyGeneratorColumnName: result_data_id
 bindingTables:
   - result_data
 defaultDatabaseStrategy:
   inline:
     shardingColumn: user_id
     algorithmExpression: ds_${user_id % 3}
 defaultTableStrategy:
   none:
 defaultKeyGeneratorClassName: io.shardingsphere.core.keygen.DefaultKeyGenerator

b) server.yaml

# 
# If you want to configure orchestration, authorization and proxy properties, please refer to this file.
# 
######################################################################################################

# orchestration:
 # name: orchestration_ds
 # overwrite: true
 # registry:
   # serverLists: localhost:2181
   # namespace: orchestration
authentication:
 username: root
 password: root

props:
 max.connections.size.per.query: 1
 acceptor.size: 16
 executor.size: 16  
 proxy.transaction.enabled: false
 proxy.opentracing.enabled: false
 sql.show: false

4.启动,运行 bin/start.exe

先后启动遇到的问题

  1. 官网配置手册参数错误问题(这个太不应该了)
    a) 使用本文的配置
  2. 遇到无法执行SQL,无法找到query_cache_size的环境变量
    a) 无法兼容Mysql8,换用mysql 5.5
  3. 插入数据中文是乱码问题
    a) 要在url上加上characterEncoding=utf8
  4. 插入重复数据的问题
    a) sharding.jdbc.config.sharding.tables.result_data.actual-data-nodes这行的配置是关键,如果写成了ds${0…2}.result_data会导致数据被重复存储了3份

Sharding Sidecar

需要比较强的基础设施,这里就不介绍了。下面是官网地址。
shardingsphere.apache.org/document/cu…

Sharding JDBC搭建

采用的SpringBoot+Mybatis+Druid的框架,也没什么特殊的直接上代码吧。
目录结果

工程机构
关键的依赖pom.xml

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.zw.se2.research</groupId>
    <artifactId>mybatisDemo</artifactId>
    <version>1.0-SNAPSHOT</version>
    <packaging>jar</packaging>

    <parent>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-parent</artifactId>
        <version>2.0.0.RELEASE</version>
    </parent>

    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
        <java.version>1.8</java.version>
    </properties>

    <dependencies>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-web</artifactId>
        </dependency>

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-test</artifactId>
            <scope>test</scope>
        </dependency>
        <!-- https://mvnrepository.com/artifact/mysql/mysql-connector-java -->
        <dependency>
            <groupId>mysql</groupId>
            <artifactId>mysql-connector-java</artifactId>
            <version>5.1.38</version>
        </dependency>
        <!-- https://mvnrepository.com/artifact/io.shardingsphere/sharding-jdbc-core -->
        <dependency>
            <groupId>io.shardingsphere</groupId>
            <artifactId>sharding-jdbc-spring-boot-starter</artifactId>
            <version>3.1.0</version>
        </dependency>

        <!-- for spring namespace -->
        <dependency>
            <groupId>io.shardingsphere</groupId>
            <artifactId>sharding-jdbc-spring-namespace</artifactId>
            <version>3.1.0</version>
        </dependency>
        <dependency>
            <groupId>com.fasterxml.jackson.core</groupId>
            <artifactId>jackson-core</artifactId>
        </dependency>
        <dependency>
            <groupId>com.fasterxml.jackson.core</groupId>
            <artifactId>jackson-databind</artifactId>
        </dependency>
        <dependency>
            <groupId>com.fasterxml.jackson.datatype</groupId>
            <artifactId>jackson-datatype-joda</artifactId>
        </dependency>
        <dependency>
            <groupId>com.fasterxml.jackson.module</groupId>
            <artifactId>jackson-module-parameter-names</artifactId>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-jdbc</artifactId>
        </dependency>
        <dependency>
            <groupId>com.alibaba</groupId>
            <artifactId>druid</artifactId>
            <version>1.1.10</version>
        </dependency>

        <dependency>
            <groupId>org.mybatis.spring.boot</groupId>
            <artifactId>mybatis-spring-boot-starter</artifactId>
            <version>1.3.1</version>
        </dependency>

        <!--mapper-->
        <dependency>
            <groupId>tk.mybatis</groupId>
            <artifactId>mapper-spring-boot-starter</artifactId>
            <version>2.0.0</version>
        </dependency>
        <!--pagehelper-->
        <dependency>
            <groupId>com.github.pagehelper</groupId>
            <artifactId>pagehelper-spring-boot-starter</artifactId>
            <version>1.2.3</version>
            <exclusions>
                <exclusion>
                    <groupId>org.mybatis.spring.boot</groupId>
                    <artifactId>mybatis-spring-boot-starter</artifactId>
                </exclusion>
            </exclusions>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-configuration-processor</artifactId>
            <optional>true</optional>
        </dependency>

        <dependency>
            <groupId>com.alibaba</groupId>
            <artifactId>druid-spring-boot-starter</artifactId>
            <version>1.1.0</version>
        </dependency>

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-jdbc</artifactId>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-aop</artifactId>
        </dependency>
        <dependency>
            <groupId>com.alibaba</groupId>
            <artifactId>fastjson</artifactId>
            <version>1.2.3</version>
        </dependency>
        <dependency>
            <groupId>net.sf.jacob-project</groupId>
            <artifactId>jacob</artifactId>
            <version>1.14.3</version>
        </dependency>
        <dependency>
            <groupId>org.projectlombok</groupId>
            <artifactId>lombok</artifactId>
        </dependency>
    </dependencies>

    <build>
        <plugins>
            <plugin>
                <groupId>org.springframework.boot</groupId>
                <artifactId>spring-boot-maven-plugin</artifactId>
            </plugin>
        </plugins>
    </build>

    <repositories>
        <repository>
            <id>spring-snapshots</id>
            <url>https://repo.spring.io/snapshot</url>
            <snapshots>
                <enabled>true</enabled>
            </snapshots>
        </repository>
        <repository>
            <id>spring-milestones</id>
            <url>https://repo.spring.io/milestone</url>
            <snapshots>
                <enabled>false</enabled>
            </snapshots>
        </repository>
    </repositories>

    <pluginRepositories>
        <pluginRepository>
            <id>spring-snapshots</id>
            <url>https://repo.spring.io/snapshot</url>
            <snapshots>
                <enabled>true</enabled>
            </snapshots>
        </pluginRepository>
        <pluginRepository>
            <id>spring-milestones</id>
            <url>https://repo.spring.io/milestone</url>
            <snapshots>
                <enabled>false</enabled>
            </snapshots>
        </pluginRepository>
    </pluginRepositories>


</project>

关键配置application-dev.properties

server.port=9010
debug=true
logging.level.root=info
logging.level.tk.mybatis.springboot.mapper=trace
logging.config=ext/conf/logback.xml
sharding.jdbc.datasource.names=ds0,ds1,ds2
sharding.jdbc.datasource.ds0.driver-class-name=com.mysql.jdbc.Driver
sharding.jdbc.datasource.ds0.type=com.alibaba.druid.pool.DruidDataSource
sharding.jdbc.datasource.ds0.url=jdbc:mysql://10.0.4.207:3306/dim_research?characterEncoding=utf8
sharding.jdbc.datasource.ds0.username=root
sharding.jdbc.datasource.ds0.password=ABC.123456.ZwSe2
sharding.jdbc.datasource.ds1.driver-class-name=com.mysql.jdbc.Driver
sharding.jdbc.datasource.ds1.type=com.alibaba.druid.pool.DruidDataSource
sharding.jdbc.datasource.ds1.url=jdbc:mysql://10.0.4.212:3306/dim_research?characterEncoding=utf8
sharding.jdbc.datasource.ds1.username=root
sharding.jdbc.datasource.ds1.password=ABC.123456.ZwSe2
sharding.jdbc.datasource.ds2.driver-class-name=com.mysql.jdbc.Driver
sharding.jdbc.datasource.ds2.type=com.alibaba.druid.pool.DruidDataSource
sharding.jdbc.datasource.ds2.url=jdbc:mysql://10.0.4.213:3306/dim_research?characterEncoding=utf8
sharding.jdbc.datasource.ds2.username=root
sharding.jdbc.datasource.ds2.password=ABC.123456.ZwSe2
sharding.jdbc.config.sharding.tables.result_data.actual-data-nodes=ds0.result_data_0,ds1.result_data_1,ds2.result_data_2
sharding.jdbc.config.sharding.tables.result_data.table-strategy.inline.sharding-column=result_data_id
sharding.jdbc.config.sharding.tables.result_data.table-strategy.inline.algorithm-expression=result_data_$->{result_data_id % 3}
sharding.jdbc.config.sharding.tables.result_data.key-generator-column-name=result_data_id
sharding.jdbc.config.sharding.binding-tables=result_data
sharding.jdbc.config.sharding.default-database-strategy.inline.sharding-column=user_id
sharding.jdbc.config.sharding.default-database-strategy.inline.algorithm-expression=ds$->{user_id % 3}

具体的工程还是参考本人的github
github.com/zhaoenweiex…

测试

测试环境

版本mysql 5.5,未添加索引
数据库节点:虚拟机,4核8G

测试结果

PS:Proxy 插入速度极为缓慢,比单表批量插入性能要低30倍,接近单个插入速度,猜测由于采用id分表的原因,批量插入的模式变为单个插入,性能大幅降低。

  1. 100W数据下测试
操作单表(214)Proxy(214)Sharding-jdbc(207,212,213)
Count12303600665
特殊属性查询168049001475
100分页查询6115748
批量插入1130003610000686000

2.2000W测试

操作单表(214)Proxy(214)Sharding-jdbc(207,212,213)
Count126000539873555
特殊属性查询12600091838760
100分页查询350307107
批量插入2189000415990007237031

3.6000W数据的测试

操作单表(214)Proxy(214)Sharding-jdbc(207,212,213)
Count480816258757258757
特殊属性查询511972252089219035
100分页查询307270213
批量插入43310007281998912278000