参考书籍:《从根上理解MySQL》
1、什么是字符集
2、客户端和服务器通信过程中使用的字符集
3、主从复制,当一个表字段在主库和从库字符集不一致的时候,会怎样?
1、什么是字符集?
——字符集,character set,字面意思就是字符的集合,但实际上还隐藏了一些含义,一个是字符的范围,一个字符集的字符是有限的,也就是字符集是有限集合;还有一个是字符集与二进制数据的映射规则,所有的字符集的字符在计算机里最终都是要映射成二进制数据的,但是怎么映射成二进制数据,不同的字符集的映射规则是不一致的。关于映射规则,有两个对称的术语,编码和解码。
编码:将字符映射成二进制数据的过程。
解码:将二进制数据映射成字符的过程。
这里的编码和解码,感觉是从人的角度来看待的,人能轻易理解字符,于是从二进制映射到字符的过程,就像是给二进制数据解开了密码,而从字符映射到二进制数据的过程,对人来说是可轻松识别的字符变成了难以理解的一串二进制数据,就像加上了密码一样。
伴随字符集的,还有非常紧密的一个概念,即collation,一般译作排序规则或者比较规则。排序规则是用来给一个字符集的字符做排序的一种规则,一个字符集可以匹配多种排序规则,一种排序规则也可以适用多个字符集。举个例子,现在有a,b,A,B,要比较这四个字符的大小(或者说给这四个字符排个序),假设这四个字符对应的二进制编码的十进制值分别是:a=0,b=1,A=2,B=3(这就是一种假设的编码规则),那么现在给这四个字符比较大小,或者排个序,那就是a<b<A<B,如果反过来编码值是a=3,b=2,A=1,B=0,那就是a>b>A>B。从本质来说,所谓排序规则,就是对映射规则的比较(或排序)。
有两个MySQL的字符集分几个级别:实例级别(server级别)、schema级别、表级别、列级别。这四个级别字符集和比较规则联系:
(1)如果创建或修改列时没有显式指定字符集和比较规则,则该列默认使用表字符集和比较规则
(2)如果创建表时没有显式指定字符集和比较规则,则该表默认使用数据库的字符集和比较规则
(3)如果创建库时没有显式指定字符集和比较规则,则该库默认使用服务器的字符集和比较规则
关于字符集相关的参数如下:
[root@singlemysql ~]# mysql
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 18
Server version: 8.0.30 MySQL Community Server - GPL
[(none)] 17:04:33> show variables like '%character%';
+--------------------------+--------------------------------+
| Variable_name | Value |
+--------------------------+--------------------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8mb3 |
| character_sets_dir | /usr/share/mysql-8.0/charsets/ |
+--------------------------+--------------------------------+
8 rows in set (0.02 sec)
[(none)] 17:05:44> show variables like '%coll%';
+-------------------------------+--------------------+
| Variable_name | Value |
+-------------------------------+--------------------+
| collation_connection | utf8mb4_0900_ai_ci |
| collation_database | utf8mb4_0900_ai_ci |
| collation_server | utf8mb4_0900_ai_ci |
| default_collation_for_utf8mb4 | utf8mb4_0900_ai_ci |
+-------------------------------+--------------------+
4 rows in set (0.00 sec)
character_set_client:会话级别的变量;客户端来源数据使用的字符集,也就是客户端发过来的查询语句使用的什么字符集
character_set_connection:MySQL接收到用户查询后,按照character_set_client将其转化为character_set_connection设定的字符集。
character_set_database:数据库级别的字符集
character_set_filesystem:文件系统的编码格式,把操作系统上的文件名转化成此字符集,即把 character_set_client转换character_set_filesystem, 默认binary是不做任何转换的。在导入数据时会用到,例如load data,会将文件名从character_set_client转换character_set_filesystem。
character_set_results:查询结果编码的字符集
character_set_server:服务器级别的字符集
character_set_system:系统元数据(字段名等)字符集
character_sets_dir:字符集安装目录
collation_connection:应用连接字符集对应的排序规则
collation_database:数据库级别的排序规则
collation_server:服务器级别的排序规则
default_collation_for_utf8mb4:用于设置字符集utf8mb4的默认排序规则,主要用于内部复制使用。该变量的值从主服务器复制到从服务器,对于字符集utf8mb4,以便从库可以正确处理来自与自己不同排序规则的主库的数据。主要是为了支持从5.7或更老版本的库和8.0的库进行同步。
2、客户端和服务器通信过程中使用的字符集
客户端与服务器进行的数据交互,本质上这些数据都是字节序列,在“客户端发送请求,服务器返回响应”的过程中,会经历多次字符集转换。(《从根上理解MySQL》)
客户端发送请求,对应的字符串编码一般与客户端所在操作系统使用的字符集一致。
服务器接收请求的时候,应该怎么看请求的字节序列的含义呢?服务器会将这个字节序列看做是使用character_set_client代表的字符集进行编码的字节序列。
服务器处理请求的时候,会将接收请求时使用的character_set_client对应的字符集进行编码的字节序列,转换为使用character_set_connection对应的字符集进行编码的字节序列。
服务器生成响应结果的时候,结果数据的字符序列,会以变量character_set_results对应的字符集来进行编码,然后发给客户端。
客户端接收到响应的时候,对于类Unix系统来说,会把接收到的字节序列直接写到显式框中,并默认使用操作系统当前使用的字符集来解释这个字符。(《从根上理解MySQL》)
3、主从复制,当一个表字段在主库和从库字符集不一致的时候,会怎样?
下面我们来做一个实验,来看看朱从复制的时候,当一个表在主库和从库字符集不一致的时候,会出现什么样的结果。实验环境:MySQL:Server version: 8.0.30 MySQL Community Server - GPL
在主库的操作如下:
--创建测试表,字符集设置为utf8(实际上是被MySQL阉割的utf8,又叫utf8mb3)
[hhu] 10:10:39> create table chartest( id char(2)) charset=utf8;
Query OK, 0 rows affected, 1 warning (0.01 sec)
[hhu] 10:11:38> show warnings;
+---------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message |
+---------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Warning | 3719 | 'utf8' is currently an alias for the character set UTF8MB3, but will be an alias for UTF8MB4 in a future release. Please consider using UTF8MB4 in order to be unambiguous. |
+---------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
[hhu] 10:11:42> insert into chartest values('aaa');--定义char(2),所以只能插入2个字符
ERROR 1406 (22001): Data too long for column 'id' at row 1
[hhu] 10:12:07> insert into chartest values('aa');
Query OK, 1 row affected (0.00 sec)
[hhu] 10:12:35> select * from chartest;
+------+
| id |
+------+
| aa |
+------+
1 row in set (0.00 sec)
[hhu] 10:12:42> show create table chartest;
+----------+-----------------------------------------------------------------------------------------------+
| Table | Create Table |
+----------+-----------------------------------------------------------------------------------------------+
| chartest | CREATE TABLE `chartest` (
`id` char(2) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3 |
+----------+-----------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
[hhu] 10:15:11> insert into chartest values('bb');
Query OK, 1 row affected (0.00 sec)
[hhu] 10:15:27> select * from chartest;--插入完两行测试数据
+------+
| id |
+------+
| aa |
| bb |
+------+
2 rows in set (0.00 sec)
对应从库的操作
[hhu] 10:12:48> select * from chartest;
+------+
| id |
+------+
| aa |
+------+
1 row in set (0.00 sec)
[hhu] 10:12:54> ALTER TABLE chartest DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;--仅修改表的字符集,不会有啥影响
Query OK, 0 rows affected (0.01 sec)
Records: 0 Duplicates: 0 Warnings: 0
[hhu] 10:14:56> show create table chartest;
+----------+------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------+
| chartest | CREATE TABLE `chartest` (
`id` char(2) CHARACTER SET utf8mb3 DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
[hhu] 10:15:02> select * from chartest;--修改表的字符集后,数据能正常同步
+------+
| id |
+------+
| aa |
| bb |
+------+
2 rows in set (0.00 sec)
--开始修改从库表的字段的字符集
[hhu] 10:19:09> ALTER TABLE chartest CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;
Query OK, 0 rows affected (0.02 sec)
Records: 0 Duplicates: 0 Warnings: 0
--修改完后已经插入的数据,不受影响,正常访问
[hhu] 10:29:07> select * from chartest;
+------+
| id |
+------+
| aa |
| bb |
+------+
2 rows in set (0.00 sec)
主库接下来的操作
--主库开始插入第三行数据
[hhu] 10:29:15> insert into chartest values('cc');
Query OK, 1 row affected (0.00 sec)
--主库可正常查询到三行数据
[hhu] 10:29:21> select id,length(id) from chartest;
+------+------------+
| id | length(id) |
+------+------------+
| aa | 2 |
| bb | 2 |
| cc | 2 |
+------+------------+
3 rows in set (0.00 sec)
从库的操作
--查看从库的同步情况,开始报错:
Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction '62350937-3448-11ed-950c-000c297f2915:6' at master log binlog.000004, end_log_pos 1808. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
[hhu] 10:29:30> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for source to send event
Master_Host: 192.168.80.129
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: binlog.000004
Read_Master_Log_Pos: 1839
Relay_Log_File: singlemysql2-relay-bin.000009
Relay_Log_Pos: 1771
Relay_Master_Log_File: binlog.000004
Slave_IO_Running: Yes
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 13146
Last_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction '62350937-3448-11ed-950c-000c297f2915:6' at master log binlog.000004, end_log_pos 1808. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
Skip_Counter: 0
Exec_Master_Log_Pos: 1561
Relay_Log_Space: 2432
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 13146
Last_SQL_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction '62350937-3448-11ed-950c-000c297f2915:6' at master log binlog.000004, end_log_pos 1808. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
Replicate_Ignore_Server_Ids:
Master_Server_Id: 3306102
Master_UUID: 62350937-3448-11ed-950c-000c297f2915
Master_Info_File: mysql.slave_master_info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State:
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp: 221102 10:29:21
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set: 62350937-3448-11ed-950c-000c297f2915:1-6
Executed_Gtid_Set: 27d3bf0c-59c9-11ed-a028-000c297f2915:1-2,
62350937-3448-11ed-950c-000c297f2915:1-5
Auto_Position: 0
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
Master_public_key_path:
Get_master_public_key: 0
Network_Namespace:
1 row in set, 1 warning (0.00 sec)
--根据提示查看表performance_schema.replication_applier_status_by_worker,直接提示错误:
Worker 1 failed executing transaction '62350937-3448-11ed-950c-000c297f2915:6' at master log binlog.000004, end_log_pos 1808; Column 0 of table 'hhu.chartest' cannot be converted from type 'char(6(bytes))' to type 'char(8(bytes) utf8mb4)'
[hhu] 11:37:52> select * from performance_schema.replication_applier_status_by_worker\G
*************************** 1. row ***************************
CHANNEL_NAME:
WORKER_ID: 1
THREAD_ID: NULL
SERVICE_STATE: OFF
LAST_ERROR_NUMBER: 13146
LAST_ERROR_MESSAGE: Worker 1 failed executing transaction '62350937-3448-11ed-950c-000c297f2915:6' at master log binlog.000004, end_log_pos 1808; Column 0 of table 'hhu.chartest' cannot be converted from type 'char(6(bytes))' to type 'char(8(bytes) utf8mb4)'
LAST_ERROR_TIMESTAMP: 2022-11-02 10:29:21.643964
LAST_APPLIED_TRANSACTION: 62350937-3448-11ed-950c-000c297f2915:5
LAST_APPLIED_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP: 2022-11-02 10:15:27.759009
LAST_APPLIED_TRANSACTION_IMMEDIATE_COMMIT_TIMESTAMP: 2022-11-02 10:15:27.759009
LAST_APPLIED_TRANSACTION_START_APPLY_TIMESTAMP: 2022-11-02 10:15:27.759830
LAST_APPLIED_TRANSACTION_END_APPLY_TIMESTAMP: 2022-11-02 10:15:27.761029
APPLYING_TRANSACTION: 62350937-3448-11ed-950c-000c297f2915:6
APPLYING_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP: 2022-11-02 10:29:21.643293
APPLYING_TRANSACTION_IMMEDIATE_COMMIT_TIMESTAMP: 2022-11-02 10:29:21.643293
APPLYING_TRANSACTION_START_APPLY_TIMESTAMP: 2022-11-02 10:29:21.643733
LAST_APPLIED_TRANSACTION_RETRIES_COUNT: 0
LAST_APPLIED_TRANSACTION_LAST_TRANSIENT_ERROR_NUMBER: 0
LAST_APPLIED_TRANSACTION_LAST_TRANSIENT_ERROR_MESSAGE:
LAST_APPLIED_TRANSACTION_LAST_TRANSIENT_ERROR_TIMESTAMP: 0000-00-00 00:00:00.000000
APPLYING_TRANSACTION_RETRIES_COUNT: 0
APPLYING_TRANSACTION_LAST_TRANSIENT_ERROR_NUMBER: 0
APPLYING_TRANSACTION_LAST_TRANSIENT_ERROR_MESSAGE:
APPLYING_TRANSACTION_LAST_TRANSIENT_ERROR_TIMESTAMP: 0000-00-00 00:00:00.000000
--再查看从库的表数据:第三行没能同步过来
[hhu] 10:31:21> select * from chartest;
+------+
| id |
+------+
| aa |
| bb |
+------+
2 rows in set (0.00 sec)
上面描述了从utf8mb3同步到utf8mb4会失败,如果反过来会怎么样呢,有兴趣的读者可以测试一下。