MySQL字符集修改

149 阅读10分钟

参考书籍:《从根上理解MySQL》
1、什么是字符集
2、客户端和服务器通信过程中使用的字符集
3、主从复制,当一个表字段在主库和从库字符集不一致的时候,会怎样?

1、什么是字符集?
——字符集,character set,字面意思就是字符的集合,但实际上还隐藏了一些含义,一个是字符的范围,一个字符集的字符是有限的,也就是字符集是有限集合;还有一个是字符集与二进制数据的映射规则,所有的字符集的字符在计算机里最终都是要映射成二进制数据的,但是怎么映射成二进制数据,不同的字符集的映射规则是不一致的。关于映射规则,有两个对称的术语,编码和解码。
编码:将字符映射成二进制数据的过程。
解码:将二进制数据映射成字符的过程。
这里的编码和解码,感觉是从人的角度来看待的,人能轻易理解字符,于是从二进制映射到字符的过程,就像是给二进制数据解开了密码,而从字符映射到二进制数据的过程,对人来说是可轻松识别的字符变成了难以理解的一串二进制数据,就像加上了密码一样。
伴随字符集的,还有非常紧密的一个概念,即collation,一般译作排序规则或者比较规则。排序规则是用来给一个字符集的字符做排序的一种规则,一个字符集可以匹配多种排序规则,一种排序规则也可以适用多个字符集。举个例子,现在有a,b,A,B,要比较这四个字符的大小(或者说给这四个字符排个序),假设这四个字符对应的二进制编码的十进制值分别是:a=0,b=1,A=2,B=3(这就是一种假设的编码规则),那么现在给这四个字符比较大小,或者排个序,那就是a<b<A<B,如果反过来编码值是a=3,b=2,A=1,B=0,那就是a>b>A>B。从本质来说,所谓排序规则,就是对映射规则的比较(或排序)。 有两个MySQL的字符集分几个级别:实例级别(server级别)、schema级别、表级别、列级别。这四个级别字符集和比较规则联系:
(1)如果创建或修改列时没有显式指定字符集和比较规则,则该列默认使用表字符集和比较规则
(2)如果创建表时没有显式指定字符集和比较规则,则该表默认使用数据库的字符集和比较规则
(3)如果创建库时没有显式指定字符集和比较规则,则该库默认使用服务器的字符集和比较规则 关于字符集相关的参数如下:

[root@singlemysql ~]# mysql
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 18
Server version: 8.0.30 MySQL Community Server - GPL

[(none)] 17:04:33> show variables like '%character%';
+--------------------------+--------------------------------+
| Variable_name            | Value                          |
+--------------------------+--------------------------------+
| character_set_client     | utf8mb4                        |
| character_set_connection | utf8mb4                        |
| character_set_database   | utf8mb4                        |
| character_set_filesystem | binary                         |
| character_set_results    | utf8mb4                        |
| character_set_server     | utf8mb4                        |
| character_set_system     | utf8mb3                        |
| character_sets_dir       | /usr/share/mysql-8.0/charsets/ |
+--------------------------+--------------------------------+
8 rows in set (0.02 sec)

[(none)] 17:05:44> show variables like '%coll%';
+-------------------------------+--------------------+
| Variable_name                 | Value              |
+-------------------------------+--------------------+
| collation_connection          | utf8mb4_0900_ai_ci |
| collation_database            | utf8mb4_0900_ai_ci |
| collation_server              | utf8mb4_0900_ai_ci |
| default_collation_for_utf8mb4 | utf8mb4_0900_ai_ci |
+-------------------------------+--------------------+
4 rows in set (0.00 sec)

character_set_client:会话级别的变量;客户端来源数据使用的字符集,也就是客户端发过来的查询语句使用的什么字符集 character_set_connection:MySQL接收到用户查询后,按照character_set_client将其转化为character_set_connection设定的字符集。 character_set_database:数据库级别的字符集 character_set_filesystem:文件系统的编码格式,把操作系统上的文件名转化成此字符集,即把 character_set_client转换character_set_filesystem, 默认binary是不做任何转换的。在导入数据时会用到,例如load data,会将文件名从character_set_client转换character_set_filesystem。 character_set_results:查询结果编码的字符集 character_set_server:服务器级别的字符集 character_set_system:系统元数据(字段名等)字符集 character_sets_dir:字符集安装目录
collation_connection:应用连接字符集对应的排序规则
collation_database:数据库级别的排序规则
collation_server:服务器级别的排序规则
default_collation_for_utf8mb4:用于设置字符集utf8mb4的默认排序规则,主要用于内部复制使用。该变量的值从主服务器复制到从服务器,对于字符集utf8mb4,以便从库可以正确处理来自与自己不同排序规则的主库的数据。主要是为了支持从5.7或更老版本的库和8.0的库进行同步。

2、客户端和服务器通信过程中使用的字符集
客户端与服务器进行的数据交互,本质上这些数据都是字节序列,在“客户端发送请求,服务器返回响应”的过程中,会经历多次字符集转换。(《从根上理解MySQL》)
客户端发送请求,对应的字符串编码一般与客户端所在操作系统使用的字符集一致。
服务器接收请求的时候,应该怎么看请求的字节序列的含义呢?服务器会将这个字节序列看做是使用character_set_client代表的字符集进行编码的字节序列。
服务器处理请求的时候,会将接收请求时使用的character_set_client对应的字符集进行编码的字节序列,转换为使用character_set_connection对应的字符集进行编码的字节序列。
服务器生成响应结果的时候,结果数据的字符序列,会以变量character_set_results对应的字符集来进行编码,然后发给客户端。
客户端接收到响应的时候,对于类Unix系统来说,会把接收到的字节序列直接写到显式框中,并默认使用操作系统当前使用的字符集来解释这个字符。(《从根上理解MySQL》)

3、主从复制,当一个表字段在主库和从库字符集不一致的时候,会怎样?
下面我们来做一个实验,来看看朱从复制的时候,当一个表在主库和从库字符集不一致的时候,会出现什么样的结果。实验环境:MySQL:Server version: 8.0.30 MySQL Community Server - GPL

在主库的操作如下:
--创建测试表,字符集设置为utf8(实际上是被MySQL阉割的utf8,又叫utf8mb3)
[hhu] 10:10:39> create table chartest( id char(2)) charset=utf8;
Query OK, 0 rows affected, 1 warning (0.01 sec)

[hhu] 10:11:38> show warnings;
+---------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level   | Code | Message                                                                                                                                                                     |
+---------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Warning | 3719 | 'utf8' is currently an alias for the character set UTF8MB3, but will be an alias for UTF8MB4 in a future release. Please consider using UTF8MB4 in order to be unambiguous. |
+---------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

[hhu] 10:11:42> insert into chartest values('aaa');--定义char(2),所以只能插入2个字符
ERROR 1406 (22001): Data too long for column 'id' at row 1
[hhu] 10:12:07> insert into chartest values('aa');
Query OK, 1 row affected (0.00 sec)

[hhu] 10:12:35> select * from chartest;
+------+
| id   |
+------+
| aa   |
+------+
1 row in set (0.00 sec)

[hhu] 10:12:42> show create table chartest;
+----------+-----------------------------------------------------------------------------------------------+
| Table    | Create Table                                                                                  |
+----------+-----------------------------------------------------------------------------------------------+
| chartest | CREATE TABLE `chartest` (
  `id` char(2) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3 |
+----------+-----------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

[hhu] 10:15:11> insert into chartest values('bb');
Query OK, 1 row affected (0.00 sec)

[hhu] 10:15:27> select * from chartest;--插入完两行测试数据
+------+
| id   |
+------+
| aa   |
| bb   |
+------+
2 rows in set (0.00 sec)

对应从库的操作

[hhu] 10:12:48> select * from chartest;
+------+
| id   |
+------+
| aa   |
+------+
1 row in set (0.00 sec)

[hhu] 10:12:54> ALTER TABLE chartest DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;--仅修改表的字符集,不会有啥影响
Query OK, 0 rows affected (0.01 sec)
Records: 0  Duplicates: 0  Warnings: 0

[hhu] 10:14:56> show create table chartest;
+----------+------------------------------------------------------------------------------------------------------------------------------------------------+
| Table    | Create Table                                                                                                                                   |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------+
| chartest | CREATE TABLE `chartest` (
  `id` char(2) CHARACTER SET utf8mb3 DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci |
+----------+------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

[hhu] 10:15:02> select * from chartest;--修改表的字符集后,数据能正常同步
+------+
| id   |
+------+
| aa   |
| bb   |
+------+
2 rows in set (0.00 sec)

--开始修改从库表的字段的字符集
[hhu] 10:19:09> ALTER TABLE chartest CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;
Query OK, 0 rows affected (0.02 sec)
Records: 0  Duplicates: 0  Warnings: 0

--修改完后已经插入的数据,不受影响,正常访问
[hhu] 10:29:07> select * from chartest;
+------+
| id   |
+------+
| aa   |
| bb   |
+------+
2 rows in set (0.00 sec)

主库接下来的操作

--主库开始插入第三行数据
[hhu] 10:29:15> insert into chartest values('cc');
Query OK, 1 row affected (0.00 sec)

--主库可正常查询到三行数据
[hhu] 10:29:21> select id,length(id) from chartest;
+------+------------+
| id   | length(id) |
+------+------------+
| aa   |          2 |
| bb   |          2 |
| cc   |          2 |
+------+------------+
3 rows in set (0.00 sec)

从库的操作

--查看从库的同步情况,开始报错:
Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction '62350937-3448-11ed-950c-000c297f2915:6' at master log binlog.000004, end_log_pos 1808. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.

[hhu] 10:29:30> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for source to send event
                  Master_Host: 192.168.80.129
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: binlog.000004
          Read_Master_Log_Pos: 1839
               Relay_Log_File: singlemysql2-relay-bin.000009
                Relay_Log_Pos: 1771
        Relay_Master_Log_File: binlog.000004
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 13146
                   Last_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction '62350937-3448-11ed-950c-000c297f2915:6' at master log binlog.000004, end_log_pos 1808. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 1561
              Relay_Log_Space: 2432
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 13146
               Last_SQL_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction '62350937-3448-11ed-950c-000c297f2915:6' at master log binlog.000004, end_log_pos 1808. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 3306102
                  Master_UUID: 62350937-3448-11ed-950c-000c297f2915
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: 
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 221102 10:29:21
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 62350937-3448-11ed-950c-000c297f2915:1-6
            Executed_Gtid_Set: 27d3bf0c-59c9-11ed-a028-000c297f2915:1-2,
62350937-3448-11ed-950c-000c297f2915:1-5
                Auto_Position: 0
         Replicate_Rewrite_DB: 
                 Channel_Name: 
           Master_TLS_Version: 
       Master_public_key_path: 
        Get_master_public_key: 0
            Network_Namespace: 
1 row in set, 1 warning (0.00 sec)

--根据提示查看表performance_schema.replication_applier_status_by_worker,直接提示错误:
Worker 1 failed executing transaction '62350937-3448-11ed-950c-000c297f2915:6' at master log binlog.000004, end_log_pos 1808; Column 0 of table 'hhu.chartest' cannot be converted from type 'char(6(bytes))' to type 'char(8(bytes) utf8mb4)'


[hhu] 11:37:52> select * from performance_schema.replication_applier_status_by_worker\G
*************************** 1. row ***************************
                                           CHANNEL_NAME: 
                                              WORKER_ID: 1
                                              THREAD_ID: NULL
                                          SERVICE_STATE: OFF
                                      LAST_ERROR_NUMBER: 13146
                                     LAST_ERROR_MESSAGE: Worker 1 failed executing transaction '62350937-3448-11ed-950c-000c297f2915:6' at master log binlog.000004, end_log_pos 1808; Column 0 of table 'hhu.chartest' cannot be converted from type 'char(6(bytes))' to type 'char(8(bytes) utf8mb4)'
                                   LAST_ERROR_TIMESTAMP: 2022-11-02 10:29:21.643964
                               LAST_APPLIED_TRANSACTION: 62350937-3448-11ed-950c-000c297f2915:5
     LAST_APPLIED_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP: 2022-11-02 10:15:27.759009
    LAST_APPLIED_TRANSACTION_IMMEDIATE_COMMIT_TIMESTAMP: 2022-11-02 10:15:27.759009
         LAST_APPLIED_TRANSACTION_START_APPLY_TIMESTAMP: 2022-11-02 10:15:27.759830
           LAST_APPLIED_TRANSACTION_END_APPLY_TIMESTAMP: 2022-11-02 10:15:27.761029
                                   APPLYING_TRANSACTION: 62350937-3448-11ed-950c-000c297f2915:6
         APPLYING_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP: 2022-11-02 10:29:21.643293
        APPLYING_TRANSACTION_IMMEDIATE_COMMIT_TIMESTAMP: 2022-11-02 10:29:21.643293
             APPLYING_TRANSACTION_START_APPLY_TIMESTAMP: 2022-11-02 10:29:21.643733
                 LAST_APPLIED_TRANSACTION_RETRIES_COUNT: 0
   LAST_APPLIED_TRANSACTION_LAST_TRANSIENT_ERROR_NUMBER: 0
  LAST_APPLIED_TRANSACTION_LAST_TRANSIENT_ERROR_MESSAGE: 
LAST_APPLIED_TRANSACTION_LAST_TRANSIENT_ERROR_TIMESTAMP: 0000-00-00 00:00:00.000000
                     APPLYING_TRANSACTION_RETRIES_COUNT: 0
       APPLYING_TRANSACTION_LAST_TRANSIENT_ERROR_NUMBER: 0
      APPLYING_TRANSACTION_LAST_TRANSIENT_ERROR_MESSAGE: 
    APPLYING_TRANSACTION_LAST_TRANSIENT_ERROR_TIMESTAMP: 0000-00-00 00:00:00.000000

--再查看从库的表数据:第三行没能同步过来
[hhu] 10:31:21> select * from chartest;
+------+
| id   |
+------+
| aa   |
| bb   |
+------+
2 rows in set (0.00 sec)

上面描述了从utf8mb3同步到utf8mb4会失败,如果反过来会怎么样呢,有兴趣的读者可以测试一下。