重新发现Oracle之美(一)字符集

246 阅读2分钟

重新发现Oracle之美(一)字符集

因实际业务需要,将部分16GBK字符集的数据库变更成AL32UTF8字符集;

无意间搜索MOS,发现ORACLE已经为我们DBA提供了解决方案了, 在8i,9i,10g和11g中将NLS_CHARACTERSET更改为AL32UTF8 / UTF8(Unicode)(文档ID 260192.1)

  1. 可以先将GBK的dmp文件导入到GBK的数据库中
  2. 使用csscan的方法,检查数据 是否符合UTF8的要求
  3. 检查csscan的结果集,并做出相应的数据字典的调整

  • [一、 Csscan安装(11.2.0.x)]

Installing Csscan in Oracle RDBMS 11.2.0.x  

The default tablespace is SYSTEM, if you plan to run csscan against a big database then we advice to create a separate tablespace and change the $ORACLE_HOME/rdbms/admin/csminst.sql script to use this tablespace.

Modify the following statement in csminst.sql to assign your preferred tablespace to CSMIG instead of SYSTEM:

alter user csmig default tablespace SYSTEM quota unlimited on SYSTEM

The amount of space needed depends mainly on the amount of exceptions found ( convertible and/or lossy data) and is impossible to predict upfront. Its useful to use on the first run of csscan a tablespace that is limited in size to avoid any disk space issues.

Make a backup/copy of Csminst.sql and edit Csminst.sql and then run Csminst.sql using these commands using sqlplus of the server Oracle_Home (or when using a client, using the same base version for the client as the server

> set oracle_sid=<your SID>
> sqlplus /nolog
> SQL> conn / as sysdba
> SQL> set TERMOUT ON
> SQL> set ECHO ON
> SQL> spool csminst.log
> -- note the drop user
> SQL> drop user csmig cascade;
> SQL> @?/rdbms/admin/csminst.sql

Check the csminst.log for errors. The Csmig user is  by default locked.

  • [二、 使用csscan方法检查当前数据库是否满足utf8的要求,并输出日志]

语法:$ csscan "/ as sysdba" FULL=Y FROMCHAR= TOCHAR= LOG=dbcheck CAPTURE=N ARRAY=1000000 PROCESS=2

查看当前字符集:SELECT value FROM NLS_DATABASE_PARAMETERS WHERE parameter='NLS_CHARACTERSET';

$ export ORACLE_SID=

$ csscan "/ as sysdba" LOG=/tmp/expcheck full=y CAPTURE=N TOCHAR=AL32UTF8 ARRAY=1024000 PROCESS=6

csscan " 'sys as sysdba' " full=y  fromchar=ZHS16GBK tochar=AL32UTF8 log=verify.log array=102400 process=6

  • TOCHAR = <当前NLS_CHARACTERSET>不是拼写错误,我们的想法是在更改NLS_CHARACTERSET 之前检查CURRENT字符集中未在此NLS_CHARACTERSET中定义的代码

  • PROCESS =参数会影响系统上的负载,这是更高的(例如6或8)将完成更快的Csscan,越低这将对您的系统产生的影响越小。如果需要,适应。

  • csscan SUPPRESS参数通过限制记录的信息量/表来限制.err文件的大小。使用SUPPRESS = 1000将为.err文件中的每个表记录最多1000行。它不会影响.txt文件中的信息。它将影响.err文件中记录的数据。这对于大数据库的第一次扫描非常有用,如果您不知道数据库中有多少“可转换”或“有损”,那么这将避免.err文件变为100的MB大而且它也限制了Csmig模式下csscan表使用的空间。

  • 不要指定TO N CHAR或FROM N CHAR csscan参数,这些参数用于更改NLS_NCHAR_CHARACTERSET。同样它们不是必需的,不应该指*

--查看csscan扫描进度
sqlplus / as sysdba
SET pages 1000
SELECT target,
       TO_CHAR(start_time, 'HH24:MI:SS - DD-MM-YY'),
       time_remaining,
       sofar,
       totalwork,
       sid,
       serial#,
       opname
  FROM v$session_longops
 WHERE sid IN
       (SELECT sid FROM v$session WHERE upper(program) LIKE 'CSSCAN%')
   AND sofar < totalwork
 ORDER BY start_time;

Csscan将创建3个文件:

  1. dbcheck.out csscan 输出的日志
  2. dbcheck.txt 数据库扫描摘要报告
  3. dbcheck.err 包含dbcheck.txt中报告的有损行的rowid(如果有)。

此Csscan 用于检查所有数据是否正确存储在当前字符集中。由于TOCHAR和FROMCHAR字符设置相同,因此dbcheck.txt中不会报告任何“Convertible”或“Truncation”数据。

--通过以下脚本可以查询到 哪些表,哪些字段需要扩展 
DECLARE
  newmaxsz NUMBER;
BEGIN
  FOR rec IN (SELECT DISTINCT u.owner_name,
                              u.table_name,
                              u.column_name,
                              u.column_type,
                              u.owner_id,
                              u.table_id,
                              u.column_id,
                              u.column_intid
                FROM csmv$errors u
               WHERE u.error_type = 'EXCEED_SIZE'
               ORDER BY u.owner_name, u.table_name, u.column_name) LOOP
    SELECT MAX(cnvsize)
      INTO newmaxsz
      FROM csm$errors
     WHERE usr# = rec.owner_id
       AND obj# = rec.table_id
       AND col# = rec.column_id
       AND intcol# = rec.column_intid;
 
    DBMS_OUTPUT.PUT_LINE('alter table ' || rec.owner_name || '.' ||
                         rec.table_name || ' modify' || ' (' ||
                         rec.column_name || ' ' || rec.column_type || '(' ||
                         newmaxsz || '));');
  END LOOP;
END;

\