重新发现Oracle之美(一)字符集
因实际业务需要,将部分16GBK字符集的数据库变更成AL32UTF8字符集;
无意间搜索MOS,发现ORACLE已经为我们DBA提供了解决方案了, 在8i,9i,10g和11g中将NLS_CHARACTERSET更改为AL32UTF8 / UTF8(Unicode)(文档ID 260192.1)
- 可以先将GBK的dmp文件导入到GBK的数据库中
- 使用csscan的方法,检查数据 是否符合UTF8的要求
- 检查csscan的结果集,并做出相应的数据字典的调整
- [一、 Csscan安装(11.2.0.x)]
Installing Csscan in Oracle RDBMS 11.2.0.x
The default tablespace is SYSTEM, if you plan to run csscan against a big database then we advice to create a separate tablespace and change the $ORACLE_HOME/rdbms/admin/csminst.sql script to use this tablespace.
Modify the following statement in csminst.sql to assign your preferred tablespace to CSMIG instead of SYSTEM:
alter user csmig default tablespace SYSTEM quota unlimited on SYSTEM
The amount of space needed depends mainly on the amount of exceptions found ( convertible and/or lossy data) and is impossible to predict upfront. Its useful to use on the first run of csscan a tablespace that is limited in size to avoid any disk space issues.
Make a backup/copy of Csminst.sql and edit Csminst.sql and then run Csminst.sql using these commands using sqlplus of the server Oracle_Home (or when using a client, using the same base version for the client as the server
> set oracle_sid=<your SID>
> sqlplus /nolog
> SQL> conn / as sysdba
> SQL> set TERMOUT ON
> SQL> set ECHO ON
> SQL> spool csminst.log
> -- note the drop user
> SQL> drop user csmig cascade;
> SQL> @?/rdbms/admin/csminst.sql
Check the csminst.log for errors. The Csmig user is by default locked.
- [二、 使用csscan方法检查当前数据库是否满足utf8的要求,并输出日志]
语法:$ csscan "/ as sysdba" FULL=Y FROMCHAR= TOCHAR= LOG=dbcheck CAPTURE=N ARRAY=1000000 PROCESS=2
查看当前字符集:SELECT value FROM NLS_DATABASE_PARAMETERS WHERE parameter='NLS_CHARACTERSET';
$ export ORACLE_SID=
$ csscan "/ as sysdba" LOG=/tmp/expcheck full=y CAPTURE=N TOCHAR=AL32UTF8 ARRAY=1024000 PROCESS=6
csscan " 'sys as sysdba' " full=y fromchar=ZHS16GBK tochar=AL32UTF8 log=verify.log array=102400 process=6
-
TOCHAR = <当前NLS_CHARACTERSET>不是拼写错误,我们的想法是在更改NLS_CHARACTERSET 之前检查CURRENT字符集中未在此NLS_CHARACTERSET中定义的代码
-
PROCESS =参数会影响系统上的负载,这是更高的(例如6或8)将完成更快的Csscan,越低这将对您的系统产生的影响越小。如果需要,适应。
-
csscan SUPPRESS参数通过限制记录的信息量/表来限制.err文件的大小。使用SUPPRESS = 1000将为.err文件中的每个表记录最多1000行。它不会影响.txt文件中的信息。它将影响.err文件中记录的数据。这对于大数据库的第一次扫描非常有用,如果您不知道数据库中有多少“可转换”或“有损”,那么这将避免.err文件变为100的MB大而且它也限制了Csmig模式下csscan表使用的空间。
-
不要指定TO N CHAR或FROM N CHAR csscan参数,这些参数用于更改NLS_NCHAR_CHARACTERSET。同样它们不是必需的,不应该指*
--查看csscan扫描进度
sqlplus / as sysdba
SET pages 1000
SELECT target,
TO_CHAR(start_time, 'HH24:MI:SS - DD-MM-YY'),
time_remaining,
sofar,
totalwork,
sid,
serial#,
opname
FROM v$session_longops
WHERE sid IN
(SELECT sid FROM v$session WHERE upper(program) LIKE 'CSSCAN%')
AND sofar < totalwork
ORDER BY start_time;
Csscan将创建3个文件:
- dbcheck.out csscan 输出的日志
- dbcheck.txt 数据库扫描摘要报告
- dbcheck.err 包含dbcheck.txt中报告的有损行的rowid(如果有)。
此Csscan 用于检查所有数据是否正确存储在当前字符集中。由于TOCHAR和FROMCHAR字符设置相同,因此dbcheck.txt中不会报告任何“Convertible”或“Truncation”数据。
--通过以下脚本可以查询到 哪些表,哪些字段需要扩展
DECLARE
newmaxsz NUMBER;
BEGIN
FOR rec IN (SELECT DISTINCT u.owner_name,
u.table_name,
u.column_name,
u.column_type,
u.owner_id,
u.table_id,
u.column_id,
u.column_intid
FROM csmv$errors u
WHERE u.error_type = 'EXCEED_SIZE'
ORDER BY u.owner_name, u.table_name, u.column_name) LOOP
SELECT MAX(cnvsize)
INTO newmaxsz
FROM csm$errors
WHERE usr# = rec.owner_id
AND obj# = rec.table_id
AND col# = rec.column_id
AND intcol# = rec.column_intid;
DBMS_OUTPUT.PUT_LINE('alter table ' || rec.owner_name || '.' ||
rec.table_name || ' modify' || ' (' ||
rec.column_name || ' ' || rec.column_type || '(' ||
newmaxsz || '));');
END LOOP;
END;
\