使用如下代码得到汉字“年”的unicdoe 编码和UTF8编码:
package test;
import java.io.UnsupportedEncodingException;
import java.net.URLEncoder;
public class Test {
public static String getUTF8EnCodeFromText(String text) {
StringBuffer sb = new StringBuffer();
sb.append(text);
String xmString = "";
String xmlUTF8 = "";
try {
xmString = new String(sb.toString().getBytes("UTF-8"));
xmlUTF8 = URLEncoder.encode(xmString, "UTF-8");
System.out.println("UTF8 Code:" + xmlUTF8) ;
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
return xmlUTF8;
}
public static String getUniCode(char single) {
StringBuffer output = new StringBuffer();
output.append(Integer.toString(single, 16));
System.out.println("Unicode: " + output);
return output.toString();
}
public static void main(String[] args) throws UnsupportedEncodingException {
Test.getUTF8EnCodeFromText("2014年12月1日和联想有一个重要的销售会议");
Test.getUniCode('年');
Test.getUTF8EnCodeFromText("年");
char ab = ((char)Integer.parseInt("5e74", 16));
System.out.println("original character: " + ab);
}
}
输出:
如果用记事本打开一个具有如下内容的txt文件:
用hex editor打开,发现该字符的编码为 C4 EA
通过检查发现该txt file的保存方式是ANSI:
改成utf8后,再用hex eidtor打开就能观察到期望的编码如下:
通过访问网站: www.ab173.com/utf8.php
能得到如何用javascript 进行转换的source code:
\