(十)Shapefile编码(上)

368 阅读4分钟

判断shp文件编码

首先来讲下shp文件组成,一个shp文件通常由:shp、dbf、shx、prj四个文件组成,其中prj代表空间参考信息,shx存储了索引信息,有助于程序加快搜索效率,shp保存了元素的几何实体,最后dbf里面存放了每个几何形状的属性数据,所以我们将shp文件编码,实际上是讲dbf文件编码,因为只有dbf来说,才有可能存储GBK编码的中文或其他编码的其他语言的。

对于ArcGIS来说,用ArcGIS做的数据默认都是GBK编码的,因为我们的windows默认是gbk编码,也有一些情况下可能是utf8编码的。

dbf文件的第30个字节代表了编码类型(不是绝对的,不过我试了很多软件,大部分还是遵循这个标准的),所以我们只需要读取dbf的第30个字节,然后根据一个dbf编码表就能得到这个shp的编码了。

代码很简单:

**

package cn.dev;

import java.io.File;
import java.io.FileInputStream;
import java.io.InputStream;
import java.io.InputStreamReader;

public class Learn09 {
    public static void main(String[] args) throws Exception {
        InputStream dbf = Learn09.class.getResourceAsStream("/point/point.dbf");
        byte[] bytes = new byte[30];
        dbf.read(bytes);
        byte b = bytes[29];
        System.out.println(Integer.toHexString(Byte.toUnsignedInt(b)));
    }
}

对于这个shp来说,输出结果是

**

4d

我们从下表中找到4d,可以看到他对应的是gbk编码。

IDCodepageDescription
10x01437US MS-DOS
20x02850International MS-DOS
30x031252Windows ANSI Latin I
40x0410000Standard Macintosh
80x08865Danish OEM
90x09437Dutch OEM
100x0A850Dutch OEM*
110x0B437Finnish OEM
130x0D437French OEM
140x0E850French OEM*
150x0F437German OEM
160x10850German OEM*
170x11437Italian OEM
180x12850Italian OEM*
190x13932Japanese Shift-JIS
200x14850Spanish OEM*
210x15437Swedish OEM
220x16850Swedish OEM*
230x17865Norwegian OEM
240x18437Spanish OEM
250x19437English OEM (Great Britain)
260x1A850English OEM (Great Britain)*
270x1B437English OEM (US)
280x1C863French OEM (Canada)
290x1D850French OEM*
310x1F852Czech OEM
340x22852Hungarian OEM
350x23852Polish OEM
360x24860Portuguese OEM
370x25850Portuguese OEM*
380x26866Russian OEM
550x37850English OEM (US)*
640x40852Romanian OEM
770x4D936Chinese GBK (PRC)
780x4E949Korean (ANSI/OEM)
790x4F950Chinese Big5 (Taiwan)
800x50874Thai (ANSI/OEM)
870x57Current ANSI CPANSI
880x581252Western European ANSI
890x591252Spanish ANSI
1000x64852Eastern European MS-DOS
1010x65866Russian MS-DOS
1020x66865Nordic MS-DOS
1030x67861Icelandic MS-DOS
1040x68895Kamenicky (Czech) MS-DOS
1050x69620Mazovia (Polish) MS-DOS
1060x6A737Greek MS-DOS (437G)
1070x6B857Turkish MS-DOS
1080x6C863French-Canadian MS-DOS
1200x78950Taiwan Big 5
1210x79949Hangul (Wansung)
1220x7A936PRC GBK
1230x7B932Japanese Shift-JIS
1240x7C874Thai Windows/MS–DOS
1340x86737Greek OEM
1350x87852Slovenian OEM
1360x88857Turkish OEM
1500x9610007Russian Macintosh
1510x9710029Eastern European Macintosh
1520x9810006Greek Macintosh
2000xC81250Eastern European Windows
2010xC91251Russian Windows
2020xCA1254Turkish Windows
2030xCB1253Greek Windows
2040xCC1257Baltic Windows

本节代码可以在github.com/scially/Geo…找到(Learn09.java)