APK Canary

这个模块下包含了以下功能：

1.输入的Apk文件首先会经过UnzipTask处理，解压到指定目录，在这一步还会做一些全局的准备工作，包括反混淆类名（读取mapping.txt）、反混淆资源(读取resMapping.txt)、统计文件大小等。 2.接下来的若干Task即用来实现各种检查规则，这些Task可以并行执行，下面一一简单介绍各个Task的实现方法:

ManifestAnalyzeTask 用于读取AndroidManifest.xml中的信息，如：packageName、verisonCode、clientVersion等。实现方法：利用ApkTool中的 AXmlResourceParser 来解析二进制的AndroidManifest.xml文件，并且可以反混淆出AndroidManifest.xml中引用的资源名称。
ShowFileSizeTask 根据文件大小以及文件后缀名来过滤出超过指定大小的文件，并按照升序或降序排列结果。实现方法：直接利用UnzipTask中统计的文件大小来过滤输出结果。
MethodCountTask 可以统计出各个Dex中的方法数，并按照类名或者包名来分组输出结果。实现方法：利用google开源的 com.android.dexdeps 类库来读取dex文件，统计方法数。
ResProguardCheckTask 可以判断apk是否经过了资源混淆实现方法：资源混淆之后的res文件夹会重命名成r，直接判断是否存在文件夹r即可判断是否经过了资源混淆。
FindNonAlphaPngTask 可以检测出apk中非透明的png文件实现方法：通过 java.awt.BufferedImage 类读取png文件并判断是否有alpha通道。
MultiLibCheckTask 可以判断apk中是否有针对多个ABI的so 实现方法：直接判断lib文件夹下是否包含多个目录。
CheckMultiSTLTask 可以检测apk中的so是否静态链接STL 实现方法：通过nm工具来读取so的符号表，如果出现 std:: 即表示so静态链接了STL。
CountRTask 可以统计R类以及R类的中的field数目实现方法：同样是利用 com.android.dexdeps 类库来读取dex文件，找出R类以及field数目。
UncompressedFileTask 可以检测出未经压缩的文件类型实现方法：直接利用UnzipTask中统计的各个文件的压缩前和压缩后的大小，判断压缩前和压缩后大小是否相等。
DuplicatedFileTask 可以检测出冗余的文件实现方法：通过比较文件的MD5是否相等来判断文件内容是否相同。
UnusedResourceTask 可以检测出apk中未使用的资源，对于getIdentifier获取的资源可以加入白名单实现方法：（1）过读取R.txt获取apk中声明的所有资源得到declareResourceSet；（2）通过读取smali文件中引用资源的指令（包括通过reference和直接通过资源id引用资源）得出class中引用的资源classRefResourceSet；（3）通过ApkTool解析res目录下的xml文件、AndroidManifest.xml 以及 resource.arsc 得出资源之间的引用关系；（4）根据上述几步得到的中间数据即可确定出apk中未使用到的资源。
UnusedAssetsTask 可以检测出apk中未使用的assets文件实现方法：搜索smali文件中引用字符串常量的指令，判断引用的字符串常量是否某个assets文件的名称
UnStrippedSoCheckTask 可以检测出apk中未经裁剪的动态库文件实现方法：使用nm工具读取动态库文件的符号表，若输出结果中包含no symbols字样则表示该动态库已经过裁剪

设计模式

由于模块中包含众多功能，我们将不同的功能抽象成Task，然后根据面向对象设计模式，我们抽象出一个TaskFactory，用于生成不同的Task。

UnZipTask

目的：解压Apk，解析Class混淆规则、Res混淆规则，并输出apk中每个entry原始大小、zip包中压缩后的大小。主要存储了一些原始数据，为后续的Task做准备。

@Override
    public TaskResult call() throws TaskExecuteException {

        try {
            //apk文件
            ZipFile zipFile = new ZipFile(inputFile);
            ...
            //Result输出对象
            TaskResult taskResult = TaskResultFactory.factory(getType(), TASK_RESULT_TYPE_JSON, config);
            ...
            //apk总大小
            ((TaskJsonResult) taskResult).add("total-size", inputFile.length());
            //读取Class的mapping规则，并存储到config对象中
            readMappingTxtFile();
            config.setProguardClassMap(proguardClassMap);
            //读取Res的mapping规则，并存储到config对象中
            readResMappingTxtFile();
            config.setResguardMap(resguardMap);

            Enumeration entries = zipFile.entries();
            JsonArray jsonArray = new JsonArray();
            String outEntryName = "";
            while (entries.hasMoreElements()) {
                ZipEntry entry = (ZipEntry) entries.nextElement();
                outEntryName = writeEntry(zipFile, entry);
                if (!Util.isNullOrNil(outEntryName)) {
                    JsonObject fileItem = new JsonObject();
                    //输出Apk中每个item的名字、压缩后的大小
                    fileItem.addProperty("entry-name", outEntryName);
                    fileItem.addProperty("entry-size", entry.getCompressedSize());
                    jsonArray.add(fileItem);
                    //Map：解压后文件（相对路径）-> (未压缩Size，压缩后Size)
                    entrySizeMap.put(outEntryName, Pair.of(entry.getSize(), entry.getCompressedSize()));
                    //Map：Apk中文件名 -> ：解压后文件（相对路径）
                    entryNameMap.put(entry.getName(), outEntryName);
                }
            }
            //存储到config对象
            config.setEntrySizeMap(entrySizeMap);
            config.setEntryNameMap(entryNameMap);
            //输出到Result
            ((TaskJsonResult) taskResult).add("entries", jsonArray);
            taskResult.setStartTime(startTime);
            taskResult.setEndTime(System.currentTimeMillis());
            return taskResult;
        } catch (Exception e) {
            throw new TaskExecuteException(e.getMessage(), e);
        }
    }
    
     private String parseResourceNameFromPath(String dir, String filename) {
        if (Util.isNullOrNil(dir) || Util.isNullOrNil(filename)) {
            return "";
        }

        String type = dir.substring(dir.indexOf('/') + 1);
        int index = type.indexOf('-');
        if (index >= 0) {
            type = type.substring(0, index);
        }
        index = filename.indexOf('.');
        if (index >= 0) {
            filename = filename.substring(0, index);
        }
        return "R." + type + "." + filename;
    }

    private String reverseResguard(String dirName, String filename) {
        String outEntryName = "";
        if (resDirMap.containsKey(dirName)) {
            String newDirName = resDirMap.get(dirName);
            final String resource = parseResourceNameFromPath(newDirName, filename);
            int suffixIndex = filename.indexOf('.');
            String suffix = "";
            if (suffixIndex >= 0) {
                suffix = filename.substring(suffixIndex);
            }
            if (resguardMap.containsKey(resource)) {
                int lastIndex =  resguardMap.get(resource).lastIndexOf('.');
                if (lastIndex >= 0) {
                    filename = resguardMap.get(resource).substring(lastIndex + 1) + suffix;
                }
            }
            outEntryName = newDirName + "/" + filename;
        }
        return outEntryName;
    }

    private String writeEntry(ZipFile zipFile, ZipEntry entry) throws IOException {

        int readSize;
        byte[] readBuffer = new byte[4096];
        BufferedOutputStream bufferedOutput = null;
        InputStream zipInputStream = null;
        String entryName = entry.getName();
        String outEntryName = null;
        String filename;
        File dir;
        File file = null;
        int index = entryName.lastIndexOf('/');
        if (index >= 0) {
            filename = entryName.substring(index + 1);
            String dirName = entryName.substring(0, index);
            dir = new File(outputFile, dirName);
            if (!dir.exists() && !dir.mkdirs()) {
                Log.e(TAG, "%s mkdirs failed!", dir.getAbsolutePath());
                return null;
            }
            if (!Util.isNullOrNil(filename)) {
                file = new File(dir, filename);
                outEntryName = reverseResguard(dirName, filename);
                if (Util.isNullOrNil(outEntryName)) {
                    outEntryName = entryName;
                }
            }
        } else {
            file = new File(outputFile, entryName);
            outEntryName = entryName;
        }
        try {
            if (file != null) {
                if (!file.createNewFile()) {
                    Log.e(TAG, "create file %s failed!", file.getAbsolutePath());
                    return null;
                }
                bufferedOutput = new BufferedOutputStream(new FileOutputStream(file));
                zipInputStream = zipFile.getInputStream(entry);
                while ((readSize = zipInputStream.read(readBuffer)) != -1) {
                    bufferedOutput.write(readBuffer, 0, readSize);
                }
            } else {
                return null;
            }
        } finally {
            if (zipInputStream != null) {
                zipInputStream.close();
            }
            if (bufferedOutput != null) {
                bufferedOutput.close();
            }
        }
        return outEntryName;
    }

mapping文件的解析规则：

Class Mapping

    ...
android.arch.core.executor.ArchTaskExecutor$1 -> android.arch.a.a.a$1:
    42:42:void <init>() -> <init>
    45:46:void execute(java.lang.Runnable) -> execute
android.arch.core.executor.ArchTaskExecutor$2 -> android.arch.a.a.a$2:
    50:50:void <init>() -> <init>
    53:54:void execute(java.lang.Runnable) -> execute
android.arch.core.executor.DefaultTaskExecutor -> android.arch.a.a.b:
    java.lang.Object mLock -> a
    java.util.concurrent.ExecutorService mDiskIO -> b
    android.os.Handler mMainHandler -> c
    31:33:void <init>() -> <init>
    40:41:void executeOnDiskIO(java.lang.Runnable) -> a
    45:54:void postToMainThread(java.lang.Runnable) -> b
    58:58:boolean isMainThread() -> b
    ...

* 原始类名 -> 混淆后类名 （顶格）
* 原始字段名 -> 混淆后字段名   （行首预留一个Tab）
* 原始函数名 -> 混淆后函数名   （行首预留一个Tab）

Res Mapping

res path mapping:
    res/layout-v22 -> r/a
    res/drawable -> r/b
    res/color-night-v8 -> r/c
    res/xml -> r/d
    res/layout -> r/e
  ...
  
res id mapping:
    com.example.app.R.attr.avatar_border_color -> com.example.app.R.attr.a
    com.example.app.R.attr.actualImageScaleType -> com.example.app.R.attr.b
    com.example.app.R.attr.backgroundImage -> com.example.app.R.attr.c
    com.example.app.R.attr.fadeDuration -> com.example.app.R.attr.d
    com.example.app.R.attr.failureImage -> com.example.app.R.attr.e

* 原始资源目录 -> 混淆后资源目录
* 原始资源名 -> 混淆后资源名

ManifestAnalyzeTask

这个Task主要用来解析manifest文件和resources.arsc文件。

AndroidManifest解析

ManifestAnalyzeTask

总体上按顺序分为四大部分：

Header : 包括文件魔数和文件大小
String Chunk : 字符串资源池
ResourceId Chunk : 系统资源 id 信息
XmlContent Chunk : 清单文件中的具体信息，其中包含了五个部分，Start Namespace Chunk 、End Namespace Chunk 、Start Tag Chunk 、End Tag Chunk 、 Text Chunk

Header

头部由 Magic Number 和 File Size 组成，各自都是 4 字节。

Magic Number 始终为 0x0008003。
File Size 表示文件总字节数，

private void parseHeader() {
    try {
        Xml.nameSpaceMap.clear();
        String magicNumber = reader.readHexString(4);
        log("magic number: %s", magicNumber);

        int fileSize = reader.readInt();
        log("file size: %d", fileSize);
    } catch (IOException e) {
        e.printStackTrace();
        log("parse header error!");
    }
}

String Chunk

String Chunk 主要存储了清单文件中的所有字符串信息。结构还是很清晰的。结合上图逐条解释一下：

Chunk Type : 4 bytes，始终为 0x001c0001，标记这是 String Chunk
Chunk Size : 4 bytes，表示 String Chunk 的大小
String Count : 4 bytes，表示字符串的数量
Style Count : 4 bytes，表示样式的数量
Unkown : 4 bytes,固定值，0x00000000
String Pool Offset : 字符串池的偏移量，注意不是相对于文件开始处，而是相对于 String Chunk 的开始处
Style Pool Offset : 样式池的偏移量，同上，也是相对于 String Chunk 而言
String Offsets : int数组，大小为 String Count，存储每个字符串在字符串池中的相对偏移量
Style Offets : 同上，也是 int 数组。总大小为 Style Count * 4 bytes
String Pool : 字符串池，存储了所有的字符串
Style Pool : 样式池，存储了所有的样式

字符串池中的字符串存储也有特定的格式，以 versionName 为例：

前两个字节表示字符串的字符数，注意一个字符是两个字节。如上图所示，字符数为 11 ，则后面 22 个字节表示字符串内容，最后以 0000 结尾。如此循环。

样式池在解析过程中一般都为空，样式数量也为 0。

了解了 String Chunk 的结构之后，解析就很简单了。直接上代码：

private void parseStringChunk() {
        try {
            String chunkType = reader.readHexString(4);
            log("chunk type: %s", chunkType);

            int chunkSize = reader.readInt();
            log("chunk size: %d", chunkSize);

            int stringCount = reader.readInt();
            log("string count: %d", stringCount);

            int styleCount = reader.readInt();
            log("style count: %d", styleCount);

            reader.skip(4);  // unknown

            int stringPoolOffset = reader.readInt();
            log("string pool offset: %d", stringPoolOffset);

            int stylePoolOffset = reader.readInt();
            log("style pool offset: %d", stylePoolOffset);

            // 每个 string 的偏移量
            List<Integer> stringPoolOffsets = new ArrayList<>(stringCount);
            for (int i = 0; i < stringCount; i++) {
                stringPoolOffsets.add(reader.readInt());
            }

            // 每个 style 的偏移量
            List<Integer> stylePoolOffsets = new ArrayList<>(styleCount);
            for (int i = 0; i < styleCount; i++) {
                stylePoolOffsets.add(reader.readInt());
            }

            log("string pool:");
            for (int i = 1; i <= stringCount; i++) { // 没有读最后一个字符串
                String string;
                if (i == stringCount) {
                    int lastStringLength = reader.readShort() * 2;
                    string = new String(moveBlank(reader.readOrigin(lastStringLength)));
                    reader.skip(2);
                } else {
                    reader.skip(2); // 字符长度
                    // 根据偏移量读取字符串
                    byte[] content = reader.readOrigin(stringPoolOffsets.get(i) - stringPoolOffsets.get(i - 1) - 4);
                    reader.skip(2); // 跳过结尾的 0000
                    string = new String(moveBlank(content));

                }
                log("   %s", string);
                stringChunkList.add(string);
            }


            log("style pool:");
            for (int i = 1; i < styleCount; i++) {
                reader.skip(2);
                byte[] content = reader.readOrigin(stylePoolOffsets.get(i) - stylePoolOffsets.get(i - 1) - 4);
                reader.skip(2);
                String string = new String(content);
                log("   %s", string);
            }

        } catch (IOException e) {
            e.printStackTrace();
            log("parse StringChunk error!");
        }
    }

ResourceId Chunk

Chunk Type ： 4 字节，固定值，0x00080180，标识 ResourceId Chunk
Chunk Size ： 4 字节，标识此 Chunk 的字节数
ResourceIds ： int 数组，大小为 (chunkSize - 8) / 4

private void parseResourceIdChunk() {
        try {
            String chunkType = reader.readHexString(4);
            log("chunk type: %s", chunkType);

            int chunkSize = reader.readInt();
            log("chunk size: %d", chunkSize);

            int resourcesIdChunkCount = (chunkSize - 8) / 4;
            for (int i = 0; i < resourcesIdChunkCount; i++) {
                String resourcesId = reader.readHexString(4);
                log("resource id[%d]: %s", i, resourcesId);
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

XmlContent Chunk

private void parseXmlContentChunk() {
        try {
            while (reader.avaliable() > 0) {
                int chunkType = reader.readInt();
                switch (chunkType) {
                    case Xml.START_NAMESPACE_CHUNK_TYPE:
                        parseStartNamespaceChunk();
                        break;
                    case Xml.START_TAG_CHUNK_TYPE:
                        parseStartTagChunk();
                        break;
                    case Xml.END_TAG_CHUNK_TYPE:
                        parseEndTagChunk();
                        break;
                    case Xml.END_NAMESPACE_CHUNK_TYPE:
                        parseEndNamespaceChunk();
                        break;
                    case Xml.TEXT_CHUNK_TYPE:
                        parseTextChunk();
                        break;
                }
            }
        } catch (IOException e) {
            e.printStackTrace();
            log("parse XmlContentChunk error!");
        }
    }

通过 chunkType 来循环读取不同类型的 chunk 并进行解析。每一种 chunk 都具有类似的数据结构，我定义了一个抽象类 Chunk 作为不同 chunk 的基类：

public abstract class Chunk {

    int chunkType; // 标识不同 chunk 类型
    int chunkSize; // 该 chunk 字节数
    int lineNumber; // 行号

    Chunk(int chunkType){
        this.chunkType=chunkType;
    }

    public abstract String toXmlString();
}

这三个属性再加上 Unkown(0xFFFFFFFF)，这前 16 个字节是这五种 chunk 中都有的，后面不再特别叙述。

Start Namespace Chunk

Start Namespace Chunk 一般存储了清单文件的命名空间信息。再回顾一下 Start Namespace Chunk 的结构：

我们着重看一下最后两项 Prefix 和 Uri。Prefix 是一个索引值，4 字节，指向字符串池中对应的字符串，表示命名空间的前缀。Uri 同样也是指向字符串池中对应索引的字符串，表示命名空间的 uri。看上图 010 editor 截图中的例子，Prefix 值为 46， Uri 值为 47。查看前面解析过的字符串池，发现这两个字符串分别是 android 和 schemas.android.com/apk/res/and… AndroidManifest.xml 文件的命名空间。

private void parseStartNamespaceChunk() {
        log("\nparse Start NameSpace Chunk");
        log("chunk type: 0x%x", Xml.START_NAMESPACE_CHUNK_TYPE);

        try {
            int chunkSize = reader.readInt();
            log("chunk size: %d", chunkSize);

            int lineNumber = reader.readInt();
            log("line number: %d", lineNumber);

            reader.skip(4); // 0xffffffff

            int prefix = reader.readInt();
            log("prefix: %s", stringChunkList.get(prefix));

            int uri = reader.readInt();
            log("uri: %s", stringChunkList.get(uri));

            StartNameSpaceChunk startNameSpaceChunk = new StartNameSpaceChunk(chunkSize, lineNumber, prefix, uri);
            chunkList.add(startNameSpaceChunk);

            Xml.nameSpaceMap.put(stringChunkList.get(prefix), stringChunkList.get(uri));
        } catch (IOException e) {
            e.printStackTrace();
            log("parse Start NameSpace Chunk error!");
        }
    }

End Namespace Chunk

此 chunk 与 Start Namespace Chunk 结构完全一致，解析过程也完全一致，不再赘述。

Start Tag Chunk

Start Tag Chunk 是所有 chunk 中结构最复杂的一个，存储了清单文件中最重要的标签信息。通过这一个 chunk，基本上就可以获取 AndroidManifest.xml 的所有信息了。

Namespace uri ：这个标签用到的命名空间 uri 在字符串池中的索引。值为 -1 表示没有用到命名空间 uri。标签的一般都没有使用到命名空间，此值为 -1
Name : 标签名称在字符串池中的索引
Flags : 固定值，0x00140014，暂未发现有何作用
Attribute Count : 4 bytes，表示标签包含的属性个数
Class Attribute : 4 bytes，表示标签包含的类属性个数。解析过程中此项常为 0
Attributes : 属性集合，大小为 Attribute Count

标签中包含了属性集合，这就是清单文件的重要组成部分。

每个属性固定 20 个字节，包含 5 个字段，每个字段都是 4 字节无符号 int，各个字段含义如下：

NamespaceUri : 属性的命名空间 uri 在字符串池中的索引。此处很少会等于 -1
name : 属性名称在字符串池中的索引
valueStr : 属性值
type : 属性类型
data : 属性数据

属性根据 type 的不同，其属性值的表达形式也是不一样的。比如表示权限的 android:name="android.permission.NFC"，指向资源id 的 android:theme="@2131624762"，表示大小的 android:value="632.0dip" 等等。Android 源码中就提供了根据 type 和 data 获取属性值字符串的方法，这个方法就是 TypedValue.coerceToString(int type, int data),代码如下：

/**
     * Perform type conversion as per {@link #coerceToString()} on an explicitly
     * supplied type and data.
     *
     * @param type
     *            The data type identifier.
     * @param data
     *            The data value.
     *
     * @return String The coerced string value. If the value is null or the type
     *         is not known, null is returned.
     */
    public static final String coerceToString(int type, int data) {
        switch (type) {
            case TYPE_NULL:
                return null;
            case TYPE_REFERENCE:
                return "@" + data;
            case TYPE_ATTRIBUTE:
                return "?" + data;
            case TYPE_FLOAT:
                return Float.toString(Float.intBitsToFloat(data));
            case TYPE_DIMENSION:
                return Float.toString(complexToFloat(data))
                        + DIMENSION_UNIT_STRS[(data >> COMPLEX_UNIT_SHIFT)
                        & COMPLEX_UNIT_MASK];
            case TYPE_FRACTION:
                return Float.toString(complexToFloat(data) * 100)
                        + FRACTION_UNIT_STRS[(data >> COMPLEX_UNIT_SHIFT)
                        & COMPLEX_UNIT_MASK];
            case TYPE_INT_HEX:
                return String.format("0x%08X", data);
            case TYPE_INT_BOOLEAN:
                return data != 0 ? "true" : "false";
        }

        if (type >= TYPE_FIRST_COLOR_INT && type <= TYPE_LAST_COLOR_INT) {
            String res = String.format("%08x", data);
            char[] vals = res.toCharArray();
            switch (type) {
                default:
                case TYPE_INT_COLOR_ARGB8:// #AaRrGgBb
                    break;
                case TYPE_INT_COLOR_RGB8:// #FFRrGgBb->#RrGgBb
                    res = res.substring(2);
                    break;
                case TYPE_INT_COLOR_ARGB4:// #AARRGGBB->#ARGB
                    res = new StringBuffer().append(vals[0]).append(vals[2])
                            .append(vals[4]).append(vals[6]).toString();
                    break;
                case TYPE_INT_COLOR_RGB4:// #FFRRGGBB->#RGB
                    res = new StringBuffer().append(vals[2]).append(vals[4])
                            .append(vals[6]).toString();
                    break;
            }
            return "#" + res;
        } else if (type >= TYPE_FIRST_INT && type <= TYPE_LAST_INT) {
            String res;
            switch (type) {
                default:
                case TYPE_INT_DEC:
                    res = Integer.toString(data);
                    break;
            }
            return res;
        }

        return null;
    }

到这里，我们已经可以解析标签和属性了。对整个 Start Tag Chunk 的解析代码如下：

private void parseStartTagChunk() {
        log("\nparse Start Tag Chunk");
        log("chunk type: 0x%x", Xml.START_TAG_CHUNK_TYPE);

        try {
            int chunkSize = reader.readInt();
            log("chunk size: %d", chunkSize);

            int lineNumber = reader.readInt();
            log("line number: %d", lineNumber);

            reader.skip(4); // 0xffffffff

            int namespaceUri = reader.readInt();
            if (namespaceUri == -1)
                log("namespace uri: null");
            else
                log("namespace uri: %s", stringChunkList.get(namespaceUri));

            int name = reader.readInt();
            log("name: %s", stringChunkList.get(name));

            reader.skip(4); // flag 0x00140014

            int attributeCount = reader.readInt();
            log("attributeCount: %d", attributeCount);

            int classAttribute = reader.readInt();
            log("class attribute: %s", classAttribute);

            List<Attribute> attributes = new ArrayList<>();
            // 每个 attribute 五个属性，每个属性 4 字节
            for (int i = 0; i < attributeCount; i++) {

                log("Attribute[%d]", i);

                int namespaceUriAttr = reader.readInt();
                if (namespaceUriAttr == -1)
                    log("   namespace uri: null");
                else
                    log("   namespace uri: %s", stringChunkList.get(namespaceUriAttr));

                int nameAttr = reader.readInt();
                if (nameAttr == -1)
                    log("   name: null");
                else
                    log("   name: %s", stringChunkList.get(nameAttr));

                int valueStr = reader.readInt();
                if (valueStr == -1)
                    log("   valueStr: null");
                else
                    log("   valueStr: %s", stringChunkList.get(valueStr));

                int type = reader.readInt() >> 24;
                log("   type: %d", type);

                int data = reader.readInt();
                String dataString = type == TypedValue.TYPE_STRING ? stringChunkList.get(data) : TypedValue.coerceToString(type, data);
                log("   data: %s", dataString);

                Attribute attribute = new Attribute(namespaceUriAttr == -1 ? null : stringChunkList.get(namespaceUriAttr),
                        stringChunkList.get(nameAttr), valueStr, type, dataString);
                attributes.add(attribute);
            }
            StartTagChunk startTagChunk = new StartTagChunk(namespaceUri, stringChunkList.get(name), attributes);
            chunkList.add(startTagChunk);
        } catch (IOException e) {
            e.printStackTrace();
            log("parse Start NameSpace Chunk error!");
        }
    }

End Tag Chunk

End Tag Chunk 一共有 6 项数据，也就是 Start Tag Chunk 的前 6 项。该项用来标识一个标签的结束。在生成 xml 的过程中，遇到此标签，就可以将当前解析出的标签结束掉。就像上面的 manifest 标签，就可以给它加上结束标签了。

resources.arsc解析

ARSC 文件就是一个资源索引表，它可以帮助系统根据资源 ID 快速找到资源。比如微信的资源混淆工具。其实就是通过读取resources.arsc文件记录资源相关信息的地方，然后统一替换资源目录名，到达资源混淆的目的。

resources.arsc由各种类型Chunk块组成,Chunk块的头部信息记录块的类型、长度等信息。 resources.arsc整体就是一个Chunk块，块的头部信息记录块的类型、长度等信息。而这个顶级块又由其它类型的Chunk组成，同样在块的头部记录了块的类型、长度等信息。

resources.arsc文件采用小端编码方式.数据应该按字节从低位往高位读。

resources.arsc文件结构图

图的第一行分别是：头部类型(两个字节)，头部大小(两个字节)，Chunk块大小(四个字节)，package个数（package数指的是resources.arsc里面包含了多少个package的资源,一般只有一个）。

resources.arsc数据类型在Android系统层的定义文件路径：./frameworks/base/libs/androidfw/include/androidfw/ResourceTypes.h

整体上可以分为下面三大块：

ResTableHeader : 文件头
ResStringPool ：资源项值字符串池
ResTablePackage ：数据块

ResTableHeader

struct ResTable_header
{
    struct ResChunk_header header;

    // The number of ResTable_package structures.
    uint32_t packageCount;
};

这里的 header 是 ResChunk_header 类型，我们先来看一下这个类，它在 ARSC 文件的其他部分也会出现很多次。其实 ARSC 文件和 AndroidManifest.xml 文件有一些类似，也是由一个一个 Chunk 组成的。每一个 Chunk 都有固定的 ResTable_header，具体格式如下：

struct ResChunk_header
{
    // Type identifier for this chunk.  The meaning of this value depends
    // on the containing chunk.
    uint16_t type;

    // Size of the chunk header (in bytes).  Adding this value to
    // the address of the chunk allows you to find its associated data
    // (if any).
    uint16_t headerSize;

    // Total size of this chunk (in bytes).  This is the chunkSize plus
    // the size of any data associated with the chunk.  Adding this value
    // to the chunk allows you to completely skip its contents (including
    // any child chunks).  If this value is the same as chunkSize, there is
    // no data associated with the chunk.
    uint32_t size;
};

type 是该 Chunk 的标识符，不同的 Chunk 都有自己的标识符。headerSize 表示当前 Chunk Header 的大小。size 表示当前 Chunk 的大小。 ResChunkHeader 的结构还是很简单的，我们再回到 ResTableHeader。它除了 header 字段之外，还有一个 packageCount 字段，表示 ARSC 文件 ResTablePackage 的个数，即数据块的个数，通常是 1。

ResStringPool

ResStringPool 也有一个头，叫做 ResStringPoolHeader，其格式如下：

struct ResStringPool_header
{
    struct ResChunk_header header;

    // Number of strings in this pool (number of uint32_t indices that follow
    // in the data).
    uint32_t stringCount;

    // Number of style span arrays in the pool (number of uint32_t indices
    // follow the string indices).
    uint32_t styleCount;

    // Flags.
    enum {
        // If set, the string index is sorted by the string values (based
        // on strcmp16()).
        SORTED_FLAG = 1<<0,

        // String pool is encoded in UTF-8
        UTF8_FLAG = 1<<8
    };
    uint32_t flags;

    // Index from header of the string data.
    uint32_t stringsStart;

    // Index from header of the style data.
    uint32_t stylesStart;
};

有六个字段，来分别看一下：

header : ResChunkHeader，其 type 是 RES_STRING_POOL_TYPE
stringCount : 字符串个数
styleCount : 字符串样式个数
flags : 字符串的属性，可取值包括0x000(UTF-16),0x001(字符串经过排序)、0X100(UTF-8)和他们的组合值
stringsStart : 字符串内容偏移量
stylesStart : 字符串样式内容偏移量

ResStringPoolHeader 之后跟着的是两个偏移量数组 stringOffsets 和 styleOffsets，分别是字符串内容偏移量数组和字符串样式内容偏移量数组。上面提到的偏移量都是相对整个 ResStringPool 的。根据起始偏移量和每个字符串的偏移量数组，我们就可以获取到所有字符串了。注意这里的字符串并不是纯粹的字符串，它也是有结构的。 u16len 和 u8len，分别代表 UTF-8 和 UTF-16 下的字符串长度。那么如何区分呢？之前的 ResStringPoolHeader 中的 flags 属性就标记了编码格式。如果是 utf-8，则字符串以 0x00 结尾，开头前两个字节分别表示 u8len 和 u16len。如果是 utf-16，则字符串以 0x0000 结尾，开头前两个字节表示 u16len，没有 u8len 字段。

private ResStringPoolHeader parseStringPoolType(List<String> stringPoolList) {
    int currentPosition = reader.getCurrentPosition();
    ResStringPoolHeader stringPoolHeader = new ResStringPoolHeader();
    try {

        stringPoolHeader.parse(reader);
        List<Integer> stringOffsets = new ArrayList<>(stringPoolHeader.stringCount);
        for (int i = 0; i < stringPoolHeader.stringCount; i++) {
            int offset = reader.readInt();
            stringOffsets.add(offset);
        }

        List<Integer> styleOffsets = new ArrayList<>();
        for (int i = 0; i < stringPoolHeader.styleCount; i++) {
            styleOffsets.add(reader.readInt());
        }

        int position = reader.getCurrentPosition();
            for (int i = 0; i < stringPoolHeader.stringCount; i++) {
                int length = 0;
                int skipLength = 0;
                if (stringPoolHeader.flags == ResStringPoolHeader.UTF8_FLAG) {
                    int u16len = reader.read(position + stringOffsets.get(i), 1)[0];
                    int u8len = reader.read(position + stringOffsets.get(i), 1)[0];
                    length = u8len;
                    skipLength = 1; // 如果是 utf-8，则字符串以 0x00结尾
                } else {
                    int u16len =reader.readUnsignedShort();
                    length = u16len;
                    skipLength = 2; // 如果是 utf-16，则字符串以 0x0000结尾
                }
                String string = "";
                try {
                    string = new String(reader.read(position + stringOffsets.get(i) + 2, skipLength*length));
                    reader.skip(skipLength);
                } catch (Exception e) {
                    log("   parse string[%d] error!", i);
                }

                stringPoolList.add(string);
                log("   stringPool[%d]: %s", i, string);
            }

            for (int i = 0; i < stringPoolHeader.styleCount; i++) {
                int index = reader.readInt();
                int firstChar = reader.readInt();
                int lastChar = reader.readInt();
                ResSpanStyle resSpanStyle = new ResSpanStyle(index, firstChar, lastChar);
                log(resSpanStyle.toString());
                reader.skip(4); // 0xffff
            }
            reader.moveTo(currentPosition + stringPoolHeader.resChunkHeader.size);
            return stringPoolHeader;
    } catch (IOException e) {
        log("   parse string pool type error!");
    }
    return null;
}

ResTablePackage

ResTablePackage 又可以分为五小块，如下所示：

ResTablePackageHeader : 头信息
typeStrings : 资源类型字符串池
keyStrings ：资源项名称字符串池
ResTableTypeSpec ：资源表规范
ResTableType ：资源表类型配置

ResTablePackageHeader

struct ResTable_package
{
    struct ResChunk_header header;

    // If this is a base package, its ID.  Package IDs start
    // at 1 (corresponding to the value of the package bits in a
    // resource identifier).  0 means this is not a base package.
    uint32_t id;

    // Actual name of this package, \0-terminated.
    uint16_t name[128];

    // Offset to a ResStringPool_header defining the resource
    // type symbol table.  If zero, this package is inheriting from
    // another base package (overriding specific values in it).
    uint32_t typeStrings;

    // Last index into typeStrings that is for public use by others.
    uint32_t lastPublicType;

    // Offset to a ResStringPool_header defining the resource
    // key symbol table.  If zero, this package is inheriting from
    // another base package (overriding specific values in it).
    uint32_t keyStrings;

    // Last index into keyStrings that is for public use by others.
    uint32_t lastPublicKey;

    uint32_t typeIdOffset;
};

header : ResChunkHeader , 其 type 是 RES_TABLE_PACKAGE_TYPE
id : 包的 ID, 等于 Package Id,一般用户包的 Package Id 为 0X7F, 系统资源包的 Package Id 为 0X01。
name : 包名
typeStrings ：资源类型字符串池在 ResTablePackage 中的偏移量
lastPublicType ：一般资源类型字符串资源池的元素个数
keyStrings ：资源名称字符串池在 ResTablePackage 中的偏移量
lastPublicKey ：一般指资源项名称字符串资源池的元素个数。
typeIdOffset ：未知，值为 0

typeStrings

typeStrings 是资源类型字符串池，既然是资源类型，很容易就想到 string 、layout 、drawable 、mipmap 等等，这些都是资源类型。说直白点，就是通常写代码时候的 R. 后面跟的东西。typeStrings 就是一个 ResStringPool，所以它的解析方式和之前是一模一样的。

keyStrings

keyStrings 是资源项名称字符串池，它也是 ResStringPool。

资源项名称字符串池 keyStrings 之后是 ResTableTypeSpec 和 ResTableType ，它们是不定的交叉出现的。

ResTableTypeSpec

ResTableTypeSpec 是资源表规范，用来描述资源项的配置差异性。系统根据不同设备的配置差异就可以加载不同的资源项。该部分数据结构对应结构体 ResTable_typeSpec :

struct ResTable_typeSpec
{
    struct ResChunk_header header;

    // The type identifier this chunk is holding.  Type IDs start
    // at 1 (corresponding to the value of the type bits in a
    // resource identifier).  0 is invalid.
    uint8_t id;

    // Must be 0.
    uint8_t res0;
    // Must be 0.
    uint16_t res1;

    // Number of uint32_t entry configuration masks that follow.
    uint32_t entryCount;

    enum {
        // Additional flag indicating an entry is public.
        SPEC_PUBLIC = 0x40000000
    };
};

header : ResChunkHeader，其 type 是 RES_TABLE_TYPE_SPEC_TYPE
id : 标识资源的 Type ID, Type ID 是指资源的类型 ID 。资源的类型有 animator、anim、color、drawable、layout、menu、raw、string 和 xml 等等若干种，每一种都会被赋予一个 ID
res0 : must be 0
res1 : must be 0
entryCount : 等于本类型的资源项个数,指名称相同的资源项的个数

紧接着后面的是 entryCount 个 uint_32 数组，数组每个元素都是用来描述资源项的配置差异性的。

ResTableType

ResTableType 是资源项的具体信息，包括资源项的名称，类型，值和配置等等。对应结构体 ResTable_type ：

struct ResTable_type
{
    struct ResChunk_header header;

    enum {
        NO_ENTRY = 0xFFFFFFFF
    };

    // The type identifier this chunk is holding.  Type IDs start
    // at 1 (corresponding to the value of the type bits in a
    // resource identifier).  0 is invalid.
    uint8_t id;

    // Must be 0.
    uint8_t res0;
    // Must be 0.
    uint16_t res1;

    // Number of uint32_t entry indices that follow.
    uint32_t entryCount;

    // Offset from header where ResTable_entry data starts.
    uint32_t entriesStart;

    // Configuration this collection of entries is designed for.
    ResTable_config config;
};

header : ResChunkHeader，其 type 是 RES_TABLE_TYPE_TYPE
id : 标识资源的 Type ID, Type ID 是指资源的类型 ID 。资源的类型有 animator、anim、color、drawable、layout、menu、raw、string 和 xml 等等若干种，每一种都会被赋予一个 ID
res0 : must be 0
res1 : must be 0
entryCount : 资源项的个数
entryStart ：资源项相对于本结构的偏移量
config : 资源的配置信息

config 之后是一个大小为 entryCount 的 uint32_t 数组，用于描述资源项数据库的偏移量。这个偏移量数组之后是一个 ResTableEntry[]，我们再来看一下这块内容。 ResTableEntry 是资源项数据，对应结构体 ResTable_entry ：

struct ResTable_entry
{
    // Number of bytes in this structure.
    uint16_t size;

    enum {
        // If set, this is a complex entry, holding a set of name/value
        // mappings.  It is followed by an array of ResTable_map structures.
        FLAG_COMPLEX = 0x0001,
        // If set, this resource has been declared public, so libraries
        // are allowed to reference it.
        FLAG_PUBLIC = 0x0002,
        // If set, this is a weak resource and may be overriden by strong
        // resources of the same name/type. This is only useful during
        // linking with other resource tables.
        FLAG_WEAK = 0x0004
    };
    uint16_t flags;

    // Reference into ResTable_package::keyStrings identifying this entry.
    struct ResStringPool_ref key;
};

size : 该结构体大小
flags : 标志位
key : 资源项名称在资源项名称字符串资源池的索引

根据 flags 的不同，后面的数据结构也有所不同。如果 flags 包含 FLAG_COMPLEX(0x0001)，则该数据结构是 ResTableMapEntry，ResTableMapEntry 是继承自 ResTableEntry 的，在原有结构上多了两个 uint32_t 字段 parent 和 count。parent 表示父资源项的 ID。count 表示接下来有多少个 ResTableMap。ResTableMap 结构如下所示：

struct ResTable_map
{
    ResTable_ref name; // 资源名称
    Res_value value; // 资源值
}

再来看看 ResValue ：

struct Res_value {
    uint16_t size;
    uint8_t res0;
    uint8_t dataType;
    data_type data;
}

以上就是 flags 包含 FLAG_COMPLEX(0x0001)时表示的 ResTableMapEntry 的结构。如果不包含的话，就直接是 Res_value。

ShowFileSizeTask

统计超过阈值的文件。

public TaskResult call() throws TaskExecuteException {
        ...
            long startTime = System.currentTimeMillis();
            //获取UnZipTask中记录的 文件名->（文件压缩后大小，文件压缩前大小） map
            Map<String, Pair<Long, Long>> entrySizeMap = config.getEntrySizeMap();
            if (!entrySizeMap.isEmpty()) {                                        
                for (Map.Entry<String, Pair<Long, Long>> entry : entrySizeMap.entrySet()) {
                    final String suffix = getSuffix(entry.getKey());
                    Pair<Long, Long> size = entry.getValue();
                    // 记录超出阈值的文件
                    if (size.getFirst() >= downLimit * ApkConstants.K1024) {
                        if (filterSuffix.isEmpty() || filterSuffix.contains(suffix)) {
                            entryList.add(Pair.of(entry.getKey(), size.getFirst()));
                        } 
                    } 
                }
            }

           ...
           //排序

            JsonArray jsonArray = new JsonArray();
            for (Pair<String, Long> sortFile : entryList) {
                JsonObject fileItem = new JsonObject();
                fileItem.addProperty("entry-name", sortFile.getFirst());
                fileItem.addProperty("entry-size", sortFile.getSecond());
                jsonArray.add(fileItem);
            }
            //输出到结果
            ((TaskJsonResult) taskResult).add("files", jsonArray);
            taskResult.setStartTime(startTime);
            taskResult.setEndTime(System.currentTimeMillis());
            return taskResult;
        } catch (Exception e) {
            throw new TaskExecuteException(e.getMessage(), e);
        }
    }

MethodCountTask

统计在本dex文件内定义的方法数、未在本dex文件内定义的方法数。

public TaskResult call() throws TaskExecuteException {
        try {
            ...
            long startTime = System.currentTimeMillis();
            JsonArray jsonArray = new JsonArray();
            for (int i = 0; i < dexFileList.size(); i++) {
                RandomAccessFile dexFile = dexFileList.get(i);
                //计算dex中的方法信息
                countDex(dexFile);
                //dex内能找到定义的方法
                int totalInternalMethods = sumOfValue(classInternalMethod);
                //跨dex的方法
                int totalExternalMethods = sumOfValue(classExternalMethod);
                JsonObject jsonObject = new JsonObject();
                jsonObject.addProperty("dex-file", dexFileNameList.get(i));
                //按Class维度聚合
                if (JobConstants.GROUP_CLASS.equals(group)) {
                    List<String> sortList = sortKeyByValue(classInternalMethod);
                    JsonArray classes = new JsonArray();
                    for (String className : sortList) {
                        JsonObject classObj = new JsonObject();
                        classObj.addProperty("name", className);
                        classObj.addProperty("methods", classInternalMethod.get(className));
                        classes.add(classObj);
                    }
                    jsonObject.add("internal-classes", classes);
                    //按package维度聚合
                } else if (JobConstants.GROUP_PACKAGE.equals(group)) {
                    String packageName;
                    for (Map.Entry<String, Integer> entry : classInternalMethod.entrySet()) {
                        packageName = ApkUtil.getPackageName(entry.getKey());
                        if (!Util.isNullOrNil(packageName)) {
                            if (!pkgInternalRefMethod.containsKey(packageName)) {
                                pkgInternalRefMethod.put(packageName, entry.getValue());
                            } else {
                                pkgInternalRefMethod.put(packageName, pkgInternalRefMethod.get(packageName) + entry.getValue());
                            }
                        }
                    }
                    List<String> sortList = sortKeyByValue(pkgInternalRefMethod);
                    JsonArray packages = new JsonArray();
                    for (String pkgName : sortList) {
                        JsonObject pkgObj = new JsonObject();
                        pkgObj.addProperty("name", pkgName);
                        pkgObj.addProperty("methods", pkgInternalRefMethod.get(pkgName));
                        packages.add(pkgObj);
                    }
                    jsonObject.add("internal-packages", packages);
                }
                jsonObject.addProperty("total-internal-classes", classInternalMethod.size());
                jsonObject.addProperty("total-internal-methods", totalInternalMethods);

                if (JobConstants.GROUP_CLASS.equals(group)) {
                    List<String> sortList = sortKeyByValue(classExternalMethod);
                    JsonArray classes = new JsonArray();
                    for (String className : sortList) {
                        JsonObject classObj = new JsonObject();
                        classObj.addProperty("name", className);
                        classObj.addProperty("methods", classExternalMethod.get(className));
                        classes.add(classObj);
                    }
                    jsonObject.add("external-classes", classes);

                } else if (JobConstants.GROUP_PACKAGE.equals(group)) {
                    String packageName = "";
                    for (Map.Entry<String, Integer> entry : classExternalMethod.entrySet()) {
                        packageName = ApkUtil.getPackageName(entry.getKey());
                        if (!Util.isNullOrNil(packageName)) {
                            if (!pkgExternalMethod.containsKey(packageName)) {
                                pkgExternalMethod.put(packageName, entry.getValue());
                            } else {
                                pkgExternalMethod.put(packageName, pkgExternalMethod.get(packageName) + entry.getValue());
                            }
                        }
                    }
                    List<String> sortList = sortKeyByValue(pkgExternalMethod);
                    JsonArray packages = new JsonArray();
                    for (String pkgName : sortList) {
                        JsonObject pkgObj = new JsonObject();
                        pkgObj.addProperty("name", pkgName);
                        pkgObj.addProperty("methods", pkgExternalMethod.get(pkgName));
                        packages.add(pkgObj);
                    }
                    jsonObject.add("external-packages", packages);

                }
                jsonObject.addProperty("total-external-classes", classExternalMethod.size());
                jsonObject.addProperty("total-external-methods", totalExternalMethods);
                jsonArray.add(jsonObject);
            }
            ((TaskJsonResult) taskResult).add("dex-files", jsonArray);
            taskResult.setStartTime(startTime);
            taskResult.setEndTime(System.currentTimeMillis());
            return taskResult;
        } catch (Exception e) {
            throw new TaskExecuteException(e.getMessage(), e);
        }
    }

这段代码的重点是如何对dex文件进行静态分析的:

private void countDex(RandomAccessFile dexFile) throws IOException {
        classInternalMethod.clear();
        classExternalMethod.clear();
        pkgInternalRefMethod.clear();
        pkgExternalMethod.clear();
        DexData dexData = new DexData(dexFile);
        //加载dex数据
        dexData.load();
        MethodRef[] methodRefs = dexData.getMethodRefs();
        ClassRef[] externalClassRefs = dexData.getExternalReferences();
        //获取混淆的Class maping规则
        Map<String, String> proguardClassMap = config.getProguardClassMap();
        String className = null;
        for (ClassRef classRef : externalClassRefs) {
            className = ApkUtil.getNormalClassName(classRef.getName());
            if (proguardClassMap.containsKey(className)) {
                //混淆前的原始className
                className = proguardClassMap.get(className);
            }
            if (className.indexOf('.') == -1) {
                continue;
            }
            classExternalMethod.put(className, 0);
        }
        for (MethodRef methodRef : methodRefs) {
            className = ApkUtil.getNormalClassName(methodRef.getDeclClassName());
            if (proguardClassMap.containsKey(className)) {
                className = proguardClassMap.get(className);
            }
            if (!Util.isNullOrNil(className)) {
                if (className.indexOf('.') == -1) {
                    continue;
                }
                if (classExternalMethod.containsKey(className)) {
                    classExternalMethod.put(className, classExternalMethod.get(className) + 1);
                } else if (classInternalMethod.containsKey(className)) {
                    classInternalMethod.put(className, classInternalMethod.get(className) + 1);
                } else {
                    classInternalMethod.put(className, 1);
                }
            }
        }

        //remove 0-method referenced class
        Iterator<String> iterator = classExternalMethod.keySet().iterator();
        while (iterator.hasNext()) {
            if (classExternalMethod.get(iterator.next()) == 0) {
                iterator.remove();
            }
        }
    }

我们先来大概给 DEX 文件分个层，如下图所示：

header : DEX 文件头，记录了一些当前文件的信息以及其他数据结构在文件中的偏移量
string_ids : 字符串的偏移量
type_ids : 类型信息的偏移量
proto_ids : 方法声明的偏移量
field_ids : 字段信息的偏移量
method_ids : 方法信息（所在类，方法声明以及方法名）的偏移量
class_def : 类信息的偏移量
data : ：数据区
link_data : 静态链接数据区

从 header 到 data 之间都是偏移量数组，并不存储真实数据，所有数据都存在 data 数据区，根据其偏移量区查找。

header

DEX 文件头部分的具体格式可以参考 DexFile.h 中的定义：

struct DexHeader {
    u1  magic[8];           // 魔数
    u4  checksum;           // adler 校验值
    u1  signature[kSHA1DigestLen]; // sha1 校验值
    u4  fileSize;           // DEX 文件大小
    u4  headerSize;         // DEX 文件头大小
    u4  endianTag;          // 字节序
    u4  linkSize;           // 链接段大小
    u4  linkOff;            // 链接段的偏移量
    u4  mapOff;             // DexMapList 偏移量
    u4  stringIdsSize;      // DexStringId 个数
    u4  stringIdsOff;       // DexStringId 偏移量
    u4  typeIdsSize;        // DexTypeId 个数
    u4  typeIdsOff;         // DexTypeId 偏移量
    u4  protoIdsSize;       // DexProtoId 个数
    u4  protoIdsOff;        // DexProtoId 偏移量
    u4  fieldIdsSize;       // DexFieldId 个数
    u4  fieldIdsOff;        // DexFieldId 偏移量
    u4  methodIdsSize;      // DexMethodId 个数
    u4  methodIdsOff;       // DexMethodId 偏移量
    u4  classDefsSize;      // DexCLassDef 个数
    u4  classDefsOff;       // DexClassDef 偏移量
    u4  dataSize;           // 数据段大小
    u4  dataOff;            // 数据段偏移量
};

其中的 u 表示无符号数，u1 就是 8 位无符号数，u4 就是 32 位无符号数。

magic 一般是常量，用来标记 DEX 文件，它可以分解为：

文件标识 dex + 换行符 + DEX 版本 + 0

字符串格式为 dex\n035\0，十六进制为 0x6465780A30333500。 checksum 是对去除 magic 、 checksum 以外的文件部分作 alder32 算法得到的校验值，用于判断 DEX 文件是否被篡改。 signature 是对除去 magic 、 checksum 、 signature 以外的文件部分作 sha1 得到的文件哈希值。 endianTag 用于标记 DEX 文件是大端表示还是小端表示。由于 DEX 文件是运行在 Android 系统中的，所以一般都是小端表示，这个值也是恒定值 0x12345678。其余部分分别标记了 DEX 文件中其他各个数据结构的个数和其在数据区的偏移量。根据偏移量我们就可以轻松的获得各个数据结构的内容。下面顺着上面的 DEX 文件结构来认识第一个数据结构 string_ids。

string_ids

struct DexStringId {
    u4 stringDataOff;
};

string_ids 是一个偏移量数组，stringDataOff 表示每个字符串在 data 区的偏移量。根据偏移量在 data 区拿到的数据中，第一个字节表示的是字符串长度，后面跟着的才是字符串数据。这块逻辑比较简单，直接看一下代码：

private void parseDexString() {
    log("\nparse DexString");
    try {
        int stringIdsSize = dex.getDexHeader().string_ids__size;
        for (int i = 0; i < stringIdsSize; i++) {
            int string_data_off = reader.readInt();
            byte size = dexData[string_data_off]; // 第一个字节表示该字符串的长度，之后是字符串内容
            String string_data = new String(Utils.copy(dexData, string_data_off + 1, size));
            DexString string = new DexString(string_data_off, string_data);
            dexStrings.add(string);
            log("string[%d] data: %s", i, string.string_data);
        }
    } catch (IOException e) {
        e.printStackTrace();
    }
}

其中包含了变量名，方法名，文件名等等，这个字符串池在后面其他结构的解析中也会经常遇到。

type_ids

struct DexTypeId {
    u4  descriptorIdx;
};

type_ids 表示的是类型信息，descriptorIdx 指向 string_ids 中元素。根据索引直接在上一步读取到的字符串池即可解析对应的类型信息，代码如下：

private void parseDexType() {
    log("\nparse DexTypeId");
    try {
        int typeIdsSize = dex.getDexHeader().type_ids__size;
        for (int i = 0; i < typeIdsSize; i++) {
            int descriptor_idx = reader.readInt();
            DexTypeId dexTypeId = new DexTypeId(descriptor_idx, dexStringIds.get(descriptor_idx).string_data);
            dexTypeIds.add(dexTypeId);
            log("type[%d] data: %s", i, dexTypeId.string_data);
        }
    } catch (IOException e) {
        e.printStackTrace();
    }
}

proto_ids

struct DexProtoId {
    u4  shortyIdx;          /* index into stringIds for shorty descriptor */
    u4  returnTypeIdx;      /* index into typeIds list for return type */
    u4  parametersOff;      /* file offset to type_list for parameter types */
};

proto_ids 表示方法声明信息，它包含以下三个变量：

shortyIdx : 指向 string_ids ，表示方法声明的字符串
returnTypeIdx : 指向 type_ids ，表示方法的返回类型
parametersOff ：方法参数列表的偏移量

方法参数列表的数据结构在 DexFile.h 中用 DexTypeList 来表示：

struct DexTypeList {
    u4  size;               /* #of entries in list */
    DexTypeItem list[1];    /* entries */
};

struct DexTypeItem {
    u2  typeIdx;            /* index into typeIds */
};

size 表示方法参数的个数，参数用 DexTypeItem 表示，它只有一个属性 typeIdx，指向 type_ids 中对应项。具体的解析代码如下：

private void parseDexProto() {
    log("\nparse DexProto");
    try {
        int protoIdsSize = dex.getDexHeader().proto_ids__size;
        for (int i = 0; i < protoIdsSize; i++) {
            int shorty_idx = reader.readInt();
            int return_type_idx = reader.readInt();
            int parameters_off = reader.readInt();

            DexProtoId dexProtoId = new DexProtoId(shorty_idx, return_type_idx, parameters_off);
            log("proto[%d]: %s %s %d", i, dexStringIds.get(shorty_idx).string_data,
                    dexTypeIds.get(return_type_idx).string_data, parameters_off);

            if (parameters_off > 0) {
                parseDexProtoParameters(parameters_off);
            }

            dexProtos.add(dexProtoId);
        }
    } catch (IOException e) {
        e.printStackTrace();
    }
}

field_ids

struct DexFieldId {
    u2  classIdx;           /* index into typeIds list for defining class */
    u2  typeIdx;            /* index into typeIds for field type */
    u4  nameIdx;            /* index into stringIds for field name */
};

field_ids 表示的是字段信息，指明了字段所在的类，字段的类型以及字段名称，在 DexFile.h 中定义为 DexFieldId , 其各个字段含义如下：

classIdx : 指向 type_ids ，表示字段所在类的信息
typeIdx : 指向 ype_ids ，表示字段的类型信息
nameIdx : 指向 string_ids ，表示字段名称

代码解析很简单，就不贴出来了，直接看一下解析结果:

parse DexField
field[0]: LHello;->HELLO_WORLD;Ljava/lang/String;
field[1]: Ljava/lang/System;->out;Ljava/io/PrintStream;

method_ids

struct DexMethodId {
    u2  classIdx;           /* index into typeIds list for defining class */
    u2  protoIdx;           /* index into protoIds for method prototype */
    u4  nameIdx;            /* index into stringIds for method name */
};

method_ids 指明了方法所在的类、方法声明以及方法名。在 DexFile.h 中用 DexMethodId 表示该项，其属性含义如下：

classIdx : 指向 type_ids ，表示类的类型
protoIdx : 指向 type_ids ，表示方法声明
nameIdx : 指向 string_ids ，表示方法名

class_def

struct DexClassDef {
    u4  classIdx;           /* index into typeIds for this class */
    u4  accessFlags;
    u4  superclassIdx;      /* index into typeIds for superclass */
    u4  interfacesOff;      /* file offset to DexTypeList */
    u4  sourceFileIdx;      /* index into stringIds for source file name */
    u4  annotationsOff;     /* file offset to annotations_directory_item */
    u4  classDataOff;       /* file offset to class_data_item */
    u4  staticValuesOff;    /* file offset to DexEncodedArray */
};

class_def 是 DEX 文件结构中最复杂也是最核心的部分，它表示了类的所有信息，对应 DexFile.h 中的 DexClassDef :

classIdx : 指向 type_ids ，表示类信息
accessFlags : 访问标识符
superclassIdx : 指向 type_ids ，表示父类信息
interfacesOff : 指向 DexTypeList 的偏移量，表示接口信息
sourceFileIdx : 指向 string_ids ，表示源文件名称
annotationOff : 注解信息
classDataOff : 指向 DexClassData 的偏移量，表示类的数据部分
staticValueOff :指向 DexEncodedArray 的偏移量，表示类的静态数据

DefCLassData

重点是 classDataOff 这个字段，它包含了一个类的核心数据，在 Android 源码中定义为 DexClassData ，它不在 DexFile.h 中了，而是在 DexClass.h 中：

struct DexClassData {
    DexClassDataHeader header;
    DexField*          staticFields;
    DexField*          instanceFields;
    DexMethod*         directMethods;
    DexMethod*         virtualMethods;
};

DexClassDataHeader 定义了类中字段和方法的数目，它也定义在 DexClass.h 中：

struct DexClassDataHeader {
    u4 staticFieldsSize;
    u4 instanceFieldsSize;
    u4 directMethodsSize;
    u4 virtualMethodsSize;
};

staticFieldsSize : 静态字段个数
instanceFieldsSize : 实例字段个数
directMethodsSize : 直接方法个数
virtualMethodsSize : 虚方法个数

在读取的时候要注意这里的数据是 LEB128 类型。它是一种可变长度类型，每个 LEB128 由 1~5 个字节组成，每个字节只有 7 个有效位。如果第一个字节的最高位为 1，表示需要继续使用第 2 个字节，如果第二个字节最高位为 1，表示需要继续使用第三个字节，依此类推，直到最后一个字节的最高位为 0，至多 5 个字节。除了 LEB128 以外，还有无符号类型 ULEB128。

那么为什么要使用这种数据结构呢？我们都知道 Java 中 int 类型都是 4 字节，32 位的，但是很多时候根本用不到 4 个字节，用这种可变长度的结构，可以节省空间。对于运行在 Android 系统上来说，能多省一点空间肯定是好的。下面给出了 Java 读取 ULEB128 的代码：

public static int readUnsignedLeb128(byte[] src, int offset) {
    int result = 0;
    int count = 0;
    int cur;
    do {
        cur = copy(src, offset, 1)[0];
        cur &= 0xff;
        result |= (cur & 0x7f) << count * 7;
        count++;
        offset++;
        DexParser.POSITION++;
    } while ((cur & 0x80) == 128 && count < 5);
    return result;
}

继续回到 DexClassData 中来。header 部分定义了各种字段和方法的个数，后面跟着的分别就是静态字段、实例字段、直接方法、虚方法的具体数据了。字段用 DexField 表示，方法用 DexMethod 表示。

DexField

struct DexField {
    u4 fieldIdx;    /* index to a field_id_item */
    u4 accessFlags;
};

fieldIdx : 指向 field_ids ，表示字段信息
accessFlags ：访问标识符

DexMethod

struct DexMethod {
    u4 methodIdx;    /* index to a method_id_item */
    u4 accessFlags;
    u4 codeOff;      /* file offset to a code_item */
46};

method_idx 是指向 method_ids 的索引，表示方法信息。accessFlags 是该方法的访问标识符。codeOff 是结构体 DexCode 的偏移量。如果你坚持看到了这里，是不是发现说到现在还没说到最重要的东西，DEX 包含的代码，或者说指令，对应的就是 Hello.java 中的 main 方法。没错，DexCode 就是用来存储方法的详细信息以及其中的指令的。

struct DexCode {
    u2  registersSize;  // 寄存器个数
    u2  insSize;        // 参数的个数
    u2  outsSize;       // 调用其他方法时使用的寄存器个数
    u2  triesSize;      // try/catch 语句个数
    u4  debugInfoOff;   // debug 信息的偏移量
    u4  insnsSize;      // 指令集的个数
    u2  insns[1];       // 指令集
    /* followed by optional u2 padding */  // 2 字节，用于对齐
    /* followed by try_item[triesSize] */
    /* followed by uleb128 handlersSize */
    /* followed by catch_handler_item[handlersSize] */
};

性能优化之matrix学习-Apk Canary

APK Canary

设计模式

UnZipTask

ManifestAnalyzeTask

AndroidManifest解析

ManifestAnalyzeTask

Header

String Chunk

ResourceId Chunk

XmlContent Chunk

Start Namespace Chunk

End Namespace Chunk

Start Tag Chunk

End Tag Chunk

resources.arsc解析

ResTableHeader

ResStringPool

ResTablePackage

ResTablePackageHeader

typeStrings

keyStrings

ResTableTypeSpec

ResTableType

ShowFileSizeTask

MethodCountTask

header

string_ids

type_ids

proto_ids

field_ids

method_ids

class_def

DefCLassData

DexField

DexMethod