APK逆向 AXML解析失败2：ResChunkHeader大小报错遇到一个之前没见过的axml格式对抗，使用andro

1 问题描述

遇到一个之前没见过的axml格式对抗，使用androguard报错如下

[ERROR   ] androguard.axml: Error parsing resource header: declared header size is smaller than required size of 8! Offset=83208
[ERROR   ] androguard.apk: Error while parsing AndroidManifest.xml - is the file valid?

就是文件中声明的头部大小比最小头部大小要小

不了解axml格式的可以先看别人写的解析：blog.csdn.net/beyond702/a…

2 解析

借用大佬的文件格式图：

所有的块，都以ResChunkHeader开头，这个头的大小最小为8字节（在安卓源码中定义），结构如下，包含了块的类型、头部的大小、整个块大小；头部除了这个8字节文件头，还能写入一些其他信息，所以最小为8字节

struct ResChunk_header
{
    uint_16 type;
    uint_16 headerSize;
    uint_32 size;
}

这里放androguard包解析的代码，unpack那一句解析了头部，可以看出是按照标准解析的。如果解析出来的头部大小小于8，就会报开头的错误。

# This is the minimal size such a header must have. There might be other header data too!
SIZE = 2 + 2 + 4

def __init__(self, buff, expected_type=None):
    """
    :param androguard.core.bytecode.BuffHandle buff: the buffer set to the position where the header starts.
    :param int expected_type: the type of the header which is expected.
    """
    self.start = buff.get_idx()
    # Make sure we do not read over the buffer:
    if buff.size() < self.start + self.SIZE:
        raise ResParserError("Can not read over the buffer size! Offset={}".format(self.start))

    self._type, self._header_size, self._size = unpack('<HHL', buff.read(self.SIZE))

    if expected_type and self._type != expected_type:
        raise ResParserError("Header type is not equal the expected type: Got 0x{:04x}, wanted 0x{:04x}".format(self._type, expected_type))

    # Assert that the read data will fit into the chunk.
    # The total size must be equal or larger than the header size
    if self._header_size < self.SIZE:
        raise ResParserError(
            "declared header size is smaller than required size of {}! Offset={}".format(self.SIZE, self.start))

之后用010打开axml，定位到开头报错的位置83208，转换为16进制为14508

发现这个位置是错误的，并不是ResChunkHeader，可以确定是因为解析程序定位块的位置错误导致问题；之后再从开头往后逐一排查各个块（文件头、字符串块、资源id块、命名空间块....）。

仔细查看字符串块，发现了问题，如下图，字符串池的首地址scStringPollOffset为1420，换成16进制为0x58c，这个地址是相对于字符串块首地址的，字符串块前面还有8个字节的axml文件头，所以字符串池的地址为0x594，与010模板里面解析出的STRING_ITEM的首地址一样；但是字符串池前面的字符串偏移数组scStringOffsets，长度却有0x2970，这个字符串偏移数组已经盖过了STRING_ITEM里面的部分字符串了，很明显scStringOffsets的长度被修改过

再看一段androguard里面的代码，这个是从scStringOffsets开始解析的代码，这里先按照错误的长度（2652）读取字符串偏移，保存到数组。styles这里为0，直接跳过，然后计算字符串池的大小（这个是根据scStringPollOffset计算的，所以是正确的），最后再读取字符串池的数据到缓存。

那么字符串池缓存读取完毕后，此时的文件指针已经到了：8 + 28 + 2652* 4 + (73984-1420) = 83208，与上面报错的地方一致。（8和28分别是axml文件头和字符串块文件头）

        # Next, there is a list of string following.
        # This is only a list of offsets (4 byte each)
        for i in range(self.stringCount):
            self.m_stringOffsets.append(unpack('<I', buff.read(4))[0])

        # And a list of styles
        # again, a list of offsets
        for i in range(self.styleCount):
            self.m_styleOffsets.append(unpack('<I', buff.read(4))[0])

        # FIXME it is probably better to parse n strings and not calculate the size
        size = self.header.size - self.stringsOffset

        # if there are styles as well, we do not want to read them too.
        # Only read them, if no
        if self.stylesOffset != 0 and self.styleCount != 0:
            size = self.stylesOffset - self.stringsOffset

        if (size % 4) != 0:
            log.warning("Size of strings is not aligned by four bytes.")

        self.m_charbuff = buff.read(size)

3 总结

androguard库并没有严格按照标准解析字符串（没有从scStringPollOffset开始解析字符串，而是从scStringOffsets末尾开始解析）导致了解析错误。最后修改源码吧字符串池缓存读取位置改一下，就能正确解析了。于此同时，用jeb打开此问题apk，也会解析不出manifest，不过jeb不是开源的，没法自己解决了。