复盘修fastjson的bug学习笔记

近日看到B站up"硬核空间JAVA"在线修fastjson 的Bug，感到收益匪浅，所以自己按照他的思路重新也修一遍bug

（已提供可复现的源码，求fix）fastjson无法反序列化超出某种限制的类 · Issue #2779 · alibaba/fastjson

大家可以围观一下这个issuses,然后我们来复盘一下整个过程。

因为bug已经修复，我们可以下载完代码切到bug修复之前

然后我们按照描述去复现这个bug,但是代码中用了lombok 插件，不想下载该插件的话我们可以用set()、get() 方法代替，然后运行代码，果然出现了所描述的错误（下图）

看到这个错误，我们首先按照控制台的错误提示找到ASMDeserializerFactory这个类的83行，断点打在这一行。然后debug,运行如下：

现在我们可以看下这个类的字节码，这里用到了我们不经常使用Evaluate,在上图中选中code右击，然后点击Evaluate Expression

然后，我们需要将code 写成class 文件，为了操作方便我们选择google的Files去写

然后去执行，操作完毕后，我们去找到bad.class。idea中关于读字节码的插件有很多，像ASM等，这里我们要介绍另外一种好用的工具classpy,该作者写过《自己动手写Java虚拟机》。我们按照classpy的操作，打开bad.class文件（如下）

到了这一步，我们仍然没有任何头绪。

此时我们回到开始报错的时候，发现“Illegal target of jump or branch”这个错误仍然不是很明显，所以我们去下载jdk 源码，看下jdk里是怎么写的。这里我们直接去官方下载即可，我下载的是openJDK9,至于为什么是jdk9？jdk8 可以吗？我们卖个关子。全局搜索后，我们果然找到了两处

现在我们需要改些东西

这样我们可以清楚的定位到什么位置出的错误。

改完代码后，我们需要编译，因为windows上的编译有些麻烦，我又买不起mac,所以只能在centos系统上进行编译。把openjdk 整个文件上传到centos上，然后我们运行这一行命令

yum install libfreetype6-dev libcups2-dev libx11-dev libxext-dev libxrender-dev libxrandr-dev libxtst-dev libxt-dev libasound2-dev libffi-dev autoconf gcc clang libfontconfig1-dev

先把环境准备好，文件上传完毕后，进入openjdk 文件夹

我们可以看到这些文件，然后可以运行 sh configure 命令查看还需要什么软件，直到看到

就可以进行编译啦！因为编译jdk需要上个版本的jdk,原来装了jdk8 ，所以这次选择的是jdk9编译命令 make images,这个时候可以休息一会等待编译完成。然而到了一半出了错： collect2: error: ld terminated with signal 9 [Killed]

原来是swap文件太小了，增大swap文件之后，继续编译。这次很顺利的成功了。

接下来，我们开始用jdk9运行AbcDTO.java 文件，centos上的命令和windows上不同,下面展示的centos的命令：

编译命令

 /root/openjdk/build/linux-x86_64-normal-server-release/jdk/bin/javac -cp fastjson-1.2.47.jar  AbcDTO.java

运行命令

 /root/openjdk/build/linux-x86_64-normal-server-release/jdk/bin/java -cp fastjson-1.2.47.jar: AbcDTO

运行完后，我们可以看到

原来的错误提示变成了这样 “ Illegal target of jump or branch 50 -32405”，此时我们再打开classpy,根据报错信息找到deserialze 这个方法，寻找第50行

这里我们可以看出来是越界了，然后找到这一块代码，ifeq上面的方法是isEnabled,下面的是aload

找到这个后，查看visitJumpInsn这个方法

我们可以查出这个方法是ASM(这是一个 Java 字节码操控框架)的方法，我们去下载源码看下

public void visitJumpInsn(final int opcode, final Label label) {
  lastBytecodeOffset = code.length;
  // Add the instruction to the bytecode of the method.
  // Compute the 'base' opcode, i.e. GOTO or JSR if opcode is GOTO_W or JSR_W, otherwise opcode.
  int baseOpcode =
      opcode >= Constants.GOTO_W ? opcode - Constants.WIDE_JUMP_OPCODE_DELTA : opcode;
  boolean nextInsnIsJumpTarget = false;
  if ((label.flags & Label.FLAG_RESOLVED) != 0
      && label.bytecodeOffset - code.length < Short.MIN_VALUE) {
    // Case of a backward jump with an offset < -32768. In this case we automatically replace GOTO
    // with GOTO_W, JSR with JSR_W and IFxxx <l> with IFNOTxxx <L> GOTO_W <l> L:..., where
    // IFNOTxxx is the "opposite" opcode of IFxxx (e.g. IFNE for IFEQ) and where <L> designates
    // the instruction just after the GOTO_W.
    if (baseOpcode == Opcodes.GOTO) {
      code.putByte(Constants.GOTO_W);
    } else if (baseOpcode == Opcodes.JSR) {
      code.putByte(Constants.JSR_W);
    } else {
      // Put the "opposite" opcode of baseOpcode. This can be done by flipping the least
      // significant bit for IFNULL and IFNONNULL, and similarly for IFEQ ... IF_ACMPEQ (with a
      // pre and post offset by 1). The jump offset is 8 bytes (3 for IFNOTxxx, 5 for GOTO_W).
      code.putByte(baseOpcode >= Opcodes.IFNULL ? baseOpcode ^ 1 : ((baseOpcode + 1) ^ 1) - 1);
      code.putShort(8);
      // Here we could put a GOTO_W in theory, but if ASM specific instructions are used in this
      // method or another one, and if the class has frames, we will need to insert a frame after
      // this GOTO_W during the additional ClassReader -> ClassWriter round trip to remove the ASM
      // specific instructions. To not miss this additional frame, we need to use an ASM_GOTO_W
      // here, which has the unfortunate effect of forcing this additional round trip (which in
      // some case would not have been really necessary, but we can't know this at this point).
      code.putByte(Constants.ASM_GOTO_W);
      hasAsmInstructions = true;
      // The instruction after the GOTO_W becomes the target of the IFNOT instruction.
      nextInsnIsJumpTarget = true;
    }
    label.put(code, code.length - 1, true);
  } else if (baseOpcode != opcode) {
    // Case of a GOTO_W or JSR_W specified by the user (normally ClassReader when used to remove
    // ASM specific instructions). In this case we keep the original instruction.
    code.putByte(opcode);
    label.put(code, code.length - 1, true);
  } else {
    // Case of a jump with an offset >= -32768, or of a jump with an unknown offset. In these
    // cases we store the offset in 2 bytes (which will be increased via a ClassReader ->
    // ClassWriter round trip if it turns out that 2 bytes are not sufficient).
    code.putByte(baseOpcode);
    label.put(code, code.length - 1, false);
  }

  // If needed, update the maximum stack size and number of locals, and stack map frames.
  if (currentBasicBlock != null) {
    Label nextBasicBlock = null;
    if (compute == COMPUTE_ALL_FRAMES) {
      currentBasicBlock.frame.execute(baseOpcode, 0, null, null);
      // Record the fact that 'label' is the target of a jump instruction.
      label.getCanonicalInstance().flags |= Label.FLAG_JUMP_TARGET;
      // Add 'label' as a successor of the current basic block.
      addSuccessorToCurrentBasicBlock(Edge.JUMP, label);
      if (baseOpcode != Opcodes.GOTO) {
        // The next instruction starts a new basic block (except for GOTO: by default the code
        // following a goto is unreachable - unless there is an explicit label for it - and we
        // should not compute stack frame types for its instructions).
        nextBasicBlock = new Label();
      }
    } else if (compute == COMPUTE_INSERTED_FRAMES) {
      currentBasicBlock.frame.execute(baseOpcode, 0, null, null);
    } else if (compute == COMPUTE_MAX_STACK_AND_LOCAL_FROM_FRAMES) {
      // No need to update maxRelativeStackSize (the stack size delta is always negative).
      relativeStackSize += STACK_SIZE_DELTA[baseOpcode];
    } else {
      if (baseOpcode == Opcodes.JSR) {
        // Record the fact that 'label' designates a subroutine, if not already done.
        if ((label.flags & Label.FLAG_SUBROUTINE_START) == 0) {
          label.flags |= Label.FLAG_SUBROUTINE_START;
          hasSubroutines = true;
        }
        currentBasicBlock.flags |= Label.FLAG_SUBROUTINE_CALLER;
        // Note that, by construction in this method, a block which calls a subroutine has at
        // least two successors in the control flow graph: the first one (added below) leads to
        // the instruction after the JSR, while the second one (added here) leads to the JSR
        // target. Note that the first successor is virtual (it does not correspond to a possible
        // execution path): it is only used to compute the successors of the basic blocks ending
        // with a ret, in {@link Label#addSubroutineRetSuccessors}.
        addSuccessorToCurrentBasicBlock(relativeStackSize + 1, label);
        // The instruction after the JSR starts a new basic block.
        nextBasicBlock = new Label();
      } else {
        // No need to update maxRelativeStackSize (the stack size delta is always negative).
        relativeStackSize += STACK_SIZE_DELTA[baseOpcode];
        addSuccessorToCurrentBasicBlock(relativeStackSize, label);
      }
    }
    // If the next instruction starts a new basic block, call visitLabel to add the label of this
    // instruction as a successor of the current block, and to start a new basic block.
    if (nextBasicBlock != null) {
      if (nextInsnIsJumpTarget) {
        nextBasicBlock.flags |= Label.FLAG_JUMP_TARGET;
      }
      visitLabel(nextBasicBlock);
    }
    if (baseOpcode == Opcodes.GOTO) {
      endCurrentBasicBlockWithNoSuccessor();
    }
  }
}

我们看以看出经过几年的发展，asm现在的源码已经增加了很多，只需按照这个思路改下fastjson 源码即可，我试着把fastjson 都换成最新的asm代码，但涉及的有些多，水平不够，还需努力学习。整个过程抽丝剥茧,让我耳目一新，所以记下来。另外ASM是一个优秀的框架，很值得学习。最后fastjson 正在投票，喜欢fastjson的可以投票www.oschina.net/project/top…。

一些知识点介绍的仍不是很仔细，我具体参考的资料如下，大家可以看下

B站视频,全文就是按照这个复现的：www.bilibili.com/video/av773…

字节码: en.wikipedia.org/wiki/Java_b…
linux增大swap空间： www.cnblogs.com/cc11001100/…

java 一些数据类型的范围。