近日看到B站up"硬核空间JAVA"在线修fastjson 的Bug,感到收益匪浅,所以自己按照他的思路重新也修一遍bug
(已提供可复现的源码,求fix)fastjson无法反序列化超出某种限制的类 · Issue #2779 · alibaba/fastjson大家可以围观一下这个issuses,然后我们来复盘一下整个过程。
因为bug已经修复,我们可以下载完代码切到bug修复之前

然后我们按照描述去复现这个bug,但是代码中用了lombok 插件,不想下载该插件的话我们可以用set()、get() 方法代替 ,然后运行代码,果然出现了所描述的错误(下图)

看到这个错误,我们首先按照控制台的错误提示找到ASMDeserializerFactory这个类的83行,断点打在这一行。然后debug,运行如下:

现在我们可以看下这个类的字节码,这里用到了我们不经常使用Evaluate,在上图中选中code右击,然后点击Evaluate Expression

然后,我们需要将code 写成class 文件,为了操作方便我们选择google的Files去写

然后去执行,操作完毕后,我们去找到bad.class。idea中关于读字节码的插件有很多,像ASM等,这里我们要介绍另外一种好用的工具classpy,该作者写过《自己动手写Java虚拟机》。我们按照classpy的操作,打开bad.class文件(如下)

到了这一步,我们仍然没有任何头绪。
此时我们回到开始报错的时候,发现“Illegal target of jump or branch”这个错误仍然不是很明显,所以我们去下载jdk 源码,看下jdk里是怎么写的。这里我们直接去官方下载即可,我下载的是openJDK9,至于为什么是jdk9?jdk8 可以吗?我们卖个关子。全局搜索后,我们果然找到了两处


现在我们需要改些东西

这样我们可以清楚的定位到什么位置出的错误。
改完代码后,我们需要编译,因为windows上的编译有些麻烦,我又买不起mac,所以只能在centos系统上进行编译。把openjdk 整个文件上传到centos上,然后我们运行这一行命令
yum install libfreetype6-dev libcups2-dev libx11-dev libxext-dev libxrender-dev libxrandr-dev libxtst-dev libxt-dev libasound2-dev libffi-dev autoconf gcc clang libfontconfig1-dev
先把环境准备好,文件上传完毕后,进入openjdk 文件夹

我们可以看到这些文件,然后可以运行 sh configure 命令 查看还需要什么软件,直到看到

就可以进行编译啦!因为编译jdk需要上个版本的jdk,原来装了jdk8 ,所以这次选择的是jdk9编译命令 make images,这个时候可以休息一会等待编译完成。然而到了一半出了错 : collect2: error: ld terminated with signal 9 [Killed]
原来是swap文件太小了,增大swap文件之后,继续编译。这次很顺利的成功了。
接下来,我们开始用jdk9运行AbcDTO.java 文件,centos上的命令和windows上不同,下面展示的centos的命令:
编译命令
/root/openjdk/build/linux-x86_64-normal-server-release/jdk/bin/javac -cp fastjson-1.2.47.jar AbcDTO.java
运行命令
/root/openjdk/build/linux-x86_64-normal-server-release/jdk/bin/java -cp fastjson-1.2.47.jar: AbcDTO
运行完后,我们可以看到

原来的错误提示变成了这样 “ Illegal target of jump or branch 50 -32405”,此时我们再打开classpy,根据报错信息找到deserialze 这个方法,寻找第50行

这里我们可以看出来是越界了,然后找到这一块代码,ifeq上面的方法是isEnabled,下面的是aload

找到这个后,查看visitJumpInsn这个方法

我们可以查出这个方法是ASM(这是一个 Java 字节码操控框架)的方法,我们去下载源码看下
public void visitJumpInsn(final int opcode, final Label label) {
lastBytecodeOffset = code.length;
// Add the instruction to the bytecode of the method.
// Compute the 'base' opcode, i.e. GOTO or JSR if opcode is GOTO_W or JSR_W, otherwise opcode.
int baseOpcode =
opcode >= Constants.GOTO_W ? opcode - Constants.WIDE_JUMP_OPCODE_DELTA : opcode;
boolean nextInsnIsJumpTarget = false;
if ((label.flags & Label.FLAG_RESOLVED) != 0
&& label.bytecodeOffset - code.length < Short.MIN_VALUE) {
// Case of a backward jump with an offset < -32768. In this case we automatically replace GOTO
// with GOTO_W, JSR with JSR_W and IFxxx <l> with IFNOTxxx <L> GOTO_W <l> L:..., where
// IFNOTxxx is the "opposite" opcode of IFxxx (e.g. IFNE for IFEQ) and where <L> designates
// the instruction just after the GOTO_W.
if (baseOpcode == Opcodes.GOTO) {
code.putByte(Constants.GOTO_W);
} else if (baseOpcode == Opcodes.JSR) {
code.putByte(Constants.JSR_W);
} else {
// Put the "opposite" opcode of baseOpcode. This can be done by flipping the least
// significant bit for IFNULL and IFNONNULL, and similarly for IFEQ ... IF_ACMPEQ (with a
// pre and post offset by 1). The jump offset is 8 bytes (3 for IFNOTxxx, 5 for GOTO_W).
code.putByte(baseOpcode >= Opcodes.IFNULL ? baseOpcode ^ 1 : ((baseOpcode + 1) ^ 1) - 1);
code.putShort(8);
// Here we could put a GOTO_W in theory, but if ASM specific instructions are used in this
// method or another one, and if the class has frames, we will need to insert a frame after
// this GOTO_W during the additional ClassReader -> ClassWriter round trip to remove the ASM
// specific instructions. To not miss this additional frame, we need to use an ASM_GOTO_W
// here, which has the unfortunate effect of forcing this additional round trip (which in
// some case would not have been really necessary, but we can't know this at this point).
code.putByte(Constants.ASM_GOTO_W);
hasAsmInstructions = true;
// The instruction after the GOTO_W becomes the target of the IFNOT instruction.
nextInsnIsJumpTarget = true;
}
label.put(code, code.length - 1, true);
} else if (baseOpcode != opcode) {
// Case of a GOTO_W or JSR_W specified by the user (normally ClassReader when used to remove
// ASM specific instructions). In this case we keep the original instruction.
code.putByte(opcode);
label.put(code, code.length - 1, true);
} else {
// Case of a jump with an offset >= -32768, or of a jump with an unknown offset. In these
// cases we store the offset in 2 bytes (which will be increased via a ClassReader ->
// ClassWriter round trip if it turns out that 2 bytes are not sufficient).
code.putByte(baseOpcode);
label.put(code, code.length - 1, false);
}
// If needed, update the maximum stack size and number of locals, and stack map frames.
if (currentBasicBlock != null) {
Label nextBasicBlock = null;
if (compute == COMPUTE_ALL_FRAMES) {
currentBasicBlock.frame.execute(baseOpcode, 0, null, null);
// Record the fact that 'label' is the target of a jump instruction.
label.getCanonicalInstance().flags |= Label.FLAG_JUMP_TARGET;
// Add 'label' as a successor of the current basic block.
addSuccessorToCurrentBasicBlock(Edge.JUMP, label);
if (baseOpcode != Opcodes.GOTO) {
// The next instruction starts a new basic block (except for GOTO: by default the code
// following a goto is unreachable - unless there is an explicit label for it - and we
// should not compute stack frame types for its instructions).
nextBasicBlock = new Label();
}
} else if (compute == COMPUTE_INSERTED_FRAMES) {
currentBasicBlock.frame.execute(baseOpcode, 0, null, null);
} else if (compute == COMPUTE_MAX_STACK_AND_LOCAL_FROM_FRAMES) {
// No need to update maxRelativeStackSize (the stack size delta is always negative).
relativeStackSize += STACK_SIZE_DELTA[baseOpcode];
} else {
if (baseOpcode == Opcodes.JSR) {
// Record the fact that 'label' designates a subroutine, if not already done.
if ((label.flags & Label.FLAG_SUBROUTINE_START) == 0) {
label.flags |= Label.FLAG_SUBROUTINE_START;
hasSubroutines = true;
}
currentBasicBlock.flags |= Label.FLAG_SUBROUTINE_CALLER;
// Note that, by construction in this method, a block which calls a subroutine has at
// least two successors in the control flow graph: the first one (added below) leads to
// the instruction after the JSR, while the second one (added here) leads to the JSR
// target. Note that the first successor is virtual (it does not correspond to a possible
// execution path): it is only used to compute the successors of the basic blocks ending
// with a ret, in {@link Label#addSubroutineRetSuccessors}.
addSuccessorToCurrentBasicBlock(relativeStackSize + 1, label);
// The instruction after the JSR starts a new basic block.
nextBasicBlock = new Label();
} else {
// No need to update maxRelativeStackSize (the stack size delta is always negative).
relativeStackSize += STACK_SIZE_DELTA[baseOpcode];
addSuccessorToCurrentBasicBlock(relativeStackSize, label);
}
}
// If the next instruction starts a new basic block, call visitLabel to add the label of this
// instruction as a successor of the current block, and to start a new basic block.
if (nextBasicBlock != null) {
if (nextInsnIsJumpTarget) {
nextBasicBlock.flags |= Label.FLAG_JUMP_TARGET;
}
visitLabel(nextBasicBlock);
}
if (baseOpcode == Opcodes.GOTO) {
endCurrentBasicBlockWithNoSuccessor();
}
}
}
我们看以看出经过几年的发展,asm现在的源码已经增加了很多,只需按照这个思路改下fastjson 源码即可,我试着把fastjson 都换成最新的asm代码,但涉及的有些多,水平不够,还需努力学习。整个过程抽丝剥茧,让我耳目一新,所以记下来。另外ASM是一个优秀的框架,很值得学习。最后fastjson 正在投票,喜欢fastjson的可以投票www.oschina.net/project/top…。
一些知识点介绍的仍不是很仔细,我具体参考的资料如下,大家可以看下
B站视频,全文就是按照这个复现的 :www.bilibili.com/video/av773…
字节码: en.wikipedia.org/wiki/Java_b…
linux增大swap空间: www.cnblogs.com/cc11001100/…
java 一些数据类型的范围。