问题背景
JSON 是一种封闭的数据结构,通常以左花括号“{”开头,右花括号“}”结尾。封闭的数据结构,意味着一般情况下,前端对 JSON 的解析必须等 待 JSON 数据全部传输完成,否则会因为 JSON 数据不完整而导致解析报错。
这就产生了一个矛盾:
- 流式传输的优势是边接收边处理,可以快速响应用户
- 但 JSON 解析要求必须完整才能处理
- 结果就是:即使我们用流式方式获取数据,也得等所有数据到齐才能解析,流式传输的"快速响应"优势就失效了
解决方案:流式 JSON 解析器
为了解决这个问题,我们在 BFF(Backend For Frontend)层实现了一个流式 JSON 解析器。
主要功能
- 流式解析:逐字符解析 JSON,就像逐字阅读一样,不需要等待完整输入
- 增量输出:解析过程中实时发送结果,通过事件(EventEmitter)监听解析数据,实时发送给前端
- 例如:解析到
{"user/name": "张三"}时,立即输出{"user/name": "张三"} - 前端可以立即更新 UI,显示"张三"这个名字
- 例如:解析到
- 自动修复:可选功能,能处理常见的格式错误(如缺失引号、转义字符等)
- JSONURI 路径:为每个解析出的值提供路径标识
- 例如:
"user/name"表示这是user对象下的name属性 - 这样前端就知道这个值应该放在哪里
- 例如:
工作原理:像阅读一样逐字解析
核心原理:状态机模式
流式 JSON 解析器就像一个智能阅读器,它知道自己在读什么,也知道接下来应该读什么。
1. 状态机:定义"我在读什么"
解析器定义了 11 种状态,就像阅读时的不同阶段:
| 状态 | 说明 | 例子 |
|---|---|---|
| Begin | 开始,还没读到任何内容 | 准备开始 |
| Object | 正在读一个对象 | {...} |
| Array | 正在读一个数组 | [...] |
| Key | 正在读键名 | "name" |
| Value | 正在读值 | "张三" 或 123 |
| String | 正在读字符串值 | "这是一个字符串" |
| Number | 正在读数字值 | 123 或 3.14 |
| Boolean | 正在读布尔值 | true 或 false |
| Null | 正在读 null | null |
| Finish | 完成,整个 JSON 解析完毕 | 全部读完 |
| Breaker | 分隔符(逗号、冒号等) | , 或 : |
2. 状态栈:管理嵌套结构
想象你在读一本有章节、段落、句子的书:
- 当你进入一个新章节时,记住"我在第几章"
- 当你进入一个段落时,记住"我在第几章的哪一段"
- 当你读完一个段落,回到章节层级
- 当你读完一个章节,回到书的层级
状态栈就是这样的"记忆系统":
// 示例:解析 {"user": {"name": "张三", "age": 18}}
// 步骤1: 读到 {,进入 Object 状态
stateStack = ['Object'];
// 步骤2: 读到 "user",进入 Key 状态
stateStack = ['Object', 'Key'];
// 步骤3: 读到 :,进入 Value 状态,发现又是一个 {
stateStack = ['Object', 'Value', 'Object'];
// 步骤4: 读到 "name",进入 Key 状态
stateStack = ['Object', 'Value', 'Object', 'Key'];
// 步骤5: 读到 "张三",进入 String 状态,解析完成,输出 {"user/name": "张三"}
// 然后出栈,回到 Object 状态
stateStack = ['Object', 'Value', 'Object'];
// 步骤6: 继续解析 "age": 18...
关键点:
- 遇到
{或[时:入栈(进入新的层级) - 遇到
}或]时:出栈(回到上一层) - 支持多层嵌套,就像俄罗斯套娃一样
工作流程示例
让我们用一个具体例子看看解析器是如何工作的:
输入数据流(逐步到达):
1. {"na
2. me": "张
3. 三", "age": 1
4. 8}
解析过程:
时刻1: 收到 {"na
├─ 状态: Begin → Object
├─ 状态栈: ['Object']
└─ 输出: 无(数据不完整)
时刻2: 收到 me": "张
├─ 识别到完整的键: "name"
├─ 状态: Key → Value → String
├─ 状态栈: ['Object', 'Key', 'Value', 'String']
└─ 输出: 无(值还没完整)
时刻3: 收到 三", "age": 1
├─ 识别到完整的值: "张三"
├─ 状态: String → Breaker → Key
├─ 状态栈: ['Object', 'Key']
└─ 输出: {"name": "张三"} ✅ 立即发送给前端!
时刻4: 收到 8}
├─ 识别到完整的值: 18
├─ 状态: Number → Finish
├─ 状态栈: []
└─ 输出: {"age": 18} ✅ 立即发送给前端!
前端效果:
- 时刻 3:用户立即看到"张三"显示在界面上
- 时刻 4:用户立即看到"18"显示在界面上
- 不需要等待整个 JSON 完成!
优势总结
- 快速响应:数据边到边解析边显示,用户体验更好
- 内存友好:不需要缓存整个 JSON,节省内存
- 实时反馈:适合 AI 对话、实时数据展示等场景
- 灵活扩展:支持自动修复、自定义路径格式等功能
应用场景
- AI 对话:AI 生成回复时,可以逐字显示,不用等全部生成完
- 实时数据监控:数据变化时立即更新,不用等完整数据包
- 大文件解析:处理大型 JSON 文件时,不需要一次性加载到内存
完整代码
import EventEmitter from 'node:events';
const enum LexerStates {
Begin = 'Begin',
Object = 'Object',
Array = 'Array',
Key = 'Key',
Value = 'Value',
String = 'String',
Number = 'Number',
Boolean = 'Boolean',
Null = 'Null',
Finish = 'Finish',
Breaker = 'Breaker',
}
function isNumeric(str: unknown) {
return (
!isNaN(str as number) && // use type coercion to parse the _entirety_ of the string (`parseFloat` alone does not do this)...
!isNaN(parseFloat(str as string))
); // ...and ensure strings of whitespace fail
}
function isTrue(str: string) {
return str === 'true';
}
//判断空白符
function isWhiteSpace(str: string) {
return /^\s+$/.test(str);
}
function isQuotationMark(str: string) {
return str === '"' || str === '“' || str === '”' || str === '‘' || str === '’' || str === "'";
}
export class JSONParser extends EventEmitter {
private content: string[] = [];
private stateStack: LexerStates[] = [LexerStates.Begin];
private currentToken = '';
private keyPath: string[] = [];
private arrayIndexStack: any[] = [];
private objectTokenIndexStack: number[] = [];
private autoFix = false;
private debug = false;
private lastPopStateToken: { state: LexerStates; token: string } | null = null;
constructor(
options: { autoFix?: boolean; parentPath?: string | null; debug?: boolean } = {
autoFix: false,
parentPath: null,
debug: false,
}
) {
super();
this.autoFix = !!options.autoFix;
this.debug = !!options.debug;
if (options.parentPath) this.keyPath.push(options.parentPath);
}
get currentState() {
return this.stateStack[this.stateStack.length - 1];
}
get lastState() {
return this.stateStack[this.stateStack.length - 2];
}
get arrayIndex() {
return this.arrayIndexStack[this.arrayIndexStack.length - 1];
}
private log(...args: any[]) {
if (this.debug) {
console.log(...args, this.content.join(''), this.stateStack.join('->'));
}
}
private pushState(state: LexerStates) {
this.log('pushState', state);
this.stateStack.push(state);
if (state === LexerStates.Array) {
this.arrayIndexStack.push({ index: 0 });
}
if (state === LexerStates.Object || state === LexerStates.Array) {
this.objectTokenIndexStack.push(this.content.length - 1);
}
}
private popState() {
this.lastPopStateToken = { state: this.currentState, token: this.currentToken };
this.currentToken = '';
const state = this.stateStack.pop();
this.log('popState', state, this.currentState);
if (state === LexerStates.Value) {
this.keyPath.pop();
}
if (state === LexerStates.Array) {
this.arrayIndexStack.pop();
}
if (state === LexerStates.Object || state === LexerStates.Array) {
const idx = this.objectTokenIndexStack.pop();
if (idx != null && idx >= 0) {
const obj = JSON.parse(this.content.slice(idx).join(''));
this.emit('object-resolve', {
uri: this.keyPath.join('/'),
delta: obj,
});
}
}
return state;
}
private reduceState() {
const currentState = this.currentState;
if (currentState === LexerStates.Breaker) {
this.popState();
if (this.currentState === LexerStates.Value) {
this.popState();
}
} else if (currentState === LexerStates.String) {
const str = this.currentToken;
this.popState();
if (this.currentState === LexerStates.Key) {
this.keyPath.push(str);
} else if (this.currentState === LexerStates.Value) {
this.emit('string-resolve', {
uri: this.keyPath.join('/'),
delta: JSON.parse(`["${str}"]`)[0],
});
// this.popState();
}
} else if (currentState === LexerStates.Number) {
const num = Number(this.currentToken);
this.popState();
if (this.currentState === LexerStates.Value) {
// ...
this.emit('data', {
uri: this.keyPath.join('/'), // JSONURI https://github.com/aligay/jsonuri
delta: num,
});
this.popState();
}
} else if (currentState === LexerStates.Boolean) {
const str = this.currentToken;
this.popState();
if (this.currentState === LexerStates.Value) {
this.emit('data', {
uri: this.keyPath.join('/'),
delta: isTrue(str),
});
this.popState();
}
} else if (currentState === LexerStates.Null) {
this.popState();
if (this.currentState === LexerStates.Value) {
this.emit('data', {
uri: this.keyPath.join('/'),
delta: null,
});
this.popState();
}
} else if (currentState === LexerStates.Array || currentState === LexerStates.Object) {
this.popState();
if (this.currentState === LexerStates.Begin) {
this.popState();
this.pushState(LexerStates.Finish);
const data = new Function(`return ${this.content.join('')}`)();
this.emit('finish', data);
} else if (this.currentState === LexerStates.Value) {
// this.popState();
this.pushState(LexerStates.Breaker);
}
} else {
this.traceError(this.content.join(''));
}
}
private traceError(input: string) {
// console.error('Invalid Token', input);
this.content.pop();
if (this.autoFix) {
if (this.currentState === LexerStates.Begin || this.currentState === LexerStates.Finish) {
return;
}
if (this.currentState === LexerStates.Breaker) {
if (this.lastPopStateToken?.state === LexerStates.String) {
// 修复 token 引号转义
const lastPopStateToken = this.lastPopStateToken.token;
this.stateStack[this.stateStack.length - 1] = LexerStates.String;
this.currentToken = lastPopStateToken || '';
let traceToken = '';
for (let i = this.content.length - 1; i >= 0; i--) {
if (this.content[i].trim()) {
this.content.pop();
traceToken = '\\"' + traceToken;
break;
}
traceToken = this.content.pop() + traceToken;
}
this.trace(traceToken + input);
return;
}
}
if (this.currentState === LexerStates.String) {
// 回车的转义
if (input === '\n') {
if (this.lastState === LexerStates.Value) {
const currentToken = this.currentToken.trimEnd();
if (
currentToken.endsWith(',') ||
currentToken.endsWith(']') ||
currentToken.endsWith('}')
) {
// 这种情况下是丢失了最后一个引号
for (let i = this.content.length - 1; i >= 0; i--) {
if (this.content[i].trim()) {
break;
}
this.content.pop();
}
const token = this.content.pop() as string;
// console.log('retrace -> ', '"' + token + input);
this.trace('"' + token + input);
// 这种情况下多发送(emit)出去了一个特殊字符,前端需要修复,发送一个消息让前端能够修复
this.emit('data', {
uri: this.keyPath.join('/'),
delta: '',
error: {
token,
},
});
} else {
// this.currentToken += '\\n';
// this.content.push('\\n');
this.trace('\\n');
}
}
return;
}
}
if (this.currentState === LexerStates.Key) {
if (input !== '"') {
// 处理多余的左引号 eg. {""name": "bearbobo"}
if (this.lastPopStateToken?.token === '') {
this.content.pop();
this.content.push(input);
this.pushState(LexerStates.String);
}
}
// key 的引号后面还有多余内容,忽略掉
return;
}
if (this.currentState === LexerStates.Value) {
if (input === ',' || input === '}' || input === ']') {
// value 丢失了
this.pushState(LexerStates.Null);
this.currentToken = '';
this.content.push('null');
this.reduceState();
if (input !== ',') {
this.trace(input);
} else {
this.content.push(input);
}
} else {
// 字符串少了左引号
this.pushState(LexerStates.String);
this.currentToken = '';
this.content.push('"');
// 不处理 Value 的引号情况,因为前端修复更简单
// if(!isQuotationMark(input)) {
this.trace(input);
// }
}
return;
}
if (this.currentState === LexerStates.Object) {
// 直接缺少了 key
if (input === ':') {
this.pushState(LexerStates.Key);
this.pushState(LexerStates.String);
this.currentToken = '';
this.content.push('"');
this.trace(input);
return;
}
// 一般是key少了左引号
this.pushState(LexerStates.Key);
this.pushState(LexerStates.String);
this.currentToken = '';
this.content.push('"');
if (!isQuotationMark(input)) {
// 单引号和中文引号
this.trace(input);
}
return;
}
if (
this.currentState === LexerStates.Number ||
this.currentState === LexerStates.Boolean ||
this.currentState === LexerStates.Null
) {
// number, boolean 和 null 失败
const currentToken = this.currentToken;
this.stateStack.pop();
this.currentToken = '';
// this.currentToken = '';
for (let i = 0; i < [...currentToken].length; i++) {
this.content.pop();
}
// console.log('retrace', '"' + this.currentToken + input);
this.trace('"' + currentToken + input);
return;
}
}
// console.log('Invalid Token', input, this.currentToken, this.currentState, this.lastState, this.lastPopStateToken);
throw new Error('Invalid Token');
}
private traceBegin(input: string) {
// TODO: 目前只简单处理了对象和数组的情况,对于其他类型的合法JSON处理需要补充
if (input === '{') {
this.pushState(LexerStates.Object);
} else if (input === '[') {
this.pushState(LexerStates.Array);
} else {
this.traceError(input);
return; // recover
}
}
isBeforeStart() {
return this.currentState === LexerStates.Begin;
}
isAfterEnd() {
return this.currentState === LexerStates.Finish;
}
private traceObject(input: string) {
// this.currentToken = '';
if (isWhiteSpace(input) || input === ',') {
return;
}
if (input === '"') {
this.pushState(LexerStates.Key);
this.pushState(LexerStates.String);
} else if (input === '}') {
this.reduceState();
} else {
this.traceError(input);
}
}
private traceArray(input: string) {
if (isWhiteSpace(input)) {
return;
}
if (input === '"') {
this.keyPath.push((this.arrayIndex.index++).toString());
this.pushState(LexerStates.Value);
this.pushState(LexerStates.String);
} else if (input === '.' || input === '-' || isNumeric(input)) {
this.keyPath.push((this.arrayIndex.index++).toString());
this.currentToken += input;
this.pushState(LexerStates.Value);
this.pushState(LexerStates.Number);
} else if (input === 't' || input === 'f') {
this.keyPath.push((this.arrayIndex.index++).toString());
this.currentToken += input;
this.pushState(LexerStates.Value);
this.pushState(LexerStates.Boolean);
} else if (input === 'n') {
this.keyPath.push((this.arrayIndex.index++).toString());
this.currentToken += input;
this.pushState(LexerStates.Value);
this.pushState(LexerStates.Null);
} else if (input === '{') {
this.keyPath.push((this.arrayIndex.index++).toString());
this.pushState(LexerStates.Value);
this.pushState(LexerStates.Object);
} else if (input === '[') {
this.keyPath.push((this.arrayIndex.index++).toString());
this.pushState(LexerStates.Value);
this.pushState(LexerStates.Array);
} else if (input === ']') {
this.reduceState();
}
}
private traceString(input: string) {
if (input === '\n') {
this.traceError(input);
return;
}
const currentToken = this.currentToken.replace(/\\\\/g, ''); // 去掉转义的反斜杠
if (input === '"' && currentToken[this.currentToken.length - 1] !== '\\') {
// 字符串结束符
const lastState = this.lastState;
this.reduceState();
if (lastState === LexerStates.Value) {
this.pushState(LexerStates.Breaker);
}
} else if (
this.autoFix &&
input === ':' &&
currentToken[this.currentToken.length - 1] !== '\\' &&
this.lastState === LexerStates.Key
) {
// 默认这种情况下少了右引号,补一个
this.content.pop();
for (let i = this.content.length - 1; i >= 0; i--) {
if (this.content[i].trim()) {
break;
}
this.content.pop();
}
this.trace('":');
} else if (
this.autoFix &&
isQuotationMark(input) &&
input !== '"' &&
this.lastState === LexerStates.Key
) {
// 处理 key 中的中文引号和单引号
this.content.pop();
return;
} else {
if (this.lastState === LexerStates.Value) {
if (input !== '\\' && this.currentToken[this.currentToken.length - 1] !== '\\') {
// 如果不是反斜杠,且不构成转义符,则发送出去
this.emit('data', {
uri: this.keyPath.join('/'),
delta: input,
});
} else if (this.currentToken[this.currentToken.length - 1] === '\\') {
// 如果不是反斜杠,且可能构成转义,需要判断前面的\\的奇偶性
let count = 0;
for (let i = this.currentToken.length - 1; i >= 0; i--) {
if (this.currentToken[i] === '\\') {
count++;
} else {
break;
}
}
if (count % 2) {
// 奇数个反斜杠,构成转义
this.emit('data', {
uri: this.keyPath.join('/'),
delta: JSON.parse(`["\\${input}"]`)[0],
});
}
}
}
this.currentToken += input;
}
}
private traceKey(input: string) {
if (isWhiteSpace(input)) {
this.content.pop();
return;
}
if (input === ':') {
this.popState();
this.pushState(LexerStates.Value);
} else {
this.traceError(input);
}
}
private traceValue(input: string) {
if (isWhiteSpace(input)) {
return;
}
if (input === '"') {
this.pushState(LexerStates.String);
} else if (input === '{') {
this.pushState(LexerStates.Object);
} else if (input === '.' || input === '-' || isNumeric(input)) {
this.currentToken += input;
this.pushState(LexerStates.Number);
} else if (input === 't' || input === 'f') {
this.currentToken += input;
this.pushState(LexerStates.Boolean);
} else if (input === 'n') {
this.currentToken += input;
this.pushState(LexerStates.Null);
} else if (input === '[') {
this.pushState(LexerStates.Array);
} else {
this.traceError(input);
}
}
private traceNumber(input: string) {
if (isWhiteSpace(input)) {
return;
}
if (isNumeric(this.currentToken + input)) {
this.currentToken += input;
return;
}
if (input === ',') {
this.reduceState();
} else if (input === '}' || input === ']') {
this.reduceState();
this.content.pop();
this.trace(input);
} else {
this.traceError(input);
}
}
private traceBoolean(input: string) {
if (isWhiteSpace(input)) {
return;
}
if (input === ',') {
if (this.currentToken === 'true' || this.currentToken === 'false') {
this.reduceState();
} else {
this.traceError(input);
}
return;
}
if (input === '}' || input === ']') {
if (this.currentToken === 'true' || this.currentToken === 'false') {
this.reduceState();
this.content.pop();
this.trace(input);
} else {
this.traceError(input);
}
return;
}
if (
'true'.startsWith(this.currentToken + input) ||
'false'.startsWith(this.currentToken + input)
) {
this.currentToken += input;
return;
}
this.traceError(input);
}
private traceNull(input: string) {
if (isWhiteSpace(input)) {
return;
}
if (input === ',') {
if (this.currentToken === 'null') {
this.reduceState();
} else {
this.traceError(input);
}
return;
}
if (input === '}' || input === ']') {
this.reduceState();
this.content.pop();
this.trace(input);
return;
}
if ('null'.startsWith(this.currentToken + input)) {
this.currentToken += input;
return;
}
this.traceError(input);
}
private traceBreaker(input: string) {
if (isWhiteSpace(input)) {
return;
}
if (input === ',') {
this.reduceState();
} else if (input === '}' || input === ']') {
this.reduceState();
this.content.pop();
this.trace(input);
} else {
this.traceError(input);
}
}
public finish() {
// 结束解析
if (this.currentState !== LexerStates.Finish) {
throw new Error(`Parser not finished: ${this.currentState} | ${this.content.join('')}`);
}
}
public trace(input: string) {
const currentState = this.currentState;
this.log('trace', JSON.stringify(input), currentState, JSON.stringify(this.currentToken));
const inputArray = [...input];
if (inputArray.length > 1) {
inputArray.forEach(char => {
this.trace(char);
});
return;
}
this.content.push(input);
if (currentState === LexerStates.Begin) {
this.traceBegin(input);
} else if (currentState === LexerStates.Object) {
this.traceObject(input);
} else if (currentState === LexerStates.String) {
this.traceString(input);
} else if (currentState === LexerStates.Key) {
this.traceKey(input);
} else if (currentState === LexerStates.Value) {
this.traceValue(input);
} else if (currentState === LexerStates.Number) {
this.traceNumber(input);
} else if (currentState === LexerStates.Boolean) {
this.traceBoolean(input);
} else if (currentState === LexerStates.Null) {
this.traceNull(input);
} else if (currentState === LexerStates.Array) {
this.traceArray(input);
} else if (currentState === LexerStates.Breaker) {
this.traceBreaker(input);
} else if (!isWhiteSpace(input)) {
this.traceError(input);
}
}
}