流式传输与 JSON 结构化矛盾

90 阅读5分钟

问题背景

JSON 是一种封闭的数据结构,通常以左花括号“{”开头,右花括号“}”结尾。封闭的数据结构,意味着一般情况下,前端对 JSON 的解析必须等 待 JSON 数据全部传输完成,否则会因为 JSON 数据不完整而导致解析报错。

这就产生了一个矛盾

  • 流式传输的优势是边接收边处理,可以快速响应用户
  • 但 JSON 解析要求必须完整才能处理
  • 结果就是:即使我们用流式方式获取数据,也得等所有数据到齐才能解析,流式传输的"快速响应"优势就失效了

解决方案:流式 JSON 解析器

为了解决这个问题,我们在 BFF(Backend For Frontend)层实现了一个流式 JSON 解析器

主要功能

  1. 流式解析:逐字符解析 JSON,就像逐字阅读一样,不需要等待完整输入
  2. 增量输出:解析过程中实时发送结果,通过事件(EventEmitter)监听解析数据,实时发送给前端
    • 例如:解析到 {"user/name": "张三"} 时,立即输出 {"user/name": "张三"}
    • 前端可以立即更新 UI,显示"张三"这个名字
  3. 自动修复:可选功能,能处理常见的格式错误(如缺失引号、转义字符等)
  4. JSONURI 路径:为每个解析出的值提供路径标识
    • 例如:"user/name" 表示这是 user 对象下的 name 属性
    • 这样前端就知道这个值应该放在哪里

工作原理:像阅读一样逐字解析

核心原理:状态机模式

流式 JSON 解析器就像一个智能阅读器,它知道自己在读什么,也知道接下来应该读什么。

1. 状态机:定义"我在读什么"

解析器定义了 11 种状态,就像阅读时的不同阶段:

状态说明例子
Begin开始,还没读到任何内容准备开始
Object正在读一个对象{...}
Array正在读一个数组[...]
Key正在读键名"name"
Value正在读值"张三"123
String正在读字符串值"这是一个字符串"
Number正在读数字值1233.14
Boolean正在读布尔值truefalse
Null正在读 nullnull
Finish完成,整个 JSON 解析完毕全部读完
Breaker分隔符(逗号、冒号等),:

2. 状态栈:管理嵌套结构

想象你在读一本有章节、段落、句子的书:

  • 当你进入一个新章节时,记住"我在第几章"
  • 当你进入一个段落时,记住"我在第几章的哪一段"
  • 当你读完一个段落,回到章节层级
  • 当你读完一个章节,回到书的层级

状态栈就是这样的"记忆系统"

// 示例:解析 {"user": {"name": "张三", "age": 18}}

// 步骤1: 读到 {,进入 Object 状态
stateStack = ['Object'];

// 步骤2: 读到 "user",进入 Key 状态
stateStack = ['Object', 'Key'];

// 步骤3: 读到 :,进入 Value 状态,发现又是一个 {
stateStack = ['Object', 'Value', 'Object'];

// 步骤4: 读到 "name",进入 Key 状态
stateStack = ['Object', 'Value', 'Object', 'Key'];

// 步骤5: 读到 "张三",进入 String 状态,解析完成,输出 {"user/name": "张三"}
// 然后出栈,回到 Object 状态
stateStack = ['Object', 'Value', 'Object'];

// 步骤6: 继续解析 "age": 18...

关键点

  • 遇到 {[ 时:入栈(进入新的层级)
  • 遇到 }] 时:出栈(回到上一层)
  • 支持多层嵌套,就像俄罗斯套娃一样

工作流程示例

让我们用一个具体例子看看解析器是如何工作的:

输入数据流(逐步到达):

1. {"na
2. me": "张
3. 三", "age": 1
4. 8}

解析过程

时刻1: 收到 {"na
  ├─ 状态: Begin → Object
  ├─ 状态栈: ['Object']
  └─ 输出: 无(数据不完整)

时刻2: 收到 me": "张
  ├─ 识别到完整的键: "name"
  ├─ 状态: Key → Value → String
  ├─ 状态栈: ['Object', 'Key', 'Value', 'String']
  └─ 输出: 无(值还没完整)

时刻3: 收到 三", "age": 1
  ├─ 识别到完整的值: "张三"
  ├─ 状态: String → Breaker → Key
  ├─ 状态栈: ['Object', 'Key']
  └─ 输出: {"name": "张三"}  ✅ 立即发送给前端!

时刻4: 收到 8}
  ├─ 识别到完整的值: 18
  ├─ 状态: Number → Finish
  ├─ 状态栈: []
  └─ 输出: {"age": 18}  ✅ 立即发送给前端!

前端效果

  • 时刻 3:用户立即看到"张三"显示在界面上
  • 时刻 4:用户立即看到"18"显示在界面上
  • 不需要等待整个 JSON 完成!

优势总结

  1. 快速响应:数据边到边解析边显示,用户体验更好
  2. 内存友好:不需要缓存整个 JSON,节省内存
  3. 实时反馈:适合 AI 对话、实时数据展示等场景
  4. 灵活扩展:支持自动修复、自定义路径格式等功能

应用场景

  • AI 对话:AI 生成回复时,可以逐字显示,不用等全部生成完
  • 实时数据监控:数据变化时立即更新,不用等完整数据包
  • 大文件解析:处理大型 JSON 文件时,不需要一次性加载到内存

完整代码

import EventEmitter from 'node:events';

const enum LexerStates {
  Begin = 'Begin',
  Object = 'Object',
  Array = 'Array',
  Key = 'Key',
  Value = 'Value',
  String = 'String',
  Number = 'Number',
  Boolean = 'Boolean',
  Null = 'Null',
  Finish = 'Finish',
  Breaker = 'Breaker',
}

function isNumeric(str: unknown) {
  return (
    !isNaN(str as number) && // use type coercion to parse the _entirety_ of the string (`parseFloat` alone does not do this)...
    !isNaN(parseFloat(str as string))
  ); // ...and ensure strings of whitespace fail
}

function isTrue(str: string) {
  return str === 'true';
}

//判断空白符
function isWhiteSpace(str: string) {
  return /^\s+$/.test(str);
}

function isQuotationMark(str: string) {
  return str === '"' || str === '“' || str === '”' || str === '‘' || str === '’' || str === "'";
}

export class JSONParser extends EventEmitter {
  private content: string[] = [];
  private stateStack: LexerStates[] = [LexerStates.Begin];
  private currentToken = '';
  private keyPath: string[] = [];
  private arrayIndexStack: any[] = [];
  private objectTokenIndexStack: number[] = [];
  private autoFix = false;
  private debug = false;
  private lastPopStateToken: { state: LexerStates; token: string } | null = null;

  constructor(
    options: { autoFix?: boolean; parentPath?: string | null; debug?: boolean } = {
      autoFix: false,
      parentPath: null,
      debug: false,
    }
  ) {
    super();
    this.autoFix = !!options.autoFix;
    this.debug = !!options.debug;
    if (options.parentPath) this.keyPath.push(options.parentPath);
  }

  get currentState() {
    return this.stateStack[this.stateStack.length - 1];
  }

  get lastState() {
    return this.stateStack[this.stateStack.length - 2];
  }

  get arrayIndex() {
    return this.arrayIndexStack[this.arrayIndexStack.length - 1];
  }

  private log(...args: any[]) {
    if (this.debug) {
      console.log(...args, this.content.join(''), this.stateStack.join('->'));
    }
  }

  private pushState(state: LexerStates) {
    this.log('pushState', state);
    this.stateStack.push(state);
    if (state === LexerStates.Array) {
      this.arrayIndexStack.push({ index: 0 });
    }
    if (state === LexerStates.Object || state === LexerStates.Array) {
      this.objectTokenIndexStack.push(this.content.length - 1);
    }
  }

  private popState() {
    this.lastPopStateToken = { state: this.currentState, token: this.currentToken };
    this.currentToken = '';
    const state = this.stateStack.pop();
    this.log('popState', state, this.currentState);
    if (state === LexerStates.Value) {
      this.keyPath.pop();
    }
    if (state === LexerStates.Array) {
      this.arrayIndexStack.pop();
    }
    if (state === LexerStates.Object || state === LexerStates.Array) {
      const idx = this.objectTokenIndexStack.pop();
      if (idx != null && idx >= 0) {
        const obj = JSON.parse(this.content.slice(idx).join(''));
        this.emit('object-resolve', {
          uri: this.keyPath.join('/'),
          delta: obj,
        });
      }
    }
    return state;
  }

  private reduceState() {
    const currentState = this.currentState;
    if (currentState === LexerStates.Breaker) {
      this.popState();
      if (this.currentState === LexerStates.Value) {
        this.popState();
      }
    } else if (currentState === LexerStates.String) {
      const str = this.currentToken;
      this.popState();
      if (this.currentState === LexerStates.Key) {
        this.keyPath.push(str);
      } else if (this.currentState === LexerStates.Value) {
        this.emit('string-resolve', {
          uri: this.keyPath.join('/'),
          delta: JSON.parse(`["${str}"]`)[0],
        });
        // this.popState();
      }
    } else if (currentState === LexerStates.Number) {
      const num = Number(this.currentToken);
      this.popState();
      if (this.currentState === LexerStates.Value) {
        // ...
        this.emit('data', {
          uri: this.keyPath.join('/'), // JSONURI https://github.com/aligay/jsonuri
          delta: num,
        });
        this.popState();
      }
    } else if (currentState === LexerStates.Boolean) {
      const str = this.currentToken;
      this.popState();
      if (this.currentState === LexerStates.Value) {
        this.emit('data', {
          uri: this.keyPath.join('/'),
          delta: isTrue(str),
        });
        this.popState();
      }
    } else if (currentState === LexerStates.Null) {
      this.popState();
      if (this.currentState === LexerStates.Value) {
        this.emit('data', {
          uri: this.keyPath.join('/'),
          delta: null,
        });
        this.popState();
      }
    } else if (currentState === LexerStates.Array || currentState === LexerStates.Object) {
      this.popState();
      if (this.currentState === LexerStates.Begin) {
        this.popState();
        this.pushState(LexerStates.Finish);
        const data = new Function(`return ${this.content.join('')}`)();
        this.emit('finish', data);
      } else if (this.currentState === LexerStates.Value) {
        // this.popState();
        this.pushState(LexerStates.Breaker);
      }
    } else {
      this.traceError(this.content.join(''));
    }
  }

  private traceError(input: string) {
    // console.error('Invalid Token', input);
    this.content.pop();
    if (this.autoFix) {
      if (this.currentState === LexerStates.Begin || this.currentState === LexerStates.Finish) {
        return;
      }
      if (this.currentState === LexerStates.Breaker) {
        if (this.lastPopStateToken?.state === LexerStates.String) {
          // 修复 token 引号转义
          const lastPopStateToken = this.lastPopStateToken.token;
          this.stateStack[this.stateStack.length - 1] = LexerStates.String;
          this.currentToken = lastPopStateToken || '';
          let traceToken = '';
          for (let i = this.content.length - 1; i >= 0; i--) {
            if (this.content[i].trim()) {
              this.content.pop();
              traceToken = '\\"' + traceToken;
              break;
            }
            traceToken = this.content.pop() + traceToken;
          }
          this.trace(traceToken + input);
          return;
        }
      }
      if (this.currentState === LexerStates.String) {
        // 回车的转义
        if (input === '\n') {
          if (this.lastState === LexerStates.Value) {
            const currentToken = this.currentToken.trimEnd();
            if (
              currentToken.endsWith(',') ||
              currentToken.endsWith(']') ||
              currentToken.endsWith('}')
            ) {
              // 这种情况下是丢失了最后一个引号
              for (let i = this.content.length - 1; i >= 0; i--) {
                if (this.content[i].trim()) {
                  break;
                }
                this.content.pop();
              }
              const token = this.content.pop() as string;
              // console.log('retrace -> ', '"' + token + input);
              this.trace('"' + token + input);
              // 这种情况下多发送(emit)出去了一个特殊字符,前端需要修复,发送一个消息让前端能够修复
              this.emit('data', {
                uri: this.keyPath.join('/'),
                delta: '',
                error: {
                  token,
                },
              });
            } else {
              // this.currentToken += '\\n';
              // this.content.push('\\n');
              this.trace('\\n');
            }
          }
          return;
        }
      }
      if (this.currentState === LexerStates.Key) {
        if (input !== '"') {
          // 处理多余的左引号 eg. {""name": "bearbobo"}
          if (this.lastPopStateToken?.token === '') {
            this.content.pop();
            this.content.push(input);
            this.pushState(LexerStates.String);
          }
        }
        // key 的引号后面还有多余内容,忽略掉
        return;
      }

      if (this.currentState === LexerStates.Value) {
        if (input === ',' || input === '}' || input === ']') {
          // value 丢失了
          this.pushState(LexerStates.Null);
          this.currentToken = '';
          this.content.push('null');
          this.reduceState();
          if (input !== ',') {
            this.trace(input);
          } else {
            this.content.push(input);
          }
        } else {
          // 字符串少了左引号
          this.pushState(LexerStates.String);
          this.currentToken = '';
          this.content.push('"');
          // 不处理 Value 的引号情况,因为前端修复更简单
          // if(!isQuotationMark(input)) {
          this.trace(input);
          // }
        }
        return;
      }

      if (this.currentState === LexerStates.Object) {
        // 直接缺少了 key
        if (input === ':') {
          this.pushState(LexerStates.Key);
          this.pushState(LexerStates.String);
          this.currentToken = '';
          this.content.push('"');
          this.trace(input);
          return;
        }
        // 一般是key少了左引号
        this.pushState(LexerStates.Key);
        this.pushState(LexerStates.String);
        this.currentToken = '';
        this.content.push('"');
        if (!isQuotationMark(input)) {
          // 单引号和中文引号
          this.trace(input);
        }
        return;
      }

      if (
        this.currentState === LexerStates.Number ||
        this.currentState === LexerStates.Boolean ||
        this.currentState === LexerStates.Null
      ) {
        // number, boolean 和 null 失败
        const currentToken = this.currentToken;
        this.stateStack.pop();
        this.currentToken = '';
        // this.currentToken = '';
        for (let i = 0; i < [...currentToken].length; i++) {
          this.content.pop();
        }
        // console.log('retrace', '"' + this.currentToken + input);

        this.trace('"' + currentToken + input);
        return;
      }
    }
    // console.log('Invalid Token', input, this.currentToken, this.currentState, this.lastState, this.lastPopStateToken);
    throw new Error('Invalid Token');
  }

  private traceBegin(input: string) {
    // TODO: 目前只简单处理了对象和数组的情况,对于其他类型的合法JSON处理需要补充
    if (input === '{') {
      this.pushState(LexerStates.Object);
    } else if (input === '[') {
      this.pushState(LexerStates.Array);
    } else {
      this.traceError(input);
      return; // recover
    }
  }

  isBeforeStart() {
    return this.currentState === LexerStates.Begin;
  }

  isAfterEnd() {
    return this.currentState === LexerStates.Finish;
  }

  private traceObject(input: string) {
    // this.currentToken = '';
    if (isWhiteSpace(input) || input === ',') {
      return;
    }
    if (input === '"') {
      this.pushState(LexerStates.Key);
      this.pushState(LexerStates.String);
    } else if (input === '}') {
      this.reduceState();
    } else {
      this.traceError(input);
    }
  }

  private traceArray(input: string) {
    if (isWhiteSpace(input)) {
      return;
    }
    if (input === '"') {
      this.keyPath.push((this.arrayIndex.index++).toString());
      this.pushState(LexerStates.Value);
      this.pushState(LexerStates.String);
    } else if (input === '.' || input === '-' || isNumeric(input)) {
      this.keyPath.push((this.arrayIndex.index++).toString());
      this.currentToken += input;
      this.pushState(LexerStates.Value);
      this.pushState(LexerStates.Number);
    } else if (input === 't' || input === 'f') {
      this.keyPath.push((this.arrayIndex.index++).toString());
      this.currentToken += input;
      this.pushState(LexerStates.Value);
      this.pushState(LexerStates.Boolean);
    } else if (input === 'n') {
      this.keyPath.push((this.arrayIndex.index++).toString());
      this.currentToken += input;
      this.pushState(LexerStates.Value);
      this.pushState(LexerStates.Null);
    } else if (input === '{') {
      this.keyPath.push((this.arrayIndex.index++).toString());
      this.pushState(LexerStates.Value);
      this.pushState(LexerStates.Object);
    } else if (input === '[') {
      this.keyPath.push((this.arrayIndex.index++).toString());
      this.pushState(LexerStates.Value);
      this.pushState(LexerStates.Array);
    } else if (input === ']') {
      this.reduceState();
    }
  }

  private traceString(input: string) {
    if (input === '\n') {
      this.traceError(input);
      return;
    }
    const currentToken = this.currentToken.replace(/\\\\/g, ''); // 去掉转义的反斜杠
    if (input === '"' && currentToken[this.currentToken.length - 1] !== '\\') {
      // 字符串结束符
      const lastState = this.lastState;
      this.reduceState();
      if (lastState === LexerStates.Value) {
        this.pushState(LexerStates.Breaker);
      }
    } else if (
      this.autoFix &&
      input === ':' &&
      currentToken[this.currentToken.length - 1] !== '\\' &&
      this.lastState === LexerStates.Key
    ) {
      // 默认这种情况下少了右引号,补一个
      this.content.pop();
      for (let i = this.content.length - 1; i >= 0; i--) {
        if (this.content[i].trim()) {
          break;
        }
        this.content.pop();
      }
      this.trace('":');
    } else if (
      this.autoFix &&
      isQuotationMark(input) &&
      input !== '"' &&
      this.lastState === LexerStates.Key
    ) {
      // 处理 key 中的中文引号和单引号
      this.content.pop();
      return;
    } else {
      if (this.lastState === LexerStates.Value) {
        if (input !== '\\' && this.currentToken[this.currentToken.length - 1] !== '\\') {
          // 如果不是反斜杠,且不构成转义符,则发送出去
          this.emit('data', {
            uri: this.keyPath.join('/'),
            delta: input,
          });
        } else if (this.currentToken[this.currentToken.length - 1] === '\\') {
          // 如果不是反斜杠,且可能构成转义,需要判断前面的\\的奇偶性
          let count = 0;
          for (let i = this.currentToken.length - 1; i >= 0; i--) {
            if (this.currentToken[i] === '\\') {
              count++;
            } else {
              break;
            }
          }
          if (count % 2) {
            // 奇数个反斜杠,构成转义
            this.emit('data', {
              uri: this.keyPath.join('/'),
              delta: JSON.parse(`["\\${input}"]`)[0],
            });
          }
        }
      }
      this.currentToken += input;
    }
  }

  private traceKey(input: string) {
    if (isWhiteSpace(input)) {
      this.content.pop();
      return;
    }
    if (input === ':') {
      this.popState();
      this.pushState(LexerStates.Value);
    } else {
      this.traceError(input);
    }
  }

  private traceValue(input: string) {
    if (isWhiteSpace(input)) {
      return;
    }
    if (input === '"') {
      this.pushState(LexerStates.String);
    } else if (input === '{') {
      this.pushState(LexerStates.Object);
    } else if (input === '.' || input === '-' || isNumeric(input)) {
      this.currentToken += input;
      this.pushState(LexerStates.Number);
    } else if (input === 't' || input === 'f') {
      this.currentToken += input;
      this.pushState(LexerStates.Boolean);
    } else if (input === 'n') {
      this.currentToken += input;
      this.pushState(LexerStates.Null);
    } else if (input === '[') {
      this.pushState(LexerStates.Array);
    } else {
      this.traceError(input);
    }
  }

  private traceNumber(input: string) {
    if (isWhiteSpace(input)) {
      return;
    }
    if (isNumeric(this.currentToken + input)) {
      this.currentToken += input;
      return;
    }
    if (input === ',') {
      this.reduceState();
    } else if (input === '}' || input === ']') {
      this.reduceState();
      this.content.pop();
      this.trace(input);
    } else {
      this.traceError(input);
    }
  }

  private traceBoolean(input: string) {
    if (isWhiteSpace(input)) {
      return;
    }

    if (input === ',') {
      if (this.currentToken === 'true' || this.currentToken === 'false') {
        this.reduceState();
      } else {
        this.traceError(input);
      }
      return;
    }

    if (input === '}' || input === ']') {
      if (this.currentToken === 'true' || this.currentToken === 'false') {
        this.reduceState();
        this.content.pop();
        this.trace(input);
      } else {
        this.traceError(input);
      }
      return;
    }

    if (
      'true'.startsWith(this.currentToken + input) ||
      'false'.startsWith(this.currentToken + input)
    ) {
      this.currentToken += input;
      return;
    }

    this.traceError(input);
  }

  private traceNull(input: string) {
    if (isWhiteSpace(input)) {
      return;
    }

    if (input === ',') {
      if (this.currentToken === 'null') {
        this.reduceState();
      } else {
        this.traceError(input);
      }
      return;
    }

    if (input === '}' || input === ']') {
      this.reduceState();
      this.content.pop();
      this.trace(input);
      return;
    }

    if ('null'.startsWith(this.currentToken + input)) {
      this.currentToken += input;
      return;
    }

    this.traceError(input);
  }

  private traceBreaker(input: string) {
    if (isWhiteSpace(input)) {
      return;
    }
    if (input === ',') {
      this.reduceState();
    } else if (input === '}' || input === ']') {
      this.reduceState();
      this.content.pop();
      this.trace(input);
    } else {
      this.traceError(input);
    }
  }

  public finish() {
    // 结束解析
    if (this.currentState !== LexerStates.Finish) {
      throw new Error(`Parser not finished: ${this.currentState} | ${this.content.join('')}`);
    }
  }

  public trace(input: string) {
    const currentState = this.currentState;
    this.log('trace', JSON.stringify(input), currentState, JSON.stringify(this.currentToken));

    const inputArray = [...input];
    if (inputArray.length > 1) {
      inputArray.forEach(char => {
        this.trace(char);
      });
      return;
    }

    this.content.push(input);
    if (currentState === LexerStates.Begin) {
      this.traceBegin(input);
    } else if (currentState === LexerStates.Object) {
      this.traceObject(input);
    } else if (currentState === LexerStates.String) {
      this.traceString(input);
    } else if (currentState === LexerStates.Key) {
      this.traceKey(input);
    } else if (currentState === LexerStates.Value) {
      this.traceValue(input);
    } else if (currentState === LexerStates.Number) {
      this.traceNumber(input);
    } else if (currentState === LexerStates.Boolean) {
      this.traceBoolean(input);
    } else if (currentState === LexerStates.Null) {
      this.traceNull(input);
    } else if (currentState === LexerStates.Array) {
      this.traceArray(input);
    } else if (currentState === LexerStates.Breaker) {
      this.traceBreaker(input);
    } else if (!isWhiteSpace(input)) {
      this.traceError(input);
    }
  }
}