深入使用streamSaver做前端多文件大文件ZIP

715 阅读5分钟

前言

众所周知(我听说),streamSaver+fetch做前端大文件的下载以及大文件下载并zip的最佳实践,在实践我遇到了类似的情景,但还有个需求是实时展示每个文件的下载状态,方便某些网络条件下可以手动单次补漏下载某个文件,通过半天的调研和追问chatgpt以及自己的调试,才有了多次版本的迭代,以下代码直接截取关键部分,更多细节可以参考streamSaver样例

版本1 (同步/多文件/状态丢失)

这个版本是同步下载的,一个一个文件遍历过去,但是每个文件的下载状态丢失了,也是streamSaver样例中最能想到的方式,对于不展示每个文件状态的话,基本就满足了

/**
 * 同步下载打包【推荐】
 * @param zipName 压缩包文件名
 * @param files 文件列表,格式:[{"name":"文件名", "url":"文件下载地址"},……]
 */

export const zipFiles = (zipName, files) => {
  console.log("同步下载打包开始时间:" + new Date());
  // 创建压缩文件输出流
  const zipFileOutputStream = streamSaver.createWriteStream(zipName);
  // 创建下载文件流
  // const fileIterator = files.values();
  const fileIterator = files.values();
  const readableZipStream = new ZIP({
    async start(ctrl) {
      const fileInfo = fileIterator.next();
      if (fileInfo.done) {
        //迭代终止
        ctrl.close();
      } else {
        const { name, url } = fileInfo.value;
        return fetch(url).then((res) => {
          ctrl.enqueue({
            name,
            stream: () => res.body,
          });
        });
      }
    },
  });
  if (window.WritableStream && readableZipStream.pipeTo) {
    // 开始下载
    readableZipStream
      .pipeTo(zipFileOutputStream)
      .then(() => console.log("同步下载打包结束时间:" + new Date()));
  }
};

版本2 (同步/多文件/状态同步)

同步方案,由于用了axio会在下载中占用些内存,内存的占用由最大的单体文件决定, 同时满足每个文件的状态管理,总体来说比较折中,可以满足当前需求,也可以继续优化成并行方式来下载,但要预估一下最大文件和会不会超过浏览器接受范围

const axios_download = async file => {
  file.status = 'downloading';
  return new Promise((resolve, reject) => {
    axios({
      method: 'get',
      url,
      responseType: 'blob',
      // 进度管理很方便
      onDownloadProgress(e: AxiosProgressEvent) {
        let progress = 0;
        if (e.total) {
          progress = Math.round((e.loaded * 100) / e.total);
          if (e.loaded === e.total) {
            progress = 100;
          }
          file.progress = progress;
        }
      },
    })
      .then(data => {
        const blob: Blob = data.data as Blob;
        resolve(blob);
        file.status = 'success';
        return blob;
      })
      .catch(error => {
        console.log('error', error);
        reject(error);
        if (file.shouldRetry) {
            console.log('wait_retry');
            file.status = 'wait_retry';
        } else {
            file.status = 'error';
        }
        file.error = error;
        throw new Error(error);
      });
  })
};
const downloadZipMethodAxios = (fileList: fileListState[]) => {
  const filename = `${dayjs(new Date()).format('YYYYMMDDHHmmss')}.zip`;
  const fileStream = StreamSaver.createWriteStream(filename);
  const readableZipStream = new ZIP({
    async start(ctrl) {
      for (let i = 0; i < fileList.length; i++) {
        let file = fileList[i];
        let data = await axios_download(file);
        if (!data) {
          continue;
        }
        let stream = {
          name: file.filename,
          stream: () => data.stream(),
        };
        ctrl.enqueue(stream);
        stream = null;
      }
      ctrl.close();
    },
  });

  if (window.WritableStream && readableZipStream.pipeTo) {
    return readableZipStream.pipeTo(fileStream).then(e => {
      console.log('下载完了');
    });
  } else {
    // less optimized
    const writer = fileStream.getWriter();
    const reader = readableZipStream.getReader();
    const pump = () => reader.read().then(res => (res.done ? writer.close() : writer.write(res.value).then(pump)));

    pump();
  }
};

方案3 (同步/fetch/状态同步)

比较理想的同步方案,全流程都是流式的,稳定待检验,文件状态更新有问题,文件列表少时可以正常更新

const downloadZipMethodFetch = (fileList: fileListState[]) => {
  const filename = `${dayjs(new Date()).format('YYYYMMDDHHmmss')}.zip`;
  const fileStream = StreamSaver.createWriteStream(filename);
  const readableZipStream = new ZIP({
    async start(ctrl) {
      for (let i = 0; i < fileList.length; i++) {
        let file = fileList[i];
        const name = file.filename;
        const resp = await fetch(file.url);
        // const stream = () => resp.body; // 这是可行的,但是不能做文件状态更新
        const stream = () => {
          // 这种stream定义方式,期望是可以完成每个文件的状态更新,但还需要检验
          return createReadableStream(file, resp);
        };
        ctrl.enqueue({ name, stream });
      }
      ctrl.close();
    },
  });

  if (window.WritableStream && readableZipStream.pipeTo) {
    return readableZipStream.pipeTo(fileStream).then(e => {
      console.log('下载完了');
    });
  } else {
    // less optimized
    const writer = fileStream.getWriter();
    const reader = readableZipStream.getReader();
    const pump = () => reader.read().then(res => (res.done ? writer.close() : writer.write(res.value).then(pump)));

    pump();
  }
};

function createReadableStream(file: fileListState, resp: Response) {
  let receivedLength = 0;
  file.status = 'downloading';
  file.progress = 0;
  const contentLength = Number(resp.headers.get('content-length'));
  const reader = resp.body!.getReader();
  // 为了获取下载进度,包装了新的ReadableStream
  const stream = new ReadableStream({
    async pull(ctrl) {
      try {
        while (true) {
          const chunk = await reader.read();
          if (chunk.done) {
            file.status = 'success';
            file.progress = 100;
            ctrl.close();
            break;
          }
          receivedLength += chunk.value.length;
          file.progress = Math.round((receivedLength / contentLength) * 100);
          // 为了获取长度需要读数据,但文件流只能读一次,需要写回外侧的流中,或者尝试使用tee来分裂流
          ctrl.enqueue(chunk.value);
        }
      } catch (error) {
        console.log('🚀 ~ pull ~ error:', error);
      }
    },
  });

  return stream;
}

方案4 (异步/fetch/状态同步)

一般要是文件多并且希望尽快下载完成,服务端压力扛得住的话,可以试试这个方式,思路大概就是下面的样子,做一下任务调度即可

const downloadZipMethodFetchAsync = async (fileList: fileListState[]) => {
  const all = [] as Array<FetchRequest>;
  const filename = `${dayjs(new Date()).format('YYYYMMDDHHmmss')}.zip`;
  const fileStream = StreamSaver.createWriteStream(filename);
  // fileList = fileList.slice(0, 10); // for test
  const readableZipStream = new ZIP({
    async start(ctrl) {
      fileList.forEach(file => {
        const task: FetchRequest = {
          fetch: (): Promise<any> => {
            return taskStream(file, ctrl);
          },
        };
        all.push(task);
      });
      try {
        // 任务的批次运行,限制同时只有5个进行下载
        await handleBatchRequest(all, 5);
        ctrl.close();
      } catch (error) {
        console.log('🚀 ~ pull ~ error:', error);
        state.isDownloading = false;
        throw error;
      }
    },
  });

  if (window.WritableStream && readableZipStream.pipeTo) {
    return readableZipStream.pipeTo(fileStream).then(e => {
      console.log('下载完了');
    });
  } else {
    // less optimized
    const writer = fileStream.getWriter();
    const reader = readableZipStream.getReader();
    const pump = () => reader.read().then(res => (res.done ? writer.close() : writer.write(res.value).then(pump)));

    pump();
  }
};

// 分裂流的写法如下
const taskStream = async (file: fileListState, ctrl: ReadableStreamDefaultController) => {
  if (file.status === 'success' || file.status === 'downloading') {
    return;
  }
  const name = file.filename;
  const resp = await fetch(file.url);
  // 使用tee方法将可读流分成两个可读流
  const [stream1, stream2] = resp.body!.tee();
  const stream = () => stream1; // 这是可行的,但是不能做文件状态更新
  createReadableStreamTee(file, resp, stream2);
  ctrl.enqueue({ name, stream });
};

const createReadableStreamTee = async (file: fileListState, resp: Response, stream: ReadableStream) => {
  let receivedLength = 0;
  file.status = 'downloading';
  file.progress = 0;
  const contentLength = Number(resp.headers.get('content-length'));
  const reader = stream.getReader();
  while (true) {
    const chunk = await reader.read();
    if (chunk.done) {
      file.progress = 100; // 当下载完成时,设置进度为100%
      file.status = 'success';
      break;
    }
    receivedLength += chunk.value.length;
    const progress = Math.round((receivedLength / contentLength) * 100);
    file.progress = progress; // 更新下载进度
  }
};

结束语

实际用下来的话,大部分场景用axios的方案就挺好的了,后面几种都是为了兼顾文件状态同步想出的法子,使用效果就要因地制宜了, 看到这那就真的爱你了,对你有帮助的话,点点赞和收藏吧

后续故事

在第一次写完文章后又遇到了些问题,后来我还是选择了axios+handleBatchRequest方案优化了下载速度,主要是因为批量下载的文件中没有大文件都是些几kb到数百m的文件,浏览器的内存可以扛得住这个范围的波动,另外其实这样的话edge浏览器显示的下载进度和速度会有点问题(并发为1时没这个问题),不过还是正常下载的.

还有就是压缩包超过4g的问题, 如果你深入看了StreamSaver项目一定会发现这里zip64修正,下面有一些方式可以让大文件压缩和解压都不会出现头域错误或者文件损坏,其实就是让zip64成为默认压缩方式而不是和zip32混用.

参考