大文件上传
上传大文件时的痛点
- 上传时间比较久
- 中间一旦出错,需要重新上传
- 一般服务需要对文件的大小进行限制
针对这些问题的解决办法:分片上传
读取文件
import logo from './logo.svg';
import './App.css';
import Upload from './components/Upload';
function App() {
const handleFileChange = (e: Event) => {
console.log('File changed:');
console.log(e.target.files);
}
return (
<div className="App">
<Upload />
<input type="file" onChange={handleFileChange} />
</div>
);
}
export default App;
文件分片
核心是使用Blob对象的slice方法,上一步获取到选择的文件是一个File对象,他是继承于Blob,所以可以使用slice方法对文件进行分割
用法Blob:slice() 方法 - Web API | MDN
slice()
slice(start)
slice(start, end)
slice(start, end, contentType)
// 分片大小
const CHUNK_SIZE = 1024 * 1024; // 1MB
// 文件分片
const createFileChunk = (file: File) => {
const chunks: Blob[] = [];
let cur = 0;
while (cur < file.size) {
const chunk = file.slice(cur, cur + CHUNK_SIZE);
chunks.push(chunk);
cur += CHUNK_SIZE;
}
return chunks;
}
Hash计算
// 计算hash
const calculateHash = (chunks: Blob[]) => {
return new Promise(resolve => {
// 1. 第一个和最后一个切片全部参与计算
// 2. 中间切片之计算前两个字节,后两个字节,和中间两个字节
// 所有参与计算的切片
const targetChunks: Blob[] = [];
const spark = new SparkMD5.ArrayBuffer();
const fileReader = new FileReader();
chunks.forEach((chunk, index) => {
if (index === 0 || index === chunks.length - 1) {
// 第一个和最后一个切片全部参与计算
targetChunks.push(chunk);
} else {
// 中间切片之计算前两个字节,后两个字节,和中间两个字节
const firstChunks = chunk.slice(0, 2);
const middleChunks = chunk.slice(CHUNK_SIZE / 2, CHUNK_SIZE / 2 + 2);
const lastChunks = chunk.slice(CHUNK_SIZE - 2, CHUNK_SIZE);
targetChunks.push(firstChunks, middleChunks, lastChunks);
}
});
fileReader.readAsArrayBuffer(new Blob(targetChunks));
fileReader.onload = (e) => {
spark.append(e.target?.result as ArrayBuffer);
// console.log('hash:' ,spark.end());
resolve(spark.end());
}
})
};
const hash = await calculateHash(chunks);
console.log(hash);
文件上传
如果上传一个1G的文件,加入每一个分片的大小为1M,那么总的分片大小是1024个分片,浏览器肯定处理不了,因为切片文件太多,浏览器一次创建了太多的请求,这是没有必要的,拿chrome来说,默认的请求并发数为6,过多的请求并部队提升上传速度,反而给浏览器带来了巨大的负担,因此,有必要限制前端请求个数。
如何解决?
创建最大并发数的请求,比如6个,那么同一时刻我们就允许浏览器只发送6个请求,其中,一个请求有了返回结果,就发起下一个新的请求,以此类推,直至所有的请求发送完毕。
上传文件时一般还要用到FormData对象,需要把传递的文件还有额外信息放到这个FormData对象里面。
前端实现
// 上传切片
const uploadChunks = async (chunks: Blob[]) => {
const data = chunks.map((chunk, index) => {
return {
fileName,
chunk,
chunkHash: `${fileHash}-${index}`,
fileHash
}
})
const formData = data.map((item) => {
const _formData = new FormData();
_formData.append("fileName", item.fileName);
_formData.append("chunk", item.chunk);
_formData.append("chunkHash", item.chunkHash);
_formData.append("fileHash", item.fileHash);
return _formData;
})
const MAX_CONCURRENT = 6; // 最大并发数
let index = 0;
const taskPool = [];
while(index < formData.length) {
const task= fetch('/upload', {
method: 'POST',
body: formData[index]
})
const _targetTask = taskPool.findIndex(t => t === task);
taskPool.splice( _targetTask)
taskPool.push(task);
if(taskPool.length === MAX_CONCURRENT) {
await Promise.race(taskPool);
}
index++;
}
// 最后不足MAX_CONCURRENT的任务
await Promise.all(taskPool);
}
服务端实现
后端在处理文件时需要用到multiparty这个工具,multiparty是一个专门用于Node.js的、专门处理HTTP请求中multiparty/form-data格式数据的库。
后端在处理每个上传的分片时候,应该先将它们临时存放到服务器的一个地方,方便合并的时候再去读取,为了区分不同文件的分片,需要用到文件对应的hash为文件夹命名,将这个文件的所有分片存放到这个文件夹中。
const express = require('express');
const cors = require('cors');
const multiparty = require('multiparty');
const path = require('path');
const fse = require('fs-extra');
const app = express();
app.use(cors());
const UPLOAD_DIR = path.resolve(__dirname, 'uploads');
app.post(`/upload`, (req, res) => {
const form = new multiparty.Form();
form.parse(req, async (err, fields, files) => {
if(err) {
return res.status(500).send('Error parsing form data');
}
console.log('Fields:', fields);
console.log('Files:', files);
// 存放临时目录
const fileHash = fields['fileHash'][0];
const chunkHash = fields['chunkHash'][0];
if(!fse.existsSync(UPLOAD_DIR)) {
await fse.mkdir(UPLOAD_DIR)
}
const chunkPath = path.resolve(UPLOAD_DIR, fileHash);
if(!fse.existsSync(chunkPath)) {
await fse.mkdir(chunkPath)
}
const oldPath = files.chunk[0].path
console.log('Old Path:', oldPath);
await fse.move(oldPath, path.resolve(chunkPath, chunkHash), { overwrite: true })
res.status(200).send('Files uploaded successfully');
})
})
app.listen(3000, () => {
console.log('Server is running on http://localhost:3000');
})
合并文件
把所有切片都上传到服务器之后,要将所有的切片合并成一个完整的文件。
前端实现
前端需要向服务器发送一个合并请求,为了区分要合并的文件,需要将文件的hash值给传过去。
// 合并请求
const mergeRequest = () => {
fetch("http://localhost:3000/merge", {
method: "POST",
headers: {
"Content-Type": "application/json",
},
body: JSON.stringify({
fileName: fileName.current,
fileHash: fileHash.current,
size: CHUNK_SIZE,
}),
}).then(() => {
alert("合并成功");
});
};
服务端实现
app.post("/merge", async (req, res) => {
const { fileName, fileHash, size } = req.body;
// 保存切片的目录
const chunkDir = path.resolve(UPLOAD_DIR, fileHash);
// 如果切片的目录不存在,直接返回错误信息
if (!fse.existsSync(chunkDir)) {
return res.status(401).json({
code: 401,
message: "合并失败,请重新上传",
});
}
// 如果合并的文件已经存在,不需要进行合并
const filePath = path.resolve(UPLOAD_DIR, fileHash + fileExtension(fileName));
if (fse.existsSync(filePath)) {
res.status(200).json({
code: 200,
message: "合并成功",
});
return;
}
// 如果文件不存在,进行合并
const chunkPaths = await fse.readdir(chunkDir);
// 根据切片的下标进行排序
chunkPaths.sort((a, b) => {
return a.split("-")[1] - b.split("-")[1];
});
const list = chunkPaths.map((chunkName, index) => {
return new Promise((resolve) => {
const chunkPath = path.resolve(chunkDir, chunkName);
const readStream = fse.createReadStream(chunkPath);
const writeStream = fse.createWriteStream(filePath, {
start: index * size,
end: (index + 1) * size,
});
readStream.pipe(writeStream);
readStream.on("end", async () => {
await fse.unlink(path.resolve(chunkPath));
resolve();
});
});
});
await Promise.all(list);
await fse.remove(chunkDir);
res.status(200).send("Merge request received");
});
到这里,已经实现了大文件上传的基本功能了,但是没有考虑到如果上传相同文件的情况,而且如果中间网络断了,就要重新上传所有的分片。如果要解决这两个问题,就要使用秒传和断点续传。
秒传&断点续传
如果有相同的文件进行hash计算时,对应的hash值应该是一样的,并且服务端在给上传的文件命名时,就是用对应的hash值命名的,所以在上传之前可以判断有没有这个文件,如果有这个文件,就不用重复上传了,直接告诉用户上传成功,给用户的柑橘和就像是实现了秒传。
前端实现
前端在上传之前,需要将对应文件的hash值告诉服务器,服务端会看有没有对应的文件,如果有,就直接返回,不执行分片上传的操作了。
// 验证文件是否已经上传过
const verify = () => {
return fetch("http://localhost:3000/verify", {
method: "POST",
headers: {
"content-type": "application/json",
},
body: JSON.stringify({
fileHash: fileHash.current,
fileName: fileName.current,
}),
})
.then((res) => res.json())
.then((data) => {
return data;
});
};
服务端实现
app.post("/verify", async (req, res) => {
const { fileHash, fileName } = req.body;
const filePath = path.resolve(UPLOAD_DIR, fileHash + fileExtension(fileName));
if (fse.existsSync(filePath)) {
// 文件已存在,通知前端无需上传
return res.status(200).json({
ok: true,
data: {
shouldUpload: false,
},
});
} else {
// 文件不存在,通知前端需要上传
return res.status(200).json({
ok: true,
data: {
shouldUpload: true,
},
});
}
});
断点续传
完成上面的步骤后,当我们再上传相同的文件,即使改了文件名,也会提示我们秒传成功了,因为服务器已经有那个文件了,这样就解决了重复上传的问题。
如果在上传的过程中,发生网络中断,应该如何解决呢?
如果在断网之前已经上传了一部分分片,在上传之前,只需要拿到这部分分片,然后再过滤掉这些切片,就可以避免重复上传这些分片了,换句话说,只需要上传失败的分片。
前端实现
// 上传切片
const uploadChunks = async (chunks: Blob[], existsChunks: string[]) => {
// ...
// 把服务端已经存在的切片过滤出来,不再上传
const formData = data
.filter((item) => !existsChunks.includes(item.chunkHash))
.map((item) => {
const _formData = new FormData();
_formData.append("fileName", item.fileName);
_formData.append("chunk", item.chunk);
_formData.append("chunkHash", item.chunkHash);
_formData.append("fileHash", item.fileHash);
return _formData;
});
// ...
}
const handleFileChange = async (e: ChangeEvent<HTMLInputElement>) => {
// ...
// 上传分片
uploadChunks(chunks, data.data.existsChunks);
};
服务端实现
app.post("/verify", async (req, res) => {
const { fileHash, fileName } = req.body;
const filePath = path.resolve(UPLOAD_DIR, fileHash + fileExtension(fileName));
// 所有上传成功的切片
const chunkDir = path.resolve(UPLOAD_DIR, fileHash);
let chunkPaths = [];
if (fse.existsSync(chunkDir)) {
chunkPaths = await fse.readdir(chunkDir);
console.log("🚀 ~ chunkPaths:", chunkPaths);
}
if (fse.existsSync(filePath)) {
// 文件已存在,通知前端无需上传
return res.status(200).json({
ok: true,
data: {
shouldUpload: false,
},
});
} else {
// 文件不存在,通知前端需要上传
return res.status(200).json({
ok: true,
data: {
shouldUpload: true,
existsChunks: chunkPaths,
},
});
}
});
可以看到第1个切片已经上传成功了
过滤之后可以看到,只需要把不存在于服务端的切片上传
总代码
前端
import SparkMD5 from "spark-md5";
import { useEffect, useRef } from "react";
import type { ChangeEvent } from "react";
function Upload() {
// 文件名字
const fileName = useRef<string>("");
// 文件Hash
const fileHash = useRef<string>("");
// 分片大小
const CHUNK_SIZE = 1024 * 1024; // 1MB
// 文件分片
const createFileChunk = (file: File) => {
const chunks: Blob[] = [];
let cur = 0;
while (cur < file.size) {
const chunk = file.slice(cur, cur + CHUNK_SIZE);
chunks.push(chunk);
cur += CHUNK_SIZE;
}
return chunks;
};
// 计算hash
const calculateHash = (chunks: Blob[]) => {
return new Promise((resolve) => {
// 1. 第一个和最后一个切片全部参与计算
// 2. 中间切片之计算前两个字节,后两个字节,和中间两个字节
// 所有参与计算的切片
const targetChunks: Blob[] = [];
const spark = new SparkMD5.ArrayBuffer();
const fileReader = new FileReader();
chunks.forEach((chunk, index) => {
if (index === 0 || index === chunks.length - 1) {
// 第一个和最后一个切片全部参与计算
targetChunks.push(chunk);
} else {
// 中间切片之计算前两个字节,后两个字节,和中间两个字节
const firstChunks = chunk.slice(0, 2);
const middleChunks = chunk.slice(CHUNK_SIZE / 2, CHUNK_SIZE / 2 + 2);
const lastChunks = chunk.slice(CHUNK_SIZE - 2, CHUNK_SIZE);
targetChunks.push(firstChunks, middleChunks, lastChunks);
}
});
fileReader.readAsArrayBuffer(new Blob(targetChunks));
fileReader.onload = (e) => {
spark.append(e.target?.result as ArrayBuffer);
// console.log('hash:' ,spark.end());
resolve(spark.end());
};
});
};
// 合并请求
const mergeRequest = () => {
fetch("http://localhost:3000/merge", {
method: "POST",
headers: {
"Content-Type": "application/json",
},
body: JSON.stringify({
fileName: fileName.current,
fileHash: fileHash.current,
size: CHUNK_SIZE,
}),
}).then(() => {
alert("合并成功");
});
};
// 上传切片
const uploadChunks = async (chunks: Blob[], existsChunks: string[]) => {
const data = chunks.map((chunk, index) => {
return {
fileName: fileName.current,
chunk,
chunkHash: `${fileHash.current}-${index}`,
fileHash: fileHash.current,
};
});
const formData = data
.filter((item) => !existsChunks.includes(item.chunkHash))
.map((item) => {
const _formData = new FormData();
_formData.append("fileName", item.fileName);
_formData.append("chunk", item.chunk);
_formData.append("chunkHash", item.chunkHash);
_formData.append("fileHash", item.fileHash);
return _formData;
});
const MAX_CONCURRENT = 6; // 最大并发数
let index = 0;
const taskPool = [];
while (index < formData.length) {
const task = fetch("http://localhost:3000/upload", {
method: "POST",
body: formData[index],
});
const _targetTask = taskPool.findIndex((t) => t === task);
taskPool.splice(_targetTask);
taskPool.push(task);
if (taskPool.length === MAX_CONCURRENT) {
await Promise.race(taskPool);
}
index++;
}
// 最后不足MAX_CONCURRENT的任务
await Promise.all(taskPool);
mergeRequest();
};
// 验证文件是否已经上传过
const verify = () => {
return fetch("http://localhost:3000/verify", {
method: "POST",
headers: {
"content-type": "application/json",
},
body: JSON.stringify({
fileHash: fileHash.current,
fileName: fileName.current,
}),
})
.then((res) => res.json())
.then((data) => {
return data;
});
};
const handleFileChange = async (e: ChangeEvent<HTMLInputElement>) => {
if (!e.target.files || e.target.files.length === 0) return;
// 1. 读取文件
const file = e.target.files[0];
fileName.current = file.name;
// 2. 文件分片
const chunks = createFileChunk(file);
// 3. hash计算
const hash = await calculateHash(chunks);
fileHash.current = hash as string;
// 验证是否已经上传过
const data = await verify();
console.log("data", data);
if (!data.data.shouldUpload) {
alert("该文件已经上传过");
return;
}
// 上传分片
uploadChunks(chunks, data.data.existsChunks);
};
return (
<div>
<input type="file" onChange={handleFileChange} />
</div>
);
}
export default Upload;
服务端
const express = require("express");
const cors = require("cors");
const multiparty = require("multiparty");
const path = require("path");
const fse = require("fs-extra");
const app = express();
app.use(express.json());
app.use(cors());
const UPLOAD_DIR = path.resolve(__dirname, "uploads");
// 分离文件后缀名
const fileExtension = (fileName) => {
return fileName.slice(fileName.lastIndexOf("."));
};
app.post(`/upload`, (req, res) => {
const form = new multiparty.Form();
form.parse(req, async (err, fields, files) => {
if (err) {
return res.status(500).send("Error parsing form data");
}
// 存放临时目录
const fileHash = fields["fileHash"][0];
const chunkHash = fields["chunkHash"][0];
if (!fse.existsSync(UPLOAD_DIR)) {
await fse.mkdir(UPLOAD_DIR);
}
const chunkPath = path.resolve(UPLOAD_DIR, fileHash);
if (!fse.existsSync(chunkPath)) {
await fse.mkdir(chunkPath);
}
const oldPath = files.chunk[0].path;
await fse.move(oldPath, path.resolve(chunkPath, chunkHash), {
overwrite: true,
});
res.status(200).json({
code: 200,
message: "Chunk uploaded successfully",
});
});
});
app.post("/merge", async (req, res) => {
const { fileName, fileHash, size } = req.body;
// 保存切片的目录
const chunkDir = path.resolve(UPLOAD_DIR, fileHash);
// 如果切片的目录不存在,直接返回错误信息
if (!fse.existsSync(chunkDir)) {
return res.status(401).json({
code: 401,
message: "合并失败,请重新上传",
});
}
// 如果合并的文件已经存在,不需要进行合并
const filePath = path.resolve(UPLOAD_DIR, fileHash + fileExtension(fileName));
if (fse.existsSync(filePath)) {
res.status(200).json({
code: 200,
message: "合并成功",
});
return;
}
// 如果文件不存在,进行合并
const chunkPaths = await fse.readdir(chunkDir);
// 根据切片的下标进行排序
chunkPaths.sort((a, b) => {
return a.split("-")[1] - b.split("-")[1];
});
const list = chunkPaths.map((chunkName, index) => {
return new Promise((resolve) => {
const chunkPath = path.resolve(chunkDir, chunkName);
const readStream = fse.createReadStream(chunkPath);
const writeStream = fse.createWriteStream(filePath, {
start: index * size,
end: (index + 1) * size,
});
readStream.pipe(writeStream);
readStream.on("end", async () => {
await fse.unlink(path.resolve(chunkPath));
resolve();
});
});
});
await Promise.all(list);
await fse.remove(chunkDir);
res.status(200).send("Merge request received");
});
app.post("/verify", async (req, res) => {
const { fileHash, fileName } = req.body;
const filePath = path.resolve(UPLOAD_DIR, fileHash + fileExtension(fileName));
// 所有上传成功的切片
const chunkDir = path.resolve(UPLOAD_DIR, fileHash);
let chunkPaths = [];
if (fse.existsSync(chunkDir)) {
chunkPaths = await fse.readdir(chunkDir);
console.log("🚀 ~ chunkPaths:", chunkPaths);
}
if (fse.existsSync(filePath)) {
// 文件已存在,通知前端无需上传
return res.status(200).json({
ok: true,
data: {
shouldUpload: false,
},
});
} else {
// 文件不存在,通知前端需要上传
return res.status(200).json({
ok: true,
data: {
shouldUpload: true,
existsChunks: chunkPaths,
},
});
}
});
app.listen(3000, () => {
console.log("Server is running on http://localhost:3000");
});