415. Java 文件操作基础 - 精准读取压缩诗集:从二进制文件中高效提取指定十四行诗
📚 读取单个 Sonnet
上节课我们已经把所有 154 首 Sonnet 压缩存储在一个二进制文件里。 这节课我们来看看:如何只读取其中的一首(比如第 75 首)。
🎯 核心思路
读取逻辑其实比写入简单:
- 读取文件头:获取总数、offset 和 length。
- 定位目标 Sonnet:根据 offset 跳过前面的字节。
- 读取压缩字节数组。
- 解压缩 + 解码成文本。
✅ 示例代码(读取文件头)
Path path = Paths.get("files/sonnets.bin");
try (InputStream file = Files.newInputStream(path);
BufferedInputStream bis = new BufferedInputStream(file);
DataInputStream dis = new DataInputStream(file)) {
int numberOfSonnets = dis.readInt();
System.out.println("numberOfSonnets = " + numberOfSonnets);
List<Integer> offsets = new ArrayList<>();
List<Integer> lengths = new ArrayList<>();
for (int i = 0; i < numberOfSonnets; i++) {
offsets.add(dis.readInt());
lengths.add(dis.readInt());
}
// 此时已经拿到 offsets 和 lengths
} catch (IOException e) {
e.printStackTrace();
}
👉 运行后,你会得到类似输出:
numberOfSonnets = 154
并且 offsets、lengths 数组里保存了每首 Sonnet 的位置信息。
🔍 跳过和读取字节的工具方法
⚠️ 注意:
skip(n)和read(n)在流上操作时,可能不会一次完成任务(尤其是大文件)。- 所以我们要写工具方法,确保真的跳过/读取了指定字节数。
跳过固定字节数
static long skip(BufferedInputStream bis, int offset) throws IOException {
long skipped = 0L;
while (skipped < offset) {
skipped += bis.skip(offset - skipped);
}
return skipped;
}
读取固定字节数
static byte[] readBytes(BufferedInputStream bis, int length) throws IOException {
byte[] bytes = new byte[length];
int copied = 0;
while (copied < length) {
int read = bis.read(bytes, copied, length - copied);
if (read == -1) throw new EOFException("Unexpected end of file");
copied += read;
}
return bytes;
}
💡 我稍微优化了原代码:不再额外使用中间 buffer,直接往目标数组里读,逻辑更直观。
✅ 示例代码(读取第 75 首 Sonnet)
int sonnetIndex = 75; // 要读取的 Sonnet
int offset = offsets.get(sonnetIndex - 1);
int length = lengths.get(sonnetIndex - 1);
skip(bis, offset); // 定位到 offset
byte[] bytes = readBytes(bis, length); // 读取压缩字节数组
try (ByteArrayInputStream bais = new ByteArrayInputStream(bytes);
GZIPInputStream gzis = new GZIPInputStream(bais);
InputStreamReader isr = new InputStreamReader(gzis);
BufferedReader reader = new BufferedReader(isr)) {
List<String> sonnetLines = reader.lines().toList();
sonnetLines.forEach(System.out::println);
}
🚀 输出结果
运行后,你会在控制台看到 第 75 首 Sonnet(解压缩后的文本):
So are you to my thoughts as food to life,
Or as sweet-season’d showers are to the ground;
And for the peace of you I hold such strife
As ’twixt a miser and his wealth is found.
Now proud as an enjoyer, and anon
Doubting the filching age will steal his treasure;
Now counting best to be with you alone,
Then better’d that the world may see my pleasure:
Sometime all full with feasting on your sight,
And by and by clean starved for a look;
Possessing or pursuing no delight,
Save what is had, or must from you be took.
Thus do I pine and surfeit day by day,
Or gluttoning on all, or all away.
📌 总结
- 读取文件头 → 拿到目录表(offset & length)
- 用 skip() 精确跳过字节
- 用 readBytes() 保证完整读取
- 解压缩 + 转换为字符串行
- 最后打印目标 Sonnet
这就是从压缩二进制文件里 精确定位并读取单个 Sonnet 的完整流程。
要不要我帮你把 读取单个 Sonnet 的逻辑,封装成一个 SonnetFileReader 工具类?这样学员就可以像调用 API 一样使用,比如:
SonnetFileReader sfr = new SonnetFileReader("files/sonnets.bin");
List<String> sonnet75 = sfr.readSonnet(75);