网页上大文本快速更新的方案探究

209 阅读4分钟

开启掘金成长之旅!这是我参与「掘金日新计划 · 2 月更文挑战」的第 2 天,点击查看活动详情

现在在做页面大文本的增删改查接口,组长布置任务调研一些快速更新或自动保存的方案实现。本人主要针对以下两个方向进行了调研:版本管理软件中的文件更新方式和一些博客文章页面实现定时自动保存的方案。

从git、svn说开去

要说到版本管理,自然会想到git和svn等典型代表。虽说目前经常会用到git,以及在merge的时候通过git diff的命令进行文本比对,但是git和svn对文件版本变化的存储方式是不同的。

  • git只关心文件数据的整体是否有变化,并不保存前后变化的差异。也就是说git diff只是起到了检查文件差异的作用,并没有保存其diff结果,而是全量更新文件内容。如果前后文件内容不变,就添加指针指向该未修改的版本文件。

18333fig0105-tn.png

  • svn是增量式的版本控制,它不会将各个版本的副本都完整的保存下来,而只会记录下版本之间的差异(类似于git diff操作),然后按照顺序更新或者恢复特定版本的数据。这使得服务端的存储量会非常低。

18333fig0104-tn.png

  • git diff的实现算法之一:Myers算法,感兴趣可以去查查原理,这里就不再展开去了解了。主要我们去使用用其原理实现diff的工具包之一java-diff-utils
// 在pom中添加以下依赖
<dependencies>
    <dependency>
        <groupId>io.github.java-diff-utils</groupId>
        <artifactId>java-diff-utils</artifactId>
        <version>4.12</version>
    </dependency>
    <dependency>
        <groupId>commons-io</groupId>
        <artifactId>commons-io</artifactId>
        <version>2.4</version>
    </dependency>
</dependencies>
import com.github.difflib.DiffUtils;
import com.github.difflib.UnifiedDiffUtils;
import com.github.difflib.patch.AbstractDelta;
import com.github.difflib.patch.Patch;
import com.github.difflib.patch.PatchFailedException;
import org.apache.commons.io.FileUtils;

import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.util.List;

/**
 * @author 
 * @description: 测试java-diff-util
 * @date 
 */
public class test_git_diff {

    public static void main(String[] args) throws IOException, PatchFailedException {
        List<String> text1 = Files.readAllLines(new File("D:\\test1.txt").toPath());
        List<String> text2 = Files.readAllLines(new File("D:\\test2.txt").toPath());
        List<String> diff = printDiff(text1, text2);
        mergeText(text1, diff);
    }

    /**
     * 打印两个文件的不同点
     * @param original
     * @param revised
     */
    private static List<String> printDiff(List<String> original, List<String> revised) throws IOException {
        //两文件的不同点
        Patch<String> patch = DiffUtils.diff(original, revised);

        //生成统一的差异格式
        List<String> unifiedDiff = UnifiedDiffUtils.generateUnifiedDiff("test1.txt", "test2.txt", original, patch, 0);

        unifiedDiff.forEach(System.out::println);

        String fileName = "D:\\diff_test1_test2.txt";
        FileUtils.writeLines(new File(fileName), StandardCharsets.UTF_8.name(), unifiedDiff, "\r", false);

        return unifiedDiff;
    }

    /**
     * 将text1和diff进行合并,相当于生成text2
     * @param text1
     * @param unifiedDiff
     * @throws IOException
     * @throws PatchFailedException
     */
    private static void mergeText(List<String> text1, List<String> unifiedDiff) throws IOException, PatchFailedException {
        //从文件或此处从内存导入统一差异格式到补丁
        Patch<String> importedPatch = UnifiedDiffUtils.parseUnifiedDiff(unifiedDiff);

        //将差异运用到其他文件打补丁,即将不同点运用到其他文件(相当于git的冲突合并)
        List<String> patchedText = DiffUtils.patch(text1, importedPatch);

        String fileName = "D:\\test3.txt";
        FileUtils.writeLines(new File(fileName), StandardCharsets.UTF_8.name(), patchedText, "\r", false);

    }

}

其中输入的test1.txt和test2.txt分别如下,test2.txt由test1.txt通过插入和修改部分字段得到:

--------------------test1.txt------------------------------------
 Full-text searching is performed using MATCH() AGAINST() syntax.
 MATCH() takes a comma-separated list that names the columns to be searched.
 AGAINST takes a string to search for, and an optional modifier that indicates what type of search to perform. 
The search string must be a string value that is constant during query evaluation. 
This rules out, for example, a table column because that can differ for each row.

Previously, MySQL permitted the use of a rollup column with MATCH(), 
but queries employing this construct performed poorly and with unreliable results.
 (This is due to the fact that MATCH() is not implemented as a function of its arguments,
 but rather as a function of the row ID of the current row in the underlying scan of the base table.) 
As of MySQL 8.0.28, MySQL no longer allows such queries; more specifically, any query matching 
all of the criteria listed here is rejected with ER_FULLTEXT_WITH_ROLLUP: 
---------------------------------------------------------------

--------------------test2.txt----------------------------------
 Full-text searching is performed using MATCH() AGAINST() syntax.
 MATCH() takes a comma-separated list that names the columns to be searched.
 AGAINST takes a string to (insert1)search for, and an optional modifier that indicates what type of search to perform. 
The search string must be a string value that is constant during query evaluation. 
This rules out, for example, a table column because that can differ for each row.

insert2
insert3
Previously, MySQL permitted the use of a rollup column with (delete1), 
but queries employing this construct performed poorly and with unreliable results.
 (This is due to the fact that MATCH() is not implemented as a function of its arguments,
 but rather as a function of the row ID of the current row in the underlying scan of the base table.) 
As of MySQL 8.0.28, MySQL no longer allows such queries; more specifically, any query matching 
all of the criteria listed here is rejected with ER_FULLTEXT_WITH_ROLLUP: 
---------------------------------------------------------------

输出的diff_test1_test2.txt如下,实际效果是类似于git diff的输出方式的。@@ -3,1 +3,1 @@表示test1和test2的第三行有改动。

--- test1.txt
+++ test2.txt
@@ -3,1 +3,1 @@
- AGAINST takes a string to search for, and an optional modifier that indicates what type of search to perform. 
+ AGAINST takes a string to (insert1)search for, and an optional modifier that indicates what type of search to perform. 
@@ -7,1 +7,3 @@
-Previously, MySQL permitted the use of a rollup column with MATCH(), 
+insert2
+insert3
+Previously, MySQL permitted the use of a rollup column with (delete1), 

根据patch获得的test3.txt就不贴出来了,是跟test2.txt内容一致的,印证了使用java-diff-util来进行增量存储,通过patch获得最新版本文档的可行性。

一些博客或在线文档自动保存的策略

  • websocket进行浏览器和服务器的通信
  • 使用redis来缓存实时编辑的文章(也可以考虑缓存diff以及全量文章)
  • 设置定时任务,或者关闭窗口,或者点击保存按钮时将缓存数据存储到MySQL数据库中,控制更新频率
  • 初次进入“新增”页面时,使用websocket从服务器获取数据