第三方支持库jieba效果不好与报错问题 | 青训营笔记

542 阅读2分钟

这是我参与「第三届青训营 -后端场」笔记创作活动的第1篇笔记

问题描述

我们小组选择的大项目是搜索引擎,所以我在前期需求分析过后遇到的第一个难题就是与第三方支持库jieba有关的。

从需求文档里的参考库里我们发现jieba有两个go版本的库,排在上面的wangbin的库太老了,最近一次commit是2015年,这个库切出来的词甚至连一些常见的人名都不认识,而且不开启新词模式好像它就会当一个真结巴,一个字一个字的切,加了新词模式就会导致一些源数据切的效果和搜索词切的不相同了,影响搜索结果,根本不能拿来使用,因此我们果断换了yanyiwu的库。

但yanyiwu的库也有独属于它的问题。因为它的底层是用C++编写的,存在一些我们不可知的问题,导致哪怕我是正常调用它都会报一堆的C++的错,such as:

# github.com/yanyiwu/gojieba
In file included from ../../../go/pkg/mod/github.com/yanyiwu/gojieba@v1.1.0/deps/cppjieba/Unicode.hpp:9,
                 from ../../../go/pkg/mod/github.com/yanyiwu/gojieba@v1.1.0/deps/cppjieba/DictTrie.hpp:15,
                 from ../../../go/pkg/mod/github.com/yanyiwu/gojieba@v1.1.0/deps/cppjieba/QuerySegment.hpp:8,
                 from ../../../go/pkg/mod/github.com/yanyiwu/gojieba@v1.1.0/deps/cppjieba/Jieba.hpp:4,
                 from jieba.cpp:5:
../../../go/pkg/mod/github.com/yanyiwu/gojieba@v1.1.0/deps/limonp/LocalVector.hpp: In instantiation of ‘void limonp::LocalVector<T>::reserve(size_t) [with T = std::pair<long unsigned int, const cppjieba::DictUnit*>; size_t = long unsigned int]’:
../../../go/pkg/mod/github.com/yanyiwu/gojieba@v1.1.0/deps/limonp/LocalVector.hpp:83:7:   required from ‘void limonp::LocalVector<T>::push_back(const T&) [with T = std::pair<long unsigned int, const cppjieba::DictUnit*>]’
../../../go/pkg/mod/github.com/yanyiwu/gojieba@v1.1.0/deps/cppjieba/Trie.hpp:99:81:   required from here
../../../go/pkg/mod/github.com/yanyiwu/gojieba@v1.1.0/deps/limonp/LocalVector.hpp:95:11: warning: ‘void* memcpy(void*, const void*, size_t)’ writing to an object of type ‘struct std::pair<long unsigned int, const cppjieba::DictUnit*>’ with no trivial copy-assignment; use copy-assignment or copy-initialization instead [-Wclass-memaccess]
   95 |     memcpy(ptr_, old, sizeof(T) * capacity_);
      |     ~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/include/c++/9/utility:70,
                 from /usr/include/c++/9/algorithm:60,
                 from ../../../go/pkg/mod/github.com/yanyiwu/gojieba@v1.1.0/deps/cppjieba/QuerySegment.hpp:4,
                 from ../../../go/pkg/mod/github.com/yanyiwu/gojieba@v1.1.0/deps/cppjieba/Jieba.hpp:4,
                 from jieba.cpp:5:
/usr/include/c++/9/bits/stl_pair.h:208:12: note: ‘struct std::pair<long unsigned int, const cppjieba::DictUnit*>’ declared here
  208 |     struct pair
      |            ^~~~
In file included from ../../../go/pkg/mod/github.com/yanyiwu/gojieba@v1.1.0/deps/cppjieba/Unicode.hpp:9,
                 from ../../../go/pkg/mod/github.com/yanyiwu/gojieba@v1.1.0/deps/cppjieba/DictTrie.hpp:15,
                 from ../../../go/pkg/mod/github.com/yanyiwu/gojieba@v1.1.0/deps/cppjieba/QuerySegment.hpp:8,
                 from ../../../go/pkg/mod/github.com/yanyiwu/gojieba@v1.1.0/deps/cppjieba/Jieba.hpp:4,
                 from jieba.cpp:5:
                 
...

../../../go/pkg/mod/github.com/yanyiwu/gojieba@v1.1.0/deps/limonp/LocalVector.hpp:63:13: warning: ‘void* memcpy(void*, const void*, size_t)’ writing to an object of type ‘struct std::pair<long unsigned int, const cppjieba::DictUnit*>’ with no trivial copy-assignment; use copy-assignment or copy-initialization instead [-Wclass-memaccess]
   63 |       memcpy(ptr_, vec.ptr_, vec.size() * sizeof(T));
      |       ~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/include/c++/9/utility:70,
                 from /usr/include/c++/9/algorithm:60,
                 from ../../../go/pkg/mod/github.com/yanyiwu/gojieba@v1.1.0/deps/cppjieba/QuerySegment.hpp:4,
                 from ../../../go/pkg/mod/github.com/yanyiwu/gojieba@v1.1.0/deps/cppjieba/Jieba.hpp:4,
                 from jieba.cpp:5:
/usr/include/c++/9/bits/stl_pair.h:208:12: note: ‘struct std::pair<long unsigned int, const cppjieba::DictUnit*>’ declared here
  208 |     struct pair
      |            ^~~~

这些代码伤害性不大,但侮辱性极强,我只能在浩如烟海的issue里找找有没有大怨种和我一样的遇到了问题的,结果还真有。

解决方法

在 go.mod 文件中加入下面这一行:

replace github.com/yanyiwu/gojieba v1.1.2 => github.com/ttys3/gojieba v1.1.3

顺带吐槽一下,这个pr明明已经Merged了,库主人就是不愿意再更一个Release,很无语。