携手创作,共同成长!这是我参与「掘金日新计划 · 8 月更文挑战」的第16天,点击查看活动详情
前言
最近在看KBQA的相关内容,即Knowledge Base Question Answer。该任务旨在将用户的自然语言问句转换为Knowledge Graph中的数据查询语言SPARQL,然后将SPARQL在KG上进行执行得到查询结果反馈给用户。
SPARQL简介
SPARQL是一种知识图谱上的图数据查询语言,其语法类似于关系型数据库上的结构化查询语言SQL。不过SQL只能用在结构化数据上,使用受限。SPARQL则是在更加灵活的知识图谱上进行查询,功能更加强大。
在获得查询结果时,需要将SPARQL query提交给知识图谱(类比于SQL查询时,需要将SQL放到MySQL服务器中进行查询)。而知识图谱的服务可以是本地搭建,也可以通过网络远程的服务端,比如著名的维基百科就提供了一个在线版本的SPARQL查询网站:dbpedia.org/sparql 。用户通过文本框输入SPARQL语句就可以得到相应的查询结果。
本地SPARQL endpoint查询环境搭建
使用查询网站进行查询虽然很方便,但是由于Depedia是国外的网站,由于某些原因,服务器上无法正常访问。同时,在需要大规模查询时,更有可能因为访问量过大而被封禁ip,而搭建本地SPARQL的服务端则可以很好的解决这个问题。
使用Fuski搭建
使用Virtuoso搭建
DBpedia官方推荐使用Virtuoso的方式,利用Docker快速实现本地环境搭建。
首先,需要在服务器上安装Docker(参考:juejin.cn/post/706159… )和Docker compose(参考:juejin.cn/post/712310… ) 。
一旦安装了Docker和Docker Compose,运行如下命令:
git clone https://github.com/dbpedia/virtuoso-sparql-endpoint-quickstart.git
cd virtuoso-sparql-endpoint-quickstart
COLLECTION_URI=https://databus.dbpedia.org/dbpedia/collections/dbpedia-snapshot-2022-03 VIRTUOSO_ADMIN_PASSWD=YourSecretPassword docker-compose up
其中,只需要简单修改
VIRTUOSO_ADMIN_PASSWD=YourSecretPassword
成自定义的密码字段就可以了。
一切运行正常后,Virtuoso会自动帮我们下载相关数据,这可能要花费一些时间,在此期间,终端的输出如下:
(base) jxqi@han-server-01:~/project/text2sparql/virtuoso/virtuoso-sparql-endpoint-quickstart$ COLLECTION_URI=https://databus.dbpedia.org/dbpedia/collections/dbpedia-snapshot-2022-03 VIRTUOSO_ADMIN_PASSWD=123456 docker-compose up
[+] Running 3/3
⠿ Container virtuoso-sparql-endpoint-quickstart-download-1 Recreated 0.2s
⠿ Container virtuoso-sparql-endpoint-quickstart-store-1 Rec... 0.2s
⠿ Container virtuoso-sparql-endpoint-quickstart-load-1 Recr... 0.1s
Attaching to virtuoso-sparql-endpoint-quickstart-download-1, virtuoso-sparql-endpoint-quickstart-load-1, virtuoso-sparql-endpoint-quickstart-store-1
virtuoso-sparql-endpoint-quickstart-load-1 | [INFO] Waiting for download to finish...
virtuoso-sparql-endpoint-quickstart-download-1 | Creating LOCK at /root/data
virtuoso-sparql-endpoint-quickstart-download-1 | Loading collection https://databus.dbpedia.org/dbpedia/collections/dbpedia-snapshot-2022-03
virtuoso-sparql-endpoint-quickstart-download-1 | GET: https://databus.dbpedia.org/dbpedia/collections/dbpedia-snapshot-2022-03 / ACCEPT: text/sparql
virtuoso-sparql-endpoint-quickstart-store-1 | Starting the Virtuoso Server
virtuoso-sparql-endpoint-quickstart-store-1 |
virtuoso-sparql-endpoint-quickstart-store-1 | Sat Jul 23 2022
virtuoso-sparql-endpoint-quickstart-store-1 | 01:23:36 { Loading plugin 1: Type `plain', file `geos' in `../hosting'
virtuoso-sparql-endpoint-quickstart-store-1 | 01:23:36 plain version 1.2.3234 from OpenLink Software
virtuoso-sparql-endpoint-quickstart-store-1 | 01:23:36 GEOS plugin based on Geometry Engine Open Source library from Open Source Geospatial Foundation
virtuoso-sparql-endpoint-quickstart-store-1 | 01:23:36 SUCCESS plugin 1: loaded from ../hosting/geos.so }
virtuoso-sparql-endpoint-quickstart-store-1 | 01:23:36 { Loading plugin 2: Type `plain', file `proj4' in `../hosting'
virtuoso-sparql-endpoint-quickstart-store-1 | 01:23:36 plain version 1.1.3234 from OpenLink Software
virtuoso-sparql-endpoint-quickstart-store-1 | 01:23:36 Cartographic Projections support based on Frank Warmerdam's proj4 library
virtuoso-sparql-endpoint-quickstart-store-1 | 01:23:36 SUCCESS plugin 2: loaded from ../hosting/proj4.so }
virtuoso-sparql-endpoint-quickstart-store-1 | 01:23:36 { Loading plugin 3: Type `plain', file `shapefileio' in `../hosting'
virtuoso-sparql-endpoint-quickstart-store-1 | 01:23:36 ShapefileIO version 0.1virt71 from OpenLink Software
virtuoso-sparql-endpoint-quickstart-store-1 | 01:23:36 Shapefile support based on Frank Warmerdam's Shapelib
virtuoso-sparql-endpoint-quickstart-store-1 | 01:23:36 SUCCESS plugin 3: loaded from ../hosting/shapefileio.so }
virtuoso-sparql-endpoint-quickstart-store-1 | 01:23:36 OpenLink Virtuoso Universal Server
virtuoso-sparql-endpoint-quickstart-store-1 | 01:23:36 Version 07.20.3234-pthreads for Linux as of May 18 2022
virtuoso-sparql-endpoint-quickstart-store-1 | 01:23:36 uses OpenSSL 1.1.1 11 Sep 2018
virtuoso-sparql-endpoint-quickstart-store-1 | 01:23:36 uses parts of PCRE, Html Tidy
virtuoso-sparql-endpoint-quickstart-store-1 | 01:23:36 Database version 3126
virtuoso-sparql-endpoint-quickstart-store-1 | 01:23:36 SQL Optimizer enabled (max 1000 layouts)
virtuoso-sparql-endpoint-quickstart-store-1 | 01:23:37 Compiler unit is timed at 0.000139 msec
virtuoso-sparql-endpoint-quickstart-download-1 | Collections resolved to query:
virtuoso-sparql-endpoint-quickstart-download-1 | PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
virtuoso-sparql-endpoint-quickstart-download-1 | PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
virtuoso-sparql-endpoint-quickstart-download-1 | PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
virtuoso-sparql-endpoint-quickstart-download-1 | PREFIX db: <https://databus.dbpedia.org/>
virtuoso-sparql-endpoint-quickstart-download-1 | PREFIX dcat: <http://www.w3.org/ns/dcat#>
virtuoso-sparql-endpoint-quickstart-download-1 | PREFIX dct: <http://purl.org/dc/terms/>
virtuoso-sparql-endpoint-quickstart-download-1 | PREFIX dataid: <http://dataid.dbpedia.org/ns/core#>
……
之后,通过本地的8890端口即可与Virtuoso进行交互查询,
这里,我们输入一句样例的SPARQL查询语句:
SELECT DISTINCT COUNT(?uri) WHERE {?uri <http://dbpedia.org/ontology/director> <http://dbpedia.org/resource/Stanley_Kubrick> . }
这里KBQA数据集LC-QuAD 1.0中的一个样例,我们对其进行查询:
可以在Results Format栏选择返回的结果格式,比如JSON格式, 得到的结果输出如下:
这和我们在在线网站所查询得到的结果完全一致。
{ "head": { "link": [], "vars": ["callret-0"] },
"results": { "distinct": false, "ordered": true, "bindings": [
{ "callret-0": { "type": "typed-literal", "datatype": "http://www.w3.org/2001/XMLSchema#integer", "value": "16" }} ] } }