DBpedia本地SPARQL endpoint查询环境搭建

853 阅读4分钟

携手创作,共同成长!这是我参与「掘金日新计划 · 8 月更文挑战」的第16天,点击查看活动详情

前言

最近在看KBQA的相关内容,即Knowledge Base Question Answer。该任务旨在将用户的自然语言问句转换为Knowledge Graph中的数据查询语言SPARQL,然后将SPARQL在KG上进行执行得到查询结果反馈给用户。

SPARQL简介

SPARQL是一种知识图谱上的图数据查询语言,其语法类似于关系型数据库上的结构化查询语言SQL。不过SQL只能用在结构化数据上,使用受限。SPARQL则是在更加灵活的知识图谱上进行查询,功能更加强大。

在获得查询结果时,需要将SPARQL query提交给知识图谱(类比于SQL查询时,需要将SQL放到MySQL服务器中进行查询)。而知识图谱的服务可以是本地搭建,也可以通过网络远程的服务端,比如著名的维基百科就提供了一个在线版本的SPARQL查询网站:dbpedia.org/sparql 。用户通过文本框输入SPARQL语句就可以得到相应的查询结果。

image.png

本地SPARQL endpoint查询环境搭建

使用查询网站进行查询虽然很方便,但是由于Depedia是国外的网站,由于某些原因,服务器上无法正常访问。同时,在需要大规模查询时,更有可能因为访问量过大而被封禁ip,而搭建本地SPARQL的服务端则可以很好的解决这个问题。

使用Fuski搭建

使用Virtuoso搭建

DBpedia官方推荐使用Virtuoso的方式,利用Docker快速实现本地环境搭建。

首先,需要在服务器上安装Docker(参考:juejin.cn/post/706159… )和Docker compose(参考:juejin.cn/post/712310… ) 。

一旦安装了Docker和Docker Compose,运行如下命令:

git clone https://github.com/dbpedia/virtuoso-sparql-endpoint-quickstart.git
cd virtuoso-sparql-endpoint-quickstart
COLLECTION_URI=https://databus.dbpedia.org/dbpedia/collections/dbpedia-snapshot-2022-03 VIRTUOSO_ADMIN_PASSWD=YourSecretPassword docker-compose up

其中,只需要简单修改

VIRTUOSO_ADMIN_PASSWD=YourSecretPassword

成自定义的密码字段就可以了。

一切运行正常后,Virtuoso会自动帮我们下载相关数据,这可能要花费一些时间,在此期间,终端的输出如下:

(base) jxqi@han-server-01:~/project/text2sparql/virtuoso/virtuoso-sparql-endpoint-quickstart$ COLLECTION_URI=https://databus.dbpedia.org/dbpedia/collections/dbpedia-snapshot-2022-03 VIRTUOSO_ADMIN_PASSWD=123456 docker-compose up
[+] Running 3/3
 ⠿ Container virtuoso-sparql-endpoint-quickstart-download-1  Recreated                                                      0.2s
 ⠿ Container virtuoso-sparql-endpoint-quickstart-store-1     Rec...                                                         0.2s
 ⠿ Container virtuoso-sparql-endpoint-quickstart-load-1      Recr...                                                        0.1s
Attaching to virtuoso-sparql-endpoint-quickstart-download-1, virtuoso-sparql-endpoint-quickstart-load-1, virtuoso-sparql-endpoint-quickstart-store-1
virtuoso-sparql-endpoint-quickstart-load-1      | [INFO] Waiting for download to finish...
virtuoso-sparql-endpoint-quickstart-download-1  | Creating LOCK at /root/data
virtuoso-sparql-endpoint-quickstart-download-1  | Loading collection https://databus.dbpedia.org/dbpedia/collections/dbpedia-snapshot-2022-03
virtuoso-sparql-endpoint-quickstart-download-1  | GET: https://databus.dbpedia.org/dbpedia/collections/dbpedia-snapshot-2022-03 / ACCEPT: text/sparql
virtuoso-sparql-endpoint-quickstart-store-1     | Starting the Virtuoso Server
virtuoso-sparql-endpoint-quickstart-store-1     | 
virtuoso-sparql-endpoint-quickstart-store-1     |               Sat Jul 23 2022
virtuoso-sparql-endpoint-quickstart-store-1     | 01:23:36 { Loading plugin 1: Type `plain', file `geos' in `../hosting'
virtuoso-sparql-endpoint-quickstart-store-1     | 01:23:36   plain version 1.2.3234 from OpenLink Software
virtuoso-sparql-endpoint-quickstart-store-1     | 01:23:36   GEOS plugin based on Geometry Engine Open Source library from Open Source Geospatial Foundation
virtuoso-sparql-endpoint-quickstart-store-1     | 01:23:36   SUCCESS plugin 1: loaded from ../hosting/geos.so }
virtuoso-sparql-endpoint-quickstart-store-1     | 01:23:36 { Loading plugin 2: Type `plain', file `proj4' in `../hosting'
virtuoso-sparql-endpoint-quickstart-store-1     | 01:23:36   plain version 1.1.3234 from OpenLink Software
virtuoso-sparql-endpoint-quickstart-store-1     | 01:23:36   Cartographic Projections support based on Frank Warmerdam's proj4 library
virtuoso-sparql-endpoint-quickstart-store-1     | 01:23:36   SUCCESS plugin 2: loaded from ../hosting/proj4.so }
virtuoso-sparql-endpoint-quickstart-store-1     | 01:23:36 { Loading plugin 3: Type `plain', file `shapefileio' in `../hosting'
virtuoso-sparql-endpoint-quickstart-store-1     | 01:23:36   ShapefileIO version 0.1virt71 from OpenLink Software
virtuoso-sparql-endpoint-quickstart-store-1     | 01:23:36   Shapefile support based on Frank Warmerdam's Shapelib
virtuoso-sparql-endpoint-quickstart-store-1     | 01:23:36   SUCCESS plugin 3: loaded from ../hosting/shapefileio.so }
virtuoso-sparql-endpoint-quickstart-store-1     | 01:23:36 OpenLink Virtuoso Universal Server
virtuoso-sparql-endpoint-quickstart-store-1     | 01:23:36 Version 07.20.3234-pthreads for Linux as of May 18 2022
virtuoso-sparql-endpoint-quickstart-store-1     | 01:23:36 uses OpenSSL 1.1.1  11 Sep 2018
virtuoso-sparql-endpoint-quickstart-store-1     | 01:23:36 uses parts of PCRE, Html Tidy
virtuoso-sparql-endpoint-quickstart-store-1     | 01:23:36 Database version 3126
virtuoso-sparql-endpoint-quickstart-store-1     | 01:23:36 SQL Optimizer enabled (max 1000 layouts)
virtuoso-sparql-endpoint-quickstart-store-1     | 01:23:37 Compiler unit is timed at 0.000139 msec
virtuoso-sparql-endpoint-quickstart-download-1  | Collections resolved to query:
virtuoso-sparql-endpoint-quickstart-download-1  | PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
virtuoso-sparql-endpoint-quickstart-download-1  | PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
virtuoso-sparql-endpoint-quickstart-download-1  | PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
virtuoso-sparql-endpoint-quickstart-download-1  | PREFIX db: <https://databus.dbpedia.org/>
virtuoso-sparql-endpoint-quickstart-download-1  | PREFIX dcat: <http://www.w3.org/ns/dcat#>
virtuoso-sparql-endpoint-quickstart-download-1  | PREFIX dct: <http://purl.org/dc/terms/>
virtuoso-sparql-endpoint-quickstart-download-1  | PREFIX dataid: <http://dataid.dbpedia.org/ns/core#>
……

之后,通过本地的8890端口即可与Virtuoso进行交互查询,

image.png

这里,我们输入一句样例的SPARQL查询语句:

SELECT DISTINCT COUNT(?uri) WHERE {?uri <http://dbpedia.org/ontology/director> <http://dbpedia.org/resource/Stanley_Kubrick>  . }

这里KBQA数据集LC-QuAD 1.0中的一个样例,我们对其进行查询:

image.png

可以在Results Format栏选择返回的结果格式,比如JSON格式, 得到的结果输出如下:

image.png

这和我们在在线网站所查询得到的结果完全一致。

{ "head": { "link": [], "vars": ["callret-0"] },
  "results": { "distinct": false, "ordered": true, "bindings": [
    { "callret-0": { "type": "typed-literal", "datatype": "http://www.w3.org/2001/XMLSchema#integer", "value": "16" }} ] } }