sqlite3vfshttp:用于通过 http(s) 查询数据库的 Go sqlite VFS

558 阅读6分钟

[

](#sqlite3vfshttp-a-go-sqlite-vfs-for-querying-databases-over-https)sqlite3vfshttp:一个Go sqlite VFS,用于通过http(s)查询数据库。

sqlite3vfshttp是一个sqlite3 VFS,用于通过http(s)查询远程数据库。这允许你执行查询,而不需要先下载完整的数据库。

你的数据库必须托管在一个支持HTTP范围请求的网络服务器上(比如Amazon S3)。

[

](#example)例子

参见sqlitehttpcli/sqlitehttpcli.go,这是一个简单的CLI工具,能够查询远程托管的sqlite数据库。

[

](#usage)使用方法

	vfs := sqlite3vfshttp.New(*url)

	err := sqlite3vfs.RegisterVFS("httpvfs", vfs)
	if err != nil {
		log.Fatalf("register vfs err: %s", err)
	}

	db, err := sql.Open("sqlite3", "not_a_read_name.db?vfs=httpvfs&mode=ro")
	if err != nil {
		log.Fatalf("open db err: %s", err)
	}

[

](#querying-a-database-in-s3)在S3中查询一个数据库

这个库的最初目的是让AWS Lambda函数能够查询存储在S3中的sqlite数据库,而无需下载整个数据库。

即使是存储在S3中的私有文件,通过生成一个预签名的URL并将其传递给这个库,也可以做到这一点。这允许客户端进行HTTP获取范围请求,而不需要知道如何签署S3请求。

[

](#building-a-loadable-extension-for-the-sqlite3-cli)为sqlite3 cli建立一个可加载的扩展。

sqlite3 cli支持运行时可加载扩展。我们可以把sqlite3vfshttp 构建为一个共享库,然后从sqlite3 cli中加载它,通过http连接交互式地查询sqlite数据库。共享库的代码位于sqlite3http-ext 目录中。更多细节见 sqlite3http-ext/README.md。

[

](#demo)演示

我已经上传了一个30MB的sqlite数据库到一个可公开访问的网络服务器上进行测试,基于 "国际收支投资状况。2021年3月季度 - CSV",来自www.stats.govt.nz/large-datas…。该模式为。

CREATE TABLE csv (series_reference,
period,
data_value,
suppressed,
status,
units,
magntude,
subject,
grp,
series_title_1);

你可以使用共享库扩展从sqlite3 cli工具查询这个数据集。

- load extention
sqlite> .load ./httpvfs
sqlite> - 使用vfs=httpvfs打开db,注意你必须使用以file://开头的sqlite uri语法 sq

lite

> .open file://foo.db?vfs=httpvfs
sqlite> - 从远程数据库查询
sqlite> SELECT * from csv where period > '2010' limit 10;
series_reference period data_value suppressed status units magntude subject grp series_title_1
------- ———- ———- ———- ———- ———- ———- ——————-------------------- --
--- BOPQ.S06AC000000000A 2010.03 17463 F 美元 6 国际收支-BOP BPM6 季度,国际收支主要组成部分 实际
BOPQ.S06AC000000000A 2010.06 17260 F 美元 6 国际收支平衡-BOP BPM6 季度,国际收支主要组成部分 实际
BOPQ.S06AC000000000A 2010.09 15419 F 美元 6 国际收支平衡-BOP BPM6 季度,国际收支主要组成部分 实际
BOPQ

.

S06AC000000000A 2010.12 17088 F 美元 6 国际收支平衡-BOP BPM6 季度,国际收支主要组成部分 实际
BOPQ.S06AC000000000A 2011.03 18516 F 美元 6 国际收支平衡-BOP BPM6 季度,国际收支主要组成部分 实际
BOPQ

.

S06AC000000000A 2011.06 18835 F 美元 6 国际收支平衡-BOP BPM6 季度,国际收支主要组成部分 实际
BOPQ.S06AC000000000A 2011.09 16390 F 美元 6 国际收支平衡-BOP BPM6 季度,国际收支主要组成部分 实际
BOPQ

.

S06AC000000000A 2011.12 18748 F 美元 6 国际收支平衡-BOP BPM6 季度,国际收支主要组成部分 实际
BOPQ.S06AC000000000A 2012.03 18477 F 美元 6 国际收支平衡-BOP BPM6 季度,国际收支主要组成部分 实际
BOPQ

.

S06AC000000000A 2012.06 18270 F 美元 6 国际收支-BOP BPM6 季度,国际收支的主要组成部分 实际
">

$ cd sqlite3http-ext

# build httpvfs.so shared library
$ make
go build -tags SQLITE3VFS_LOADABLE_EXT -o sqlite3http_ext.a -buildmode=c-archive sqlite3http_ext.go
gcc -g -fPIC -shared -o httpvfs.so sqlite3http_ext.c sqlite3http_ext.a

# set url of sqlite3 db as environment variable SQLITE3VFSHTTP_URL:
$ export SQLITE3VFSHTTP_URL='https://www.sanford.io/demo.db'

$ sqlite3
SQLite version 3.31.1 2020-01-27 19:55:54
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> -- load extention
sqlite> .load ./httpvfs
sqlite> -- open db using vfs=httpvfs, note you must use the sqlite uri syntax which starts with file://
sqlite> .open file:///foo.db?vfs=httpvfs
sqlite> -- query from remote db
sqlite> SELECT * from csv where period > '2010' limit 10;
series_reference      period      data_value  suppressed  status      units       magntude    subject                    grp                                                   series_title_1
--------------------  ----------  ----------  ----------  ----------  ----------  ----------  -------------------------  ----------------------------------------------------  --------------
BOPQ.S06AC000000000A  2010.03     17463                   F           Dollars     6           Balance of Payments - BOP  BPM6 Quarterly, Balance of payments major components  Actual
BOPQ.S06AC000000000A  2010.06     17260                   F           Dollars     6           Balance of Payments - BOP  BPM6 Quarterly, Balance of payments major components  Actual
BOPQ.S06AC000000000A  2010.09     15419                   F           Dollars     6           Balance of Payments - BOP  BPM6 Quarterly, Balance of payments major components  Actual
BOPQ.S06AC000000000A  2010.12     17088                   F           Dollars     6           Balance of Payments - BOP  BPM6 Quarterly, Balance of payments major components  Actual
BOPQ.S06AC000000000A  2011.03     18516                   F           Dollars     6           Balance of Payments - BOP  BPM6 Quarterly, Balance of payments major components  Actual
BOPQ.S06AC000000000A  2011.06     18835                   F           Dollars     6           Balance of Payments - BOP  BPM6 Quarterly, Balance of payments major components  Actual
BOPQ.S06AC000000000A  2011.09     16390                   F           Dollars     6           Balance of Payments - BOP  BPM6 Quarterly, Balance of payments major components  Actual
BOPQ.S06AC000000000A  2011.12     18748                   F           Dollars     6           Balance of Payments - BOP  BPM6 Quarterly, Balance of payments major components  Actual
BOPQ.S06AC000000000A  2012.03     18477                   F           Dollars     6           Balance of Payments - BOP  BPM6 Quarterly, Balance of payments major components  Actual
BOPQ.S06AC000000000A  2012.06     18270                   F           Dollars     6           Balance of Payments - BOP  BPM6 Quarterly, Balance of payments major components  Actual

另外,你可以使用sqlitehttpcli 例子工具查询这个数据集。

'2010' limit 10
" row: [BOPQ.S06AC000000000A 2010.03 17463 F Dollars 6 Balance of Payments - BOP BPM6 Quarterly, Balance of payments major components Actual]
row: [BOPQ.S06AC000000000A 2010.06 17260 F Dollars 6 Balance of Payments - BOP BPM6 Quarterly, Balance of payments major components Actual]
row: [BOPQ.S06AC000000000A 2010.09 15419 F 美元 6 国际收支-BOP BPM6 季度,国际收支主要组成部分 实际]
行:[BOPQ.S06AC000000000A 2010.12 17088 F 美元 6 国际收支-BOP BPM6 季度,国际收支主要组成部分 实际]
行:[BOPQ.S06AC000000000A 2011.03 18516 F 美元 6 国际收支-BOP BPM6 季度,国际收支主要组成部分 实际]
行:[BOPQ.S06AC000000000A 2011.06 18835 F 美元 6 国际收支-BOP BPM6 季度,国际收支主要组成部分 实际]
行:[BOPQ.S06AC000000000A 2011.09 16390 F 美元 6 国际收支-BOP BPM6 季度,国际收支主要组成部分 实际]
行: [BOPQ.S06AC000000000A 2011.12 18748 F 美元 6 国际收支-BOP BPM6 季度,国际收支主要组成部分 实际]
行: [BOPQ.S06AC000000000A 2012.03 18477 F 美元 6 国际收支-BOP BPM6 季度,国际收支主要组成部分 实际]
行: [BOPQ.S06AC000000000A 2012.06 18270 F 美元 6 国际收支-BOP BPM6 季度,国际收支主要组成部分 实际]

">

# query the sqlite schema table
$ ./sqlitehttpcli -url 'https://www.sanford.io/demo.db' -query 'select * from main.sqlite_master'
row: [table csv csv 2 CREATE TABLE csv (series_reference,
period,
data_value,
suppressed,
status,
units,
magntude,
subject,
grp,
series_title_1)]

# get 10 rows from the dataset
./sqlitehttpcli -url 'https://www.sanford.io/demo.db' -query "select * from csv limit 10"
row: [BOPQ.S06AC000000000A 1971.06 426  F Dollars 6 Balance of Payments - BOP BPM6 Quarterly, Balance of payments major components Actual]
row: [BOPQ.S06AC000000000A 1971.09 435  F Dollars 6 Balance of Payments - BOP BPM6 Quarterly, Balance of payments major components Actual]
row: [BOPQ.S06AC000000000A 1971.12 360  F Dollars 6 Balance of Payments - BOP BPM6 Quarterly, Balance of payments major components Actual]
row: [BOPQ.S06AC000000000A 1972.03 417  F Dollars 6 Balance of Payments - BOP BPM6 Quarterly, Balance of payments major components Actual]
row: [BOPQ.S06AC000000000A 1972.06 528  F Dollars 6 Balance of Payments - BOP BPM6 Quarterly, Balance of payments major components Actual]
row: [BOPQ.S06AC000000000A 1972.09 471  F Dollars 6 Balance of Payments - BOP BPM6 Quarterly, Balance of payments major components Actual]
row: [BOPQ.S06AC000000000A 1972.12 437  F Dollars 6 Balance of Payments - BOP BPM6 Quarterly, Balance of payments major components Actual]
row: [BOPQ.S06AC000000000A 1973.03 607  F Dollars 6 Balance of Payments - BOP BPM6 Quarterly, Balance of payments major components Actual]
row: [BOPQ.S06AC000000000A 1973.06 666  F Dollars 6 Balance of Payments - BOP BPM6 Quarterly, Balance of payments major components Actual]
row: [BOPQ.S06AC000000000A 1973.09 578  F Dollars 6 Balance of Payments - BOP BPM6 Quarterly, Balance of payments major components Actual]

# get 10 rows where the period is after 2010
$ ./sqlitehttpcli -url 'https://www.sanford.io/demo.db' -query "select * from csv where period > '2010' limit 10"
row: [BOPQ.S06AC000000000A 2010.03 17463  F Dollars 6 Balance of Payments - BOP BPM6 Quarterly, Balance of payments major components Actual]
row: [BOPQ.S06AC000000000A 2010.06 17260  F Dollars 6 Balance of Payments - BOP BPM6 Quarterly, Balance of payments major components Actual]
row: [BOPQ.S06AC000000000A 2010.09 15419  F Dollars 6 Balance of Payments - BOP BPM6 Quarterly, Balance of payments major components Actual]
row: [BOPQ.S06AC000000000A 2010.12 17088  F Dollars 6 Balance of Payments - BOP BPM6 Quarterly, Balance of payments major components Actual]
row: [BOPQ.S06AC000000000A 2011.03 18516  F Dollars 6 Balance of Payments - BOP BPM6 Quarterly, Balance of payments major components Actual]
row: [BOPQ.S06AC000000000A 2011.06 18835  F Dollars 6 Balance of Payments - BOP BPM6 Quarterly, Balance of payments major components Actual]
row: [BOPQ.S06AC000000000A 2011.09 16390  F Dollars 6 Balance of Payments - BOP BPM6 Quarterly, Balance of payments major components Actual]
row: [BOPQ.S06AC000000000A 2011.12 18748  F Dollars 6 Balance of Payments - BOP BPM6 Quarterly, Balance of payments major components Actual]
row: [BOPQ.S06AC000000000A 2012.03 18477  F Dollars 6 Balance of Payments - BOP BPM6 Quarterly, Balance of payments major components Actual]
row: [BOPQ.S06AC000000000A 2012.06 18270  F Dollars 6 Balance of Payments - BOP BPM6 Quarterly, Balance of payments major components Actual]

GitHub

github.com/psanford/sq…