[
](#sqlite3vfshttp-a-go-sqlite-vfs-for-querying-databases-over-https)sqlite3vfshttp:一个Go sqlite VFS,用于通过http(s)查询数据库。
sqlite3vfshttp是一个sqlite3 VFS,用于通过http(s)查询远程数据库。这允许你执行查询,而不需要先下载完整的数据库。
你的数据库必须托管在一个支持HTTP范围请求的网络服务器上(比如Amazon S3)。
[
](#example)例子
参见sqlitehttpcli/sqlitehttpcli.go,这是一个简单的CLI工具,能够查询远程托管的sqlite数据库。
[
](#usage)使用方法
vfs := sqlite3vfshttp.New(*url)
err := sqlite3vfs.RegisterVFS("httpvfs", vfs)
if err != nil {
log.Fatalf("register vfs err: %s", err)
}
db, err := sql.Open("sqlite3", "not_a_read_name.db?vfs=httpvfs&mode=ro")
if err != nil {
log.Fatalf("open db err: %s", err)
}
[
](#querying-a-database-in-s3)在S3中查询一个数据库
这个库的最初目的是让AWS Lambda函数能够查询存储在S3中的sqlite数据库,而无需下载整个数据库。
即使是存储在S3中的私有文件,通过生成一个预签名的URL并将其传递给这个库,也可以做到这一点。这允许客户端进行HTTP获取范围请求,而不需要知道如何签署S3请求。
[
](#building-a-loadable-extension-for-the-sqlite3-cli)为sqlite3 cli建立一个可加载的扩展。
sqlite3 cli支持运行时可加载扩展。我们可以把sqlite3vfshttp 构建为一个共享库,然后从sqlite3 cli中加载它,通过http连接交互式地查询sqlite数据库。共享库的代码位于sqlite3http-ext 目录中。更多细节见 sqlite3http-ext/README.md。
[
](#demo)演示
我已经上传了一个30MB的sqlite数据库到一个可公开访问的网络服务器上进行测试,基于 "国际收支投资状况。2021年3月季度 - CSV",来自www.stats.govt.nz/large-datas…。该模式为。
CREATE TABLE csv (series_reference,
period,
data_value,
suppressed,
status,
units,
magntude,
subject,
grp,
series_title_1);
你可以使用共享库扩展从sqlite3 cli工具查询这个数据集。
- load extention
sqlite> .load ./httpvfs
sqlite> - 使用vfs=httpvfs打开db,注意你必须使用以file://开头的sqlite uri语法 sq
lite
> .open file://foo.db?vfs=httpvfs
sqlite> - 从远程数据库查询
sqlite> SELECT * from csv where period > '2010' limit 10;
series_reference period data_value suppressed status units magntude subject grp series_title_1
------- ———- ———- ———- ———- ———- ———- ——————-------------------- --
--- BOPQ.S06AC000000000A 2010.03 17463 F 美元 6 国际收支-BOP BPM6 季度,国际收支主要组成部分 实际
BOPQ.S06AC000000000A 2010.06 17260 F 美元 6 国际收支平衡-BOP BPM6 季度,国际收支主要组成部分 实际
BOPQ.S06AC000000000A 2010.09 15419 F 美元 6 国际收支平衡-BOP BPM6 季度,国际收支主要组成部分 实际
BOPQ
.
S06AC000000000A 2010.12 17088 F 美元 6 国际收支平衡-BOP BPM6 季度,国际收支主要组成部分 实际
BOPQ.S06AC000000000A 2011.03 18516 F 美元 6 国际收支平衡-BOP BPM6 季度,国际收支主要组成部分 实际
BOPQ
.
S06AC000000000A 2011.06 18835 F 美元 6 国际收支平衡-BOP BPM6 季度,国际收支主要组成部分 实际
BOPQ.S06AC000000000A 2011.09 16390 F 美元 6 国际收支平衡-BOP BPM6 季度,国际收支主要组成部分 实际
BOPQ
.
S06AC000000000A 2011.12 18748 F 美元 6 国际收支平衡-BOP BPM6 季度,国际收支主要组成部分 实际
BOPQ.S06AC000000000A 2012.03 18477 F 美元 6 国际收支平衡-BOP BPM6 季度,国际收支主要组成部分 实际
BOPQ
.
S06AC000000000A 2012.06 18270 F 美元 6 国际收支-BOP BPM6 季度,国际收支的主要组成部分 实际
">
$ cd sqlite3http-ext
# build httpvfs.so shared library
$ make
go build -tags SQLITE3VFS_LOADABLE_EXT -o sqlite3http_ext.a -buildmode=c-archive sqlite3http_ext.go
gcc -g -fPIC -shared -o httpvfs.so sqlite3http_ext.c sqlite3http_ext.a
# set url of sqlite3 db as environment variable SQLITE3VFSHTTP_URL:
$ export SQLITE3VFSHTTP_URL='https://www.sanford.io/demo.db'
$ sqlite3
SQLite version 3.31.1 2020-01-27 19:55:54
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> -- load extention
sqlite> .load ./httpvfs
sqlite> -- open db using vfs=httpvfs, note you must use the sqlite uri syntax which starts with file://
sqlite> .open file:///foo.db?vfs=httpvfs
sqlite> -- query from remote db
sqlite> SELECT * from csv where period > '2010' limit 10;
series_reference period data_value suppressed status units magntude subject grp series_title_1
-------------------- ---------- ---------- ---------- ---------- ---------- ---------- ------------------------- ---------------------------------------------------- --------------
BOPQ.S06AC000000000A 2010.03 17463 F Dollars 6 Balance of Payments - BOP BPM6 Quarterly, Balance of payments major components Actual
BOPQ.S06AC000000000A 2010.06 17260 F Dollars 6 Balance of Payments - BOP BPM6 Quarterly, Balance of payments major components Actual
BOPQ.S06AC000000000A 2010.09 15419 F Dollars 6 Balance of Payments - BOP BPM6 Quarterly, Balance of payments major components Actual
BOPQ.S06AC000000000A 2010.12 17088 F Dollars 6 Balance of Payments - BOP BPM6 Quarterly, Balance of payments major components Actual
BOPQ.S06AC000000000A 2011.03 18516 F Dollars 6 Balance of Payments - BOP BPM6 Quarterly, Balance of payments major components Actual
BOPQ.S06AC000000000A 2011.06 18835 F Dollars 6 Balance of Payments - BOP BPM6 Quarterly, Balance of payments major components Actual
BOPQ.S06AC000000000A 2011.09 16390 F Dollars 6 Balance of Payments - BOP BPM6 Quarterly, Balance of payments major components Actual
BOPQ.S06AC000000000A 2011.12 18748 F Dollars 6 Balance of Payments - BOP BPM6 Quarterly, Balance of payments major components Actual
BOPQ.S06AC000000000A 2012.03 18477 F Dollars 6 Balance of Payments - BOP BPM6 Quarterly, Balance of payments major components Actual
BOPQ.S06AC000000000A 2012.06 18270 F Dollars 6 Balance of Payments - BOP BPM6 Quarterly, Balance of payments major components Actual
另外,你可以使用sqlitehttpcli 例子工具查询这个数据集。
'2010' limit 10
" row: [BOPQ.S06AC000000000A 2010.03 17463 F Dollars 6 Balance of Payments - BOP BPM6 Quarterly, Balance of payments major components Actual]
row: [BOPQ.S06AC000000000A 2010.06 17260 F Dollars 6 Balance of Payments - BOP BPM6 Quarterly, Balance of payments major components Actual]
row: [BOPQ.S06AC000000000A 2010.09 15419 F 美元 6 国际收支-BOP BPM6 季度,国际收支主要组成部分 实际]
行:[BOPQ.S06AC000000000A 2010.12 17088 F 美元 6 国际收支-BOP BPM6 季度,国际收支主要组成部分 实际]
行:[BOPQ.S06AC000000000A 2011.03 18516 F 美元 6 国际收支-BOP BPM6 季度,国际收支主要组成部分 实际]
行:[BOPQ.S06AC000000000A 2011.06 18835 F 美元 6 国际收支-BOP BPM6 季度,国际收支主要组成部分 实际]
行:[BOPQ.S06AC000000000A 2011.09 16390 F 美元 6 国际收支-BOP BPM6 季度,国际收支主要组成部分 实际]
行: [BOPQ.S06AC000000000A 2011.12 18748 F 美元 6 国际收支-BOP BPM6 季度,国际收支主要组成部分 实际]
行: [BOPQ.S06AC000000000A 2012.03 18477 F 美元 6 国际收支-BOP BPM6 季度,国际收支主要组成部分 实际]
行: [BOPQ.S06AC000000000A 2012.06 18270 F 美元 6 国际收支-BOP BPM6 季度,国际收支主要组成部分 实际]
">
# query the sqlite schema table
$ ./sqlitehttpcli -url 'https://www.sanford.io/demo.db' -query 'select * from main.sqlite_master'
row: [table csv csv 2 CREATE TABLE csv (series_reference,
period,
data_value,
suppressed,
status,
units,
magntude,
subject,
grp,
series_title_1)]
# get 10 rows from the dataset
./sqlitehttpcli -url 'https://www.sanford.io/demo.db' -query "select * from csv limit 10"
row: [BOPQ.S06AC000000000A 1971.06 426 F Dollars 6 Balance of Payments - BOP BPM6 Quarterly, Balance of payments major components Actual]
row: [BOPQ.S06AC000000000A 1971.09 435 F Dollars 6 Balance of Payments - BOP BPM6 Quarterly, Balance of payments major components Actual]
row: [BOPQ.S06AC000000000A 1971.12 360 F Dollars 6 Balance of Payments - BOP BPM6 Quarterly, Balance of payments major components Actual]
row: [BOPQ.S06AC000000000A 1972.03 417 F Dollars 6 Balance of Payments - BOP BPM6 Quarterly, Balance of payments major components Actual]
row: [BOPQ.S06AC000000000A 1972.06 528 F Dollars 6 Balance of Payments - BOP BPM6 Quarterly, Balance of payments major components Actual]
row: [BOPQ.S06AC000000000A 1972.09 471 F Dollars 6 Balance of Payments - BOP BPM6 Quarterly, Balance of payments major components Actual]
row: [BOPQ.S06AC000000000A 1972.12 437 F Dollars 6 Balance of Payments - BOP BPM6 Quarterly, Balance of payments major components Actual]
row: [BOPQ.S06AC000000000A 1973.03 607 F Dollars 6 Balance of Payments - BOP BPM6 Quarterly, Balance of payments major components Actual]
row: [BOPQ.S06AC000000000A 1973.06 666 F Dollars 6 Balance of Payments - BOP BPM6 Quarterly, Balance of payments major components Actual]
row: [BOPQ.S06AC000000000A 1973.09 578 F Dollars 6 Balance of Payments - BOP BPM6 Quarterly, Balance of payments major components Actual]
# get 10 rows where the period is after 2010
$ ./sqlitehttpcli -url 'https://www.sanford.io/demo.db' -query "select * from csv where period > '2010' limit 10"
row: [BOPQ.S06AC000000000A 2010.03 17463 F Dollars 6 Balance of Payments - BOP BPM6 Quarterly, Balance of payments major components Actual]
row: [BOPQ.S06AC000000000A 2010.06 17260 F Dollars 6 Balance of Payments - BOP BPM6 Quarterly, Balance of payments major components Actual]
row: [BOPQ.S06AC000000000A 2010.09 15419 F Dollars 6 Balance of Payments - BOP BPM6 Quarterly, Balance of payments major components Actual]
row: [BOPQ.S06AC000000000A 2010.12 17088 F Dollars 6 Balance of Payments - BOP BPM6 Quarterly, Balance of payments major components Actual]
row: [BOPQ.S06AC000000000A 2011.03 18516 F Dollars 6 Balance of Payments - BOP BPM6 Quarterly, Balance of payments major components Actual]
row: [BOPQ.S06AC000000000A 2011.06 18835 F Dollars 6 Balance of Payments - BOP BPM6 Quarterly, Balance of payments major components Actual]
row: [BOPQ.S06AC000000000A 2011.09 16390 F Dollars 6 Balance of Payments - BOP BPM6 Quarterly, Balance of payments major components Actual]
row: [BOPQ.S06AC000000000A 2011.12 18748 F Dollars 6 Balance of Payments - BOP BPM6 Quarterly, Balance of payments major components Actual]
row: [BOPQ.S06AC000000000A 2012.03 18477 F Dollars 6 Balance of Payments - BOP BPM6 Quarterly, Balance of payments major components Actual]
row: [BOPQ.S06AC000000000A 2012.06 18270 F Dollars 6 Balance of Payments - BOP BPM6 Quarterly, Balance of payments major components Actual]