Presto基于Docker部署及Connector配置

571 阅读7分钟

Presto部署

最简单方式

相关文件

  1. 准备Presto的启动脚本presto-start.sh,脚本内容如下:
#!/bin/bash

# 参考地址
# https://prestodb.io/docs/current/installation/deploy-docker.html
# https://prestodb.io/docs/current/security/password-file.html

# 访问presto命令行
# docker exec -it some-presto presto-cli

cd `dirname $0`
ROOT_DIR=`pwd`

docker run -d \
--privileged=true \
--name some-presto \
-p 8080:8080 \
-e TZ=Asia/Shanghai \
-v $ROOT_DIR/config.properties:/opt/presto-server/etc/config.properties \
-v $ROOT_DIR/jvm.config:/opt/presto-server/etc/jvm.config \
-v $ROOT_DIR/catalog:/opt/presto-server/etc/catalog \
prestodb/presto:latest
  1. 准备config.properties文件,并和启动脚本放在同一个目录,文件内容如下:
coordinator=true
node-scheduler.include-coordinator=true
http-server.http.port=8080
discovery-server.enabled=true
discovery.uri=http://localhost:8080
  1. 准备jvm.config文件,并和启动脚本放在同一个目录。
-server
-Xmx2G
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:+UseGCOverheadLimit
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:+ExitOnOutOfMemoryError
-Djdk.attach.allowAttachSelf=true

4. 创建catalog目录,用于管理Connector,和启动脚本放在同一个目录,此处只需要建一个空目录。

启动测试

  1. 启动容器。
$ bash presto-start.sh afc0c44a72b2d52e9ce5d296da7683a88ee7dd2623af55f07bcdfc836d4f294d
  1. 浏览器页面访问测试,http://127.0.0.1:8080

image.png

  1. Presto客户端测试。进入容器,调用presto客户端,即可测试。
$ docker exec -it some-presto bash
[root@afc0c44a72b2 /]#
[root@afc0c44a72b2 /]# presto-cli
presto> show catalogs;
 Catalog
---------
 system
(1 row)

Query 20240808_064334_00000_bs393, FINISHED, 1 node
Splits: 1 total, 1 done (100.00%)
[Latency: client-side: 0:01, server-side: 0:01] [0 rows, 0B] [0 rows/s, 0B/s]

presto> show schemas from system;
       Schema
--------------------
 information_schema
 jdbc
 metadata
 runtime
(4 rows)

Query 20240808_064350_00001_bs393, FINISHED, 1 node
Splits: 19 total, 19 done (100.00%)
[Latency: client-side: 0:01, server-side: 0:01] [4 rows, 57B] [8 rows/s, 114B/s]

presto> show tables from system.metadata;
       Table
--------------------
 analyze_properties
 catalogs
 column_properties
 schema_properties
 table_properties
(5 rows)

Query 20240808_064432_00002_bs393, FINISHED, 1 node
Splits: 19 total, 19 done (100.00%)
[Latency: client-side: 377ms, server-side: 357ms] [5 rows, 166B] [14 rows/s, 464B/s]

presto> select * from system.metadata.catalogs;
 catalog_name | connector_id
--------------+--------------
 system       | system
(1 row)

Query 20240808_064452_00003_bs393, FINISHED, 1 node
Splits: 17 total, 17 done (100.00%)
[Latency: client-side: 177ms, server-side: 145ms] [1 rows, 12B] [6 rows/s, 82B/s]
  1. Dbeaver测试。

image.png

image.png

账号密码方式

配置SSL

Presto有约束,要使用账号/密码,必须要先支持SSL,以HTTPS访问才行。生成jks文件,用于SSL认证。在生成jks文件时,CN一定要配成泛域名,并且要设置密码,比如设置成presto_ssl,后续要用。

$ keytool -genkeypair -alias presto_ssl -keyalg RSA -keystore presto.jks -validity 36500
Enter keystore password:
Re-enter new password:
What is your first and last name?
  [Unknown]:  *.ed.com
What is the name of your organizational unit?
  [Unknown]:
What is the name of your organization?
  [Unknown]:
What is the name of your City or Locality?
  [Unknown]:
What is the name of your State or Province?
  [Unknown]:
What is the two-letter country code for this unit?
  [Unknown]:
Is CN=*.ed.com, OU=Unknown, O=Unknown, L=Unknown, ST=Unknown, C=Unknown correct?
  [no]:  yes

配置密码库

创建密码文件库password.db,添加用户和密码。此时需要用到htpasswd工具,如果机器上没有需要自行安装,使用下面命令来添加用户。

htpasswd -B -C 10 password.db username

这里以添加用户密码分别root和toor为例,password.db文件中已经正确添加了root用户。

$ touch password.db
$ htpasswd -B -C 10 password.db root
New password:
Re-type new password:
Adding password for user root
$ cat password.db
root:$2y$10$eellteFAhaRrAZSA3weVVeK0u6vM8EYhvtOeV/m4Ep.CXMCFYhv4W

文件调整

  1. presto-start.sh,脚本内容如下:
#!/bin/bash

# 参考地址
# https://prestodb.io/docs/current/installation/deploy-docker.html
# https://prestodb.io/docs/current/security/password-file.html

# 访问presto命令行
# docker exec -it some-presto presto-cli

cd `dirname $0`
ROOT_DIR=`pwd`

docker run -d \
--privileged=true \
--name some-presto \
-p 8443:8443 \
-e TZ=Asia/Shanghai \
-v $ROOT_DIR/config.properties:/opt/presto-server/etc/config.properties \
-v $ROOT_DIR/jvm.config:/opt/presto-server/etc/jvm.config \
-v $ROOT_DIR/password-authenticator.properties:/opt/presto-server/etc/password-authenticator.properties \
-v $ROOT_DIR/password.db:/opt/presto-server/etc/password.db \
-v $ROOT_DIR/presto.jks:/opt/presto-server/etc/presto.jks \
-v $ROOT_DIR/catalog:/opt/presto-server/etc/catalog \
prestodb/presto:latest
  1. config.properties,如下:
coordinator=true
node-scheduler.include-coordinator=true
# 禁用HTTP,测试发现禁用会报错
# http-server.http.enabled=false
http-server.http.port=8080
discovery-server.enabled=true
discovery.uri=http://localhost:8080

# 开启SSL配置
http-server.https.enabled=true
http-server.https.port=8443
# jks文件
http-server.https.keystore.path=/opt/presto-server/etc/presto.jks
# jks文件密码
http-server.https.keystore.key=presto_ssl

# 开启密码认证
http-server.authentication.type=PASSWORD
  1. jvm.config,同上,保持不变。
  2. 新增password-authenticator.properties文件,指定认证类型为文件以及对应的密码库位置。
password-authenticator.name=file
file.password-file=/opt/presto-server/etc/password.db

配置好后的文件目录结构如下:

$ tree
.
├── catalog
├── config.properties
├── jvm.config
├── password-authenticator.properties
├── password.db
├── presto-start.sh
└── presto.jks

启动测试

  1. 启动容器
$ bash presto-start.sh 
e61286a7000cf1d03474ad21f50320d97a15b094aa388285e0faa75da38c837b
  1. 浏览器访问

image.png

image.png

  1. Dbeaver访问。JDBC地址配置为jdbc:presto://127.0.0.1:8443?SSL=true&SSLKeyStorePath=/tmp/presto.jks&SSLKeyStorePassword=presto_ssl,其中,presto.jks就是上文的jks文件,presto_ssl就是上文的密码。

image.png

但执行查询时,会报下面错误,上面也有提到,我们生成的jks文件中,限定的域名为*.ed.com

image.png

此时,需要配置一下host,并将JDBC中的地址换成域名形式。这里我们增加一个local.ed.com域名,JDBC地址调整为jdbc:presto://local.ed.com:8443?SSL=true&SSLKeyStorePath=/tmp/presto.jks&SSLKeyStorePassword=presto_ssl,再次测试可以执行成功。

$ cat /etc/hosts
127.0.0.1    localhost
127.0.0.1 local.ed.com

image.png

小彩蛋

配置了SSL后,在浏览器中用HTTPS访问不需要指定证书,但是通过JDBC访问时却需要指定证书,都是自签名证书,为何有这种差异?问了ChatGPT,得到下面回复:

image.png

Kerberos认证

Kerberos配置

  1. 客户端和服务端Keytab。 为了开启Kerberos认证,至少需要2个Keytab,一个是Presto服务端Keytab,一个是客户端Keytab。客户端可以使用任意一个Keyatb,该Keytab只用于客户端和服务端之间的互信认证,不用于指定执行身份。在官网System Access Control一节,有下面说明,讲的很清楚,认证和执行身份是分离的,执行身份需要用--user来指定,客户端只需要一个有效的Keytab即可。

Presto separates the concept of the principal who authenticates to the coordinator from the username that is responsible for running queries. When running the Presto CLI, for example, the Presto username can be specified using the --user option. By default, the Presto coordinator allows any principal to run queries as any Presto user. In a secure environment, this is probably not desirable behavior and likely requires customization.

对于服务端Keytab,需要特别注意,笔者在这里卡了很久。一个标准的Principal格式一般为username/hostname@realm,分为三个部分:

  • username:服务名,对应Presto的http.server.authentication.krb5.service-name
  • hostname:主机名,对应Presto的http.authentication.krb5.principal-hostname,因为Kerberos认证需要开启SSL,相应会有域名,这里务必保证和域名保持一致。
  • realm:所属域。 我们会使用local.ed.com域名来访问,Kerberos域为EXAMPLE.COM,服务名定义为presto,服务端需要的principal就要为presto/local.ed.com@EXAMPLE.COM
  1. krb5.conf文件,这里不多介绍,一个坑点见“Kerberos认证失败”章节。

文件调整

  1. config.properties,内容如下:
coordinator=true
node-scheduler.include-coordinator=true
# 禁用HTTP,测试发现禁用会报错
# http-server.http.enabled=false
http-server.http.port=8080
discovery-server.enabled=true
discovery.uri=http://localhost:8080

# 开启SSL配置
http-server.https.enabled=true
http-server.https.port=8443
# jks文件
http-server.https.keystore.path=/opt/presto-server/etc/presto.jks
# jks文件密码
http-server.https.keystore.key=presto_ssl

# 开启Kerberos认证
http-server.authentication.type=KERBEROS
# 指定服务名,对应Principal中的username
http.server.authentication.krb5.service-name=presto
# 配置Principal中的主机名,这里特别注意,官网给的配置项是http.server.authentication.krb5.service-hostname
# 配置上去会提示不存在,看了源码,com.facebook.airlift.http.server.KerberosConfig,下面才是正确的配置项
# 配置值也一定要和域名保持一致
http.authentication.krb5.principal-hostname=local.ed.com
# 服务端使用的Keytab
http.server.authentication.krb5.keytab=/opt/presto-server/etc/presto.keytab
http.authentication.krb5.config=/opt/presto-server/etc/krb5.conf
  1. jvm.config,内容如下,java.security.debug用于打开Kerberos调试日志,是否开启根据实际情况决定。
-server
-Xmx2G
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:+UseGCOverheadLimit
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:+ExitOnOutOfMemoryError
-Djdk.attach.allowAttachSelf=true
-Djava.security.debug=true
-Djava.security.krb5.conf=/opt/presto-server/etc/krb5.conf

3. presto-start.sh,内容如下:

#!/bin/bash

# 参考地址
# https://prestodb.io/docs/current/installation/deploy-docker.html
# https://prestodb.io/docs/current/security/password-file.html

# 访问presto命令行
# docker exec -it some-presto presto-cli

cd `dirname $0`
ROOT_DIR=`pwd`

docker run -d \
--privileged=true \
--name some-presto \
--hostname docker \
-p 8443:8443 \
-e TZ=Asia/Shanghai \
--net common-network \
-v $ROOT_DIR/config.properties:/opt/presto-server/etc/config.properties \
-v $ROOT_DIR/jvm.config:/opt/presto-server/etc/jvm.config \
-v $ROOT_DIR/presto.jks:/opt/presto-server/etc/presto.jks \
-v $ROOT_DIR/krb5.conf:/opt/presto-server/etc/krb5.conf \
-v $ROOT_DIR/presto.keytab:/opt/presto-server/etc/presto.keytab \
-v $ROOT_DIR/catalog:/opt/presto-server/etc/catalog \
prestodb/presto:latest

调整完后的目录结构如下:

$ tree
.
├── catalog
├── config.properties
├── jvm.config
├── krb5.conf
├── presto-start.sh
├── presto.jks
└── presto.keytab

启动测试

  1. 启动容器
$ bash presto-start.sh
7dfa60090c52c5977bdeffc1dcf0a3af37ff5764e39c476c01a2b4cde1c0d20f
  1. 浏览器访问,此时会报Kerberos认证失败。

image.png

  1. 通过Jar访问,这里就需要下面几个文件和配置
    • sun.security.krb5.debug:是否开启Kerberos调试日志。
    • presto-cli-*-executable.jar:可以从此处下载到。
    • --krb5-disable-remote-service-hostname-canonicalization:是否禁用规范主机名,带该配置表示禁止。如果不配置,会根据访问的域名找规范化主机名作为服务端Principal中的主机名来连接服务端,如果和服务端配置的不一致,会导致认证失败。
    • --krb5-config-path:krb5.conf配置,和服务端用同一个。
    • --krb5-principal:客户端Principal,需要和客户端Keytab匹配。
    • --krb5-keytab-path:客户端Keytab,这里使用hive.keytab。
    • --krb5-remote-service-name:服务端Princial中的服务名。
    • --keystore-path:SSL使用的jks文件。
    • --keystore-password:jks密码。
    • --debug:是否开启Debug模式。
java \
-Dsun.security.krb5.debug=true \
-jar /path/presto-cli-0.288-executable.jar \
--server https://local.ed.com:8443 \
--krb5-disable-remote-service-hostname-canonicalization \
--krb5-config-path /path/krb5.conf \
--krb5-principal hive/dev@EXAMPLE.COM \
--krb5-keytab-path /path/hive.keytab \
--krb5-remote-service-name presto \
--keystore-path /path/presto.jks \
--keystore-password presto_ssl

测试内容如下:

$ java \
-jar /path/presto-cli-0.288-executable.jar \
--server https://local.ed.com:8443 \
--krb5-disable-remote-service-hostname-canonicalization \
--krb5-config-path /path/krb5.conf \
--krb5-principal hive/dev@EXAMPLE.COM \
--krb5-keytab-path /path/hive.keytab \
--krb5-remote-service-name presto \
--keystore-path /path/presto.jks \
--keystore-password presto_ssl
presto>
presto> show catalogs;
 Catalog
---------
 system
(1 row)

Query 20240810_020517_00001_gwrsm, FINISHED, 1 node
Splits: 1 total, 1 done (100.00%)
[Latency: client-side: 186ms, server-side: 112ms] [0 rows, 0B] [0 rows/s, 0B/s]

presto> select * from system.jdbc.catalogs;
 table_cat
-----------
 system
(1 row)

Query 20240810_020548_00002_gwrsm, FINISHED, 1 node
Splits: 17 total, 17 done (100.00%)
[Latency: client-side: 389ms, server-side: 270ms] [1 rows, 6B] [3 rows/s, 22B/s]

小彩蛋

前面有提到--krb5-disable-remote-service-hostname-canonicalization这个配置,如果不配置,会根据访问的域名找规范化名,下面就是对应的处理逻辑。如果不加该配置,或者服务端Principal的主机名和域名不一致,请确保域名计算出来的fullHostName和服务端Principal的主机名一致。此外,makeServicePrincipal方法返回值也不是服务端Principal,其实是服务端Principal的username/hotsname部分。

private static String makeServicePrincipal(String serviceName, String hostName, boolean useCanonicalHostname)
{
    String serviceHostName = hostName;
    if (useCanonicalHostname) {
        serviceHostName = canonicalizeServiceHostName(hostName);
    }
    return format("%s@%s", serviceName, serviceHostName.toLowerCase(Locale.US));
}

private static String canonicalizeServiceHostName(String hostName)
{
    try {
        InetAddress address = InetAddress.getByName(hostName);
        String fullHostName;
        if ("localhost".equalsIgnoreCase(address.getHostName())) {
            fullHostName = InetAddress.getLocalHost().getCanonicalHostName();
        }
        else {
            fullHostName = address.getCanonicalHostName();
        }
        if (fullHostName.equalsIgnoreCase("localhost")) {
            throw new ClientException("Fully qualified name of localhost should not resolve to 'localhost'. System configuration error?");
        }
        return fullHostName;
    }
    catch (UnknownHostException e) {
        throw new ClientException("Failed to resolve host: " + hostName, e);
    }
}

Connector配置

介绍Presto部署,我们来看下如何配置Connector来外挂外部数据源。这里以MySQL和带Kerberos认证的Hive为例,其他数据源的配置可以参考官方文档。

MySQL Catalog

配置文件

为了配置MySQL的Catalog,需要在前面提到的Catalog文件夹中增加一个配置文件,文件名就是catalog名。这里以catalog/mysql_catalog.properties为例,connector.name不可以修改,其他按照实际调整。

connector.name=mysql
connection-url=jdbc:mysql://some-mysql:3306?useSSL=false&useUnicode=true&characterEncoding=utf8
connection-user=root
connection-password=toor

最终的文件结构如下:

$ tree
.
├── catalog
│   └── mysql_catalog.properties
├── config.properties
├── jvm.config
├── password-authenticator.properties
├── password.db
├── presto-start.sh
└── presto.jks

测试验证

使用presto-cli客户端连接,可以看到Catalog多了一个mysql_catalog,三元组查询也可以正常查出数据。

# presto-cli
presto> show catalogs;
    Catalog
---------------
 mysql_catalog
 system
(2 rows)

Query 20240808_090852_00000_tpag6, FINISHED, 1 node
Splits: 19 total, 19 done (100.00%)
[Latency: client-side: 0:01, server-side: 0:01] [0 rows, 0B] [0 rows/s, 0B/s]

presto> select * from mysql_catalog.sys.host_summary limit 1 \G
-[ RECORD 1 ]----------+-----------
host                   | 172.18.0.1
statement_latency      | 127.43 ms
statement_avg_latency  | 3.27 ms
file_io_latency        | 9.69 ms
unique_users           | 1
current_memory         | 0 bytes
total_memory_allocated | 0 bytes

Query 20240808_091053_00013_tpag6, FINISHED, 1 node
Splits: 18 total, 18 done (100.00%)
[Latency: client-side: 174ms, server-side: 146ms] [3 rows, 0B] [20 rows/s, 0B/s]

Hive Catalog

这里以带Kerberos认证的Hive接入为例。

配置文件

同MySQL配置类似,首先需要在catalog目录中添加一个Hive的配置文件,文件名同样作为其Catalog名,这里以catalog/hive_catalog.properties为例,除了connector.name不可以修改,其他按照实际配置。其中,hive.metastore.service.principal为Hive MetaStore服务端的Principal,hive.metastore.client.principalhive.metastore.client.keytab为客户端的Principal和Keytab。hive.hdfs.presto.principalhive.hdfs.presto.keytab分别为HDFS的Principal和Keytab。hive.config.resources需要指定客户端的core-site.xmlhdfs-site.xml

connector.name=hive-hadoop2

hive.metastore.authentication.type=KERBEROS
hive.metastore.uri=thrift://hostname:9083
hive.metastore.username=hive
hive.metastore.service.principal=hive/_HOST@DEMO.COM
hive.metastore.client.principal=hive/dev@DEMO.COM
hive.metastore.client.keytab=/opt/presto-server/etc/catalog/hive/hive.keytab

hive.hdfs.authentication.type=KERBEROS
hive.hdfs.impersonation.enabled=false
hive.hdfs.presto.principal=hdfs/dev@DEMO.COM
hive.hdfs.presto.keytab=/opt/presto-server/etc/catalog/hive/hdfs.keytab

hive.config.resources=/opt/presto-server/etc/catalog/hive/core-site.xml,/opt/presto-server/etc/catalog/hive/hdfs-site.xml

在catalog目录下创建hive目录,添加上述提到的hive.keytabhdfs.keytabcore-site.xmlhdfs-site.xml,此外,还需要添加krb5.conf,用于kerberos认证。,最终的文件结构如下:

$ tree
.
├── catalog
│   ├── hive
│   │   ├── core-site.xml
│   │   ├── hdfs-site.xml
│   │   ├── hdfs.keytab
│   │   ├── hive.keytab
│   │   └── krb5.conf
│   └── hive_catalog.properties
├── config.properties
├── jvm.config
├── password-authenticator.properties
├── password.db
├── presto-start.sh
└── presto.jks

因为要做Kerberos认证,要么将krb5.conf放在/etc目录下,作为默认配置,要么增加JVM启动参数-Djava.security.krb5.conf,实现任意位置的指定。这里以后者为例,需要调整jvm.config文件。其中,新增-Dsun.security.krb5.debug-Djava.security.krb5.conf,前者用于开启Kerberos调试日志,会在控制台输出,便于排查Kerberos认证相关的问题。

-server
-Xmx2G
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:+UseGCOverheadLimit
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:+ExitOnOutOfMemoryError
-Djdk.attach.allowAttachSelf=true
-Dsun.security.krb5.debug=true
-Djava.security.krb5.conf=/opt/presto-server/etc/catalog/hive/krb5.conf

测试验证

使用presto-cli客户端连接,可以看到Catalog多了一个hive_catalog

# presto-cli --debug
presto> show catalogs;
   Catalog
--------------
 hive_catalog
 system
(2 rows)

Query 20240809_023936_00002_kagg9, FINISHED, 1 node
http://localhost:8080/ui/query.html?20240809_023936_00002_kagg9
Splits: 19 total, 19 done (100.00%)
[Latency: client-side: 240ms, server-side: 208ms] [0 rows, 0B] [0 rows/s, 0B/s]

presto> select * from hive_catalog.intern_new.ads_student limit 1;
Query 20240809_023941_00003_kagg9 failed: Unable to create input format org.apache.hadoop.mapred.TextInputFormat
com.facebook.presto.spi.PrestoException: Unable to create input format org.apache.hadoop.mapred.TextInputFormat
    at com.facebook.presto.hive.HiveUtil.getInputFormat(HiveUtil.java:361)
    at com.facebook.presto.hive.StoragePartitionLoader.loadPartition(StoragePartitionLoader.java:281)
    at com.facebook.presto.hive.DelegatingPartitionLoader.loadPartition(DelegatingPartitionLoader.java:78)
    at com.facebook.presto.hive.BackgroundHiveSplitLoader.loadSplits(BackgroundHiveSplitLoader.java:192)
    at com.facebook.presto.hive.BackgroundHiveSplitLoader.access$300(BackgroundHiveSplitLoader.java:40)
    at com.facebook.presto.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:121)
    at com.facebook.presto.hive.util.ResumableTasks.safeProcessTask(ResumableTasks.java:48)
    at com.facebook.presto.hive.util.ResumableTasks.access$000(ResumableTasks.java:21)
    at com.facebook.presto.hive.util.ResumableTasks$1.run(ResumableTasks.java:36)
    at com.facebook.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:78)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.RuntimeException: Error in configuring object
    at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:112)
    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:78)
    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
    at com.facebook.presto.hive.HiveUtil.getInputFormat(HiveUtil.java:358)
    ... 12 more
Caused by: java.lang.reflect.InvocationTargetException
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:566)
    at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
    ... 15 more
Caused by: java.lang.IllegalArgumentException: Compression codec com.hadoop.compression.lzo.LzoCodec not found.
    at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:139)
    at org.apache.hadoop.io.compress.CompressionCodecFactory.<init>(CompressionCodecFactory.java:180)
    at org.apache.hadoop.mapred.TextInputFormat.configure(TextInputFormat.java:45)
    ... 20 more
Caused by: java.lang.ClassNotFoundException: Class com.hadoop.compression.lzo.LzoCodec not found
    at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101)
    at com.facebook.presto.hive.CopyOnFirstWriteConfiguration.getClassByName(CopyOnFirstWriteConfiguration.java:328)
    at com.facebook.presto.hive.WrapperJobConf.getClassByName(WrapperJobConf.java:53)
    at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:132)
    ... 22 more

执行查询,但会发现针对TextFile表会查询失败,在社区找到了类似的问题8840,需要将hadoop-lzo-xxx.jar包拷贝到Hive的插件目录/opt/presto-server/plugin/hive-hadoop2即可解决。

image.png

image.png

上述包拷入后,再次测试,现在可以正常查询。

# presto-cli
presto> select * from hive_catalog.intern_new.ads_student limit 1;
  id   | age | name
-------+-----+------
 10001 |  12 | gsc
(1 row)

Query 20240809_024825_00000_6hi4w, FINISHED, 1 node
Splits: 517 total, 53 done (10.25%)
[Latency: client-side: 0:08, server-side: 0:07] [36 rows, 468B] [4 rows/s, 62B/s]

其他问题

时区问题

容器中默认的时区不是东八区,可以在容器启动时指定时区-e TZ=Asia/Shanghai

日志相关

Presto服务日志直接输出到了控制台,可以通过docker logs some-presto来查看服务日志。客户端访问时,要看完整的报错日志的话,可以增加调试参数,即presto-cli --debug来查看客户端的错误日志。

Kerberos认证失败

JDK11,安全策略问题,弱加密类型的principal禁止使用,在Docker容器日志中会看到下面信息,该Keytab有3个加密类型,但都提示不支持,导致一致循环重试,最后失败。

image.png

针对上述问题,可以在krb5.conf文件中[libdefaults]模块增加allow_weak_crypto = true解决。

Hive TextFile无法读取问题

Presto查询TextFile表会报下面错误,提示TextFile无法读取,在社区找到解决方案,需要将hadoop-lzo-*.jar包拷贝到plugin/hive-hadoop2下。

参考资料