Hive 基础 03 HiveServer2/beeline 配置使用

312 阅读4分钟

持续创作,加速成长!这是我参与「掘金日新计划 · 6 月更文挑战」的第29天,点击查看活动详情

Hive 内置了 HiveServer 和 HiveServer2 服务,两者都允许客户端使用多种编程语言进行连接,但是 HiveServer 不能处理多个客户端的并发请求,因此产生了 HiveServer2。HiveServer2(HS2)允许远程客户端可以使用各种编程语言向 Hive 提交请求并检索结果,支持多客户端并发访问和身份验证。

HS2 是由多个服务组成的单个进程,其包括基于 Thrift 的 Hive 服务(TCP 或 HTTP)和用于 Web UI 的 Jetty Web 服务。

HiveServer2 拥有自己的 CLI 工具 Beeline。Beeline 是一个基于 SQLLine 的 JDBC 客户端。由于目前 HiveServer2 是 Hive 开发维护的重点(Hive 0.15 后就不再支持 HiveServer),所以官方更加推荐使用 Beeline 而不是 Hive CLI。【也就是说HiveServer服务的客户端工具叫CLI,HiveServer2服务的客户端叫 Beeline 直白看是蜜蜂线,谁知道却是直线】:)

1 修改Hadoop配置

修改 hadoop 集群的 core-site.xml 配置文件,增加如下配置,指定 hadoop 的 root 用户可以代理本机上所有的用户。Hadoop的安装可参考《Linux环境 Hadoop3.1.3 单机版安装部署》

<property>
	<name>hadoop.proxyuser.root.hosts</name>
	<value>*</value>
</property>
<property>
	<name>hadoop.proxyuser.root.groups</name>
	<value>*</value>
</property>

之所以要配置这一步,是因为 hadoop 2.0 以后引入了安全伪装机制,使得 hadoop 不允许上层系统(如 hive)直接将实际用户传递到 hadoop 层,而应该将实际用户传递给一个超级代理,由该代理在hadoop 上执行操作,以避免任意客户端随意操作 hadoop。如果不配置这一步,在之后的连接中可能会抛出 AuthorizationException 异常。

关于 Hadoop 的用户代理机制,可以参考: hadoop 的用户代理机制 或 Superusers Acting OnBehalf Of Other Users

2 启动hiveserver2服务

由于上面已经配置过环境变量,这里直接启动即可:

[root@tcloud ~]# nohup hiveserver2 &

非后台启动会显示:

[root@tcloud ~]# hiveserver2
which: no hbase in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/local/java/bin:/usr/local/mysql/bin:/usr/local/zookeeper/bin:/usr/local/zookeeper/sbin:/usr/local/hadoop-3.1.3/bin:/usr/local/hadoop-3.1.3/sbin:/usr/local/spark/bin:/usr/local/hive/bin:/root/bin)
2021-08-03 15:03:17: Starting HiveServer2
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hadoop-3.1.3/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Hive Session ID = 2353799a-5285-44fd-a212-9c882d547099
Hive Session ID = 9349415b-0e9a-469f-9044-fb32914675d6
Hive Session ID = 2d98aa22-0ac6-474d-ac87-e7d4db10d309
Hive Session ID = a1047f0b-d61d-4e3c-8a05-d764dcc86957

3 使用beeline

Beeline 拥有更多可使用参数,在${HIVE_HOME}/bin下可以使用 beeline --help 查看,完整参数如下:

[root@tcloud bin]# ./beeline --help
# Hive的日志信息不再贴出
   -u <database url>               the JDBC URL to connect to
   -c <named url>                  the named JDBC URL to connect to,
                                   which should be present in beeline-site.xml
                                   as the value of beeline.hs2.jdbc.url.<namedUrl>
   -r                              reconnect to last saved connect url (in conjunction with !save)
   -n <username>                   the username to connect as
   -p <password>                   the password to connect as
   -d <driver class>               the driver class to use
   -i <init file>                  script file for initialization
   -e <query>                      query that should be executed
   -f <exec file>                  script file that should be executed
   -w (or) --password-file <password file>  the password file to read password from
   --hiveconf property=value       Use value for given property
   --hivevar name=value            hive variable name and value
                                   This is Hive specific settings in which variables
                                   can be set at session level and referenced in Hive
                                   commands or queries.
   --property-file=<property-file> the file to read connection properties (url, driver, user, password) from
   --color=[true/false]            control whether color is used for display
   --showHeader=[true/false]       show column names in query results
   --escapeCRLF=[true/false]       show carriage return and line feeds in query results as escaped \r and \n
   --headerInterval=ROWS;          the interval between which heades are displayed
   --fastConnect=[true/false]      skip building table/column list for tab-completion
   --autoCommit=[true/false]       enable/disable automatic transaction commit
   --verbose=[true/false]          show verbose error messages and debug info
   --showWarnings=[true/false]     display connection warnings
   --showDbInPrompt=[true/false]   display the current database name in the prompt
   --showNestedErrs=[true/false]   display nested errors
   --numberFormat=[pattern]        format numbers using DecimalFormat pattern
   --force=[true/false]            continue running script even after errors
   --maxWidth=MAXWIDTH             the maximum width of the terminal
   --maxColumnWidth=MAXCOLWIDTH    the maximum width to use when displaying columns
   --silent=[true/false]           be more silent
   --autosave=[true/false]         automatically save preferences
   --outputformat=[table/vertical/csv2/tsv2/dsv/csv/tsv]  format mode for result display
                                   Note that csv, and tsv are deprecated - use csv2, tsv2 instead
   --incremental=[true/false]      Defaults to false. When set to false, the entire result set
                                   is fetched and buffered before being displayed, yielding optimal
                                   display column sizing. When set to true, result rows are displayed
                                   immediately as they are fetched, yielding lower latency and
                                   memory usage at the price of extra display column padding.
                                   Setting --incremental=true is recommended if you encounter an OutOfMemory
                                   on the client side (due to the fetched result set size being large).
                                   Only applicable if --outputformat=table.
   --incrementalBufferRows=NUMROWS the number of rows to buffer when printing rows on stdout,
                                   defaults to 1000; only applicable if --incremental=true
                                   and --outputformat=table
   --truncateTable=[true/false]    truncate table column when it exceeds length
   --delimiterForDSV=DELIMITER     specify the delimiter for delimiter-separated values output format (default: |)
   --isolation=LEVEL               set the transaction isolation level
   --nullemptystring=[true/false]  set to true to get historic behavior of printing null as empty string
   --maxHistoryRows=MAXHISTORYROWS The maximum number of rows to store beeline history.
   --delimiter=DELIMITER           set the query delimiter; multi-char delimiters are allowed, but quotation
                                   marks, slashes, and -- are not allowed; defaults to ;
   --convertBinaryArrayToString=[true/false]    display binary column data as string or as byte array
   --help                          display this message

   Example:
    1. Connect using simple authentication to HiveServer2 on localhost:10000
    $ beeline -u jdbc:hive2://localhost:10000 username password

    2. Connect using simple authentication to HiveServer2 on hs.local:10000 using -n for username and -p for password
    $ beeline -n username -p password -u jdbc:hive2://hs2.local:10012

    3. Connect using Kerberos authentication with hive/localhost@mydomain.com as HiveServer2 principal
    $ beeline -u "jdbc:hive2://hs2.local:10013/default;principal=hive/localhost@mydomain.com"

    4. Connect using SSL connection to HiveServer2 on localhost at 10000
    $ beeline "jdbc:hive2://localhost:10000/default;ssl=true;sslTrustStore=/usr/local/truststore;trustStorePassword=mytruststorepassword"

    5. Connect using LDAP authentication
    $ beeline -u jdbc:hive2://hs2.local:10013/default <ldap-username> <ldap-password>

在 Hive CLI 中支持的参数,Beeline 都支持,常用的参数如下,参数使用可参考《Hive CLI命令行工具使用举例》请添加图片描述 可以在 ${HIVE_HOME}/bin 下使用以下命令进入 beeline 交互式命令行,出现 Connected 则代表连接成功。

[root@tcloud bin]# ./beeline -u jdbc:hive2://tcloud:10000 -n root
# 启动信息
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.10.0.ja                                                                                                                        r!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hadoop-3.1.3/share/hadoop/common/li                                                                                                                        b/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Connecting to jdbc:hive2://tcloud:10000
Connected to: Apache Hive (version 3.1.2)
Driver: Hive JDBC (version 3.1.2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 3.1.2 by Apache Hive

# 使用界面信息
0: jdbc:hive2://tcloud:10000> show databases;
INFO  : Compiling command(queryId=root_20210803151804_247eece6-9924-4d66-867c-a3238619a31d): show databases
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:database_name, type:string, comment:from deserializer)], properties:null)
INFO  : Completed compiling command(queryId=root_20210803151804_247eece6-9924-4d66-867c-a3238619a31d); Time taken: 0.228 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=root_20210803151804_247eece6-9924-4d66-867c-a3238619a31d): show databases
INFO  : Starting task [Stage-0:DDL] in serial mode
INFO  : Completed executing command(queryId=root_20210803151804_247eece6-9924-4d66-867c-a3238619a31d); Time taken: 0.186 seconds
INFO  : OK
INFO  : Concurrency mode is disabled, not creating a lock manager
+----------------+
| database_name  |
+----------------+
| default        |
+----------------+
1 row selected (1.14 seconds)