flink on k8s 访问hive hadoop本文适用于flink1.10-1.12版本。1.flink on k

本文适用于flink1.10-1.12版本。1.13及以后的版本好像有一些改动，hadoop配置文件目录好像有一定改动，需要自己确认一下

flink on k8s 的APP 模式下

flink on k8s 的APP 模式下，读取hadoop需要先添加依赖jar包，我是添加flink-shaded-hadoop-2-uber-2.8.3-10.0.jar 这个jar包到flink的lib目录下，如果你使用的是其他版本的hadoop，请添加对应的依赖包。

然后需要把hdfs的配置文件打镜像的时候打入镜像里面，其实主要就是core-site.xml和hdfs-site.xml两个配置文件，hdfs在镜像容器里面的默认目录为/etc/hadoop/conf目录，只需要把配置文件放进这个文件夹，flink就能自己读取。

COPY --chown=flink:flink $hadoop_conf/ /etc/hadoop/conf

然后读取hive还需要在SQL里面配置hive-site.xml的目录，同理，也需要把hive-site.xml打进镜像里面，这个目录可以是flink能读取到的任意目录，然后配置对应目录就能获取。

之所以使用catalog.dbName.tableName，是比较方便，flink官网的例子是use catalog catalogName，这种方式再去读取tableName，比较麻烦，使用完之后还得在切换为flink的默认catalogName，不然其他数据源没法使用，所以直接使用catalogName这种方式是最方便的。

SQL的例子如下：

CREATE CATALOG myhive WITH (``'type' = ``'hive'``, ``'default-database' = ``'test'``, ``'hive-conf-dir' = ``'容器里面配置文件地址'```); create table titanic_source ( passengeridINT,survivedINT,pclassINT,nameSTRING,sexSTRING,ageDOUBLE,sibspINT,parchINT,ticketSTRING,fareDOUBLE,cabinSTRING,embarked STRING ) WITH( ```'connector' =``'jdbc'``, ``'url' =``'jdbc:mysql://ip:port/db_name'``, ``'username' =``'root'``, ``'password' =``'xxx'``, ``'table-name' =``'tablename'``); insert into myhive.test.titanic_sink select * from titanic_source;

flink on k8s 的SESSION 模式下

flink on k8s 的SESSION 模式下，访问hadoop只需要机器有hadoop的客户端，且客户端可用就行，要访问hive的话，需要hive-site.xml配置文件。

因为SESSION模式下，是在本地构建JobGraph，然后再集群执行，构建JobGraph的时候已经把配置信息读取完毕，只需要集群上游hadoop的jar包就行，配置文件可以省略，当然，为了APP模式也好使，有配置文件也不影响。

SQL同上，区别在于，SESSION模式下读取的是web机器上的配置文件，相当于'hive-conf-dir' = '本地配置文件目录'，这里指定的是hive的本地配置文件的目录，而且本地hadoop环境必须可用

CREATE CATALOG myhive WITH (``'type' = ``'hive'``, ``'default-database' = ``'test'``, 'hive-conf-dir' = '本地hive-site.xml文件地址'

); create table titanic_source ( `passengerid` INT, `survived` INT, `pclass` INT, `name` STRING, `sex` STRING, `age` DOUBLE, `sibsp` INT, `parch` INT, `ticket` STRING, `fare` DOUBLE, `cabin` STRING, `embarked` STRING )  WITH(         ```'connector'` `=

'jdbc', 'url' ='jdbc:mysql://ip:port/db_name', 'username'` `='root', 'password' ='xxx', 'table-name'` `='tablename'``); insert into myhive.test.titanic_sink select * from titanic_source;`

配置文件内容

hive-site.xml的配置文件内容

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
        <property>
                <name>javax.jdo.option.ConnectionURL</name>
                <value>jdbc:mysql://ip:port/dbName</value>
        </property>

        <property>
                <name>javax.jdo.option.ConnectionDriverName</name>
                <value>com.mysql.cj.jdbc.Driver</value>
        </property>
        <property>
                <name>javax.jdo.option.ConnectionUserName</name>
                <value>root</value>
        </property>
        <property>
                <name>javax.jdo.option.ConnectionPassword</name>
                <value>password</value>
        </property>
	<property>
                <name>hive.cli.print.current.db</name>
                <value>true</value>
        </property>
        <property>
                <name>hive.cli.print.header</name>
                <value>true</value>
        </property>
        <property>
                <name>hive.server2.thrift.bind.host</name>
                <value>thrift地址，ip或者域名</value>
        </property>
	<property>
  		<name>hive.server2.thrift.port</name>
 	 	<value>port</value>                               //（HiveServer2远程连接的端口，默认为10000）
  		<description>Port number of HiveServer2 Thrift interface.
  Can be overridden by setting $HIVE_SERVER2_THRIFT_PORT</description>
	</property>
	<property>
    		<name>hive.metastore.warehouse.dir</name>
    		<value>hdfs目录</value>
    		<description>location of default database for the warehouse</description>
  	</property>
	<property>
   		<name>datanucleus.autoStartMechanism</name>
   		<value>SchemaTable</value>
	</property>
	<property>
   		<name>hive.metastore.schema.verification</name>
   		<value>false</value>
	</property>

        <property>
                <name>hive.metastore.uris</name>
                <value>thrift://ip:port</value>
        </property>

</configuration>

hadoop的配置

core-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->

<configuration>
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://ip:port</value>
  </property>
 <!--修改用于hadoop存储数据的默认位置-->
  <property>
    <name>hadoop.tmp.dir</name>
    <value>/home/hadoop/hadoop-2.10.1/tmp</value>
  </property>
<property>
  <name>hadoop.http.filter.initializers</name>
  <value>org.apache.hadoop.security.AuthenticationFilterInitializer</value>
</property>
<property>
  <name>hadoop.http.authentication.type</name>
  <value>simple</value>
</property>
<property>
  <name>hadoop.http.authentication.token.validity</name>
  <value>3600</value>
</property>
<property>
  <name>hadoop.http.authentication.signature.secret.file</name>
  <value>/home/package/hadoop-2.10.1/etc/hadoop/secret</value>
</property>
<property>
  <name>hadoop.http.authentication.cookie.domain</name>
  <value></value>
</property>
<property>
  <name>hadoop.http.authentication.simple.anonymous.allowed</name>
  <value>false</value>
</property>
</configuration>

hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
  <property>
    <name>dfs.replication</name>
    <value>1</value>
  </property>
  <property>
    <name>dfs.namenode.name.dir</name>
    <value>/home/hadoop/hadoop-2.10.1/dfs/name</value>
  </property>
  <property>
    <name>dfs.datanode.data.dir</name>
    <value>/home/hadoop/hadoop-2.10.1/dfs/data</value>
  </property>
  <property>
    <name>dfs.replication</name>
    <value>3</value>
    <description>HDFS 的数据块的副本存储个数, 默认是3</description>
  </property>
  <property>
      <name>dfs.permissions</name>
      <value>false</value>
      <description>need not permissions</description>
  </property>
  <property>
    <name>dfs.namenode.http-address</name>
    <value>ip:port</value>
  </property>
</configuration>