Spark has no access to table mytable. Clients can access this table only if they have the following capabilities: CONNECTORREAD . . .
最近公司打算升级Cloudera版本,由CDH升级到CDP,在做影响分析的时候发现spark查询hive managed表会报如下错误
Spark has no access to table `mytable`. Clients can access this table only if they have the following capabilities: CONNECTORREAD . . .
This table may be a Hive-managed ACID table, or require some other capability that Spark currently does not implement;
官方给的建议是用HWC来访问HIVE mamaged table,但这种方式有两个弊端
- 需要大量修改代码,其中包括用户的一些自定义组件等,
- 只支持写ORC格式,其它格式不支持,我们的用户数据有PARQUET/AVRO等
于是寻求不用改代码的方式实现兼容:
spark-shell --jars /opt/cloudera/parcels/CDP/jars/hive-warehouse-connector-assembly-1.0.0.7.1.8.5-1.jar --conf "spark.sql.extension=com.qubole.spark.hiveacid.HiveAcidAutoConvertExtension" --conf spark.kryo.registrator="com.qubole.spark..hiveacid.util.HiveAcidKyroRegistrtor"
这种方式Hive managed表和外部表都可正常访问,且无需改代码,只需改动spark-submit启动参数。