Apache Griffin调试过程bug记录

1,073 阅读5分钟

1、数据库报错 Table 'quartz.DATACONNECTOR' doesn't exist

2021-01-18 14:54:54.135 ERROR 122541 --- [http-nio-8081-exec-8] o.a.c.c.C.[.[.[.[dispatcherServlet]     [175] : Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception [Request processing failed; nested exception is org.springframework.transaction.TransactionSystemException: Could not commit JPA transaction; nested exception is javax.persistence.RollbackException: Exception [EclipseLink-4002] (Eclipse Persistence Services - 2.6.0.v20150309-bf26070): org.eclipse.persistence.exceptions.DatabaseException
Internal Exception: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Table 'quartz.DATACONNECTOR' doesn't exist
Error Code: 1146
Call: INSERT INTO DATACONNECTOR (ID, CONFIG, CREATEDDATE, DATAFRAMENAME, DATATIMEZONE, DATAUNIT, MODIFIEDDATE, NAME, TYPE, VERSION) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
        bind => [7, {"database":"griffin_demo","table.name":"demo_tgt","where":"dt=#YYYYMMdd# AND hour=#HH#"}, 1610952894112, null, GMT+8, 1hour, null, target1610952607162, HIVE, 1.2]
Query: InsertObjectQuery(DataConnector{name=target1610952607162type=HIVE, version='1.2', config={"database":"griffin_demo","table.name":"demo_tgt","where":"dt=#YYYYMMdd# AND hour=#HH#"}})] with root cause

com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Table 'quartz.DATACONNECTOR' doesn't exist

这个问题要具体看具体分析,不过大概率是因为要插入的字段太长,超出数据库限制长度造成JPA创建表失败,从而造成后面无法插入表不存在,可以尝试修改源码 DataConnector 类:

@JsonIgnore
    @Transient
    private String defaultDataUnit = "365000d";
    // 添加 columnDefinition = "TEXT"
    @JsonIgnore
    @Column(length = 20480,columnDefinition = "TEXT")
    private String config;

    @Transient
    private Map<String, Object> configMap;

其中,DataConnector的路径在./service/src/main/java/org/apache/griffin/core/measure/entity/DataConnector.java

2.spring-boot-01-helloworld-1.0-SNAPSHOT.jar中没有主清单属性

在用maven编译安装griffin源码的时候,可能会出现jar包没有主清单属性,可以在pom中添加一个SpringBoot的构建的插件,然后重新运行 mvn install即可。

<build>
  <plugins>
  	<plugin>
  		<groupId>org.springframework.boot</groupId>
 		<artifactId>spring-boot-maven-plugin</artifactId>
 		<executions>
          <execution>
            <goals>
              <goal>repackage</goal>
            </goals>
          </execution>
        </executions>
  	</plugin>
  </plugins>
 </build>

其实在griffin-0.7.0版本中有这个插件,只需要在 goal 那里改成repackage即可,记得要clean然后再install重新编译。

3.Livy报错Could not find Livy jars directory

这个大概率是因为包下错了,不实在官网下载incubating-livy那个包,里面没有jar包,要下载安装livy-server-0.x.x.jar

4.MySQL 报错 mysql Failed to open file 'xxx.sql', error

mysql里面那些路径的东西在Linux下是以相对路径的形式来查询的,比如我们在/usr/local/tomcat里面打开的MySQL,那么里面的所有路径都是在/usr/local/tomcat目录下进行相对路径查询的,比如我们之前写的source /sqlfile/xxx.sql;,那在MySQL看来,我们给的指令就是让它找/usr/local/tomcat/sqlfile/xxx.sql文件。所以当我们想用mysql打开某个sql文件,要提前cd到相关的绝对路径,然后打开MySQL,输入指令source xxx.sql;

5.Spark启动报错java.net.ConnectException:Call From xxx to xxx:8020 to failed on connection exception:拒绝连接

1.确保防火墙已经关闭了,否则你配置的很多端口都用不了

systemctl stop firewalld
systemctl status firewalld
systemctl disable firewalld

2.确保hosts文件只有集群服务器ip地址与主机名映射,否则每次你都要写全称ip,不能写主机名

vim /etc/hosts
192.168.239.131 Hadoop101
192.168.239.132 Hadoop102
192.168.239.133 Hadoop103

3.看一下hadoop目录下的$HADOOP_HOME/etc/hadoop/core-site.xml或者hdfs-site.xml配置文件,要明确主机hdfs的namenode节点名称和端口号,尤其是端口号,是否和spark/conf/spark-defaults.conf中spark.eventlog.dir配置的端口号,以及spark/conf/spark-env.sh中配置的-Dspark.history.fs.logDirectory端口号相同,比如8020和9000那肯定对不上。

-Dspark.history.fs.logDirectory=hdfs://hadoop101:9000/spark_directory"
spark.eventLog.dir               hdfs://hadoop101:9000/spark_directory
<!-- 指定 HDFS 中 NameNode 的地址 -->
  <property>
    <name>fs.defaultFS</name>
    <!-- 其中,hdfs 为协议名称,hadoop101 为 NameNode 的节点服务器主机名称,9000 为>端口-->
    <value>hdfs://hadoop101:9000</value>
  </property>

6. | xargs kill 似乎总是kill不掉进程

在搭建大数据集群的时候,经常需要编写shell脚本全部开启或关闭某些进程,在我写flume的脚本时,总是关不掉flume消费进程,可能与它后面的kafka提前关闭有关,无论怎样,可以批量化| xargs kill后面加一个-9来强行杀死进程

"stop"){
    for i in hadoop103
    do
        echo " --------停止 $i 消费 flume-------"
        ssh $i "ps -ef | grep kafka-flume-hdfs | grep -v grep |awk '{print \$2}' | xargs kill -9"
        done
};;

简单解释一下字句含义,ps -ef查全部进程,grep是按照提供的字符条件筛选,加了一个-v就表示相反,awk {print $2}是打印第二个位置的字符串,其实也就是打印出进程ID, xargs kill是一种批量杀死进程的命令,和kill -9类似,后者是杀死某个进程。

7.elasticsearch5.2启动报错

Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00000000c5330000, 986513408, 0) failed; error='Cannot allocate memory' (errno=12)
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 986513408 bytes for committing reserved memory.
# An error report file with more information is saved as:
# /usr/local/elasticsearch/hs_err_pid1675.log

由于elasticsearch5.2默认分配jvm空间大小为1g,我的虚拟机内存不够大,修改jvm空间分配

# vim config/jvm.options  
-Xms1g  
-Xmx1g  

修改成

-Xms512m  
-Xmx512m  

spark同样也有配置jvm内存的选项,毕竟是基于内存的计算引擎,运行的时候肯定要吃内存的。试验条件下可以把spark的运行内存也改小点(修改spark/conf/spark-default.conf里面的spark.driver.memory,比如512m),否则也会报错。但是最好还是给虚拟机集群大一点的内存,尤其是运行hadoop的namanode节点的主机,因为要存储大量的元数据,如果还要做计算任务的话,可以给大点内存。

8. scala版本错误

由于griffin底层代码是scala写的,而spark底层也是基于scala,所以当你运行spark -submit执行griffin数据质量检测的时候,因为他会自动调用spack自带的scala,可能会出现与griffin的pom文件里指定的scala版本不一致,这一点经常会被忽略而出错,在griffin官网也有提到这一点,由于目前最新的griffin-0.7.0中父级pom指定得2.11版本的scala,所以建议安装spark2.3或2.4,就不要上spark3.0了,后者自带scala 2.12,会运行出错。 在这里插入图片描述 pom.xml中的Scala.binary.verson项

<properties>
        <encoding>UTF-8</encoding>
        <project.build.sourceEncoding>${encoding}</project.build.sourceEncoding>
        <project.reporting.outputEncoding>${encoding}</project.reporting.outputEncoding>

        <java.version>1.8</java.version>
        <scala.binary.version>2.11</scala.binary.version>
        <scala211.binary.version>2.11</scala211.binary.version>
        <scala.version>${scala.binary.version}.0</scala.version>

9.java.lang.AssertionError: assertion failed: Connector is undefined or invalid

21/06/28 18:54:28 ERROR measure.Application$: assertion failed: Connector is undefined or invalid
java.lang.AssertionError: assertion failed: Connector is undefined or invalid
        at scala.Predef$.assert(Predef.scala:170)
        at org.apache.griffin.measure.configuration.dqdefinition.DataSourceParam.validate(DQConfig.scala:100)
        at org.apache.griffin.measure.configuration.dqdefinition.DQConfig$$anonfun$validate$5.apply(DQConfig.scala:74)
        at org.apache.griffin.measure.configuration.dqdefinition.DQConfig$$anonfun$validate$5.apply(DQConfig.scala:74)
        at scala.collection.immutable.List.foreach(List.scala:392)
        at org.apache.griffin.measure.configuration.dqdefinition.DQConfig.validate(DQConfig.scala:74)
        at org.apache.griffin.measure.configuration.dqdefinition.reader.ParamReader$class.validate(ParamReader.scala:43)
        at org.apache.griffin.measure.configuration.dqdefinition.reader.ParamFileReader.validate(ParamFileReader.scala:33)
        at org.apache.griffin.measure.configuration.dqdefinition.reader.ParamFileReader$$anonfun$readConfig$1.apply(ParamFileReader.scala:40)
        at org.apache.griffin.measure.configuration.dqdefinition.reader.ParamFileReader$$anonfun$readConfig$1.apply(ParamFileReader.scala:36)
        at scala.util.Try$.apply(Try.scala:192)
        at org.apache.griffin.measure.configuration.dqdefinition.reader.ParamFileReader.readConfig(ParamFileReader.scala:36)
        at org.apache.griffin.measure.Application$.readParamFile(Application.scala:127)
        at org.apache.griffin.measure.Application$.main(Application.scala:61)
        at org.apache.griffin.measure.Application.main(Application.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
21/06/28 18:54:28 INFO util.ShutdownHookManager: Shutdown hook called
21/06/28 18:54:28 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-f62d0e16-1189-49ce-9c4c-565ff330dfb8

官网当中的dq.json里面关于data.source的connector这个字段写的是"connectors",加了s,后面自然是加了"[ ]"符号,代表多个connect,但是有时候他会报错,意思是Object[].class要转化成Object.class,很明显是只接受一个connector嘛,配置这块的时候还是要注意一下,如果报错,需要去掉s,写成“connector”,并且去掉“[]”,即只配置一个connetor即可。

 "process.type": "batch",
        "data.sources": [
                {
                        "name": "src",
                        "baseline": true,
                        "connector":
                                {
                                        "type": "hive",
                                        "version": "3.1",
                                        "config": {
                                                "database": "default",
                                                "table.name": "demo_src"
                                        }
                                }
                },

10. Caused by: java.lang.ClassNotFoundException: Class com.hadoop.compression.lzo.LzoCodec not found

报错信息:

Caused by: java.lang.RuntimeException: Error in configuring object
        at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:112)
        at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:78)
        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
        at org.apache.spark.rdd.HadoopRDD.getInputFormat(HadoopRDD.scala:190)
        at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:204)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
        at scala.Option.getOrElse(Option.scala:121)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
        at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
        at scala.Option.getOrElse(Option.scala:121)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
        at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
        at scala.Option.getOrElse(Option.scala:121)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
        at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:84)
        at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:84)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
        at scala.collection.immutable.List.foreach(List.scala:392)
        at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
        at scala.collection.immutable.List.map(List.scala:296)
        at org.apache.spark.rdd.UnionRDD.getPartitions(UnionRDD.scala:84)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
        at scala.Option.getOrElse(Option.scala:121)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
        at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
        at scala.Option.getOrElse(Option.scala:121)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
        at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
        at scala.Option.getOrElse(Option.scala:121)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
        at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
        at scala.Option.getOrElse(Option.scala:121)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
        at org.apache.spark.ShuffleDependency.<init>(Dependency.scala:94)
        at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$.prepareShuffleDependency(ShuffleExchangeExec.scala:321)
        at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.prepareShuffleDependency(ShuffleExchangeExec.scala:91)
        at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$doExecute$1.apply(ShuffleExchangeExec.scala:128)
        at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$doExecute$1.apply(ShuffleExchangeExec.scala:119)
        at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:52)
        ... 88 more
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
        ... 139 more
Caused by: java.lang.IllegalArgumentException: Compression codec com.hadoop.compression.lzo.LzoCodec not found.
        at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:139)
        at org.apache.hadoop.io.compress.CompressionCodecFactory.<init>(CompressionCodecFactory.java:180)
        at org.apache.hadoop.mapred.TextInputFormat.configure(TextInputFormat.java:45)
        ... 144 more
Caused by: java.lang.ClassNotFoundException: Class com.hadoop.compression.lzo.LzoCodec not found
        at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101)
        at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:132)
        ... 146 more

最简单的办法,就将hadoop-lzo-0.4.20.jar的jar包放到spark的jars中。

11. ERROR curator.ConnectionState: Connection timed out for connection string (zk:2181) and timeout (15000) / elapsed (47080)

21/06/28 19:00:24 WARN curator.ConnectionState: Connection attempt unsuccessful after 67036 (greater than max timeout of 60000). Resetting connection and trying again with a new connection.
21/06/28 19:00:24 ERROR offset.OffsetCheckpointInZK: delete /lock error: zk: 未知的名称或服务
21/06/28 19:00:39 ERROR curator.ConnectionState: Connection timed out for connection string (zk:2181) and timeout (15000) / elapsed (15059)
org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss
        at org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:197)
        at org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:87)
        at org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:115)
        at org.apache.curator.utils.EnsurePath$InitialHelper$1.call(EnsurePath.java:148)
        at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
        at org.apache.curator.utils.EnsurePath$InitialHelper.ensure(EnsurePath.java:141)
        at org.apache.curator.utils.EnsurePath.ensure(EnsurePath.java:99)
        at org.apache.curator.framework.imps.NamespaceImpl.fixForNamespace(NamespaceImpl.java:74)
        at org.apache.curator.framework.imps.CuratorFrameworkImpl.fixForNamespace(CuratorFrameworkImpl.java:574)
        at org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:194)
        at org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:41)
        at org.apache.griffin.measure.context.streaming.checkpoint.offset.OffsetCheckpointInZK.org$apache$griffin$measure$context$streaming$checkpoint$offset$OffsetCheckpointInZK$$delete(OffsetCheckpointInZK.scala:204)
        at org.apache.griffin.measure.context.streaming.checkpoint.offset.OffsetCheckpointInZK$$anonfun$delete$1.apply(OffsetCheckpointInZK.scala:124)
        at org.apache.griffin.measure.context.streaming.checkpoint.offset.OffsetCheckpointInZK$$anonfun$delete$1.apply(OffsetCheckpointInZK.scala:123)
        at scala.collection.immutable.List.foreach(List.scala:392)
        at org.apache.griffin.measure.context.streaming.checkpoint.offset.OffsetCheckpointInZK.delete(OffsetCheckpointInZK.scala:123)
        at org.apache.griffin.measure.context.streaming.checkpoint.offset.OffsetCheckpointInZK.clear(OffsetCheckpointInZK.scala:130)
        at org.apache.griffin.measure.context.streaming.checkpoint.offset.OffsetCheckpointInZK.init(OffsetCheckpointInZK.scala:90)
        at org.apache.griffin.measure.context.streaming.checkpoint.offset.OffsetCheckpointClient$$anonfun$init$1.apply(OffsetCheckpointClient.scala:34)
        at org.apache.griffin.measure.context.streaming.checkpoint.offset.OffsetCheckpointClient$$anonfun$init$1.apply(OffsetCheckpointClient.scala:34)
        at scala.collection.immutable.List.foreach(List.scala:392)
        at org.apache.griffin.measure.context.streaming.checkpoint.offset.OffsetCheckpointClient$.init(OffsetCheckpointClient.scala:34)
        at org.apache.griffin.measure.launch.streaming.StreamingDQApp$$anonfun$init$1.apply$mcV$sp(StreamingDQApp.scala:70)
        at org.apache.griffin.measure.launch.streaming.StreamingDQApp$$anonfun$init$1.apply(StreamingDQApp.scala:55)
        at org.apache.griffin.measure.launch.streaming.StreamingDQApp$$anonfun$init$1.apply(StreamingDQApp.scala:55)
        at scala.util.Try$.apply(Try.scala:192)
        at org.apache.griffin.measure.launch.streaming.StreamingDQApp.init(StreamingDQApp.scala:55)
        at org.apache.griffin.measure.Application$.main(Application.scala:82)
        at org.apache.griffin.measure.Application.main(Application.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
21/06/28 19:00:55 ERROR curator.ConnectionState: Connection timed out for connection string (zk:2181) and timeout (15000) / elapsed (31072)
org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss
        at org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:197)
        at org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:87)
        at org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:115)
        at org.apache.curator.utils.EnsurePath$InitialHelper$1.call(EnsurePath.java:148)
        at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
        at org.apache.curator.utils.EnsurePath$InitialHelper.ensure(EnsurePath.java:141)
        at org.apache.curator.utils.EnsurePath.ensure(EnsurePath.java:99)
        at org.apache.curator.framework.imps.NamespaceImpl.fixForNamespace(NamespaceImpl.java:74)
        at org.apache.curator.framework.imps.CuratorFrameworkImpl.fixForNamespace(CuratorFrameworkImpl.java:574)
        at org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:194)
        at org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:41)
        at org.apache.griffin.measure.context.streaming.checkpoint.offset.OffsetCheckpointInZK.org$apache$griffin$measure$context$streaming$checkpoint$offset$OffsetCheckpointInZK$$delete(OffsetCheckpointInZK.scala:204)
        at org.apache.griffin.measure.context.streaming.checkpoint.offset.OffsetCheckpointInZK$$anonfun$delete$1.apply(OffsetCheckpointInZK.scala:124)
        at org.apache.griffin.measure.context.streaming.checkpoint.offset.OffsetCheckpointInZK$$anonfun$delete$1.apply(OffsetCheckpointInZK.scala:123)
        at scala.collection.immutable.List.foreach(List.scala:392)
        at org.apache.griffin.measure.context.streaming.checkpoint.offset.OffsetCheckpointInZK.delete(OffsetCheckpointInZK.scala:123)
        at org.apache.griffin.measure.context.streaming.checkpoint.offset.OffsetCheckpointInZK.clear(OffsetCheckpointInZK.scala:130)
        at org.apache.griffin.measure.context.streaming.checkpoint.offset.OffsetCheckpointInZK.init(OffsetCheckpointInZK.scala:90)
        at org.apache.griffin.measure.context.streaming.checkpoint.offset.OffsetCheckpointClient$$anonfun$init$1.apply(OffsetCheckpointClient.scala:34)
        at org.apache.griffin.measure.context.streaming.checkpoint.offset.OffsetCheckpointClient$$anonfun$init$1.apply(OffsetCheckpointClient.scala:34)
        at scala.collection.immutable.List.foreach(List.scala:392)
        at org.apache.griffin.measure.context.streaming.checkpoint.offset.OffsetCheckpointClient$.init(OffsetCheckpointClient.scala:34)
        at org.apache.griffin.measure.launch.streaming.StreamingDQApp$$anonfun$init$1.apply$mcV$sp(StreamingDQApp.scala:70)
        at org.apache.griffin.measure.launch.streaming.StreamingDQApp$$anonfun$init$1.apply(StreamingDQApp.scala:55)
        at org.apache.griffin.measure.launch.streaming.StreamingDQApp$$anonfun$init$1.apply(StreamingDQApp.scala:55)
        at scala.util.Try$.apply(Try.scala:192)
        at org.apache.griffin.measure.launch.streaming.StreamingDQApp.init(StreamingDQApp.scala:55)
        at org.apache.griffin.measure.Application$.main(Application.scala:82)
        at org.apache.griffin.measure.Application.main(Application.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
21/06/28 19:01:11 ERROR curator.ConnectionState: Connection timed out for connection string (zk:2181) and timeout (15000) / elapsed (47080)
org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss
        at org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:197)
        at org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:87)
        at org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:115)
        at org.apache.curator.utils.EnsurePath$InitialHelper$1.call(EnsurePath.java:148)
        at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
        at org.apache.curator.utils.EnsurePath$InitialHelper.ensure(EnsurePath.java:141)
        at org.apache.curator.utils.EnsurePath.ensure(EnsurePath.java:99)
        at org.apache.curator.framework.imps.NamespaceImpl.fixForNamespace(NamespaceImpl.java:74)
        at org.apache.curator.framework.imps.CuratorFrameworkImpl.fixForNamespace(CuratorFrameworkImpl.java:574)
        at org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:194)
        at org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:41)
        at org.apache.griffin.measure.context.streaming.checkpoint.offset.OffsetCheckpointInZK.org$apache$griffin$measure$context$streaming$checkpoint$offset$OffsetCheckpointInZK$$delete(OffsetCheckpointInZK.scala:204)
        at org.apache.griffin.measure.context.streaming.checkpoint.offset.OffsetCheckpointInZK$$anonfun$delete$1.apply(OffsetCheckpointInZK.scala:124)
        at org.apache.griffin.measure.context.streaming.checkpoint.offset.OffsetCheckpointInZK$$anonfun$delete$1.apply(OffsetCheckpointInZK.scala:123)
        at scala.collection.immutable.List.foreach(List.scala:392)
        at org.apache.griffin.measure.context.streaming.checkpoint.offset.OffsetCheckpointInZK.delete(OffsetCheckpointInZK.scala:123)
        at org.apache.griffin.measure.context.streaming.checkpoint.offset.OffsetCheckpointInZK.clear(OffsetCheckpointInZK.scala:130)
        at org.apache.griffin.measure.context.streaming.checkpoint.offset.OffsetCheckpointInZK.init(OffsetCheckpointInZK.scala:90)
        at org.apache.griffin.measure.context.streaming.checkpoint.offset.OffsetCheckpointClient$$anonfun$init$1.apply(OffsetCheckpointClient.scala:34)
        at org.apache.griffin.measure.context.streaming.checkpoint.offset.OffsetCheckpointClient$$anonfun$init$1.apply(OffsetCheckpointClient.scala:34)
        at scala.collection.immutable.List.foreach(List.scala:392)
        at org.apache.griffin.measure.context.streaming.checkpoint.offset.OffsetCheckpointClient$.init(OffsetCheckpointClient.scala:34)
        at org.apache.griffin.measure.launch.streaming.StreamingDQApp$$anonfun$init$1.apply$mcV$sp(StreamingDQApp.scala:70)
        at org.apache.griffin.measure.launch.streaming.StreamingDQApp$$anonfun$init$1.apply(StreamingDQApp.scala:55)
        at org.apache.griffin.measure.launch.streaming.StreamingDQApp$$anonfun$init$1.apply(StreamingDQApp.scala:55)
        at scala.util.Try$.apply(Try.scala:192)
        at org.apache.griffin.measure.launch.streaming.StreamingDQApp.init(StreamingDQApp.scala:55)
        at org.apache.griffin.measure.Application$.main(Application.scala:82)
        at org.apache.griffin.measure.Application.main(Application.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

griffin/measure/target/env.json里面的griffin.checkpoint项,hosts要改成zk服务器的IP地址,x.x.x.x:2181

"griffin.checkpoint": [
    {
      "type": "zk",
      "config": {
        "hosts": "hadoop101:2181",
        "namespace": "griffin/infocache",
        "lock.path": "lock",
        "mode": "persist",
        "init.clear": true,
        "close.clear": false
      }
    }
  ]

12.java.util.ServiceConfigurationError: org.apache.spark.sql.sources.DataSourceRegister: Provider org.elasticsearch.spark.sql.DefaultSource15 not found

这个问题比较罕见,google之后发现以下两个博文有提到这个问题,从两个方面阐释了解决思路。但是归根到底就一个问题:那就是版本不兼容。

  • github.com/elastic/ela… 这个里面讲的是elasticsearch-hadoop,elasticsearch-spark-20_2.10等jar在配置的时候版本有问题,由于我跑的是spark-streaming案例,也压根没开es,所以不是这个问题。
  • 还有一个思路,www.reddit.com/r/apachespa… 意思是spark-streaming-kafka-0-10_2.11:2.4.3这个版本要一一对应,比如spark要在2.2.x以上,kafka是0.10,scala是2.11版本。

其中,Kafka 0.10的Spark Streaming集成在设计上类似于0.8 Direct Stream方法。它提供简单的并行性,Kafka分区和Spark分区之间的1:1对应关系以及对偏移量和元数据的访问。但是,由于较新的集成使用了新的Kafka consumer API而不是简单的API,因此用法上存在显著差异。集成的此版本标记为实验性的,因此API可能会发生更改。Spark Integration For Kafka在maven repository里面只有0.8和0.10两个版本。kafka0.8算是一个分水岭的一个版本,并且0.8本身存在很多bug,加了很多补丁,比如kafka client是0.9但是kafka server是0.8需要降级等问题,还有开启生产者发现kafka卡住了,需要在server.properties加入host.name之类的问题,这里不多讲。总之我用kafka0.8耍不起来,kafka0.11以上也会与griffin本身不兼容,所以还是改用了kafka0.10比较稳健的版本。 在这里插入图片描述 在这里插入图片描述 另外,改用kafka 0.10,下载地址在这里 Scala 2.11 - kafka_2.11-0.10.2.2.tgz

提醒一点,在搭建好kafka以后,还是要先跑一下kafka看看能不能正常生产消费数据(先启动zk),如果出现问题,可以考虑是否需要删除zookeeper的version-2数据目录,默认zk将数据目录存到了datadir=/tmp/zookeeper, 因人而异,我改成了zookeeper/zkData/路径了,删除即可。

13. griffin版本的问题

griffin官方文档给了一些版本列表,其中除了kafka版本在上一个问题已经阐述清楚以外,spark用2.4,hadoop用2.7也是可以的,主要是要注意,spark2.4自带的scala是2.11,这样是可以的。但是不要用spark3.0以上,因为它自带的scala是2.12,目前最新的griffin还不支持scala2.12,不知道以后会不会支持。另外,推荐使用griffin0.6.0,griffin0.5.0或者更低版本没用过不知怎么样,但是最好不要用github那个最新的0.7.0版本,编译后发现hdfs sink写不出文件。

成功之后可以,可以在spark后台看到过程,在missRecord里面看到错误数据, _Metrics里面记录此次之行结果

在这里插入图片描述 在这里插入图片描述 在这里插入图片描述

14. com.google.gson.JsonIOException: JSON document was not fully consumed.

Exception in thread "main" com.google.gson.JsonIOException: JSON document was not fully consumed.
	at com.google.gson.Gson.assertFullConsumption(Gson.java:905)
	at com.google.gson.Gson.fromJson(Gson.java:898)
	at com.google.gson.Gson.fromJson(Gson.java:846)
	at com.google.gson.Gson.fromJson(Gson.java:817)
	at com.unionpay.magpie.util.GsonTest.main(GsonTest.java:13)

在我的案例中,这个错误是因为kafka producer有间断的数据(即中间有停顿,会产生以“\0”结尾的空json字符串)这样的话Gson就拒绝解析,可以通过JsonReader。因为jsonReader通过token by token方式读,并且一旦读到最终的“}”符号,它会暂停解析,代码见下一题。参考地址

15. Caused by: java.lang.NullPointerException

这个是json解析来的可能是个null传,这个提前判断一下,如果是空,就返回“”空字符串就好了。参考,代码在上图

public static String handleData(String line){
        try {
                if (line!=null&& !line.equals("")){
                    Gson gson = new GsonBuilder().setLenient().setDateFormat("yyyy-MM-dd_HH:mm:ss").create();
                    JsonReader reader = new JsonReader(new StringReader(line));
                    Student student = gson.fromJson(reader, Student.class);
                    int rand = ra.nextInt(10) + 1;
                    if (rand > 8) student.setName(student.getName() + "_" + ra.nextInt(10));
                    return gson.toJson(student);
                }
                else return "";
        }catch (Exception e){
            return "";
        }
    }

16.Use JsonReader.setLenient(true) to accept malformed JSON at line 1 column 1 path $

这个问题有可能是json字符串(在我的案例中),某个位置的值为null,可以在GsonBuilder()下面加一个setLenient(),参考地址

public static String handleData(String line){
        try {
                if (line!=null&& !line.equals("")){
                    Gson gson = new GsonBuilder().setLenient().setDateFormat("yyyy-MM-dd_HH:mm:ss").create();
                    JsonReader reader = new JsonReader(new StringReader(line));
                    Student student = gson.fromJson(reader, Student.class);
                    int rand = ra.nextInt(10) + 1;
                    if (rand > 8) student.setName(student.getName() + "_" + ra.nextInt(10));
                    return gson.toJson(student);
                }
                else return "";
        }catch (Exception e){
            return "";
        }
    }

17. com.google.gson.JsonSyntaxException异常

在我的案例中是因为时间格式不对,解决方案,创建gson对象时对其指定时间转换格式 代码在上一题 ,还有这里