持续创作,加速成长!这是我参与「掘金日新计划 · 6 月更文挑战」的第34天,点击查看活动详情。
1.简单说明
Greenplum Stream Server (GPSS)是一个ETL(提取、转换、加载)工具。GPSS服务器的一个实例从一个或多个客户机接收流数据,使用Greenplum数据库可读的外部表将数据转换并插入到目标Greenplum表中。数据源和数据格式是特定于客户机的。数据源和数据格式由客户端指定。
2.问题详情
2.1 多线程导致的问题
INTERNAL: Load task is already opened
和 Need to open a load task before write
问题详细报错如下:
io.grpc.StatusRuntimeException: INTERNAL: Load task is already opened
at io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:233)
at io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:214)
at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:139)
at api.GpssGrpc$GpssBlockingStub.open(GpssGrpc.java:640)
io.grpc.StatusRuntimeException: INTERNAL: Need to open a load task before write
at io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:233)
at io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:214)
at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:139)
at api.GpssGrpc$GpssBlockingStub.write(GpssGrpc.java:650)
GPSS本身就是做数据批量入库的流服务,不要使用多线程对相同表进行数据入库操作,可以使用多线程同时处理多张表。
2.2 字段缺少导致的问题
pq: segment reject limit reached, aborting operation (seg4 slice1...
问题详细报错如下:
io.grpc.StatusRuntimeException: INTERNAL: pq: segment reject limit reached, aborting operation (seg4 slice1 xxx.xxx.x.xxx:6004 pid=216169)
at io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:233)
at io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:214)
at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:139)
at api.GpssGrpc$GpssBlockingStub.close(GpssGrpc.java:660)
GPSS自定义客户端的字段和值必须一一对应,字段无值则需要处理为nullvalue
,官网的 Data Type Mapping :
The GPSS Data API service definition includes messages that represent rows and columns of supported Greenplum Database data types. Because Greenplum Database supports more data types than protobuf, the GPSS Data API provides a mapping between the types as follows:
gRPC Type | Greenplum Type |
---|---|
Int32Value | integer, serial |
Int64Value | bigint, bigserial |
Float32Value | real |
Float64Value | double |
StringValue | text (any kind of data) |
BytesValue | bytea |
TimeStampValue | time, timestamp (without time zone) |
Java处理代码如下:
// Row.Builder
Row.Builder builder = Row.newBuilder();
// 封装一条数据
columnList.forEach(columnInfo -> {
// 使其忽略大小写
String columnName = columnInfo.getName();
Object columnVal = insertRow.get(columnName);
if (columnVal == null) {
String columnNameUpperCase = columnName.toUpperCase();
columnVal = insertRow.get(columnNameUpperCase);
}
String columnType = columnInfo.getDatabaseType();
if (columnVal != null) {
String value = columnVal.toString();
switch (columnType) {
case "VARCHAR":
builder.addColumns(DBValue.newBuilder().setStringValue(value));
break;
case "INT2":
case "INT4":
builder.addColumns(DBValue.newBuilder().setInt32Value(Integer.parseInt(value)));
break;
case "INT8":
builder.addColumns(DBValue.newBuilder().setInt64Value(Long.parseLong(value)));
break;
case "FLOAT4":
builder.addColumns(DBValue.newBuilder().setFloat32Value(Float.parseFloat(value)));
break;
case "FLOAT8":
builder.addColumns(DBValue.newBuilder().setFloat64Value(Double.parseDouble(value)));
break;
case "TIMESTAMP":
java.sql.Timestamp timestampVal = java.sql.Timestamp.valueOf(columnVal.toString());
long time = timestampVal.getTime();
Timestamp timestamp = Timestamp.newBuilder().setSeconds(time / 1000).setNanos((int) ((time % 1000) * 1000000)).build();
builder.addColumns(DBValue.newBuilder().setTimeStampValue(timestamp));
break;
default:
builder.addColumns(DBValue.newBuilder().setNullValueValue(11));
break;
}
} else {
switch (columnType) {
case "VARCHAR":
builder.addColumns(DBValue.newBuilder().setNullValueValue(7));
break;
case "INT2":
case "INT4":
builder.addColumns(DBValue.newBuilder().setNullValueValue(1));
break;
case "INT8":
builder.addColumns(DBValue.newBuilder().setNullValueValue(2));
break;
case "FLOAT4":
builder.addColumns(DBValue.newBuilder().setNullValueValue(5));
break;
case "FLOAT8":
builder.addColumns(DBValue.newBuilder().setNullValueValue(6));
break;
case "TIMESTAMP":
builder.addColumns(DBValue.newBuilder().setNullValueValue(10));
break;
default:
builder.addColumns(DBValue.newBuilder().setNullValueValue(11));
break;
}
}
2.3 主键冲突导致的问题
pq: duplicate key value violates unique constraint "xxx"
问题详细报错如下:
io.grpc.StatusRuntimeException: INTERNAL: pq: duplicate key value violates unique constraint "xxx" (seg2 192.168.0.246:6002 pid=231712)
at io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:233)
at io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:214)
at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:139)
at api.GpssGrpc$GpssBlockingStub.close(GpssGrpc.java:660)
如果数据无法保证主键唯一,可使用GPSS的update
或merge
操作,Java代码如下:
// 根据类型创建不同操作(1.创建操作对象;2.设置操作字段;3.设置操作类型。)
if (GpssOperationType.INSERT.symbol.equals(type)) {
InsertOption.Builder builderInsert = InsertOption.newBuilder()
.setErrorLimitCount(props.errorLimitCount)
.setErrorLimitPercentage(props.errorLimitPercentage)
.setTruncateTable(props.truncateTable);
builderInsert.addAllInsertColumns(columnNameList);
InsertOption insertOpt = builderInsert.build();
openReq = openRequestBuilder.setInsertOption(insertOpt).build();
} else if (GpssOperationType.UPDATE.symbol.equals(type)) {
UpdateOption.Builder updateOption = UpdateOption.newBuilder()
.setErrorLimitCount(props.errorLimitCount)
.setErrorLimitPercentage(props.errorLimitPercentage);
// 需设置 matchColumns 字段
List<String> finalMatchFieldsList = matchFieldsList;
columnNameList.forEach(columnName -> {
if (finalMatchFieldsList.contains(columnName)) {
updateOption.addMatchColumns(columnName);
} else {
updateOption.addUpdateColumns(columnName);
}
});
UpdateOption updateOpt = updateOption.build();
openReq = openRequestBuilder.setUpdateOption(updateOpt).build();
} else if (GpssOperationType.MERGE.symbol.equals(type)) {
// 操作对象 matchColumns 参数需配置
MergeOption.Builder mergeOption = MergeOption.newBuilder()
.setErrorLimitCount(props.errorLimitCount)
.setErrorLimitPercentage(props.errorLimitPercentage);
matchFieldsList.forEach(mergeOption::addMatchColumns);
mergeOption.addAllInsertColumns(columnNameList);
mergeOption.addAllUpdateColumns(columnNameList);
MergeOption mergeOpt = mergeOption.build();
openReq = openRequestBuilder.setMergeOption(mergeOpt).build();
}
2.4 setFloat64Value导致的问题
查看一下源码:
// DBValue.newBuilder().setFloat32Value()
public DBValue.Builder setFloat32Value(float value) {
this.dBTypeCase_ = 5;
this.dBType_ = value;
this.onChanged();
return this;
}
// DBValue.newBuilder().setFloat64Value()
public DBValue.Builder setFloat64Value(double value) {
this.dBTypeCase_ = 6;
this.dBType_ = value;
this.onChanged();
return this;
}
导致图片中问题的原因是:
// 正确用法
DBValue.newBuilder().setFloat64Value(Double.parseDouble(value))
// 错误用法
DBValue.newBuilder().setFloat64Value(Float.parseFloat(value))
3.更新记录
- 2022.06.10 多线程导致的问题
- 2022.06.24 添加字段缺少和主键冲突导致的问题
- 2022.06.27 添加setFloat64Value导致的值问题