在本系列博客的第一部分，我们讨论了利用Google Cloud Dataflow高效部署机器学习模型进行推理的最佳实践和模式。除其他技术外，它还展示了有效的批处理输入和使用shared.py来有效利用模型。

在这篇文章中，我们通过使用tfx-bsl的RunInference API，这是一个来自TensorFlow Extended（TFX）的实用转换，它将我们从手动实现第一部分描述的模式中抽象出来。

涵盖了以下四种模式。

使用RunInference来进行ML预测的调用。
对RunInference结果进行后处理。在业务流程中，进行预测往往是多步骤流程的第一部分。这里我们将把结果处理成可以在下游使用的形式。
附上一个键。在传递给模型的数据的同时，往往需要一个标识符--例如，物联网设备ID或客户标识符--在以后的过程中使用，即使它不被模型本身使用。我们展示了如何实现这一点。
在同一管道中用多个模型进行推理。通常情况下，你可能需要在同一管道中运行多个模型，无论是并行的还是作为预测-处理-预测调用的序列。我们通过一个简单的例子。

创建一个简单的模型

为了说明这些模式，我们将使用一个简单的玩具模型，让我们专注于管道的输入和输出所需的数据工程。这个模型将被训练成近似于数字5的乘法。

请注意下面的代码片段可以在笔记本环境下作为单元格运行。

第1步 - 设置库和导入

%pip install tfx_bsl==0.29.0 --quiet
import argparse

import tensorflow as tf
from tensorflow import keras
from tensorflow_serving.apis import prediction_log_pb2

import apache_beam as beam
import tfx_bsl
from tfx_bsl.public.beam import RunInference
from tfx_bsl.public import tfxio
from tfx_bsl.public.proto import model_spec_pb2

import numpy

from typing import Dict, Text, Any, Tuple, List

from apache_beam.options.pipeline_options import PipelineOptions

project = "<your project>"
bucket = "<your bucket>"

save_model_dir_multiply = f'gs://{bucket}/tfx-inference/model/multiply_five/v1/'
save_model_dir_multiply_ten = f'gs://{bucket}/tfx-inference/model/multiply_ten/v1/'

第2步 - 创建示例数据

在这一步中，我们创建一个小数据集，其中包括一个从0到99的数值范围和对应于每个数值乘以5的标签。

"""
Create our training data which represents the 5 times multiplication table for 0 to 99. x is the data and y the labels. 

x is a range of values from 0 to 99.
y is a list of 5x

value_to_predict includes a values outside of the training data
"""
x = numpy.arange(0, 100)
y = x * 5

第3步 - 创建一个简单的模型，编译并拟合它

"""
Build a simple linear regression model.
Note the model has a shape of (1) for its input layer, it will expect a single int64 value.
"""
input_layer = keras.layers.Input(shape=(1), dtype=tf.float32, name='x')
output_layer= keras.layers.Dense(1)(input_layer)

model = keras.Model(input_layer, output_layer)
model.compile(optimizer=tf.optimizers.Adam(), loss='mean_absolute_error')
model.summary()

让我们来教教这个模型关于5的乘法。

model.fit(x, y, epochs=2000)

接下来，用一些测试数据来检查模型的表现如何。

value_to_predict = numpy.array([105, 108, 1000, 1013], dtype=numpy.float32)
model.predict(value_to_predict)

从下面的结果来看，这个简单的模型似乎已经学会了它的5倍表，足以满足我们的需要

OUTPUT: 

array([[ 524.9939],
       [ 539.9937],
       [4999.935 ],
       [5064.934 ]], dtype=float32)

第4步--将输入转化为tf.example

在我们刚刚建立的模型中，我们使用了一个简单的列表来生成数据并将其传递给模型。在接下来的步骤中，我们通过在模型训练中使用tf.example对象来使模型更加健壮。

tf.example是一个可序列化的字典（或映射），从名字到张量，这确保了即使在基础例子中添加了新的特征，模型仍然可以发挥作用。使用tf.example还带来了一个好处，那就是数据可以以一种高效的序列化格式在不同的模型中进行移植。

为了在这个例子中使用tf.example，我们首先需要创建一个辅助类，ExampleProcessor, ，用来序列化数据点。

class ExampleProcessor:
  
   def create_example_with_label(self, feature: numpy.float32,
                            label: numpy.float32)-> tf.train.Example:
       return tf.train.Example(
           features=tf.train.Features(
                 feature={'x': self.create_feature(feature),
                          'y' : self.create_feature(label)
                 }))

   def create_example(self, feature: numpy.float32):
       return tf.train.Example(
           features=tf.train.Features(
                 feature={'x' : self.create_feature(feature)})
           )

   def create_feature(self, element: numpy.float32):
       return tf.train.Feature(float_list=tf.train.FloatList(value=[element]))

使用ExampleProcess类，现在可以将内存中的列表移到磁盘上。

# Create our labeled example file for 5 times table

example_five_times_table = 'example_five_times_table.tfrecord'

with tf.io.TFRecordWriter(example_five_times_table) as writer:
 for i in zip(x, y):
   example = ExampleProcessor().create_example_with_label(
       feature=i[0], label=i[1])
   writer.write(example.SerializeToString())

# Create a file containing the values to predict

predict_values_five_times_table = 'predict_values_five_times_table.tfrecord'

with tf.io.TFRecordWriter(predict_values_five_times_table) as writer:
 for i in value_to_predict:
   example = ExampleProcessor().create_example(feature=i)
   writer.write(example.SerializeToString())

随着新的例子存储在磁盘上的TFRecord文件中，我们可以使用DatasetAPI来准备数据，这样它就可以被模型消费。

RAW_DATA_TRAIN_SPEC = {
'x': tf.io.FixedLenFeature([], tf.float32),
'y': tf.io.FixedLenFeature([], tf.float32)
}

RAW_DATA_PREDICT_SPEC = {
'x': tf.io.FixedLenFeature([], tf.float32),
}

有了特征规格，我们就可以像以前一样训练模型了。

dataset = tf.data.TFRecordDataset(example_five_times_table) 
dataset = dataset.map(lambda e : tf.io.parse_example(e, RAW_DATA_TRAIN_SPEC)) 
dataset = dataset.map(lambda t : (t['x'], t['y'])) 
dataset = dataset.batch(100) 
dataset = dataset.repeat()
model.fit(dataset, epochs=500, steps_per_epoch=1)

注意，如果我们使用TFX管道建立模型，而不是像这里一样手工制作模型，这些步骤会自动完成。

第5步 - 保存模型

现在我们有了一个模型，我们需要把它保存起来，以便在RunInference转换中使用。RunInference接受TensorFlow保存的模型pb文件作为其配置的一部分。保存的模型文件必须存储在一个可以被RunInference转换访问的位置。在笔记本中，这可以是本地文件系统；然而，为了在Dataflow上运行管道，该文件需要被所有的工作者访问，所以这里我们使用GCP桶。

请注意，gs://模式是由tf.keras.models.save_model api直接支持的。

tf.keras.models.save_model(model, save_model_dir_multiply)

在开发过程中，能够检查保存的模型文件的内容是非常有用的。为此，我们使用TensorFlow自带的save_model_cli。你可以从一个单元格中运行这个命令。

!saved_model_cli show --dir {save_model_dir_multiply} --all

下面是保存的模型文件的简略输出。注意签名def'serving_default' ，它接受一个浮动类型的张量。我们将在下一节中改变它以接受另一种类型。

OUTPUT: 

signature_def['serving_default']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['example'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 1)
        name: serving_default_example:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['dense_1'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 1)
        name: StatefulPartitionedCall:0
  Method name is: tensorflow/serving/predict

目的，我们还有一个步骤来准备模型：创建一个特定的签名。

签名是一个强大的功能，因为它使我们能够控制调用程序与模型的交互方式。来自TensorFlow文档。

"可选的签名参数控制obj中的哪些方法将被消费SavedModels的程序使用，例如，服务于API。Python函数可以用@tf.function(input_signature=...)来装饰，并直接作为签名传递，或者在用@tf.function装饰的方法上调用get_concrete_function来偷懒。"

在我们的例子中，下面的代码将创建一个签名，接受一个名称为 "examples "的tf.string 数据类型。然后这个签名与模型一起被保存，它取代了之前保存的模型。

@tf.function(input_signature=[tf.TensorSpec(shape=[None], dtype=tf.string , name='examples')])
def serve_tf_examples_fn(serialized_tf_examples):
 """Returns the output to be used in the serving signature."""
 features = tf.io.parse_example(serialized_tf_examples, RAW_DATA_PREDICT_SPEC)
 return model(features, training=False)

signature = {'serving_default': serve_tf_examples_fn}

tf.keras.models.save_model(model, save_model_dir_multiply, signatures=signature)

如果你再次运行saved_model_cli命令，你会看到输入的签名已经改变为DT_STRING 。

模式1:预测的RunInference

步骤1 - 在管道内使用RunInference

现在，模型已经准备好了，RunInference转换可以插入到Apache Beam管道中。下面的管道使用TFXIOTFExampleRecord，它通过以下方式将其转换为一个转换。 [RawRecordBeamSource](https://www.tensorflow.org/tfx/tfx_bsl/api_docs/python/tfx_bsl/public/tfxio/TFExampleBeamRecord)().保存的模型位置和签名将作为一个配置对象传递给RunInference API。 [SavedModelSpec](https://www.tensorflow.org/tfx/tfx_bsl/api_docs/python/tfx_bsl/public/proto/model_spec_pb2/SavedModelSpec)配置对象。

pipeline = beam.Pipeline()

tfexample_beam_record = tfx_bsl.public.tfxio.TFExampleRecord(file_pattern=predict_values_five_times_table)

with pipeline as p:
   _ = (p | tfexample_beam_record.RawRecordBeamSource()
          | RunInference(
              model_spec_pb2.InferenceSpecType(
                  saved_model_spec=model_spec_pb2.SavedModelSpec(model_path=save_model_dir_multiply)))
          | beam.Map(print)

注意。

你可以使用RunInference执行两种类型的推理。

从一个SavedModel实例进行过程中推理。当saved_model_spec 字段被设置为inference_spec_type 时使用。
通过使用服务端点进行远程推理。当ai_platform_prediction_model_spec 字段设置在inference_spec_type 时使用。

下面是输出的一个片段。这里的值有点难以解释，因为它们是未经处理的原始格式。在下一节中，将对原始结果进行后处理。

OUTPUT: 

predict_log {
  request { 
model_spec { signature_name: "serving_default" }
                inputs {
      key: "examples"
... 
       string_val: "\n\022\n\020\n\007example\022\005\032\003\n\001i"
...
response {
    outputs {
      key: "output_0"
      value {
   ...
        float_val: 524.993896484375

模式2：RunInference结果的后处理

RunInference API返回一个PredictionLog对象，其中包含序列化的输入和调用模型的输出。对输入和输出的访问使你能够在后处理期间创建一个简单的元组，以便在管道的下游使用。另外值得注意的是，RunInference将考虑模型的可批处理能力（并为性能目的进行批处理推理），这对你来说是透明的。

PredictionProcessorbeam.DoFn 接受RunInference的输出，并产生带有问题和答案的格式化文本作为输出。当然，在一个生产系统中，输出通常是一个Tuple[input, output]，或者仅仅是输出，这取决于用例。

class PredictionProcessor(beam.DoFn):

   def process(
           self,
           element: prediction_log_pb2.PredictionLog):
       predict_log = element.predict_log
       input_value = tf.train.Example.FromString(predict_log.request.inputs['examples'].string_val[0])
       output_value = predict_log.response.outputs
       yield (f"input is [{input_value.features.feature['x'].float_list.value}] output is {output_value['output_0'].float_val}");

pipeline = beam.Pipeline()

tfexample_beam_record = tfx_bsl.public.tfxio.TFExampleRecord(file_pattern=predict_values_five_times_table)

with pipeline as p:
   _ = (p | tfexample_beam_record.RawRecordBeamSource()
          | RunInference(
              model_spec_pb2.InferenceSpecType(
                  saved_model_spec=model_spec_pb2.SavedModelSpec(model_path=save_model_dir_multiply)))
          | beam.ParDo(PredictionProcessor())
          | beam.Map(print)
       )

现在，输出包含原始输入和模型的输出值。

OUTPUT: 

input is [[105.]] output is [523.6328735351562]
input is [[108.]] output is [538.5157470703125]
input is [[1000.]] output is [4963.6787109375]
input is [[1013.]] output is [5028.1708984375]

模式 3：附加一个键

一个有用的模式是能够将信息，通常是一个唯一的标识符，与输入一起传递给模型，并能从输出中访问这个标识符。例如，在一个物联网的用例中，你可以将一个设备ID与输入的数据传递给模型。通常这种类型的键对模型本身没有用，因此不应该被传递到第一层。

RunInference通过接受Tuple[key, value]和输出Tuple[key, PredictLog]来为我们处理这个问题。

第1步--创建一个附带键的源

由于我们需要一个带有预测数据的键，在这一步中，我们在BigQuery中创建一个表，它有两列。其中一列持有密钥，第二列持有测试值。

CREATE OR REPLACE TABLE
  maths.maths_problems_1 ( key STRING OPTIONS(description="A unique key for the maths problem"),
    value FLOAT64 OPTIONS(description="Our maths problem" ) );
INSERT INTO
  maths.maths_problems_1
VALUES
  ( "first_question", 105.00),
  ( "second_question", 108.00),
  ( "third_question", 1000.00),
  ( "fourth_question", 1013.00)

第2步 - 修改后处理器和管道

在这一步，我们。

修改管道以从新的BigQuery源表中读取数据
添加一个map变换，将表行转换成Tuple[ bytes, Example]。
修改后置推理处理器，将结果与关键值一起输出。

class PredictionWithKeyProcessor(beam.DoFn):

   def __init__(self):
       beam.DoFn.__init__(self)

   def process(
           self,
           element: Tuple[bytes, prediction_log_pb2.PredictionLog]):
       predict_log = element[1].predict_log
       input_value = tf.train.Example.FromString(predict_log.request.inputs['examples'].string_val[0])
       output_value = predict_log.response.outputs
       yield (f"key is {element[0]} input is {input_value.features.feature['x'].float_list.value} output is { output_value['output_0'].float_val[0]}" )

pipeline_options = PipelineOptions().from_dictionary({'temp_location':f'gs://{bucket}/tmp'})
pipeline = beam.Pipeline(options=pipeline_options)

with pipeline as p:
 _ = (p | beam.io.gcp.bigquery.ReadFromBigQuery(table=f'{project}:maths.maths_problems_1')
         | beam.Map(lambda x : (bytes(x['key'], 'utf-8'), ExampleProcessor().create_example(numpy.float32(x['value']))))
         | RunInference(
             model_spec_pb2.InferenceSpecType(
                 saved_model_spec=model_spec_pb2.SavedModelSpec(model_path=save_model_dir_multiply)))
         | beam.ParDo(PredictionWithKeyProcessor())
         | beam.Map(print)

key is b'first_question' input is [105.] output is 524.0875854492188
key is b'second_question' input is [108.] output is 539.0093383789062
key is b'third_question' input is [1000.] output is 4975.75830078125
key is b'fourth_question' input is [1013.] output is 5040.41943359375

模式4：在同一流水线上用多个模型进行推理

在本系列的第一部分，"连接多个模型的结果 "模式涵盖了Apache Beam中的各种分支技术，使数据通过多个模型运行成为可能。

这些技术适用于RunInference API，它可以很容易地被一个管道内的多个分支使用，有相同或不同的模型。这在功能上类似于级联集合，尽管这里的数据在单个Apache Beam DAG中流经多个模型。

多个模型的并行推理

在这个例子中，相同的数据在两个不同的模型中运行：一个是我们一直在使用的乘以5的模型，另一个是新的模型，它将学习乘以10。

"""
Create multiply by 10 table.

x is a range of values from 0 to 100.
y is a list of x * 10

value_to_predict includes a values outside of the training data
"""
x = numpy.arange( 0, 1000)
y = x * 10

# Create our labeled example file for 10 times table

example_ten_times_table = 'example_ten_times_table.tfrecord'

with tf.io.TFRecordWriter( example_ten_times_table ) as writer:
 for i in zip(x, y):
   example = ExampleProcessor().create_example_with_label(
       feature=i[0], label=i[1])
   writer.write(example.SerializeToString())

dataset = tf.data.TFRecordDataset(example_ten_times_table) 
dataset = dataset.map(lambda e : tf.io.parse_example(e, RAW_DATA_TRAIN_SPEC)) 
dataset = dataset.map(lambda t : (t['x'], t['y'])) 
dataset = dataset.batch(100)
dataset = dataset.repeat() 

model.fit(dataset, epochs=500, steps_per_epoch=10, verbose=0)

tf.keras.models.save_model(model,
                           save_model_dir_multiply_ten,
                           signatures=signature)

现在我们有了两个模型，我们将它们应用于我们的源数据。

pipeline_options = PipelineOptions().from_dictionary(
                                     {'temp_location':f'gs://{bucket}/tmp'})

pipeline = beam.Pipeline(options=pipeline_options)

with pipeline as p:
 questions = p | beam.io.gcp.bigquery.ReadFromBigQuery(
                                   table=f'{project}:maths.maths_problems_1')

 multiply_five = ( questions
             | "CreateMultiplyFiveTuple" >>
             beam.Map(lambda x : (bytes('{}{}'.format(x['key'],' * 5'),'utf-8'),
                                   ExampleProcessor().create_example(x['value'])))
            
             | "Multiply Five" >> RunInference(
                 model_spec_pb2.InferenceSpecType(
                 saved_model_spec=model_spec_pb2.SavedModelSpec(
                                           model_path=save_model_dir_multiply)))
     )
 multiply_ten = ( questions
         | "CreateMultiplyTenTuple" >>
         beam.Map(lambda x : (bytes('{}{}'.format(x['key'],'* 10'), 'utf-8'),
                              ExampleProcessor().create_example(x['value'])))
         | "Multiply Ten" >> RunInference(
             model_spec_pb2.InferenceSpecType(
             saved_model_spec=model_spec_pb2.SavedModelSpec(
                                         model_path=save_model_dir_multiply_ten)))
 )
 _ = ((multiply_five, multiply_ten) | beam.Flatten()
                                    | beam.ParDo(PredictionWithKeyProcessor())
                                    | beam.Map(print))

Output:

key is b'first_question * 5' input is [105.] output is 524.0875854492188
key is b'second_question * 5' input is [108.] output is 539.0093383789062
key is b'third_question * 5' input is [1000.] output is 4975.75830078125
key is b'fourth_question * 5' input is [1013.] output is 5040.41943359375
key is b'first_question* 10' input is [105.] output is 1054.333984375
key is b'second_question* 10' input is [108.] output is 1084.3131103515625
key is b'third_question* 10' input is [1000.] output is 9998.0908203125
key is b'fourth_question* 10' input is [1013.] output is 10128.0009765625

用多个模型依次进行推理

在顺序模式中，数据被依次发送到一个或多个模型中，每个模型的输出都会连锁到下一个模型。

以下是相关步骤。

从BigQuery中读取数据
对数据进行映射
带有乘以5的模型的RunInference
处理结果
用乘以10的模型进行RunInference
处理结果

pipeline_options = PipelineOptions().from_dictionary(
                                       {'temp_location':f'gs://{bucket}/tmp'})

pipeline = beam.Pipeline(options=pipeline_options)

def process_interim_inference(element : Tuple[
                                        bytes, prediction_log_pb2.PredictionLog
                                        ])-> Tuple[bytes, tf.train.Example]:
  
  key = '{} original input is {}'.format(
             element[0], str(tf.train.Example.FromString(
                 element[1].predict_log.request.inputs['examples'].string_val[0]
                 ).features.feature['x'].float_list.value[0]))
  
  value = ExampleProcessor().create_example(
              element[1].predict_log.response.outputs['output_0'].float_val[0])
  
  return (bytes(key,'utf-8'),value)

with pipeline as p:
  
 questions = p | beam.io.gcp.bigquery.ReadFromBigQuery(
                                   table=f'{project}:maths.maths_problems_1')

 multiply = ( questions
             | "CreateMultiplyTuple" >>
             beam.Map(lambda x : (bytes(x['key'],'utf-8'),
                                   ExampleProcessor().create_example(x['value'])))
             | "MultiplyFive" >> RunInference(
                 model_spec_pb2.InferenceSpecType(
                 saved_model_spec=model_spec_pb2.SavedModelSpec(
                                   model_path=save_model_dir_multiply)))
            
     )

 _ = ( multiply
         | "Extract result " >> 
         beam.Map(lambda x : process_interim_inference(x))
         | "MultiplyTen" >> RunInference(
             model_spec_pb2.InferenceSpecType(
             saved_model_spec=model_spec_pb2.SavedModelSpec(
                             model_path=save_model_dir_multiply_ten)))
         | beam.ParDo(PredictionWithKeyProcessor())
         | beam.Map(print)
 )

Output: 

key is b"b'first_question' original input is 105.0" input is [524.9771118164062] output is 5249.7822265625
key is b"b'second_question' original input is 108.0" input is [539.9765014648438] output is 5399.7763671875
key is b"b'third_question' original input is 1000.0" input is [4999.7841796875] output is 49997.9453125
key is b"b'forth_question' original input is 1013.0" input is [5064.78125] output is 50647.91796875

在Dataflow上运行流水线

到目前为止，管道一直在本地运行，使用直接运行器，在使用默认配置运行管道时，隐含地使用了直接运行器。通过传入配置参数（包括--runner. Details），可以使用生产型Dataflow运行器运行同样的例子，这里可以找到一个例子。

下面是一个在Dataflow服务上运行的多模型管道图的例子。

通过Dataflow运行器，你还可以访问管线监控以及RunInference转换输出的指标。下表显示了这些指标中的一部分，这些指标来自于库中的一个更大的列表。

总结

在这篇博客中，也就是我们这个系列的第二部分，我们探讨了在一些常见的场景中使用tfx-bsl RunInference，从标准推理，到后期处理以及在管道中的多个位置使用RunInference API。

要了解更多信息，请查阅Dataflow和TFX文档，你也可以用谷歌云人工智能平台管道尝试TFX。

鸣谢

如果没有Dataflow TFX和TF团队中许多人的努力工作，这一切都不可能实现。在TFX和TF团队中，我们要特别感谢Konstantinos Katsiapis, Zohar Yahav, Vilobh Meshram, Jiayi Zhao, Zhitao Li, 和Robert Crowe。在Dataflow团队中，我想感谢Ahmet Altay在整个过程中的支持和投入。

如何使用TFX推理与Dataflow进行大规模的ML推理模式