本文已参与「新人创作礼」活动，一起开启掘金创作之路。

Tensor

tensorflow,从名字上看由tensor+flow组成。本文来看看Tensor是什么，是怎么实现的。

tensorflow里的tensor可以抽象的认为由<n维数组，数组元素类型(dtype)，数组各维大小(shape)>三元组成，同时在这三元组上有一些操作：创建，删除，复制，改变shape, 切片等等。如果用C++来简单定义：

struct Tensor {
    std::vector<int> shape; //表示多维数组各维大小，如三维数组：shape={2,3,4}
    int dtype;              //表示数据类型，根据类型能报data转成对应的数组
    void *data;             //连续内存空间，保存了数组中所有元素
};

shape是可以修改的，比如一个2x3的数组，也可以变成3x2,只要元素个数不变就行。

data是一段连续的内存空间，正如c++中的数组 T[2][3]. 如果dtype是整数，那么就是int data[2][3],

data是个指针，如果强制转成int *data. 那么data, data+1, data+2, ..., data+5就是各个元素。

还能切片slice：如如把data的第一维拿出来，就是data[1]. 因为是2*3数组， data[0], data[1]都3个元素。slice之后tensor还引用着原tensor的内存。而且通过引用计数保存原tensor内存释放了，slice也是可用的

Tensor实现

然而，在工程实现中，还要考虑data的对齐，如8字节对齐。也要考虑 data的内存分配方式，tensorflow里定义了allocator接口，来实现各种不同的分配方式。考虑到模型参数保存，checkpoint保存等，tensor还得支持序列化，tensorflow使用protobuf来序列化tensor.

Tensor的实现在：

ls tensorflow/core/framework/tensor.*
tensor.cc     tensor.h      tensor.proto

基本操作

Tensor成员

  TensorShape shape_; //形状
  TensorBuffer* buf_; //数据

构造

空构造：不是scalar, shape {0}, NumElements() ==0。
type+shape构造，会分配内存：Tensor(DataType type, const TensorShape& shape); 默认用CPUAllocator
allocator+type+shape构造：Tensor(Allocator* a, DataType type, const TensorShape& shape);
带buffer构建：Tensor(DataType type, const TensorShape& shape, TensorBuffer* buf);
基于常量(scalar)的构建函数，重载了很多 explicit Tensor(float scalar_value)

切片

按第一维切片，但是不复制数据，不能保证对齐IsAligned

Tensor Slice(int64_t dim0_start, int64_t dim0_limit) const;
Tensor SubSlice(int64_t index) const;

序列化

  bool FromProto(const TensorProto& other) TF_MUST_USE_RESULT;
  bool FromProto(Allocator* a, const TensorProto& other) TF_MUST_USE_RESULT;

  /// \brief Fills in `proto` with `*this` tensor's content.
  ///
  /// `AsProtoField()` fills in the repeated field for `proto.dtype()`, while
  /// `AsProtoTensorContent()` encodes the content in `proto.tensor_content()`
  /// in a compact form.
  void AsProtoField(TensorProto* proto) const;
  void AsProtoTensorContent(TensorProto* proto) const;

拷贝

复制构造和移动构造都支持
operator=支持复制和移动

访问

/// Returns the data type.
DataType dtype() const { return shape_.data_type(); }

/// Returns the shape of the tensor.
const TensorShape& shape() const { return shape_; }

/// \brief Convenience accessor for the tensor shape.
///
/// For all shape accessors, see comments for relevant methods of
/// `TensorShape` in `tensor_shape.h`.
int dims() const { return shape().dims(); }

/// Convenience accessor for the tensor shape.
int64_t dim_size(int d) const { return shape().dim_size(d); }

/// Convenience accessor for the tensor shape.
int64_t NumElements() const { return shape().num_elements(); }

size_t AllocatedBytes() const
bool IsAligned() const
bool CopyFrom(const Tensor& other,
              const TensorShape& shape)
Tensor t;
d = t.scalar<float>(); //访问scalar
d = t.vec<float>();    //以一维数组方式访问: d[0]

d = t.matrix<float>(); //以矩阵方式访问: d(2,3)

//单个元素访问
flat = t.flat<float>()
d = flat.data()
for(auto i = 0; i < t.NumElements(); i++) d[i]
  template <typename T>
  typename TTypes<T>::Flat flat() {
    return shaped<T, 1>({NumElements()});
  }

  template <typename T>
  typename TTypes<T>::UnalignedFlat unaligned_flat() {
    return unaligned_shaped<T, 1>({NumElements()});
  }

//用于memcpy
/// REQUIRES: `DataTypeCanUseMemcpy(dtype())`.
StringPiece tensor_data() const;
void* data() const;

Debug信息


  std::string SummarizeValue(int64_t max_entries, bool print_v2 = false) const;
  std::string DebugString(int num_values) const;
  std::string DebugString() const { return DebugString(3); }
  std::string DeviceSafeDebugString() const;
  void FillDescription(TensorDescription* description) const;

Tensor shape type的实现在如下文件中

tensor.h: TensorBuffer来执行级data内存。

$ ls tensorflow/core/framework/tensor* 
tensorflow/core/framework/tensor.cc                 tensorflow/core/framework/tensor_shape.proto    tensorflow/core/framework/tensor_testutil.h
tensorflow/core/framework/tensor.h                  tensorflow/core/framework/tensor_shape_test.cc  tensorflow/core/framework/tensor_testutil_test.cc
tensorflow/core/framework/tensor.proto              tensorflow/core/framework/tensor_slice.cc       tensorflow/core/framework/tensor_types.h
tensorflow/core/framework/tensor_description.proto  tensorflow/core/framework/tensor_slice.h        tensorflow/core/framework/tensor_util.cc
tensorflow/core/framework/tensor_key.h              tensorflow/core/framework/tensor_slice.proto    tensorflow/core/framework/tensor_util.h
tensorflow/core/framework/tensor_reference.h        tensorflow/core/framework/tensor_slice_test.cc  tensorflow/core/framework/tensor_util_test.cc
tensorflow/core/framework/tensor_shape.cc           tensorflow/core/framework/tensor_test.cc
tensorflow/core/framework/tensor_shape.h            tensorflow/core/framework/tensor_testutil.cc

$ ls tensorflow/core/framework/shape* 
tensorflow/core/framework/shape_inference.cc  tensorflow/core/framework/shape_inference_test.cc      tensorflow/core/framework/shape_inference_testutil.h
tensorflow/core/framework/shape_inference.h   tensorflow/core/framework/shape_inference_testutil.cc  tensorflow/core/framework/shape_inference_testutil_test.cc

$ ls tensorflow/core/framework/type* 
tensorflow/core/framework/type_index.h   tensorflow/core/framework/typed_allocator.cc  tensorflow/core/framework/types.cc  tensorflow/core/framework/types.proto
tensorflow/core/framework/type_traits.h  tensorflow/core/framework/typed_allocator.h   tensorflow/core/framework/types.h   tensorflow/core/framework/types_test.cc

Tensor支持的数据类型

定义在tensorflow/core/framework/types.proto中

enum DataType {
  // Not a legal value for DataType.  Used to indicate a DataType field
  // has not been set.
  DT_INVALID = 0;

  // Data types that all computation devices are expected to be
  // capable to support.
  DT_FLOAT = 1;
  DT_DOUBLE = 2;
  DT_INT32 = 3;
  DT_UINT8 = 4;
  DT_INT16 = 5;
  DT_INT8 = 6;
  DT_STRING = 7;
  DT_COMPLEX64 = 8;  // Single-precision complex
  DT_INT64 = 9;
  DT_BOOL = 10;
  DT_QINT8 = 11;     // Quantized int8
  DT_QUINT8 = 12;    // Quantized uint8
  DT_QINT32 = 13;    // Quantized int32
  DT_BFLOAT16 = 14;  // Float32 truncated to 16 bits.  Only for cast ops.
  DT_QINT16 = 15;    // Quantized int16
  DT_QUINT16 = 16;   // Quantized uint16
  DT_UINT16 = 17;
  DT_COMPLEX128 = 18;  // Double-precision complex
  DT_HALF = 19;
  DT_RESOURCE = 20;
  DT_VARIANT = 21;  // Arbitrary C++ data types
  DT_UINT32 = 22;
  DT_UINT64 = 23;
}

序列化tensor.proto

tensor序列化后可以成为PB.


// Protocol buffer representing a tensor.
message TensorProto {
  DataType dtype = 1;
  TensorShapeProto tensor_shape = 2;

  int32 version_number = 3;

  bytes tensor_content = 4;
  
  repeated int32 half_val = 13 [packed = true];

  // DT_FLOAT.
  repeated float float_val = 5 [packed = true];

  // DT_DOUBLE.
  repeated double double_val = 6 [packed = true];

  // DT_INT32, DT_INT16, DT_UINT16, DT_INT8, DT_UINT8.
  repeated int32 int_val = 7 [packed = true];

  // DT_STRING
  repeated bytes string_val = 8;

  // DT_COMPLEX64. scomplex_val(2*i) and scomplex_val(2*i+1) are real
  // and imaginary parts of i-th single precision complex.
  repeated float scomplex_val = 9 [packed = true];

  // DT_INT64
  repeated int64 int64_val = 10 [packed = true];

  // DT_BOOL
  repeated bool bool_val = 11 [packed = true];

  // DT_COMPLEX128. dcomplex_val(2*i) and dcomplex_val(2*i+1) are real
  // and imaginary parts of i-th double precision complex.
  repeated double dcomplex_val = 12 [packed = true];

  // DT_RESOURCE
  repeated ResourceHandleProto resource_handle_val = 14;

  // DT_VARIANT
  repeated VariantTensorDataProto variant_val = 15;

  // DT_UINT32
  repeated uint32 uint32_val = 16 [packed = true];

  // DT_UINT64
  repeated uint64 uint64_val = 17 [packed = true];
}

// Protocol buffer representing the serialization format of DT_VARIANT tensors.
message VariantTensorDataProto {
  // Name of the type of objects being serialized.
  string type_name = 1;
  // Portions of the object that are not Tensors.
  bytes metadata = 2;
  // Tensors contained within objects being serialized.
  repeated TensorProto tensors = 3;
}

tensor_util.h和tensor_util_test.cc中有使用tensor的样例

提供了如下功能:

tensor深拷贝
slice深拷贝
Concat 连接
Split 分割
ConcatSplitStrings 字符串连接
CreatesStringTensorProto：从文件的protobuf中反序列化出dtype=DT_STRING的tensor
CreatesInt32TensorProto
CreatesInt64TensorProto
CreatesUInt32TensorProto
CreatesUInt64TensorProto
...各种类型都有从文件反序列化
CompressTensorProtoInPlaceTooSmall 各种tensor proto压缩
CompressTensorProtoInPlaceAllEqual
CompressTensorProtoConstantTail
CompressTensorProtoNegatizeZero

Tensor内存分配：Allocator

接口

tensorflow/core/framework/allocator.h

class Allocator {
 public:
  static constexpr size_t kAllocatorAlignment = 64;
  virtual ~Allocator();
  virtual std::string Name() = 0;
  virtual void* AllocateRaw(size_t alignment, size_t num_bytes) = 0;
  virtual void* AllocateRaw(size_t alignment, size_t num_bytes,
                            const AllocationAttributes& allocation_attr) {
    return AllocateRaw(alignment, num_bytes);
  }
  virtual void DeallocateRaw(void* ptr) = 0;

  virtual size_t AllocatedSize(const void* ptr) const {
    return RequestedSize(ptr);
  }
};

可以继承并实现自己的allocator

CPUAllocator

tensorflow/core/framework/cpu_allocator_impl.h


class CPUAllocator : public Allocator {
 public:
  CPUAllocator()
      : single_allocation_warning_count_(0),
        total_allocation_warning_count_(0) {}

  ~CPUAllocator() override {}

  string Name() override { return "cpu"; }

  void* AllocateRaw(size_t alignment, size_t num_bytes) override {
    void* p = port::AlignedMalloc(num_bytes, alignment);
    
    return p;
  }

  void DeallocateRaw(void* ptr) override {
    port::AlignedFree(ptr);
  }
};

//注册cpu allocator
REGISTER_MEM_ALLOCATOR("DefaultCPUAllocator", 100, CPUAllocatorFactory);

Allocator注册

tensorflow/core/framework/allocator_registry.h

class AllocatorFactoryRegistry {
 public:
  AllocatorFactoryRegistry() {}
  ~AllocatorFactoryRegistry() {}

  void Register(const char* source_file, int source_line, const string& name,
                int priority, AllocatorFactory* factory);
  Allocator* GetAllocator();

 private:
  std::vector<FactoryEntry> factories_ TF_GUARDED_BY(mu_);
};

tensorflow源码分析之Tensor