TensorflowJS在node环境进行线性回归demo

404 阅读4分钟

使用TensorflowJS在node环境中

源代码链接 - 码云 - 开源中国 (gitee.com)JS可以做任何事情,其中包括机器学习。主要的js机器学习库包括谷歌的tfjs,但是大部分教程都是基于浏览器端的。这里提供一个node端的tfjs机器学习demo,项目依赖还包括typescript,nodemon等。 使用node环境的好处是跳过了浏览器环境,更加释放性能。并且linux系统下可以使用node调用GPU加速操作。我的系统环境是Ubuntu20.04,使用@tensorflow/tfjs-node。使用prettier作为代码格式化工具。使用nodemon作为监听的包。参考APITensorFlow.js API

参考package.json

{

  "name": "ts-node",

  "version": "1.0.0",

  "description": "",

  "main": "index.js",

  "scripts": {

    "test": "echo "Error: no test specified" && exit 1",

    "watch": "tsc --watch",

    "start": "nodemon --config ./nodemon.json",

    "http-server": "http-server ."

  },

  "keywords": [

    "tfjs"

  ],

  "author": "masaikk",

  "license": "ISC",

  "dependencies": {

    "@tensorflow/tfjs-vis": "^1.5.1",

    "@types/node": "^17.0.23",

    "http-server": "^14.1.0",

    "nodemon": "^2.0.15",

    "typescript": "^4.6.3"

  },

  "devDependencies": {

    "@tensorflow/tfjs-node": "^3.15.0",

    "prettier": "^2.6.2"

  }

}

简要的示例代码检查npm是否安装成功

import * as tf from "@tensorflow/tfjs-node";
import type { Sequential, Tensor2D, Tensor } from "@tensorflow/tfjs-node";
import { model } from "./testapis";

let verbose: boolean = false;

// Create a rank-2 tensor (matrix) matrix tensor from a multidimensional array.
const a: Tensor = tf.tensor([
  [1, 2],
  [3, 4],
]);
console.log("shape:", a.shape);
// a.print();

// Or you can create a tensor from a flat array and specify a shape.
const shape = [2, 2];
const b: Tensor = tf.tensor([1, 2, 3, 101], shape);
console.log("shape:", b.shape);
// b.print();

const c: Tensor = tf.add(a, b);

c.print(verbose);

使用数据集

使用dataset来导入csv文件,要注意的是读取操作是返回promise。

可以使用http-server来当本地的数据服务器。

导入使用如下代码

import * as tf from "@tensorflow/tfjs-node";
import { CSVDataset } from "@tensorflow/tfjs-data/dist/datasets/csv_dataset";

const houseScaleData: CSVDataset = tf.data.csv(
  "http://127.0.0.1:8080/src/testapis/dataset/kc_house_data.csv"
);
const data = houseScaleData.take(10);
// 获取前十行

export { houseScaleData, data };

注意index.ts中的异步函数操作

import * as tf from "@tensorflow/tfjs-node";
import type { Sequential, Tensor2D, Tensor } from "@tensorflow/tfjs-node";
import { houseScaleData, data } from "./testapis";

async function run() {
  const data10 = await houseScaleData.take(10).toArray();
  console.log(data10);
}

run();

可以使用对象数组的map方法来遍历,*不过这里我碰到了一些问题,必须要把map对象设置为any。*疑似代码:

import * as tf from "@tensorflow/tfjs-node";
import type { Sequential, Tensor2D, Tensor } from "@tensorflow/tfjs-node";
import { houseScaleData, data } from "./testapis";

async function run() {
  const data10 = houseScaleData.take(10);
  const data10Objs = await data10.toArray();

  const points: any[] = data10Objs.map((record: any) => {
    // console.log(record);
    let sqft_living: number = record.sqft_living;
    let price: number = record.price;

    // console.log(typeof record);
    return Object({
      x: sqft_living,
      y: price,
    });
  });

  const featureValues = points.map((p) => p.x);
  const featureTensor: Tensor2D = tf.tensor2d(featureValues, [
    featureValues.length,
    1,
  ]);

  const labelValues = points.map((p) => p.y);
  const labelTensor: Tensor2D = tf.tensor2d(labelValues, [
    labelValues.length,
    1,
  ]);

  featureTensor.print();
  labelTensor.print();
}

run();

加上point接口的定义,注意numberNumber区别:

import * as tf from "@tensorflow/tfjs-node";
import type { Sequential, Tensor2D, Tensor } from "@tensorflow/tfjs-node";
import { houseScaleData, data } from "./testapis";

interface point {
  x: number;
  y: number;
}

async function run() {
  const data10 = houseScaleData.take(10);

  const points: point[] = (await houseScaleData.toArray()).map(
    (record: any) => {
      let sqft_living: number = record.sqft_living;
      let price: number = record.price;
      return Object({
        x: sqft_living,
        y: price,
      });
    }
  );

  const featureValues = points.map((p) => p.x);
  const featureTensor: Tensor2D = tf.tensor2d(featureValues, [
    featureValues.length,
    1,
  ]);

  const labelValues = points.map((p) => p.y);
  const labelTensor: Tensor2D = tf.tensor2d(labelValues, [
    labelValues.length,
    1,
  ]);

  featureTensor.print();
  labelTensor.print();
}

run();

正则化

ts-node/src/testapis/utils/TensorUtils.ts中定义正则化函数normalise并导出。

import type { Tensor } from "@tensorflow/tfjs-node";

function normalise(tensor: Tensor): Tensor {
  const min: Tensor = tensor.min();
  const max: Tensor = tensor.max();
  const normalisedTensor: Tensor = tensor.sub(min).div(max.sub(min));
  return normalisedTensor;
}

export { normalise };

定义denormalise函数并导出

function denormalise(tensor: Tensor, min: Tensor, max: Tensor): Tensor {
  const denomalisedTensor: Tensor = tensor.mul(max.sub(min)).add(min);
  return denomalisedTensor;
}

切分数据集

使用tf.split方法。需要注意的是这里需要考虑切分的原始数据是否可以划分完否则会报错。

示例代码

  const [trainNormalisedFeatureTensor, testNormalisedFeatureTensor]: Tensor[] =
    tf.split(normalisedFeature.tensor, 2, 0);

这里表示切成两份并且从第零维开始切分。

创建模型

以线性回归为例,使用const model: Sequential = tf.sequential();

示例代码

import * as tf from "@tensorflow/tfjs-node";
import type { Sequential } from "@tensorflow/tfjs-node";

function createModel(): Sequential {
  const model: Sequential = tf.sequential();
  model.add(
    tf.layers.dense({
      units: 1,
      useBias: true,
      activation: "linear",
      inputDim: 1,
    })
  );

  return model;
}

export { createModel };

参考ts-node/src/testapis/models/LinearModel.ts

参看model的基本信息

const model: Sequential = createModel();
  model.summary();
__________________________________________________________________________________________
Layer (type)                Input Shape               Output shape              Param #   
==========================================================================================
dense_Dense1 (Dense)        [[null,1]]                [null,1]                  2         
==========================================================================================
Total params: 2
Trainable params: 2
Non-trainable params: 0
__________________________________________________________________________________________

考虑对于model添加compile选项以规定训练的方法:

  const optim: tf.SGDOptimizer = tf.train.sgd(0.1);

  model.compile({
    loss: "meanSquaredError",
    optimizer: optim,
  });

训练模型

训练

使用model.fit()方法,比如:

async function trainModel(
  model: Sequential,
  trainingFeatureTensor: Tensor,
  trainingLabelTensor: Tensor
): Promise<void> {
  model.fit(trainingFeatureTensor, trainingLabelTensor);
}

这里的fit方法可以添加更多的选择,比如说epoch和回调函数。

model.fit(trainingFeatureTensor, trainingLabelTensor, {
    epochs: 20,
    callbacks: {
      onEpochEnd: (epoch: number, log: Logs | undefined) => {
        console.log(`Epoch ${epoch} with loss: ${(log as Logs).loss}`);
      },
    },
  });

注意model.fit()方法是异步的。所以以上定义的trainModel()方法也是异步的。

参考代码位于ts-node/src/testapis/models/modelHandler.ts

返回值

参考返回的对象tf.History

export declare class History extends BaseCallback {
    epoch: number[];
    history: {
        [key: string]: Array<number | Tensor>;
    };
    onTrainBegin(logs?: UnresolvedLogs): Promise<void>;
    onEpochEnd(epoch: number, logs?: UnresolvedLogs): Promise<void>;
    /**
     * Await the values of all losses and metrics.
     */
    syncData(): Promise<void>;
}

验证

添加验证集的比例

async function trainModel(
  model: Sequential,
  trainingFeatureTensor: Tensor,
  trainingLabelTensor: Tensor
): Promise<History> {
  return model.fit(trainingFeatureTensor, trainingLabelTensor, {
    batchSize: 32,
    epochs: 20,
    validationSplit: 0.2,
    callbacks: {
      onEpochEnd: (epoch: number, log: Logs | undefined) => {
        console.log(`Epoch ${epoch} with loss: ${(log as Logs).loss}`);
      },
    },
  });
}

测试模型

评估

使用model.evaluate()方法进行评估。

  const lossTensor: tf.Scalar | tf.Scalar[] = model.evaluate(
    testNormalisedFeatureTensor,
    testNormalisedLabelTensor
  );
  const loss: Uint8Array | Int32Array | Float32Array = (
    lossTensor as tf.Scalar
  ).dataSync();
  console.log("testing loss: " + loss);