后端任务调度平台接入&部署文档

169 阅读8分钟

一、接入步骤

1.1 向后端管理员申请新应用,并设置应用配置

在数据库中的app_config表中插入新应用(目前没有写管理端界面,只能手动操作数据库)

配置项作用样例
app_name应用唯一标识,用于前端参数test-app
concurrent_num并发运行数1
cmd执行命令行,python文件的执行命令,需为绝对路径/usr/bin/python3 /Documents/project/main.py
timeout超时时间,单位分钟600
local_path存储文件路径/data/test-app
email_template邮件格式,json字符串。内部key的含义:subject(标题)、success(成功)...其中%jobId%和%time%会在程序中动态替换。{"failed": "任务执行失败", "running": "任务正在执行", "subject": "【DeepBIO Result Notice】- JOBID: %jobId%", "success": "任务执行成功 %jobid% %time%", "timeout": "任务超时", "waiting": "任务创建成功,等待执行"}

1.2 在python中接入

只需重写下面代码中的start_real() 即可,setting字典为后端传参,后端的回调url可根据情况进行更改

import json
import sys
import time

import requests


def start_real(setting):
    print("在这里开始运行py程序")
    print("setting:" + json.dumps(setting))
    return {'test': '1'}


def retry(func):
    def inner(*args, **kwargs):
        ret = func(*args, **kwargs)
        max_retry = 5
        number = 0
        if not ret:
            while number < max_retry:
                number += 1
                print("尝试第{}次".format(number))
                time.sleep(0.5)
                result = func(*args, **kwargs)
                if result:
                    return result
        return ret

    return inner


@retry
def do_post(url, data):
    try:
        return requests.post(url=url, data=data).json()
    except:
        return None


class CallbackEntity:
    def __init__(self, job_id, status, result=None):
        self.job_id = job_id
        self.status = status
        self.result = result

    def dict(self):
        return {
            'jobId': self.job_id,
            'status': self.status,
            'result': self.result
        }


# 后端平台回调接入方法,只需重写start_real()即可
def start_backend_filter():
    # 后端调度平台的任务状态更新链接,具体的ip、端口、路径需再确认
    backend_url = "http://211.87.232.152:6688/backend_platform/job/status/update"

    # 参数,包含一些固定参数和前端传参
    setting = json.loads(sys.argv[2])

    # jobId
    job_id = setting["jobId"]
    # 请求数据路径
    request_data_path = setting["requestDataPath"]
    # 结果保存路径
    save_path = setting["resultDataPath"]
    # 任务类型
    type = setting["type"]

    # 回调后端,将job的状态设置为1,表示python已开始执行任务
    callback_res = do_post(backend_url, CallbackEntity(job_id, 1).dict())
    if callback_res is None or callback_res['code'] != 0:
        return

    try:
        # 执行python程序
        result = start_real(setting)

        # 回调后端,将job的状态设置为2,表示任务执行成功,并把结果回传
        callback_res = do_post(backend_url, CallbackEntity(job_id, 2, json.dumps(result)).dict())
        if callback_res is None or callback_res['code'] != 0:
            return
    except Exception as e:
        print(e)
        # 回调后端,将job的状态设置为-1,表示任务执行出现异常
        do_post(backend_url, CallbackEntity(job_id, -1).dict())


if __name__ == '__main__':
    start_backend_filter()

二、后端平台执行逻辑简述

2.1 概述

2.2 接口文档

2.2.1 创建任务

POST /job/create

创建一个空任务,后续可根据此空任务的id来分批上传文件

{
  "code": 0,
  "message": "success",
  "data": {
    "jobId": "20230614004405179167e1",
    "status": "creating",
    "param": {},
    "result": null,
    "requestTime": null,
    "createTime": "2023-06-14 00:44:05",
    "completeTime": null,
    "type": null
  }
}

2.2.2 分批上传文件

POST /job/file/upload

因为一次性上传很多个文件容易超时失败,所以可通过此接口结合创建任务接口/job/create分批上传文件。

后端会以 {"fileKey":"文件绝对路径"} 的格式传递给python

参数名描述是否可为空
jobId任务ID
fileKey文件key,用于传递给python
file数据文件

2.2.3 提交任务

POST /job/submit

在提交完文件之后,调用该接口才是真正地创建一个可运行的任务。当然在提交文件仅有一个的情况下,也可以不用2.2.1和2.2.2接口,直接调用该接口提交任务。

参数名描述是否可为空
appName应用名
jobId创建的空任务ID。若为空,则为直接创建任务。
dataStr蛋白质序列字符串
dataFile蛋白质序列文件,和dataStr是一个东西,dataFile的优先级更高
paramjson字符串,前端可通过此参数更灵活地给python传参
mail邮箱
type任务类型,默认为0

2.2.4 更新任务状态

POST /status/update

更新任务状态,用于python回调接口

参数名描述是否可为空
jobId任务ID
status任务状态
result任务执行结果

2.2.5 查询任务具体信息

GET /job/info/{jobId}

获取任务具体信息

{
  "code": 0,
  "message": "success",
  "data": {
    "jobId": "20230613233449f3a64284",
    "status": "success",
    "param": {
      "a": 1,
      "jobId": "20230613233449f3a64284",
      "b": "tt",
      "type": 0,
      "requestDataPath": "/Users/yurui/tmp/request/job-20230613233449f3a64284-dataStr.txt",
      "resultDataPath": "/Users/yurui/tmp/result/"
    },
    "result": {
      "test": "1"
    },
    "requestTime": "2023-06-13 23:37:07",
    "createTime": "2023-06-13 23:34:50",
    "completeTime": "2023-06-13 23:37:07",
    "type": 0
  }
}

2.2.6 获取任务列表

GET /job/list

参数名描述是否可为空
appName应用名
type任务类型,默认为0
filterCreating是否需要过滤正在创建的任务,默认为true
page第几页,默认为1
size每页数量,默认为100
{
    "code": 0,
    "message": "success",
    "data": [
        {
            "jobId": "20230613233449f3a64284",
            "status": "success",
            "requestTime": "2023-06-13 23:37:07",
            "createTime": "2023-06-13 23:34:50",
            "completeTime": "2023-06-13 23:37:07",
            "type": 0
        },
        {
            "jobId": "2023061322521045b6d73b",
            "status": "failed",
            "requestTime": "2023-06-13 22:52:29",
            "createTime": "2023-06-13 22:52:11",
            "completeTime": null,
            "type": 0
        }
    ]
}

三、部署方式

git仓库:gitee.com/skyyemperor…

当前部署路径:/home/weilab/backend

3.1 创建数据库

运行db.sql中的建表sql

3.2 部署jar包

  • start方式:systemctl start backend_platform.service 或 /home/weilab/backend/backend_platform/start.sh start
  • stop方式:systemctl stop backend_platform.service 或 /home/weilab/backend/backend_platform/start.sh stop
  • 端口:6688 urlinner.wei-group.net/backend_pla…

1、用maven打包jar包

mvn clean package -Dmaven.test.skip=true

2、在服务器上设置config/application.yml,xxx需替换为具体的配置

server:
  port: 8000
  servlet:
    context-path: /xxx

spring:
  application:
    name: AI-BACKEND-TEMPLATE
  datasource:
    type: com.zaxxer.hikari.HikariDataSource
    driver-class-name: com.mysql.cj.jdbc.Driver
    url: jdbc:mysql://localhost:3306/xxx?useUnicode=true&characterEncoding=utf-8&useSSL=false&allowMultiQueries=true&serverTimezone=GMT%2B8&zeroDateTimeBehavior=convertToNull&allowPublicKeyRetrieval=true
    username: xxx
    password: xxx
    hikari:
      pool-name: MyHikariCP
      minimum-idle: 10
      maximum-pool-size: 10
      idle-timeout: 600000
      max-lifetime: 800000
      connection-timeout: 30000
      auto-commit: true
      connection-test-query: SELECT 1
    dbcp2:
      test-on-borrow: true
      validation-query: SELECT 1
  servlet:
    multipart:
      max-file-size: 2GB
      max-request-size: 2GB
  mail:
    username: xxx
    password: xxx
    host: xxx
    port: 465
    properties:
      mail.smtp.socketFactory.class: xxx
      mail.smtp.socketFactory.port: 465
      mail.smtp.auth: true
      mail.smtp.starttls.enable: true
      mail.smtp.starttls.required: true

mybatis-plus:
  type-aliases-package: com.weilab.biology.core.data.po
  mapper-locations: classpath:mapper/*.xml
  configuration:
    map-underscore-to-camel-case: true

3、设置启动脚本start.sh

#!/bin/bash

names=("backend_platform")
cmds=("java -jar -Xms100M -Xmx100M -Xmn80M backend_platform.jar")
SHELL_PATH="/home/weilab/backend/backend_platform"

pre_start() {
  case "$1" in
  0)
#    mvn clean package -Dmaven.test.skip=true
#    cp "$SHELL_PATH/target/ai-backend-template.jar" "${names[0]}.jar"
    ;;
  *)
    ;;
  esac
}

start() {
    cd $SHELL_PATH
    for((i=0;i<${#cmds[@]};i++)); do
        if [ $1 ] && [ ${path[i]} != $1 ]; then continue; fi
        pre_start $i
        pid=`ps -ef|grep "${cmds[i]}"|grep -v grep|awk '{print $2}'`
        if [ ! $pid ]; then
            log=`nohup ${cmds[i]} >> ${names[i]}.log 2>&1 &`
            sleep 0.5
            pid=`ps -ef|grep "${cmds[i]}"|grep -v grep|awk '{print $2}'`
            if [ $pid ]; then
                echo "${names[i]} start success!! pid is $pid"
            else
                echo "${names[i]} start failed."
                echo "$log"
            fi
        elif [ $pid ]; then
            echo "${names[i]} has started"
        else
            echo "${names[i]} is not exsit!"
        fi
    done
}

stop() {
    for((i=0;i<${#cmds[@]};i++)); do
        if [ $1 ] && [ ${path[i]} != $1 ]; then
            continue
        fi
        pid=`ps -ef|grep "${cmds[i]}"|grep -v grep|awk '{print $2}'`
        if [ $pid ]; then
            kill -9 $pid
            echo "${names[i]} stop success!!"
        else
            echo "${names[i]} has stopped."
        fi
    done
}


status() {
    for((i=0;i<${#cmds[@]};i++)); do
        if [ $1 ] && [ ${path[i]} != $1 ]; then
            continue
        fi
        pid=`ps -ef|grep "${cmds[i]}"|grep -v grep|awk '{print $2}'`
        if [ $pid ]; then
            echo "${names[i]} is running!! pid is $pid"
        else
            echo "${names[i]} has stopped."
        fi
    done
}

case "$1" in
start)
    start $2
    ;;
stop)
    stop $2
    ;;
restart)
    stop $2
    echo "$2 is restarting..."
    start $2
    ;;
status)
    status
    ;;
*)
    echo "Userage: $0 {start|stop|restart|status}"
    exit 1
esac

4、将脚本设置为可执行文件,并运行启动脚本

chmod +x start.sh
./start.sh start

3.3 内网穿透部署

内网服务器部署路径在/home/weilab/backend/intranet

外网服务器部署路径在/root/backend

因为直接连nps会被信息办ban,所以再叠一层hysteria代理,这两个工具自己去网上找教程了解一下

启动方式:

先启动hysteria: /home/weilab/backend/intranet/hysteria/start.sh start

再启动npc:/home/weilab/backend/intranet/npc/start.sh start

我配置了开机自启:systemctl start autoboot.service

但是可能因为开机时的磁盘挂载问题没生效,以后如果内网穿透还不能开机自启就来找我,我来帮配个docker

四、常见问题解决方式

3.1 hysteria client端报错

2023-08-26T10:39:59+08:00 [INFO] [retry:40] [interval:10] Retrying...
2023-08-26T10:40:11+08:00 [ERRO] [error:CRYPTO_ERROR (0x12a): x509: certificate has expired or is not yet valid: current time 2023-08-26T10:40:11+08:00 is after 2023-08-12T01:38:22Z] Failed to initialize client
2023-08-26T10:40:11+08:00 [INFO] [retry:41] [interval:10] Retrying...
2023-08-26T10:40:21+08:00 [ERRO] [error:CRYPTO_ERROR (0x12a): x509: certificate has expired or is not yet valid: current time 2023-08-26T10:40:21+08:00 is after 2023-08-12T01:38:22Z] Failed to initialize client

解决方式

  • 重启外网服务器的hysteria服务端(路径/root/backend/hysteria)
  • 还不行的话,就更新一下hysteria的证书配置