Rust-开发应用-分布式任务框架-Apalis-实践与0.7.1版本BUG

52 阅读9分钟

日常工作中有时需处理一个大型计算任务,这个任务可拆解成多个依赖计算资源的子任务组成,且可以并行执行。这类分布式计算任务可以抽象为如下模型:

Apalis.drawio.svg

其中,业务系统App发布分布式作业Job至任务队列Backend(这个队列可以是存储、信息中间件等等),空闲的Worker从队列中获取并执行还未执行的Job,以达成执行分布式任务的效果。

Apalis提供了类似的功能。Apalis将其自身定义为一个简单、可扩展的多线程作业和消息处理库;其有如下特点:

  1. 简单可预测的任务处理模型,内置并发、并行的工作流。
  2. Worker易于扩展,同时支持优雅关闭。
  3. 支持通过RedisSqlitePostgresMySQL等实现工作队列。

Apalis同时还提供有Web接口及UI,能够可视化管理你的分布式任务。

image.png

Apalis的工作机制类似于前文所说的模型:

sequenceDiagram
    participant App
    participant Worker
    participant Backend

    App->>+Backend: Add job to queue
    Backend-->>+Worker: Job data
    Worker->>+Backend: Update job status to 'Running'
    Worker->>+App: Started job
    loop job execution
        Worker-->>-App: Report job progress
    end
    Worker->>+Backend: Update job status to 'completed'

当前Apalis0.7.1版本。

示例

为项目中添加依赖,在Cargo.toml中添加:

[dependencies]
apalis = { version = "0.7", features = "limit" } # Limit for concurrency
apalis-redis = { version = "0.7" } # Use redis for persistence

这里我们定义一个用于处理邮件的Worker。其使用Redis作为工作队列,内部存储Email作为Job的数据给Worker消费。其中send_email()是实际处理作业Job的函数,当Worker消费到一条未被执行的Email数据,将触发send_email()函数。

use apalis::prelude::*;
use apalis_redis::RedisStorage;
use serde::{Deserialize, Serialize};

#[derive(Debug, Deserialize, Serialize)]
struct Email {
    to: String,
}

/// A function called for every job
async fn send_email(job: Email, data: Data<usize>) -> Result<(), Error> {
  /// execute job
  Ok(())
}

#[tokio::main]
async fn main() -> {
    std::env::set_var("RUST_LOG", "debug");
    env_logger::init();
    let redis_url = std::env::var("REDIS_URL").expect("Missing env variable REDIS_URL");
    let conn = apalis_redis::connect(redis_url).await.expect("Could not connect");
    let storage = RedisStorage::new(conn);
    WorkerBuilder::new("email-worker")
      .concurrency(2)
      .data(0usize)
      .backend(storage)
      .build_fn(send_email)
      .run()
      .await;
}

之后,我们还需要一个App作为消费者来发布分布式任务,App通常是在另一台服务器上的业务进程。通过produce_route_jobs()发布Email数据至Redis工作队列中。

//This can be in another part of the program or another application eg a http server
async fn produce_route_jobs(storage: &mut RedisStorage<Email>) -> Result<()> {
    storage
        .push(Email {
            to: "test@example.com".to_string(),
        })
        .await?;
}

这样,一个完整的示例程序就完成了。

BUG@0.7.1

博主在尝试使用apalis-mysql时遇到了BUG,并且该BUG在当前一般使用apalis-mysql时基本都会遇到。当前已经知会给开发者。

BUG Issue可以查看github.com/geofmureith… 。具体BUG的触发代码如下:

use std::time::Duration;

use anyhow::Result;

use apalis::layers::retry::backoff::ExponentialBackoffMaker;
use apalis::layers::retry::backoff::MakeBackoff;
use apalis::layers::retry::RetryPolicy;
use apalis::prelude::*;
use apalis_redis::RedisStorage;
use apalis_sql::mysql::MySqlPool;
use apalis_sql::mysql::MysqlStorage;
use email_service::{send_email, Email};
use tokio::signal::ctrl_c;
use tokio::time::sleep;

async fn produce_mysql_jobs(storage: &MysqlStorage<Email>) -> Result<()> {
    let mut storage = storage.clone();

    sleep(Duration::from_millis(100)).await;
    storage
        .push(Email {
            to: format!("test@example.com"),
            text: "Test background job from apalis".to_string(),
            subject: "Background email job".to_string(),
        })
        .await?;

    Ok(())
}

async fn mysql() -> Result<()> {
    std::env::set_var("RUST_LOG", "debug,sqlx::query=error");
    tracing_subscriber::fmt::init();
    let database_url = std::env::var("DATABASE_URL")
        .unwrap_or_else(|_| "mysql://root:strong_password@localhost:3306/apalis-jobs".to_string());
    let pool = MySqlPool::connect(&database_url).await?;

    // Setup migrations
    MysqlStorage::setup(&pool).await?;

    // Create a storage that consumes `Email`
    let mysql: MysqlStorage<Email> = MysqlStorage::new(pool);

    Monitor::new()
        .register({
            WorkerBuilder::new("tasty-avocado")
                .concurrency(8)
                .enable_tracing()
                .backend(mysql)
                .build_fn(send_email)
        })
        .run_with_signal(ctrl_c())
        .await?;
    Ok(())
}

async fn mysql_producer() -> Result<()> {
    std::env::set_var("RUST_LOG", "debug,sqlx::query=error");
    tracing_subscriber::fmt::init();
    let database_url = std::env::var("DATABASE_URL")
        .unwrap_or_else(|_| "mysql://root:strong_password@localhost:3306/apalis-jobs".to_string());
    let pool = MySqlPool::connect(&database_url).await?;

    // Setup migrations
    MysqlStorage::setup(&pool).await?;

    // Create a storage that consumes `Email`
    let mysql: MysqlStorage<Email> = MysqlStorage::new(pool);
    produce_mysql_jobs(&mysql).await?;
    Ok(())
}

async fn redis() -> Result<()> {
    std::env::set_var("RUST_LOG", "debug,sqlx::query=error");
    tracing_subscriber::fmt::init();
    let redis_url = std::env::var("REDIS_URL")
        .unwrap_or_else(|_| "redis://localhost:6379".to_string());
    let conn = apalis_redis::connect(redis_url)
        .await
        .expect("Could not connect");
    let storage = RedisStorage::new(conn);

    Monitor::new()
        .register({
            WorkerBuilder::new("email-worker-shadow")
                .enable_tracing()
                .concurrency(8)
                .backend(storage)
                .build_fn(send_email)
        })
        .run_with_signal(ctrl_c())
        .await?;
    Ok(())
}

async fn produce_redis_jobs(storage: &RedisStorage<Email>) -> Result<()> {
    let mut storage = storage.clone();

    sleep(Duration::from_millis(100)).await;
    storage
        .push(Email {
            to: format!("test@example.com"),
            text: "Test background job from apalis".to_string(),
            subject: "Background email job".to_string(),
        })
        .await?;

    Ok(())
}

async fn redis_producer() -> Result<()> {
    std::env::set_var("RUST_LOG", "debug,sqlx::query=error");
    tracing_subscriber::fmt::init();
    let redis_url = std::env::var("REDIS_URL")
        .unwrap_or_else(|_| "redis://localhost:6379".to_string());
    let conn = apalis_redis::connect(redis_url)
        .await
        .expect("Could not connect");
    let storage = RedisStorage::new(conn);
    produce_redis_jobs(&storage).await?;
    Ok(())
}

#[tokio::main]
async fn main() -> Result<()> {
    mysql_producer().await
}

其中,redis()redis_producer()为使用Redis作为工作队列的分布式系统。mysql()mysql_producer()为使用Mysql作为工作队列的分布式系统。

使用apalis-redis说明预期行为

首先,我们可以通过apalis-redis来观察预期的示例行为:

  1. 在进程1中执行redis()来启动一个Worker订阅Email
  2. 在进程2中执行redis_producer()来生产一条Email数据。

如下输出结果中attempt=1attempt表示此次Job被尝试执行的次数),正常的预期行为是,这条作业仅会被一个Worker线程获取并执行。

    Finished `dev` profile [unoptimized + debuginfo] target(s) in 2.60s
     Running `target/debug/example`
2025-05-07T08:16:02.549371Z DEBUG task{task_id="01JTMX1SZC9EHD3JQ95CD46BMD" attempt=1}: apalis::layers::tracing::on_request: task.start
2025-05-07T08:16:04.550793Z  INFO task{task_id="01JTMX1SZC9EHD3JQ95CD46BMD" attempt=1}: email_service: Sending email to test@example.com, is_shutting_down false, count 1    
2025-05-07T08:16:06.553781Z  INFO task{task_id="01JTMX1SZC9EHD3JQ95CD46BMD" attempt=1}: email_service: Sending email to test@example.com, is_shutting_down false, count 2    
2025-05-07T08:16:08.556312Z  INFO task{task_id="01JTMX1SZC9EHD3JQ95CD46BMD" attempt=1}: email_service: Sending email to test@example.com, is_shutting_down false, count 3    
2025-05-07T08:16:10.558700Z  INFO task{task_id="01JTMX1SZC9EHD3JQ95CD46BMD" attempt=1}: email_service: Sending email to test@example.com, is_shutting_down false, count 4    
.......  
2025-05-07T08:16:20.567444Z  INFO task{task_id="01JTMX1SZC9EHD3JQ95CD46BMD" attempt=1}: email_service: Shutting down email job    
2025-05-07T08:16:20.567517Z  INFO task{task_id="01JTMX1SZC9EHD3JQ95CD46BMD" attempt=1}: email_service: Shut down email job    
2025-05-07T08:16:20.567689Z DEBUG task{task_id="01JTMX1SZC9EHD3JQ95CD46BMD" attempt=1}: apalis::layers::tracing::on_response: task.done done_in=18018ms result=()

使用apalis-mysql说明BUG

相比apalis-redis,触发BUG的场景在示例中仅替换了工作队列的Backend形式,即由Reids转为Mysql

  1. 在进程1中执行mysql()来启动一个Worker订阅Email
  2. 在进程2中执行mysql_producer()来生产一条Email数据。

运行后输出结果如下:

    Finished `dev` profile [unoptimized + debuginfo] target(s) in 1.45s
     Running `target/debug/example`
2025-05-07T08:17:30.010863Z DEBUG sqlx_mysql::connection::tls: not performing TLS upgrade: TLS support not compiled in
2025-05-07T08:17:30.063521Z DEBUG sqlx_mysql::connection::tls: not performing TLS upgrade: TLS support not compiled in
2025-05-07T08:17:30.063620Z DEBUG sqlx_mysql::connection::tls: not performing TLS upgrade: TLS support not compiled in
2025-05-07T08:17:42.757838Z DEBUG task{task_id="01JTMX4VT0BZHV10TGEJXQ2RSB" attempt=1}: apalis::layers::tracing::on_request: task.start
2025-05-07T08:17:42.872156Z DEBUG task{task_id="01JTMX4VT0BZHV10TGEJXQ2RSB" attempt=2}: apalis::layers::tracing::on_request: task.start
2025-05-07T08:17:42.984470Z DEBUG task{task_id="01JTMX4VT0BZHV10TGEJXQ2RSB" attempt=3}: apalis::layers::tracing::on_request: task.start
2025-05-07T08:17:43.213097Z DEBUG task{task_id="01JTMX4VT0BZHV10TGEJXQ2RSB" attempt=4}: apalis::layers::tracing::on_request: task.start
2025-05-07T08:17:43.321480Z DEBUG task{task_id="01JTMX4VT0BZHV10TGEJXQ2RSB" attempt=5}: apalis::layers::tracing::on_request: task.start
2025-05-07T08:17:43.432377Z DEBUG task{task_id="01JTMX4VT0BZHV10TGEJXQ2RSB" attempt=6}: apalis::layers::tracing::on_request: task.start
2025-05-07T08:17:44.760357Z  INFO task{task_id="01JTMX4VT0BZHV10TGEJXQ2RSB" attempt=1}: email_service: Sending email to test@example.com, is_shutting_down false, count 1    
2025-05-07T08:17:44.874307Z  INFO task{task_id="01JTMX4VT0BZHV10TGEJXQ2RSB" attempt=2}: email_service: Sending email to test@example.com, is_shutting_down false, count 1    
...
2025-05-07T08:17:53.330770Z  INFO task{task_id="01JTMX4VT0BZHV10TGEJXQ2RSB" attempt=5}: email_service: Shut down email job    
2025-05-07T08:17:53.330820Z DEBUG task{task_id="01JTMX4VT0BZHV10TGEJXQ2RSB" attempt=5}: apalis::layers::tracing::on_response: task.done done_in=10009ms result=()
2025-05-07T08:17:53.443285Z  INFO task{task_id="01JTMX4VT0BZHV10TGEJXQ2RSB" attempt=6}: email_service: Sending email to test@example.com, is_shutting_down true, count 5    
2025-05-07T08:17:53.443420Z  INFO task{task_id="01JTMX4VT0BZHV10TGEJXQ2RSB" attempt=6}: email_service: Shutting down email job    
2025-05-07T08:17:53.443478Z  INFO task{task_id="01JTMX4VT0BZHV10TGEJXQ2RSB" attempt=6}: email_service: Shut down email job    
2025-05-07T08:17:53.443541Z DEBUG task{task_id="01JTMX4VT0BZHV10TGEJXQ2RSB" attempt=6}: apalis::layers::tracing::on_response: task.done done_in=10011ms result=()

其中,一些日志打印的问题,开发者说明是由于我在两个示例中使用的.concurrency(8).enable_tracing()顺序不同导致。但关键的BUG,在于attempt = 1,2,3..6的出现:正常情况下,一个正在执行的Job,不应该被多个Worker执行,就像一封邮件无法被多次投递一样。在Apalis中,Job只有在两种场景下会被attempt(理解为尝试执行):

  1. Job状态为Pending(待执行)时;此时任务被提交,但是还未执行。
  2. Job状态为Failed(失败)时且执行次数不超过最大尝试次数;实际是一种失败重试机制。

源码calculate_status

pub fn calculate_status<Res>(ctx: &SqlContext, res: &Response<Res>) -> State {
    match &res.inner {
        Ok(_) => State::Done,
        Err(e) => match &e {
            Error::Abort(_) => State::Killed,
            Error::Failed(_) if ctx.max_attempts() as usize <= res.attempt.current() => {
                State::Killed
            }
            _ => State::Failed,
        },
    }
}

源码stream_jobs

let fetch_query = "SELECT id FROM Jobs
                    WHERE (status = 'Pending' OR (status = 'Failed' AND attempts < max_attempts)) AND run_at < ?1 AND job_type = ?2 ORDER BY priority DESC LIMIT ?3";

而正常Job执行时,其正常状态应该为Running(正在运行),不应该被多次尝试执行。

以上就是当前Apalis版本的BUG基本信息,更多细节大家可以参考前面发出的与开发者沟通的Issue,或是查看源码Debug。这个BUG导致Apalis-mysql当前版本是无法使用的,需要特别注意:)