APScheduler官方文档:apscheduler.readthedocs.io/en/3.x/user…
1. APScheduler简介
全称是Advanced Python Scheduler,是一个轻量级的Python定时任务调度框架(Python库)
1.1 APScheduler的4种components
- triggers: 包含调度逻辑,决定任务是否运行的条件,除了最初的配置外,是无状态的
- job stores:存储被调度的任务,默认的job store将任务存储在内存中,除此之外也可以存储到数据库中;除默认存储外,不会将job的数据存储到内存中;此外还需要注意,不能将job store在scheduler之间共享
- executors:将job中指定的可调用对象提交给一个线程或者线程池,执行完毕后,会通知scheduler完成后续工作
- schedulers:负责将其余部分绑定到一起,一个程序中通常只有一个scheduler,通过其提供的接口配置job store,executor;完成对job的增加, 修改,删除操作
1.2 常见的scheduler
- BlockingScheduler:当程序中调度任务是唯一的一件事情的时候使用,因为启动后会被阻塞
- BackgroundScheduler: 没有使用框架,且希望在后台运行的时候
- AsyncIOScheduler:应用程序使用asyncio模块的时候
- GeventScheduler: 应用程序使用Gevent的时候
- TornadoScheduler: 如果构建的是一个Tornado应用
- TwistedScheduler: 如果构建的是一个Twisted应用
- QTScheduler:如果构建的是一个QT应用
1.3 选择合适的job store
如果每次启动应用都会重新创建所有的job,就可以选择无状态的内存存储(MemoryJobStore)
如果在程序重启或崩溃时希望存储任务的状态,那么就可以选择官方推荐的SqlAlchemyJobStore和PostgreSQL
1.4 选择合适的 executor
如果使用了framework,其会提供对应的执行器。否则可以使用ThreadPoolExecutor或者ProcessPoolExecutor(适用于多个CPU,且为计算密集型程序),也可以通过添加secondary executor来同时使用两者
1.5 选择合适的trigger
APScheduler有三种内置的触发器
- date: 指定时间点运行一次时使用
- interval:固定的时间间隔使用
- cron: 在特定的日期特定的时间运行
也可以多个触发器一起使用,当某一个触发器符合要求,或者同时符合要求时执行job:apscheduler.readthedocs.io/en/3.x/modu…
2. 配置scheduler
2.1 创建一个BackgroundScheduler,使用默认的MemeryScheduler
from apscheduler.schedulers.background import BackgroundScheduler
scheduler = BackgroundScheduler()
2.2 为scheduler配置两个jobstore和两个executor
. You want to have two job stores using two executors and you also want to tweak the default values for new jobs and set a different timezone. The following three examples are completely equivalent, and will get you:
- a MongoDBJobStore named “mongo”
- an SQLAlchemyJobStore named “default” (using SQLite)
- a ThreadPoolExecutor named “default”, with a worker count of 20
- a ProcessPoolExecutor named “processpool”, with a worker count of 5
- UTC as the scheduler’s timezone
- coalescing turned off for new jobs by default
- a default maximum instance limit of 3 for new jobs
通过配置子字典进行配置
也可以通过参数配置,或者先创建scheduler的实例,在进行配置,具体配置方法可到官网查看
此处依赖两个模块需要安装 SQLAlchemy 和 PyMongo
代码示例:
import unittest
class TestConfigScheduler(unittest.TestCase):
def test_config_default(self):
from apscheduler.schedulers.background import BackgroundScheduler
scheduler = BackgroundScheduler()
def test_config_scheduler(self):
from pytz import utc
from apscheduler.schedulers.background import BackgroundScheduler
from apscheduler.jobstores.mongodb import MongoDBJobStore
from apscheduler.jobstores.sqlalchemy import SQLAlchemyJobStore
from apscheduler.executors.pool import ThreadPoolExecutor, ProcessPoolExecutor
jobstores = {
'mongo': MongoDBJobStore(),
'default': SQLAlchemyJobStore(url='sqlite:///jobs.sqlite')
}
executors = {
'default': ThreadPoolExecutor(20),
'processpool': ProcessPoolExecutor(5)
}
job_defaults = {
'coalesce': False,
'max_instances': 3
}
scheduler = BackgroundScheduler(jobstores=jobstores, executors=executors, job_defaults=job_defaults,
timezone=utc)
3. 启动一个调度器
调用scheduler实例的start()方法启动调度器,BlockingScheduler启动之后,会将当前线程阻塞,所以需要在启动前就完成所有的设置,而其他的调度器则在启动后立即返回,可以继续向调度器中添加作业;但是启动了scheduler后,就不能继续修改其配置了
4. 向scheduler中添加任务
向schduler中添加任务有两种方法
- 通过scheduler.add_job()添加任务
- 通过装饰器scheduled_job()添加任务
通过函数添加的job成功后,会返回对应的job,在添加后对其进行修改;而装饰器只能添加不在修改的作业,优点是更加简明
代码示例:
import unittest
from apscheduler.schedulers.background import BlockingScheduler
class TesStartScheduler(unittest.TestCase):
def test_date_tigger(self):
"""
启动一个 使用blockscheduler调度器创建的以时间为触发器的任务
BlockingScheduler是同步阻塞的scheduler
"""
def my_job(text):
print(text)
sched = BlockingScheduler()
# 如果不设置执行时间,将会立即执行
sched.add_job(my_job, 'date', args=['text'])
# sched.add_job(my_job, 'date', run_date=datetime(2023, 2, 4, 22, 23, 0), args=['text'])
sched.start()
def test_add_job_by_decoration(self):
sched = BlockingScheduler()
@sched.scheduled_job('date', args=['text'])
def my_job(text):
print(text)
sched.start()
5. 从scheduler中删除任务
删除job的方法有两种
- 调用scheduler实例的remove_job的函数,并传入job_id,来进行删除
- 调用job实例的remove函数进行删除
由于只有add_job方法会返回job实例,所以通过装饰器添加的job,如果想要删除,在添加时需要执行job_id
代码示例:
class TestRemoveScheduler(unittest.TestCase):
def test_remove_job(self):
"""
删除使用add_job函数添加的job,可以不设置id,使用随机id
:return:
"""
sched = BackgroundScheduler()
def myfunc():
print("myfunc")
added_job1 = sched.add_job(myfunc,'interval',seconds=2)
sched.start()
time.sleep(6)
print(added_job1.id)
added_job1.remove()
time.sleep(4)
def test_remove_job1(self):
"""
删除使用add_job函数添加的job,可以不设置id,使用随机id
:return:
"""
sched = BackgroundScheduler()
@sched.scheduled_job('interval',seconds=2,id='myfunc')
def myfunc():
print("myfunc")
sched.start()
time.sleep(6)
sched.remove_job(job_id='myfunc')
time.sleep(4)
6. 暂停/恢复任务
有两种方式可以对任务进行暂停/回复操作
- 调用scheduler实例的pause_job()/resume_job()的函数,并传入job_id,来进行删除
- 调用job实例的pause()/resume()函数进行删除
class PauseAndResumeJob(unittest.TestCase):
def test_pause_and_resume_job1(self):
sched = BackgroundScheduler()
def myfunc():
print("myfunc")
added_job1 = sched.add_job(myfunc,'interval',seconds=2)
sched.start()
time.sleep(6)
added_job1.pause()
print("sleep 6 sec start")
time.sleep(6)
print("sleep 6 sec end")
added_job1.resume()
time.sleep(4)
def test_pause_and_resume_job2(self):
sched = BackgroundScheduler()
def myfunc():
print("myfunc")
added_job1 = sched.add_job(myfunc,'interval',seconds=2,id='myfunc')
sched.start()
time.sleep(6)
sched.pause_job(job_id="myfunc")
print("sleep 6 sec start")
time.sleep(6)
print("sleep 6 sec end")
sched.resume_job(job_id="myfunc")
time.sleep(4)
7. 列出当前被调度的任务
scheduler的实例有两个函数可以列出当前别调度的任务
-
get_jobs()
-
print_jobs() : 会优化显示的信息,一并打印下次执行的时间
class TestListScheduledJob(unittest.TestCase):
def test_list_job(self):
sched = BackgroundScheduler()
def myfunc1():
print("myfunc1")
def myfunc2(text):
print("myfunc2"+text)
added_job1 = sched.add_job(myfunc1,'interval',seconds=2,id='myfunc1')
sched.start()
time.sleep(5)
added_job1.pause()
added_job2 = sched.add_job(myfunc2,'interval',seconds=2,id='myfunc2',args=['print'])
print(sched.get_jobs())
print(sched.print_jobs())
time.sleep(6)
8. 修改job
如果只是修改job的相关属性,那么只需要通过job的modify函数即可
然而如果需要修改trigger相关的信息,那么就需要重新进行调度,就需要用到scheduler的reschedule_job()的函数了,传入job_id或者 使用job实例的reschedule()函数进行修改
class TestModifyJob(unittest.TestCase):
def test_modify_job(self):
sched = BackgroundScheduler()
def myfunc(text):
print("myfunc1:"+datetime.datetime.now().strftime('%Y%m%D %H:%M:%S')+":"+text)
added_job1 = sched.add_job(myfunc,'interval',seconds=2,id='myfunc1',args=['tttttt'])
sched.start()
print(added_job1.name,added_job1.args)
added_job1.modify(name="jobForMyFunc",args=['test'])
print(added_job1.name, added_job1.args)
def test_reschedule_job(self):
sched = BackgroundScheduler()
def myfunc(text):
print( text+":" +"myfunc1:" + datetime.datetime.now().strftime('%y%m%d %H:%M:%S') )
added_job1 = sched.add_job(myfunc, 'interval', seconds=2, id='myfunc1', args=['tttttt'])
sched.start()
time.sleep(5)
added_job1.reschedule('interval',seconds=5)
time.sleep(15)
def test_reschedule_job2(self):
sched = BackgroundScheduler()
def myfunc(text):
print( text+":" +"myfunc1:" + datetime.datetime.now().strftime('%y%m%d %H:%M:%S') )
added_job1 = sched.add_job(myfunc, 'interval', seconds=2, id='myfunc1', args=['tttttt'])
sched.start()
time.sleep(5)
sched.reschedule_job('myfunc1',trigger='interval',seconds=5)
time.sleep(15)
9. 停止scheduler
通过scheduler实例的shutdow()函数,可以关闭 job stores and executors,并且会等待执行的任务完成后关闭,如果不想等待的话,添加参数wait=False
class TestShutdownScheduler(unittest.TestCase):
def test_shutdown_scheduler_wait_finish(self):
sched = BackgroundScheduler()
def myfunc(text):
print( text+":" +"myfunc1:" + datetime.datetime.now().strftime('%y%m%d %H:%M:%S') )
time.sleep(5)
print(text + ":" + "myfunc1:" + datetime.datetime.now().strftime('%y%m%d %H:%M:%S')+'exec end')
added_job1 = sched.add_job(myfunc, 'interval', seconds=2, id='myfunc1', args=['tttttt'])
sched.start()
time.sleep(11)
sched.shutdown()
print("shutted")
print(datetime.datetime.now().strftime('%y%m%d %H:%M:%S'))
def test_shutdown_scheduler_now(self):
sched = BackgroundScheduler()
def myfunc(text):
print( text+":" +"myfunc1:" + datetime.datetime.now().strftime('%y%m%d %H:%M:%S') )
time.sleep(5)
print(text + ":" + "myfunc1:" + datetime.datetime.now().strftime('%y%m%d %H:%M:%S')+'exec end')
added_job1 = sched.add_job(myfunc, 'interval', seconds=2, id='myfunc1', args=['tttttt'])
sched.start()
time.sleep(11)
sched.shutdown(wait=False)
print("shutted")
print(datetime.datetime.now().strftime('%y%m%d %H:%M:%S'))
# 如果主线程不退出,则会发现后台job会继续执行完成
# time.sleep(10)
10. 暂停/恢复scheduler
通过调用scheduler实例的函数,可以暂停和回复scheduler
- 暂停 sched.pause()
- 回复 sched.resume()
还可以直接启动一个暂停状态的scheduler, sched.start(pause=True)
class TestPauseAndResumeScheduler(unittest.TestCase):
def test_pause_and_resume_scheduler(self):
sched = BackgroundScheduler()
def myfunc1():
print("myfunc1")
def myfunc2(text):
print("myfunc2" + text)
added_job1 = sched.add_job(myfunc1, 'interval', seconds=2, id='myfunc1')
sched.start()
added_job2 = sched.add_job(myfunc2, 'interval', seconds=2, id='myfunc2', args=['print'])
time.sleep(6)
sched.pause()
print("sleep 6 sec start")
time.sleep(6)
print("sleep 6 sec end")
sched.resume()
time.sleep(5)
def test_start_a_paused_scheduler(self):
sched = BackgroundScheduler()
def myfunc1():
print("myfunc1")
def myfunc2(text):
print("myfunc2" + text)
added_job1 = sched.add_job(myfunc1, 'interval', seconds=2, id='myfunc1')
sched.start(paused=True)
added_job2 = sched.add_job(myfunc2, 'interval', seconds=2, id='myfunc2', args=['print'])
print("sleep 6 sec start")
time.sleep(6)
print("sleep 6 sec end")
sched.resume()
time.sleep(5)
11. 限制并发执行实例的数量
当我们执行test_shutdown_scheduler_now测试函数的时候,会发现有一些任务被跳过了,这是因为默认不允许同一个任务同时执行多次。可以通过add_job()时调整参数max_instances=3,来使允许任务同时最多执行3个

12. 丢失任务的执行与合并
有时,任务会由于一些问题没有被执行。最常见的情况就是,在数据库里的任务到了该执行的时间,但调度器被关闭了,那么这个任务就成了“哑弹任务”。错过执行时间后,调度器才打开了。这时,调度器会检查每个任务的misfire_grace_time参数int值,即哑弹上限,来确定是否还执行哑弹任务(这个参数可以全局设定的或者是为每个任务单独设定)。此时,一个哑弹任务,就可能会被连续执行多次。
但这就可能导致一个问题,有些哑弹任务实际上并不需要被执行多次。coalescing合并参数就能把一个多次的哑弹任务揉成一个一次的哑弹任务。也就是说,coalescing为True能把多个排队执行的同一个哑弹任务,变成一个,而不会触发哑弹事件。
注意: 如果是由于线程池/进程池满了导致的任务延迟,执行器就会跳过执行。要避免这个问题,可以添加进程或线程数来实现或把 misfire_grace_time值调高。
13. scheduler events
可以使用add_listener(),为scheduler添加监听器,当监听到指定scheduler event后,会执行传入的callback函数
| Constant | Description | Event class |
|---|---|---|
| EVENT_SCHEDULER_STARTED | The scheduler was started | SchedulerEvent |
| EVENT_SCHEDULER_SHUTDOWN | The scheduler was shut down | SchedulerEvent |
| EVENT_SCHEDULER_PAUSED | Job processing in the scheduler was paused | SchedulerEvent |
| EVENT_SCHEDULER_RESUMED | Job processing in the scheduler was resumed | SchedulerEvent |
| EVENT_EXECUTOR_ADDED | An executor was added to the scheduler | SchedulerEvent |
| EVENT_EXECUTOR_REMOVED | An executor was removed to the scheduler | SchedulerEvent |
| EVENT_JOBSTORE_ADDED | A job store was added to the scheduler | SchedulerEvent |
| EVENT_JOBSTORE_REMOVED | A job store was removed from the scheduler | SchedulerEvent |
| EVENT_ALL_JOBS_REMOVED | All jobs were removed from either all job stores or one particular job store | SchedulerEvent |
| EVENT_JOB_ADDED | A job was added to a job store | JobEvent |
| EVENT_JOB_REMOVED | A job was removed from a job store | JobEvent |
| EVENT_JOB_MODIFIED | A job was modified from outside the scheduler | JobEvent |
| EVENT_JOB_SUBMITTED | A job was submitted to its executor to be run | JobSubmissionEvent |
| EVENT_JOB_MAX_INSTANCES | A job being submitted to its executor was not accepted by the executor because the job has already reached its maximum concurrently executing instances | JobSubmissionEvent |
| EVENT_JOB_EXECUTED | A job was executed successfully | JobExecutionEvent |
| EVENT_JOB_ERROR | A job raised an exception during execution | JobExecutionEvent |
| EVENT_JOB_MISSED | A job’s execution was missed | JobExecutionEvent |
| EVENT_ALL | A catch-all mask that includes every event type | N/A |
14. 创建一个持久化的jobstore
3.1 通过postgresql进行持久化
准备工作:
pip install psycopg2- 通过docker安装一个postgresql:
docker run --name postgres -e POSTGRES_PASSWORD=123456 -p 5432:5432 -d postgres
会在指定数据库中创建表:apscheduler_jobs,表中指定记录了3个字段 id | next_run_time | job_state
class TestPGJobStore(unittest.TestCase):
@staticmethod
def myfunc1():
print("myfunc1")
def test_create_pg_job_store(self):
from apscheduler.schedulers.background import BackgroundScheduler
from apscheduler.jobstores.sqlalchemy import SQLAlchemyJobStore
scheduler = BackgroundScheduler()
scheduler.configure(jobstores = {'default':SQLAlchemyJobStore(url="postgresql://postgres:123456@192.168.1.101:5432/postgres")})
# scheduler 启动之前是无法获取数据库表中的任务的
print(scheduler.get_jobs())
# 当add_job成功执行后,再次执行,则会抛出id已存在的错误,注释添加任务行后执行,还是会执行之前添加的任务,说明已经被持久化到了数据库中
scheduler.start()
# scheduler 启动后,才能成功从数据库中删除任务
scheduler.remove_all_jobs()
print(scheduler.get_jobs())
added_job1 = scheduler.add_job(TestPGJobStore.myfunc1, 'interval', seconds=2, id='myfunc1')
print(scheduler.get_jobs())
time.sleep(5)
scheduler.shutdown(wait=False)
PG数据库中存储的数据

3.2 JobStore的创建的过程
SQLAlchemyJobStore初始化的时候会传入两个参数 engine 和 url,engine 可以可以个engine模块或者对应的字符串类型的值(需要包含":",符号前是模块的名称,后边是要从哪里导入这个包),如果没有传入engine,则会通过解析url参数,来创建一个engine。
一般我们创建engine的时候指定url就可以了,下面再来看下是如何通过url创建一个engine实例。
3.2.1 engine的创建
u = _url.make_url(url)会先通过url字符串创建一个 engine.url.URL实例。如果SQLAlchemyJobStore初如果传入的url就是一个URL对象,那么将直接返回,如果是一个字符串,接下来就会去解析这个字符串,解析出数据库连接的用户名,密码,主机IP,端口,数据库驱动名称等信息,继而以此创建并返回一个URL对象