1 调度类
Linux内核实现了4种调度类,优先级从高到低分别是:
| 调度类 | 名称 | 优先级 |
|---|---|---|
| stop_sched_class | 停止类 | - |
| rt_sched_class | 实时类 | 0-99 |
| fair_sched_class | 完全公平调度类 | 100-139 |
| idle_sched_class | 空闲类 | - |
调度器先从停止类中挑选进程,如果停止类中没有挑选到可运行的进程,再从实时类挑 选,依此类推。这可以从kernel/sched/core.c中pick_next_task函数看出来。
其它调度类都很简单,我们这里只讲完全公平调度类(CFS)。
2 调度延迟和调度最小粒度
完全公平调度类使用了一种动态时间片的算法,给每个进程分配CPU占用时间。调度延 迟指的是任何一个可运行进程两次运行之间的时间间隔。比如调度延迟是20毫秒,那 么每个进程可以执行10毫秒;如果是4个进程,可以执行5毫秒。调度延迟称为 sysctl_sched_latency,记录在/proc/sys/kernel/sched_latency_ns中,以纳秒为单位。
如果进程很多,那么可能每个进程每次运行的时间都很短,这浪费了大量的时间进行 调度。所以引入了调度最小粒度的感念。除非进程进行了阻塞任务或者主动让出CPU, 否则进程至少执行调度最小粒度的时间。调度最小粒度称为sysctl_sche_min_granularity, 记录在/proc/sys/sched_min_granulariry_ns中,以纳秒为单位。
\[ sched\_nr\_latency=\frac{sysctl\_sched\_latency}{sysctl\_sched\_min\_granularity} \]
这个比值是一个调度延迟内允许的最大运行数目。如果可运行进程个数小于 sched_nr_latency,调度周期等于调度延迟。如果可运行进程超过了sched_nr_latency, 系统就不去理会调度延迟,转而保证调度最小粒度,这种情况下,调度周期等于最小 粒度乘可运行进程个数。这在kernel/sched/fair.c中计算调度周期的函数可以看出来。
/*
* The idea is to set a period in which each task runs once.
*
* When there are too many tasks (sched_nr_latency) we have to stretch
* this period because otherwise the slices get too small.
*
* p = (nr <= nl) ? l : l*nr/nl
*/
static u64 __sched_period(unsigned long nr_running)
{
if (unlikely(nr_running > sched_nr_latency))
return nr_running * sysctl_sched_min_granularity;
else
return sysctl_sched_latency;
}
3 进程权重
通过赋予进程权重weight,就可以计算出每个进程的运行时间: \[ runtime=period \frac{weight}{sum of weight} \]
Linux下每个进程都有一个nice值,取值范围是[-20,19],nice值越高,表示越友好, 就越谦让,优先级越底。因为内核不能进行浮点运算,在kernel/sched/core.c定义 了预先计算出的nice值和weight的对应关系。这样的对应关系遵从的公式已经在注释 中给出了。
/*
* Nice levels are multiplicative, with a gentle 10% change for every
* nice level changed. I.e. when a CPU-bound task goes from nice 0 to
* nice 1, it will get ~10% less CPU time than another CPU-bound task
* that remained on nice 0.
*
* The "10% effect" is relative and cumulative: from _any_ nice level,
* if you go up 1 level, it's -10% CPU usage, if you go down 1 level
* it's +10% CPU usage. (to achieve that we use a multiplier of 1.25.
* If a task goes up by ~10% and another task goes down by ~10% then
* the relative distance between them is ~25%.)
*/
const int sched_prio_to_weight[40] = {
/* -20 */ 88761, 71755, 56483, 46273, 36291,
/* -15 */ 29154, 23254, 18705, 14949, 11916,
/* -10 */ 9548, 7620, 6100, 4904, 3906,
/* -5 */ 3121, 2501, 1991, 1586, 1277,
/* 0 */ 1024, 820, 655, 526, 423,
/* 5 */ 335, 272, 215, 172, 137,
/* 10 */ 110, 87, 70, 56, 45,
/* 15 */ 36, 29, 23, 18, 15,
};
/*
* Inverse (2^32/x) values of the sched_prio_to_weight[] array, precalculated.
*
* In cases where the weight does not change often, we can use the
* precalculated inverse to speed up arithmetics by turning divisions
* into multiplications:
*/
const u32 sched_prio_to_wmult[40] = {
/* -20 */ 48388, 59856, 76040, 92818, 118348,
/* -15 */ 147320, 184698, 229616, 287308, 360437,
/* -10 */ 449829, 563644, 704093, 875809, 1099582,
/* -5 */ 1376151, 1717300, 2157191, 2708050, 3363326,
/* 0 */ 4194304, 5237765, 6557202, 8165337, 10153587,
/* 5 */ 12820798, 15790321, 19976592, 24970740, 31350126,
/* 10 */ 39045157, 49367440, 61356676, 76695844, 95443717,
/* 15 */ 119304647, 148102320, 186737708, 238609294, 286331153,
};
Linux提供了getpriority和setpriority函数来修改nice值。
4 时间片和虚拟运行时间
在Linux中,每个CPU都拥有一个运行队列,如果队列中存在多个可执行状态的进程, 如何选择哪个进程获得CPU呢?
完全公平调度的思想是尽可能使所有进程获得相同的运行时间。每次总是选取队列中 已经运行时间最小的进程进行调度。由于引入了优先级的概念,Linux使用加权运行时 间作标准。这个加权运行时间称为虚拟运行时间(vruntime),而真实的运行时间称为 sum_exec_runtime。
\[ vruntime = sum\_exec\_runtime\times \frac{NICE\_0\_LOAD}{weigh} \]
NICE_0_LOAD的值是nice值为0的进程权重,即1024。每次调度时总是选取vruntime最 小的进程进行调度。
include/linux/sched.h中定义了调度实体,里面涉及了组调度的内容,这在后面会提 到。
struct sched_entity {
struct load_weight load; /* for load-balancing */
struct rb_node run_node;
struct list_head group_node;
unsigned int on_rq;
u64 exec_start;
u64 sum_exec_runtime;
u64 vruntime;
u64 prev_sum_exec_runtime;
u64 nr_migrations;
#ifdef CONFIG_SCHEDSTATS
struct sched_statistics statistics;
#endif
#ifdef CONFIG_FAIR_GROUP_SCHED
int depth;
struct sched_entity *parent;
/* rq on which this entity is (to be) queued: */
struct cfs_rq *cfs_rq;
/* rq "owned" by this entity/group: */
struct cfs_rq *my_q;
#endif
#ifdef CONFIG_SMP
/*
* Per entity load average tracking.
*
* Put into separate cache line so it does not
* collide with read-mostly values above.
*/
struct sched_avg avg ____cacheline_aligned_in_smp;
#endif
};
在kernel/sched/fair.c中定义了完全公平调度的相关函数。其中sched_slice负责计 算一个进程在本轮调度周期应分得的真实运行时间。
/*
* delta_exec * weight / lw.weight
* OR
* (delta_exec * (weight * lw->inv_weight)) >> WMULT_SHIFT
*
* Either weight := NICE_0_LOAD and lw \e sched_prio_to_wmult[], in which case
* we're guaranteed shift stays positive because inv_weight is guaranteed to
* fit 32 bits, and NICE_0_LOAD gives another 10 bits; therefore shift >= 22.
*
* Or, weight =< lw.weight (because lw.weight is the runqueue weight), thus
* weight/lw.weight <= 1, and therefore our shift will also be positive.
*/
static u64 __calc_delta(u64 delta_exec, unsigned long weight, struct load_weight *lw)
{
u64 fact = scale_load_down(weight);
int shift = WMULT_SHIFT;
__update_inv_weight(lw);
if (unlikely(fact >> 32)) {
while (fact >> 32) {
fact >>= 1;
shift--;
}
}
/* hint to use a 32x32->64 mul */
fact = (u64)(u32)fact * lw->inv_weight;
while (fact >> 32) {
fact >>= 1;
shift--;
}
return mul_u64_u32_shr(delta_exec, fact, shift);
}
/*
* The idea is to set a period in which each task runs once.
*
* When there are too many tasks (sched_nr_latency) we have to stretch
* this period because otherwise the slices get too small.
*
* p = (nr <= nl) ? l : l*nr/nl
*/
static u64 __sched_period(unsigned long nr_running)
{
if (unlikely(nr_running > sched_nr_latency))
return nr_running * sysctl_sched_min_granularity;
else
return sysctl_sched_latency;
}
/*
* We calculate the wall-time slice from the period by taking a part
* proportional to the weight.
*
* s = p*P[w/rw]
*/
static u64 sched_slice(struct cfs_rq *cfs_rq, struct sched_entity *se)
{
u64 slice = __sched_period(cfs_rq->nr_running + !se->on_rq);
for_each_sched_entity(se) {
struct load_weight *load;
struct load_weight lw;
cfs_rq = cfs_rq_of(se);
load = &cfs_rq->load;
if (unlikely(!se->on_rq)) {
lw = cfs_rq->load;
update_load_add(&lw, se->load.weight);
load = &lw;
}
slice = __calc_delta(slice, se->load.weight, load);
}
return slice;
}
内核周期地使用sched_slice计算出来的值检查进程是不是已经消耗完了自己的时间片。 如果已经耗尽了时间片,那么应该发生一次抢占。
调度类需要实现一个update_curr函数用于更新运行数据统计。更新的数据统计中包含 了一个队列的最小虚拟运行事件。最小运行时间的作用在后文中提到。内核使用红黑 树存储进程结构,最小运行时间对应的进程总是在红黑树的最左边。
/*
* Update the current task's runtime statistics.
*/
static void update_curr(struct cfs_rq *cfs_rq)
{
struct sched_entity *curr = cfs_rq->curr;
u64 now = rq_clock_task(rq_of(cfs_rq));
u64 delta_exec;
if (unlikely(!curr))
return;
delta_exec = now - curr->exec_start;
if (unlikely((s64)delta_exec <= 0))
return;
curr->exec_start = now;
schedstat_set(curr->statistics.exec_max,
max(delta_exec, curr->statistics.exec_max));
curr->sum_exec_runtime += delta_exec;
schedstat_add(cfs_rq, exec_clock, delta_exec);
curr->vruntime += calc_delta_fair(delta_exec, curr);
update_min_vruntime(cfs_rq);
if (entity_is_task(curr)) {
struct task_struct *curtask = task_of(curr);
trace_sched_stat_runtime(curtask, delta_exec, curr->vruntime);
cpuacct_charge(curtask, delta_exec);
account_group_exec_runtime(curtask, delta_exec);
}
account_cfs_rq_runtime(cfs_rq, delta_exec);
}
5 周期性调度
系统通过周期性的任务检查当前进程是否已经耗尽了它的时间片,以决定是否应该发 起一次抢占。每个调度类都要实现一个task_tick函数,完全公平调度类对应的是 task_tick_fair,这个函数也在kernel/sched/fair.c中定义。每次时钟中断时,首先 调用tick_handle_peroid函数,最终调用调度类的task_tick。task_tick检查是否应 该发生抢占,如果应该发生,则设置need_resched标志位,告诉内核尽快调用schedule 函数。task_tick函数中并不进行真正的进程切换,只是设置标志位。当中断处理完毕, 内核会检查need_resched标志位,如果置位,则使用schedule进行一次切换。
/*
* scheduler tick hitting a task of our scheduling class:
*/
static void task_tick_fair(struct rq *rq, struct task_struct *curr, int queued)
{
struct cfs_rq *cfs_rq;
struct sched_entity *se = &curr->se;
for_each_sched_entity(se) {
cfs_rq = cfs_rq_of(se);
entity_tick(cfs_rq, se, queued);
}
if (static_branch_unlikely(&sched_numa_balancing))
task_tick_numa(rq, curr);
}
static void
entity_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr, int queued)
{
/*
* Update run-time statistics of the 'current'.
*/
update_curr(cfs_rq);
/*
* Ensure that runnable average is periodically updated.
*/
update_load_avg(curr, 1);
update_cfs_shares(cfs_rq);
#ifdef CONFIG_SCHED_HRTICK
/*
* queued ticks are scheduled to match the slice, so don't bother
* validating it and just reschedule.
*/
if (queued) {
resched_curr(rq_of(cfs_rq));
return;
}
/*
* don't let the period tick interfere with the hrtick preemption
*/
if (!sched_feat(DOUBLE_TICK) &&
hrtimer_active(&rq_of(cfs_rq)->hrtick_timer))
return;
#endif
/*
* 如果可运行进程数量大于1,检查是否可以抢占当前进程
*/
if (cfs_rq->nr_running > 1)
check_preempt_tick(cfs_rq, curr);
}
其中的check_preempt_tick函数用于检查是否应该发生抢占。如果需要抢占,则设置 need_resched标志位来抢占当前进程。
/*
* Preempt the current task with a newly woken task if needed:
*/
static void
check_preempt_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr)
{
unsigned long ideal_runtime, delta_exec;
struct sched_entity *se;
s64 delta;
/* 记录本次时间片 */
ideal_runtime = sched_slice(cfs_rq, curr);
/* 记录已经运行的时间 */
delta_exec = curr->sum_exec_runtime - curr->prev_sum_exec_runtime;
/* 如果已经运行的事件大于时间片,则需要进行调度 */
if (delta_exec > ideal_runtime) {
/* resched_curr负责修改need_resched标志位 */
resched_curr(rq_of(cfs_rq));
/*
* The current task ran long enough, ensure it doesn't get
* re-elected due to buddy favours.
*/
clear_buddies(cfs_rq, curr);
return;
}
/*
* Ensure that a task that missed wakeup preemption by a
* narrow margin doesn't have to wait for a full slice.
* This also mitigates buddy induced latencies under load.
*/
if (delta_exec < sysctl_sched_min_granularity)
return;
se = __pick_first_entity(cfs_rq);
delta = curr->vruntime - se->vruntime;
if (delta < 0)
return;
if (delta > ideal_runtime)
resched_curr(rq_of(cfs_rq));
}
6 创建新进程或进程被唤醒
创建新的进程时,如何调度这个进程呢?如果新创建的进程vruntime为0,那么它将长 期保持调度优势,这显然是不合理的。kernel/sched/core.c定义类sched_fork处理新 创建进程时的情况。
/*
* fork()/clone()-time setup:
*/
int sched_fork(unsigned long clone_flags, struct task_struct *p)
{
unsigned long flags;
int cpu = get_cpu();
__sched_fork(clone_flags, p);
/*
* We mark the process as NEW here. This guarantees that
* nobody will actually run it, and a signal or other external
* event cannot wake it up and insert it on the runqueue either.
*/
p->state = TASK_NEW;
/*
* Make sure we do not leak PI boosting priority to the child.
*/
p->prio = current->normal_prio;
/*
* Revert to default priority/policy on fork if requested.
*/
if (unlikely(p->sched_reset_on_fork)) {
if (task_has_dl_policy(p) || task_has_rt_policy(p)) {
p->policy = SCHED_NORMAL;
p->static_prio = NICE_TO_PRIO(0);
p->rt_priority = 0;
} else if (PRIO_TO_NICE(p->static_prio) < 0)
p->static_prio = NICE_TO_PRIO(0);
p->prio = p->normal_prio = __normal_prio(p);
set_load_weight(p);
/*
* We don't need the reset flag anymore after the fork. It has
* fulfilled its duty:
*/
p->sched_reset_on_fork = 0;
}
if (dl_prio(p->prio)) {
put_cpu();
return -EAGAIN;
} else if (rt_prio(p->prio)) {
p->sched_class = &rt_sched_class;
} else {
p->sched_class = &fair_sched_class;
}
init_entity_runnable_average(&p->se);
/*
* The child is not yet in the pid-hash so no cgroup attach races,
* and the cgroup is pinned to this child due to cgroup_fork()
* is ran before sched_fork().
*
* Silence PROVE_RCU.
*/
raw_spin_lock_irqsave(&p->pi_lock, flags);
/*
* We're setting the cpu for the first time, we don't migrate,
* so use __set_task_cpu().
*/
__set_task_cpu(p, cpu);
if (p->sched_class->task_fork)
p->sched_class->task_fork(p);
raw_spin_unlock_irqrestore(&p->pi_lock, flags);
#ifdef CONFIG_SCHED_INFO
if (likely(sched_info_on()))
memset(&p->sched_info, 0, sizeof(p->sched_info));
#endif
#if defined(CONFIG_SMP)
p->on_cpu = 0;
#endif
init_task_preempt_count(p);
#ifdef CONFIG_SMP
plist_node_init(&p->pushable_tasks, MAX_PRIO);
RB_CLEAR_NODE(&p->pushable_dl_tasks);
#endif
put_cpu();
return 0;
}
其中,task_fork在完全公平调度类中对应的是kernel/sched/fair.c中的是 task_fork_fair。
/*
* called on fork with the child task as argument from the parent's context
* - child not yet on the tasklist
* - preemption disabled
*/
static void task_fork_fair(struct task_struct *p)
{
struct cfs_rq *cfs_rq;
struct sched_entity *se = &p->se, *curr;
struct rq *rq = this_rq();
raw_spin_lock(&rq->lock);
update_rq_clock(rq);
cfs_rq = task_cfs_rq(current);
curr = cfs_rq->curr;
if (curr) {
update_curr(cfs_rq);
se->vruntime = curr->vruntime;
}
/* 调整虚拟运行时间 */
place_entity(cfs_rq, se, 1);
if (sysctl_sched_child_runs_first && curr && entity_before(curr, se)) {
/*
* Upon rescheduling, sched_class::put_prev_task() will place
* 'current' within the tree based on its new key value.
*/
swap(curr->vruntime, se->vruntime);
resched_curr(rq);
}
se->vruntime -= cfs_rq->min_vruntime;
raw_spin_unlock(&rq->lock);
}
static void
place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
{
u64 vruntime = cfs_rq->min_vruntime;
/*
* The 'current' period is already promised to the current tasks,
* however the extra weight of the new task will slow them down a
* little, place the new task so that it fits in the slot that
* stays open at the end.
*/
if (initial && sched_feat(START_DEBIT))
vruntime += sched_vslice(cfs_rq, se);
/* sleeps up to a single latency don't count. */
if (!initial) {
unsigned long thresh = sysctl_sched_latency;
/*
* Halve their sleep time's effect, to allow
* for a gentler effect of sleepers:
*/
if (sched_feat(GENTLE_FAIR_SLEEPERS))
thresh >>= 1;
vruntime -= thresh;
}
/* ensure we never gain time by being placed backwards. */
se->vruntime = max_vruntime(se->vruntime, vruntime);
}
如果没有开启START_DEBIT,子进程的虚拟运行时间是父进程的虚拟运行时间与CFS运 行队列的最小虚拟运行时间的较小值。如果设置了START_DEBIT,会通过增大虚拟运行 时间来惩罚新创建的进程,增加的时间为一个虚拟时间片。
注意到sysctl_sched_child_runs_first那一行,可以指定 /proc/sys/kernel/sched_child_runs_first为1使子进程优先获得调度,如果是0,则 父进程优先获得调度。但这只是一个偏好设置,并不是保证。
再看这一行:
se->vruntime -= cfs_rq->min_vruntime;
在多处理器结构中,新创建的进程和父进程不一定在同一个CPU上,min_vruntime可能 相差较大,为了减少这个差距,在迁移之前减去所在CPU运行队列的最小虚拟运行时间; 在迁移后,再加上迁移后的CPU的运行队列中最小虚拟运行时间。在enqueue_task中可 以看到vruntime再加回来。enqueue_task在完全公平调度类中对应的是task_fork_fair。
/*
* called on fork with the child task as argument from the parent's context
* - child not yet on the tasklist
* - preemption disabled
*/
static void task_fork_fair(struct task_struct *p)
{
struct cfs_rq *cfs_rq;
struct sched_entity *se = &p->se, *curr;
struct rq *rq = this_rq();
raw_spin_lock(&rq->lock);
update_rq_clock(rq);
cfs_rq = task_cfs_rq(current);
curr = cfs_rq->curr;
if (curr) {
update_curr(cfs_rq);
se->vruntime = curr->vruntime;
}
place_entity(cfs_rq, se, 1);
if (sysctl_sched_child_runs_first && curr && entity_before(curr, se)) {
/*
* Upon rescheduling, sched_class::put_prev_task() will place
* 'current' within the tree based on its new key value.
*/
swap(curr->vruntime, se->vruntime);
resched_curr(rq);
}
se->vruntime -= cfs_rq->min_vruntime;
raw_spin_unlock(&rq->lock);
}
static void
place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
{
u64 vruntime = cfs_rq->min_vruntime;
/*
* The 'current' period is already promised to the current tasks,
* however the extra weight of the new task will slow them down a
* little, place the new task so that it fits in the slot that
* stays open at the end.
*/
if (initial && sched_feat(START_DEBIT))
vruntime += sched_vslice(cfs_rq, se);
/* sleeps up to a single latency don't count. */
if (!initial) {
unsigned long thresh = sysctl_sched_latency;
/*
* Halve their sleep time's effect, to allow
* for a gentler effect of sleepers:
*/
if (sched_feat(GENTLE_FAIR_SLEEPERS))
thresh >>= 1;
vruntime -= thresh;
}
/* ensure we never gain time by being placed backwards. */
se->vruntime = max_vruntime(se->vruntime, vruntime);
}
try_to_wake_up负责将睡眠进程唤醒。对应代码在kernel/sched/core.c中。其中也使 用了enqueue_task_fair。在place_entity中,可以看到当initial为0即被唤醒时,虚 拟运行时间为最小虚拟时间减去半个或一个周期。
无论是try_to_wake_up最后都会调用check_preempt_wakeup检查唤醒后者创建的进程 是否可以抢占当前进程。
/*
* Preempt the current task with a newly woken task if needed:
*/
static void check_preempt_wakeup(struct rq *rq, struct task_struct *p, int wake_flags)
{
struct task_struct *curr = rq->curr;
struct sched_entity *se = &curr->se, *pse = &p->se;
struct cfs_rq *cfs_rq = task_cfs_rq(curr);
int scale = cfs_rq->nr_running >= sched_nr_latency;
int next_buddy_marked = 0;
if (unlikely(se == pse))
return;
/*
* This is possible from callers such as attach_tasks(), in which we
* unconditionally check_prempt_curr() after an enqueue (which may have
* lead to a throttle). This both saves work and prevents false
* next-buddy nomination below.
*/
if (unlikely(throttled_hierarchy(cfs_rq_of(pse))))
return;
if (sched_feat(NEXT_BUDDY) && scale && !(wake_flags & WF_FORK)) {
set_next_buddy(pse);
next_buddy_marked = 1;
}
/*
* We can come here with TIF_NEED_RESCHED already set from new task
* wake up path.
*
* Note: this also catches the edge-case of curr being in a throttled
* group (e.g. via set_curr_task), since update_curr() (in the
* enqueue of curr) will have resulted in resched being set. This
* prevents us from potentially nominating it as a false LAST_BUDDY
* below.
*/
if (test_tsk_need_resched(curr))
return;
/* Idle tasks are by definition preempted by non-idle tasks. */
if (unlikely(curr->policy == SCHED_IDLE) &&
likely(p->policy != SCHED_IDLE))
goto preempt;
/*
* Batch and idle tasks do not preempt non-idle tasks (their preemption
* is driven by the tick):
*/
if (unlikely(p->policy != SCHED_NORMAL) || !sched_feat(WAKEUP_PREEMPTION))
return;
find_matching_se(&se, &pse);
update_curr(cfs_rq_of(se));
BUG_ON(!pse);
if (wakeup_preempt_entity(se, pse) == 1) {
/*
* Bias pick_next to pick the sched entity that is
* triggering this preemption.
*/
if (!next_buddy_marked)
set_next_buddy(pse);
goto preempt;
}
return;
preempt:
resched_curr(rq);
/*
* Only set the backward buddy when the current task is still
* on the rq. This can happen when a wakeup gets interleaved
* with schedule on the ->pre_schedule() or idle_balance()
* point, either of which can * drop the rq lock.
*
* Also, during early boot the idle thread is in the fair class,
* for obvious reasons its a bad idea to schedule back to it.
*/
if (unlikely(!se->on_rq || curr == rq->idle))
return;
if (sched_feat(LAST_BUDDY) && scale && entity_is_task(se))
set_last_buddy(se);
}
7 完全公平调度类的组调度
Linux把cgroup时现成了文件系统,可以mount。一般发行版已经mount好了,输入以下 命令就可以看到:
[root@localhost ~]# mount -t cgroup cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd) cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset) cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer) cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory) cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event) cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices) cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpuacct,cpu) cgroup on /sys/fs/cgroup/net_cls type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls) cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio) cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
如果没有,可以自己mount:
mkdir cgroup mount -t tmpfs cgroup_root ./cgroup mkdir cgroup/cpuset mount -t cgroup -ocpuset cpuset ./cgroup/cpuset/ mkdir cgroup/cpu mount -t cgroup -ocpu cpu ./cgroup/cpu/ mkdir cgroup/memory mount -t cgroup -omemory memory ./cgroup/memory/
具体操作请参考文档。
8 关于实时进程的调度策略
实时进程有先进先出的SCHED_FIFO策略和时间片轮转的SCHED_RR策略。此外,一般进 程还有SCHED_OTHER策略,就是前面提及的-20-19nice值范围内的进程。实时进程可以 设置为SCHED_BATCH,但是这个策略不属于实时策略。在\\\(O(1)\)调度器之后,这个策 略和SCHED_OTHER几乎一样。SCHED_IDLE策略的权重很低,比nice值为19的权重15还要 底,它采用的权重是3。
可以通过sched_setscheduler设置调度策略和优先级。
如果希望实时进程存在的情况下一般进程也可以消耗少量CPU时间,而不是等待实时进 程全部结束后才能执行,可以修改两个控制项:kernel.sched_rt_period_us和 kernel.sched_rt_runtime_us。