sched: Implement interface for cgroup unified hierarchy
While the cpu controller doesn't have any functional problems, there
are a couple interface issues which can be addressed in the v2
interface.
* cpuacct being a separate controller. This separation is artificial
and rather pointless as demonstrated by most use cases co-mounting
the two controllers. It also forces certain information to be
accounted twice.
* Use of different time units. Writable control knobs use
microseconds, some stat fields use nanoseconds while other cpuacct
stat fields use centiseconds.
* Control knobs which can't be used in the root cgroup still show up
in the root.
* Control knob names and semantics aren't consistent with other
controllers.
This patchset implements cpu controller's interface on the unified
hierarchy which adheres to the controller file conventions described
in Documentation/cgroups/unified-hierarchy.txt. Overall, the
following changes are made.
* cpuacct is implictly enabled and disabled by cpu and its information
is reported through "cpu.stat" which now uses microseconds for all
time durations. All time duration fields now have "_usec" appended
to them for clarity. While this doesn't solve the double accounting
immediately, once majority of users switch to v2, cpu can directly
account and report the relevant stats and cpuacct can be disabled on
the unified hierarchy.
Note that cpuacct.usage_percpu is currently not included in
"cpu.stat". If this information is actually called for, it can be
added later.
* "cpu.shares" is replaced with "cpu.weight" and operates on the
standard scale defined by CGROUP_WEIGHT_MIN/DFL/MAX (1, 100, 10000).
The weight is scaled to scheduler weight so that 100 maps to 1024
and the ratio relationship is preserved - if weight is W and its
scaled value is S, W / 100 == S / 1024. While the mapped range is a
bit smaller than the orignal scheduler weight range, the dead zones
on both sides are relatively small and covers wider range than the
nice value mappings. This file doesn't make sense in the root
cgroup and isn't create on root.
* "cpu.cfs_quota_us" and "cpu.cfs_period_us" are replaced by "cpu.max"
which contains both quota and period.
* "cpu.rt_runtime_us" and "cpu.rt_period_us" are replaced by
"cpu.rt.max" which contains both runtime and period.
v2: cpu_stats_show() was incorrectly using CONFIG_FAIR_GROUP_SCHED for
CFS bandwidth stats and also using raw division for u64. Use
CONFIG_CFS_BANDWITH and do_div() instead.
The semantics of "cpu.rt.max" is not fully decided yet. Dropped
for now.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
3 files changed