cgroup: Changes for v6.8

- Yafang Shao added task_get_cgroup1() helper to enable a similar BPF helper
  so that BPF progs can be more useful on cgroup1 hierarchies. While cgroup1
  is mostly in maintenance mode, this addition is very small while having an
  outsized usefulness for users who are still on cgroup1. Yafang also
  optimized root cgroup list access by making it RCU protected in the
  process.

- Waiman Long optimized rstat operation leading to substantially lower and
  more consistent lock hold time while flushing the hierarchical statistics.
  As the lock can be acquired briefly in various hot paths, this reduction
  has cascading benefits.

- Waiman also improved the quality of isolation for cpuset's isolated
  partitions. CPUs which are allocated to isolated partitions are now
  excluded from running unbound work items and cpu_is_isolated() test which
  is used by vmstat and memcg to reduce interference now includes cpuset
  isolated CPUs. While it isn't there yet, the hope is eventually reaching
  parity with the isolation level provided by the `isolcpus` boot param but
  in a dynamic manner.

  This involved a couple workqueue patches which were applied directly to
  cgroup/for-6.8 rather than ping-ponged through the wq tree. This was
  because the wq code change was small and the area is usually very static
  and unlikely to cause conflicts. However, luck had it that there was a wq
  bug fix in the area during the 6.7 cycle which caused a conflict. The
  conflict is contextual but can be a bit confusing to resolve, so there is
  one merge from wq/for-6.7-fixes.
cgroup: Move rcu_head up near the top of cgroup_root

Commit d23b5c577715 ("cgroup: Make operations on the cgroup root_list RCU
safe") adds a new rcu_head to the cgroup_root structure and kvfree_rcu()
for freeing the cgroup_root.

The current implementation of kvfree_rcu(), however, has the limitation
that the offset of the rcu_head structure within the larger data
structure must be less than 4096 or the compilation will fail. See the
macro definition of __is_kvfree_rcu_offset() in include/linux/rcupdate.h
for more information.

By putting rcu_head below the large cgroup structure, any change to the
cgroup structure that makes it larger run the risk of causing build
failure under certain configurations. Commit 77070eeb8821 ("cgroup:
Avoid false cacheline sharing of read mostly rstat_cpu") happens to be
the last straw that breaks it. Fix this problem by moving the rcu_head
structure up before the cgroup structure.

Fixes: d23b5c577715 ("cgroup: Make operations on the cgroup root_list RCU safe")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Closes: https://lore.kernel.org/lkml/20231207143806.114e0a74@canb.auug.org.au/
Signed-off-by: Waiman Long <longman@redhat.com>
Acked-by: Yafang Shao <laoar.shao@gmail.com>
Reviewed-by: Yosry Ahmed <yosryahmed@google.com>
Reviewed-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
1 file changed