workqueue: Implement localize-to-issuing-CPU for unbound workqueues
The non-strict cache affinity scope provides a reasonable default behavior
for improving execution locality while avoiding strict utilization limits
and the overhead of too-fine-grained scopes. However, it ignores L1/2
locality which may benefit some workloads.
This patch implements workqueue_attrs->localize which, when turned on, tries
to put the worker on the work item's issuing CPU when starting execution in
the same way non-strict cache affinity is implemented. As it uses the same
task_struct->wake_cpu, the same caveats apply. It isn't clear whether this
is an acceptable use of the scheduler property and there is a small race
window where the setting from position_worker() may be ignored.
To locate a worker on the work item's issuing CPU, we need to pre-assign the
work item to the worker before waking it up; otherwise, we can't know which
exact worker the work item is going to be assigned to. For work items that
request localization, this patch updates kick_pool() to pre-assign each work
item to an idle worker, exit the worker from the idle state before waking it
up. In turn, worker_thread() directly proceeds to work item execution if
IDLE was already clear when it woke up.
Theoretically, localizing to the issuing CPU without any hard restrictions
should be the best option as it tells the scheduler the best CPU to use for
locality without any restrictions on future scheduler decisions. However, in
practice, this doesn't work out that way due to loss of work conservation.
As such, this patch isn't for upstream yet. See the cover letter for further
discussion.
NOT_FOR_UPSTREAM
5 files changed