refs/heads/affinity-scopes-v1 - linux/kernel/git/tj/wq

commit	a3f00298b035bbf9701d2f3db74bcadfb346a504	[log] [tgz]
author	Tejun Heo <tj@kernel.org>	Thu May 18 11:25:58 2023 -1000
committer	Tejun Heo <tj@kernel.org>	Thu May 18 11:25:58 2023 -1000
tree	42e35b883552aef8bded6bf34357013d34ea739a
parent	bc4644bddd95499d03c9d7ac27b5c84059a8cc0e [diff]

workqueue: Implement localize-to-issuing-CPU for unbound workqueues The non-strict cache affinity scope provides a reasonable default behavior for improving execution locality while avoiding strict utilization limits and the overhead of too-fine-grained scopes. However, it ignores L1/2 locality which may benefit some workloads. This patch implements workqueue_attrs->localize which, when turned on, tries to put the worker on the work item's issuing CPU when starting execution in the same way non-strict cache affinity is implemented. As it uses the same task_struct->wake_cpu, the same caveats apply. It isn't clear whether this is an acceptable use of the scheduler property and there is a small race window where the setting from position_worker() may be ignored. To locate a worker on the work item's issuing CPU, we need to pre-assign the work item to the worker before waking it up; otherwise, we can't know which exact worker the work item is going to be assigned to. For work items that request localization, this patch updates kick_pool() to pre-assign each work item to an idle worker, exit the worker from the idle state before waking it up. In turn, worker_thread() directly proceeds to work item execution if IDLE was already clear when it woke up. Theoretically, localizing to the issuing CPU without any hard restrictions should be the best option as it tells the scheduler the best CPU to use for locality without any restrictions on future scheduler decisions. However, in practice, this doesn't work out that way due to loss of work conservation. As such, this patch isn't for upstream yet. See the cover letter for further discussion. NOT_FOR_UPSTREAM