cpu: Stop newly offlined CPU from using RCU readers

RCU ignores offlined CPUs, so they cannot safely run RCU read-side code.
(They -can- use SRCU, but not RCU.)  This means that any use of RCU
during or after the call to arch_cpu_idle_dead().  Unfortunately,
commit 2ed53c0d6cc99 added a complete() call, which will contain RCU
read-side critical sections if there is a task waiting to be awakened.

Which, as it turns out, there almost never is.  In my qemu/KVM testing,
the to-be-awakened task is not yet asleep more than 99.5% of the time.
In current mainline, failure is even harder to reproduce, requiring a
virtualized environment that delays the outgoing CPU by at least three
jiffies between the time it exits its stop_machine() task at CPU_DYING
time and the time it calls arch_cpu_idle_dead() from the idle loop.

This suggests moving back to the polling loop, but using a one-jiffy wait
instead of the old 100-millisecond wait.  Most of the time, the loop
will exit without waiting at all, and almost all of the remaining uses
will wait only one jiffy.  Of course, if this proves to be a problem,
it would be easy to make the first few passes through the loop wait only
(say) ten microseconds.

This commit therefore reverts back to a polling loop, but with a one-jiffy
wait instead of the old 100-millisecond wait.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Lan Tianyu <tianyu.lan@intel.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
1 file changed