smp: Make CSD lock acquisition atomic for debug mode
Commit b0473dcd4b1d ("smp: Improve smp_call_function_single()
CSD-lock diagnostics") changed smp_call_function_single() so that,
when CSD lock debugging is enabled, async !wait calls use the
destination CPU csd_data. That improves diagnostics, but it also removes
the single-writer property that made the old csd_lock() safe: multiple
CPUs can now prepare the same destination CPU CSD concurrently.
csd_lock() currently waits for CSD_FLAG_LOCK to clear and then sets the
bit with a non-atomic read-modify-write. Two senders can both see an
unlocked CSD, set the bit, overwrite the callback fields, and enqueue
the same llist node. Re-adding a node that is already the queue head can
make node->next point to itself, leaving the target CPU stuck walking
call_single_queue. Later synchronous work, such as a TLB shootdown, can
then remain queued and trigger soft-lockup warnings or panics.
Keep the single csd_lock() implementation, but when CSD lock debugging is
enabled, acquire CSD_FLAG_LOCK with try_cmpxchg_acquire(). This makes the
destination CPU CSD a real atomic lock in the only configuration where it
can be shared by multiple remote senders, while preserving the existing
non-debug fast path.
Fixes: b0473dcd4b1d ("smp: Improve smp_call_function_single() CSD-lock diagnostics")
Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
1 file changed