refs/heads/frederic.2023.12.07a - linux/kernel/git/paulmck/linux-rcu

commit	826b41c02aeda516a69f97448ddac0c9df9b199a	[log] [tgz]
author	Frederic Weisbecker <frederic@kernel.org>	Thu Dec 07 15:23:08 2023 +0100
committer	Paul E. McKenney <paulmck@kernel.org>	Thu Dec 07 14:16:25 2023 -0800
tree	1c26213bfcd12529be38600ce01531a15e2e9cb3
parent	2cc14f52aeb78ce3f29677c2de1f06c0e91471ab [diff]

RCU :)

Le Wed, Dec 06, 2023 at 08:38:00PM -0800, Paul E. McKenney a écrit :
> On Wed, Dec 06, 2023 at 11:45:18AM +0100, Thomas Gleixner wrote:
> > On Tue, Dec 05 2023 at 20:46, Paul E. McKenney wrote:
> > > On Tue, Dec 05, 2023 at 06:43:37AM -0800, Paul E. McKenney wrote:
> > >> On Tue, Dec 05, 2023 at 02:38:39PM +0100, Frederic Weisbecker wrote:
> > >> > On Mon, Dec 04, 2023 at 09:54:25AM -0800, Paul E. McKenney wrote:
> > >> > > And bisection fingers this bad boy:
> > >> > >
> > >> > > 5c0930ccaad5 ("hrtimers: Push pending hrtimers away from outgoing CPU earlier")
> > >> > >
> > >> > > Double checking...
> > >> >
> > >> > If so I can't figure out why that would cause a problem, I don't see right now
> > >> > how hrtimer interact specifically with rcu boost. Or does that trigger with
> > >> > other scenarios?
> > >>
> > >> It is entirely possible that I have a false bisection.  Although running
> > >> v6.7-rc3 with that commit reverted did complete overnight with no
> > >> failures.  Restarted it to get a second opinion.
> > >
> > > And it passed the second set of tests as well.  This is 2400 hours of
> > > TREE03 with the commit reverted.  In contrast, without the revert,
> > > I would get multiple failures in 1000 hours of TREE03.
> >
> > That's odd. Let me stare at that...
>
> Much appreciated!  Please let me know if there is any debug that would
> be helpful for me to add.

Here is what I could find so far:

[  565.732537]   <idle>-0         1d..7. 470811850us : enqueue_hrtimer: enqueue timer: 00000000672d54bf func: sched_rt_period_timer cpu: 1
[  565.757270]   <idle>-0         1d..7. 470811929us : <stack trace>
[  565.757270]  => __ftrace_trace_stack
[  565.757270]  => enqueue_hrtimer
[  565.757270]  => hrtimer_start_range_ns
[  565.757270]  => enqueue_task_rt
[  565.757270]  => activate_task
[  565.757270]  => ttwu_do_activate.isra.138
[  565.757270]  => try_to_wake_up
[  565.757270]  => swake_up_locked.part.50
[  565.757270]  => swake_up_one
[  565.757270]  => rcutree_report_cpu_dead
[  565.757270]  => cpuhp_report_idle_dead
[  565.757270]  => do_idle
[  565.757270]  => cpu_startup_entry
[  565.757270]  => start_secondary
[  565.757270]  => secondary_startup_64_no_verify

This timer is enqueued _after_ hrtimers have been migrated.
The task awaken is likely rcu_exp_gp_kworker here:

 rcutree_report_cpu_dead()
    rcu_preempt_deferred_qs()
       rcu_report_exp_rnp()
          swake_up_one(&rcu_state.expedited_wq);

I'm not sure about the role played by this sched rt timer but if it being
retained prevents the wake up from happening, that could be the cause of the
issue. But does that mean we are enqueueing a task on a dead rt runqueue?

In any case we should have some debugging to warn similar cases, like:

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

2 files changed

tree: 1c26213bfcd12529be38600ce01531a15e2e9cb3