RCU :)

Le Wed, Dec 06, 2023 at 08:38:00PM -0800, Paul E. McKenney a écrit :
> On Wed, Dec 06, 2023 at 11:45:18AM +0100, Thomas Gleixner wrote:
> > On Tue, Dec 05 2023 at 20:46, Paul E. McKenney wrote:
> > > On Tue, Dec 05, 2023 at 06:43:37AM -0800, Paul E. McKenney wrote:
> > >> On Tue, Dec 05, 2023 at 02:38:39PM +0100, Frederic Weisbecker wrote:
> > >> > On Mon, Dec 04, 2023 at 09:54:25AM -0800, Paul E. McKenney wrote:
> > >> > > And bisection fingers this bad boy:
> > >> > >
> > >> > > 5c0930ccaad5 ("hrtimers: Push pending hrtimers away from outgoing CPU earlier")
> > >> > >
> > >> > > Double checking...
> > >> >
> > >> > If so I can't figure out why that would cause a problem, I don't see right now
> > >> > how hrtimer interact specifically with rcu boost. Or does that trigger with
> > >> > other scenarios?
> > >>
> > >> It is entirely possible that I have a false bisection.  Although running
> > >> v6.7-rc3 with that commit reverted did complete overnight with no
> > >> failures.  Restarted it to get a second opinion.
> > >
> > > And it passed the second set of tests as well.  This is 2400 hours of
> > > TREE03 with the commit reverted.  In contrast, without the revert,
> > > I would get multiple failures in 1000 hours of TREE03.
> >
> > That's odd. Let me stare at that...
>
> Much appreciated!  Please let me know if there is any debug that would
> be helpful for me to add.

Here is what I could find so far:

[  565.732537]   <idle>-0         1d..7. 470811850us : enqueue_hrtimer: enqueue timer: 00000000672d54bf func: sched_rt_period_timer cpu: 1
[  565.757270]   <idle>-0         1d..7. 470811929us : <stack trace>
[  565.757270]  => __ftrace_trace_stack
[  565.757270]  => enqueue_hrtimer
[  565.757270]  => hrtimer_start_range_ns
[  565.757270]  => enqueue_task_rt
[  565.757270]  => activate_task
[  565.757270]  => ttwu_do_activate.isra.138
[  565.757270]  => try_to_wake_up
[  565.757270]  => swake_up_locked.part.50
[  565.757270]  => swake_up_one
[  565.757270]  => rcutree_report_cpu_dead
[  565.757270]  => cpuhp_report_idle_dead
[  565.757270]  => do_idle
[  565.757270]  => cpu_startup_entry
[  565.757270]  => start_secondary
[  565.757270]  => secondary_startup_64_no_verify

This timer is enqueued _after_ hrtimers have been migrated.
The task awaken is likely rcu_exp_gp_kworker here:

 rcutree_report_cpu_dead()
    rcu_preempt_deferred_qs()
       rcu_report_exp_rnp()
          swake_up_one(&rcu_state.expedited_wq);

I'm not sure about the role played by this sched rt timer but if it being
retained prevents the wake up from happening, that could be the cause of the
issue. But does that mean we are enqueueing a task on a dead rt runqueue?

In any case we should have some debugging to warn similar cases, like:

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2 files changed