Clocksource watchdog commits for v6.3

This pull request contains the following:

o	Improvements to clocksource-watchdog console messages.

o	Loosening of the clocksource-watchdog skew criteria to match
	those of NTP (500 parts per million, relaxed from 400 parts
	per million).  If it is good enough for NTP, it is good enough
	for the clocksource watchdog.

o	Suspend clocksource-watchdog checking temporarily when high
	memory latencies are detected.	This avoids the false-positive
	clock-skew events that have been seen on production systems
	running memory-intensive workloads.

o	On systems where the TSC is deemed trustworthy, use it as the
	watchdog timesource.  This permits clock-skew events to be
	detected, but avoids forcing workloads to use the slow HPET
	and ACPI PM timers.  These last two timers are slow enough to
	cause systems to be needlessly marked bad on the one hand, and
	real skew does sometimes happen on production systems running
	production workloads on the other.  And sometimes it is the fault
	of the TSC, or at least of the firmware that told the kernel to
	program the TSC with the wrong frequency.

o	Add a tsc=revalidate kernel boot parameter to allow the kernel
	to diagnose cases where the TSC hardware works fine, but was told
	by firmware to tick at the wrong frequency.  Such cases are rare,
	but they really have happened on production systems.
x86/tsc: Add option to force frequency recalibration with HW timer

The kernel assumes that the TSC frequency which is provided by the
hardware / firmware via MSRs or CPUID(0x15) is correct after applying
a few basic consistency checks. This disables the TSC recalibration
against HPET or PM timer.

As a result there is no mechanism to validate that frequency in cases
where a firmware or hardware defect is suspected. And there was case
that some user used atomic clock to measure the TSC frequency and
reported an inaccuracy issue, which was later fixed in firmware.

Add an option 'recalibrate' for 'tsc' kernel parameter to force the
tsc freq recalibration with HPET or PM timer, and warn if the
deviation from previous value is more than about 500 PPM, which
provides a way to verify the data from hardware / firmware.

There is no functional change to existing work flow.

Recently there was a real-world case: "The 40ms/s divergence between
TSC and HPET was observed on hardware that is quite recent" [1], on
that platform the TSC frequence 1896 MHz was got from CPUID(0x15),
and the force-reclibration with HPET/PMTIMER both calibrated out
value of 1975 MHz, which also matched with check from software
'chronyd', indicating it's a problem of BIOS or firmware.

[Thanks tglx for helping improving the commit log]
[ paulmck: Wordsmith Kconfig help text. ]

[1]. https://lore.kernel.org/lkml/20221117230910.GI4001@paulmck-ThinkPad-P17-Gen-1/
Signed-off-by: Feng Tang <feng.tang@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: <x86@kernel.org>
Cc: <linux-doc@vger.kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2 files changed