A set of fixes for X86:

 - Ensure that the PIT is set up when the local APIC is disable or
   configured in legacy mode. This is caused by an ordering issue
   introduced in the recent changes which skip PIT initialization when the
   TSC and APIC frequencies are already known.

 - Handle malformed SRAT tables during early ACPI parsing which caused an
   infinite loop anda boot hang.

 - Fix a long standing race in the affinity setting code which affects PCI
   devices with non-maskable MSI interrupts. The problem is caused by the
   non-atomic writes of the MSI address (destination APIC id) and data
   (vector) fields which the device uses to construct the MSI message. The
   non-atomic writes are mandated by PCI.

   If both fields change and the device raises an interrupt after writing
   address and before writing data, then the MSI block constructs a
   inconsistent message which causes interrupts to be lost and subsequent
   malfunction of the device.

   The fix is to redirect the interrupt to the new vector on the current
   CPU first and then switch it over to the new target CPU. This allows to
   observe an eventually raised interrupt in the transitional stage (old
   CPU, new vector) to be observed in the APIC IRR and retriggered on the
   new target CPU and the new vector. The potential spurious interrupts
   caused by this are harmless and can in the worst case expose a buggy
   driver (all handlers have to be able to deal with spurious interrupts as
   they can and do happen for various reasons).

 - Add the missing suspend/resume mechanism for the HYPERV hypercall page
   which prevents resume hibernation on HYPERV guests. This change got
   lost before the merge window.

 - Mask the IOAPIC before disabling the local APIC to prevent potentially
   stale IOAPIC remote IRR bits which cause stale interrupt lines after
   resume.
x86/apic: Mask IOAPIC entries when disabling the local APIC

When a system suspends, the local APIC is disabled in the suspend sequence,
but the IOAPIC is left in the current state. This means unmasked interrupt
lines stay unmasked. This is usually the case for IOAPIC pin 9 to which the
ACPI interrupt is connected.

That means that in suspended state the IOAPIC can respond to an external
interrupt, e.g. the wakeup via keyboard/RTC/ACPI, but the interrupt message
cannot be handled by the disabled local APIC. As a consequence the Remote
IRR bit is set, but the local APIC does not send an EOI to acknowledge
it. This causes the affected interrupt line to become stale and the stale
Remote IRR bit will cause a hang when __synchronize_hardirq() is invoked
for that interrupt line.

To prevent this, mask all IOAPIC entries before disabling the local
APIC. The resume code already has the unmask operation inside.

[ tglx: Massaged changelog ]

Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/1579076539-7267-1-git-send-email-TonyWWang-oc@zhaoxin.com

1 file changed