Generic:

- Use memdup_array_user() to harden against overflow.

- Unconditionally advertise KVM_CAP_DEVICE_CTRL for all architectures.

- Clean up Kconfigs that all KVM architectures were selecting

- New functionality around "guest_memfd", a new userspace API that
  creates an anonymous file and returns a file descriptor that refers
  to it.  guest_memfd files are bound to their owning virtual machine,
  cannot be mapped, read, or written by userspace, and cannot be resized.
  guest_memfd files do however support PUNCH_HOLE, which can be used to
  switch a memory area between guest_memfd and regular anonymous memory.

- New ioctl KVM_SET_MEMORY_ATTRIBUTES allowing userspace to specify
  per-page attributes for a given page of guest memory; right now the
  only attribute is whether the guest expects to access memory via
  guest_memfd or not, which in Confidential SVMs backed by SEV-SNP,
  TDX or ARM64 pKVM is checked by firmware or hypervisor that guarantees
  confidentiality (AMD PSP, Intel TDX module, or EL2 in the case of pKVM).

x86:

- Support for "software-protected VMs" that can use the new guest_memfd
  and page attributes infrastructure.  This is mostly useful for testing,
  since there is no pKVM-like infrastructure to provide a meaningfully
  reduced TCB.

- Fix a relatively benign off-by-one error when splitting huge pages during
  CLEAR_DIRTY_LOG.

- Fix a bug where KVM could incorrectly test-and-clear dirty bits in non-leaf
  TDP MMU SPTEs if a racing thread replaces a huge SPTE with a non-huge SPTE.

- Use more generic lockdep assertions in paths that don't actually care
  about whether the caller is a reader or a writer.

- let Xen guests opt out of having PV clock reported as "based on a stable TSC",
  because some of them don't expect the "TSC stable" bit (added to the pvclock
  ABI by KVM, but never set by Xen) to be set.

- Revert a bogus, made-up nested SVM consistency check for TLB_CONTROL.

- Advertise flush-by-ASID support for nSVM unconditionally, as KVM always
  flushes on nested transitions, i.e. always satisfies flush requests.  This
  allows running bleeding edge versions of VMware Workstation on top of KVM.

- Sanity check that the CPU supports flush-by-ASID when enabling SEV support.

- On AMD machines with vNMI, always rely on hardware instead of intercepting
  IRET in some cases to detect unmasking of NMIs

- Support for virtualizing Linear Address Masking (LAM)

- Fix a variety of vPMU bugs where KVM fail to stop/reset counters and other state
  prior to refreshing the vPMU model.

- Fix a double-overflow PMU bug by tracking emulated counter events using a
  dedicated field instead of snapshotting the "previous" counter.  If the
  hardware PMC count triggers overflow that is recognized in the same VM-Exit
  that KVM manually bumps an event count, KVM would pend PMIs for both the
  hardware-triggered overflow and for KVM-triggered overflow.

- Turn off KVM_WERROR by default for all configs so that it's not
  inadvertantly enabled by non-KVM developers, which can be problematic for
  subsystems that require no regressions for W=1 builds.

- Advertise all of the host-supported CPUID bits that enumerate IA32_SPEC_CTRL
  "features".

- Don't force a masterclock update when a vCPU synchronizes to the current TSC
  generation, as updating the masterclock can cause kvmclock's time to "jump"
  unexpectedly, e.g. when userspace hotplugs a pre-created vCPU.

- Use RIP-relative address to read kvm_rebooting in the VM-Enter fault paths,
  partly as a super minor optimization, but mostly to make KVM play nice with
  position independent executable builds.

- Guard KVM-on-HyperV's range-based TLB flush hooks with an #ifdef on
  CONFIG_HYPERV as a minor optimization, and to self-document the code.

- Add CONFIG_KVM_HYPERV to allow disabling KVM support for HyperV "emulation"
  at build time.

ARM64:

- LPA2 support, adding 52bit IPA/PA capability for 4kB and 16kB
  base granule sizes. Branch shared with the arm64 tree.

- Large Fine-Grained Trap rework, bringing some sanity to the
  feature, although there is more to come. This comes with
  a prefix branch shared with the arm64 tree.

- Some additional Nested Virtualization groundwork, mostly
  introducing the NV2 VNCR support and retargetting the NV
  support to that version of the architecture.

- A small set of vgic fixes and associated cleanups.

Loongarch:

- Optimization for memslot hugepage checking

- Cleanup and fix some HW/SW timer issues

- Add LSX/LASX (128bit/256bit SIMD) support

RISC-V:

- KVM_GET_REG_LIST improvement for vector registers

- Generate ISA extension reg_list using macros in get-reg-list selftest

- Support for reporting steal time along with selftest

s390:

- Bugfixes

Selftests:

- Fix an annoying goof where the NX hugepage test prints out garbage
  instead of the magic token needed to run the test.

- Fix build errors when a header is delete/moved due to a missing flag
  in the Makefile.

- Detect if KVM bugged/killed a selftest's VM and print out a helpful
  message instead of complaining that a random ioctl() failed.

- Annotate the guest printf/assert helpers with __printf(), and fix the
  various bugs that were lurking due to lack of said annotation.

There are two non-KVM patches buried in the middle of guest_memfd support:

  fs: Rename anon_inode_getfile_secure() and anon_inode_getfd_secure()
  mm: Add AS_UNMOVABLE to mark mapping as completely unmovable

The first is small and mostly suggested-by Christian Brauner; the second
a bit less so but it was written by an mm person (Vlastimil Babka).
x86/kvm: Do not try to disable kvmclock if it was not enabled

kvm_guest_cpu_offline() tries to disable kvmclock regardless if it is
present in the VM. It leads to write to a MSR that doesn't exist on some
configurations, namely in TDX guest:

	unchecked MSR access error: WRMSR to 0x12 (tried to write 0x0000000000000000)
	at rIP: 0xffffffff8110687c (kvmclock_disable+0x1c/0x30)

kvmclock enabling is gated by CLOCKSOURCE and CLOCKSOURCE2 KVM paravirt
features.

Do not disable kvmclock if it was not enabled.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Fixes: c02027b5742b ("x86/kvm: Disable kvmclock on all CPUs on shutdown")
Reviewed-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: stable@vger.kernel.org
Message-Id: <20231205004510.27164-6-kirill.shutemov@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
1 file changed