Generic:
- Use memdup_array_user() to harden against overflow.
- Unconditionally advertise KVM_CAP_DEVICE_CTRL for all architectures.
- Clean up Kconfigs that all KVM architectures were selecting
- New functionality around "guest_memfd", a new userspace API that
creates an anonymous file and returns a file descriptor that refers
to it. guest_memfd files are bound to their owning virtual machine,
cannot be mapped, read, or written by userspace, and cannot be resized.
guest_memfd files do however support PUNCH_HOLE, which can be used to
switch a memory area between guest_memfd and regular anonymous memory.
- New ioctl KVM_SET_MEMORY_ATTRIBUTES allowing userspace to specify
per-page attributes for a given page of guest memory; right now the
only attribute is whether the guest expects to access memory via
guest_memfd or not, which in Confidential SVMs backed by SEV-SNP,
TDX or ARM64 pKVM is checked by firmware or hypervisor that guarantees
confidentiality (AMD PSP, Intel TDX module, or EL2 in the case of pKVM).
x86:
- Support for "software-protected VMs" that can use the new guest_memfd
and page attributes infrastructure. This is mostly useful for testing,
since there is no pKVM-like infrastructure to provide a meaningfully
reduced TCB.
- Fix a relatively benign off-by-one error when splitting huge pages during
CLEAR_DIRTY_LOG.
- Fix a bug where KVM could incorrectly test-and-clear dirty bits in non-leaf
TDP MMU SPTEs if a racing thread replaces a huge SPTE with a non-huge SPTE.
- Use more generic lockdep assertions in paths that don't actually care
about whether the caller is a reader or a writer.
- let Xen guests opt out of having PV clock reported as "based on a stable TSC",
because some of them don't expect the "TSC stable" bit (added to the pvclock
ABI by KVM, but never set by Xen) to be set.
- Revert a bogus, made-up nested SVM consistency check for TLB_CONTROL.
- Advertise flush-by-ASID support for nSVM unconditionally, as KVM always
flushes on nested transitions, i.e. always satisfies flush requests. This
allows running bleeding edge versions of VMware Workstation on top of KVM.
- Sanity check that the CPU supports flush-by-ASID when enabling SEV support.
- On AMD machines with vNMI, always rely on hardware instead of intercepting
IRET in some cases to detect unmasking of NMIs
- Support for virtualizing Linear Address Masking (LAM)
- Fix a variety of vPMU bugs where KVM fail to stop/reset counters and other state
prior to refreshing the vPMU model.
- Fix a double-overflow PMU bug by tracking emulated counter events using a
dedicated field instead of snapshotting the "previous" counter. If the
hardware PMC count triggers overflow that is recognized in the same VM-Exit
that KVM manually bumps an event count, KVM would pend PMIs for both the
hardware-triggered overflow and for KVM-triggered overflow.
- Turn off KVM_WERROR by default for all configs so that it's not
inadvertantly enabled by non-KVM developers, which can be problematic for
subsystems that require no regressions for W=1 builds.
- Advertise all of the host-supported CPUID bits that enumerate IA32_SPEC_CTRL
"features".
- Don't force a masterclock update when a vCPU synchronizes to the current TSC
generation, as updating the masterclock can cause kvmclock's time to "jump"
unexpectedly, e.g. when userspace hotplugs a pre-created vCPU.
- Use RIP-relative address to read kvm_rebooting in the VM-Enter fault paths,
partly as a super minor optimization, but mostly to make KVM play nice with
position independent executable builds.
- Guard KVM-on-HyperV's range-based TLB flush hooks with an #ifdef on
CONFIG_HYPERV as a minor optimization, and to self-document the code.
- Add CONFIG_KVM_HYPERV to allow disabling KVM support for HyperV "emulation"
at build time.
ARM64:
- LPA2 support, adding 52bit IPA/PA capability for 4kB and 16kB
base granule sizes. Branch shared with the arm64 tree.
- Large Fine-Grained Trap rework, bringing some sanity to the
feature, although there is more to come. This comes with
a prefix branch shared with the arm64 tree.
- Some additional Nested Virtualization groundwork, mostly
introducing the NV2 VNCR support and retargetting the NV
support to that version of the architecture.
- A small set of vgic fixes and associated cleanups.
Loongarch:
- Optimization for memslot hugepage checking
- Cleanup and fix some HW/SW timer issues
- Add LSX/LASX (128bit/256bit SIMD) support
RISC-V:
- KVM_GET_REG_LIST improvement for vector registers
- Generate ISA extension reg_list using macros in get-reg-list selftest
- Support for reporting steal time along with selftest
s390:
- Bugfixes
Selftests:
- Fix an annoying goof where the NX hugepage test prints out garbage
instead of the magic token needed to run the test.
- Fix build errors when a header is delete/moved due to a missing flag
in the Makefile.
- Detect if KVM bugged/killed a selftest's VM and print out a helpful
message instead of complaining that a random ioctl() failed.
- Annotate the guest printf/assert helpers with __printf(), and fix the
various bugs that were lurking due to lack of said annotation.
There are two non-KVM patches buried in the middle of guest_memfd support:
fs: Rename anon_inode_getfile_secure() and anon_inode_getfd_secure()
mm: Add AS_UNMOVABLE to mark mapping as completely unmovable
The first is small and mostly suggested-by Christian Brauner; the second
a bit less so but it was written by an mm person (Vlastimil Babka).
x86/kvm: Do not try to disable kvmclock if it was not enabled
kvm_guest_cpu_offline() tries to disable kvmclock regardless if it is
present in the VM. It leads to write to a MSR that doesn't exist on some
configurations, namely in TDX guest:
unchecked MSR access error: WRMSR to 0x12 (tried to write 0x0000000000000000)
at rIP: 0xffffffff8110687c (kvmclock_disable+0x1c/0x30)
kvmclock enabling is gated by CLOCKSOURCE and CLOCKSOURCE2 KVM paravirt
features.
Do not disable kvmclock if it was not enabled.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Fixes: c02027b5742b ("x86/kvm: Disable kvmclock on all CPUs on shutdown")
Reviewed-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: stable@vger.kernel.org
Message-Id: <20231205004510.27164-6-kirill.shutemov@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
1 file changed