Skip to content
Snippets Groups Projects
  1. May 30, 2019
  2. May 28, 2019
  3. May 24, 2019
    • Peter Xu's avatar
      kvm: Check irqchip mode before assign irqfd · 654f1f13
      Peter Xu authored
      
      When assigning kvm irqfd we didn't check the irqchip mode but we allow
      KVM_IRQFD to succeed with all the irqchip modes.  However it does not
      make much sense to create irqfd even without the kernel chips.  Let's
      provide a arch-dependent helper to check whether a specific irqfd is
      allowed by the arch.  At least for x86, it should make sense to check:
      
      - when irqchip mode is NONE, all irqfds should be disallowed, and,
      
      - when irqchip mode is SPLIT, irqfds that are with resamplefd should
        be disallowed.
      
      For either of the case, previously we'll silently ignore the irq or
      the irq ack event if the irqchip mode is incorrect.  However that can
      cause misterious guest behaviors and it can be hard to triage.  Let's
      fail KVM_IRQFD even earlier to detect these incorrect configurations.
      
      CC: Paolo Bonzini <pbonzini@redhat.com>
      CC: Radim Krčmář <rkrcmar@redhat.com>
      CC: Alex Williamson <alex.williamson@redhat.com>
      CC: Eduardo Habkost <ehabkost@redhat.com>
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      654f1f13
    • Paolo Bonzini's avatar
      kvm: fix compilation on s390 · d30b214d
      Paolo Bonzini authored
      
      s390 does not have memremap, even though in this particular case it
      would be useful.
      
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      d30b214d
    • Wanpeng Li's avatar
      KVM: Fix spinlock taken warning during host resume · 2eb06c30
      Wanpeng Li authored
      
       WARNING: CPU: 0 PID: 13554 at kvm/arch/x86/kvm//../../../virt/kvm/kvm_main.c:4183 kvm_resume+0x3c/0x40 [kvm]
        CPU: 0 PID: 13554 Comm: step_after_susp Tainted: G           OE     5.1.0-rc4+ #1
        RIP: 0010:kvm_resume+0x3c/0x40 [kvm]
        Call Trace:
         syscore_resume+0x63/0x2d0
         suspend_devices_and_enter+0x9d1/0xa40
         pm_suspend+0x33a/0x3b0
         state_store+0x82/0xf0
         kobj_attr_store+0x12/0x20
         sysfs_kf_write+0x4b/0x60
         kernfs_fop_write+0x120/0x1a0
         __vfs_write+0x1b/0x40
         vfs_write+0xcd/0x1d0
         ksys_write+0x5f/0xe0
         __x64_sys_write+0x1a/0x20
         do_syscall_64+0x6f/0x6c0
         entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Commit ca84d1a2 (KVM: x86: Add clock sync request to hardware enable) mentioned
      that "we always hold kvm_lock when hardware_enable is called.  The one place that
      doesn't need to worry about it is resume, as resuming a frozen CPU, the spinlock
      won't be taken." However, commit 6706dae9 (virt/kvm: Replace spin_is_locked() with
      lockdep) introduces a bug, it asserts when the lock is not held which is contrary
      to the original goal.
      
      This patch fixes it by WARN_ON when the lock is held.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Paul E. McKenney <paulmck@linux.ibm.com>
      Signed-off-by: default avatarWanpeng Li <wanpengli@tencent.com>
      Fixes: 6706dae9 ("virt/kvm: Replace spin_is_locked() with lockdep")
      [Wrap with #ifdef CONFIG_LOCKDEP - Paolo]
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      2eb06c30
    • James Morse's avatar
      KVM: arm/arm64: Move cc/it checks under hyp's Makefile to avoid instrumentation · 623e1528
      James Morse authored
      
      KVM has helpers to handle the condition codes of trapped aarch32
      instructions. These are marked __hyp_text and used from HYP, but they
      aren't built by the 'hyp' Makefile, which has all the runes to avoid ASAN
      and KCOV instrumentation.
      
      Move this code to a new hyp/aarch32.c to avoid a hyp-panic when starting
      an aarch32 guest on a host built with the ASAN/KCOV debug options.
      
      Fixes: 021234ef ("KVM: arm64: Make kvm_condition_valid32() accessible from EL2")
      Fixes: 8cebe750 ("arm64: KVM: Make kvm_skip_instr32 available to HYP")
      Signed-off-by: default avatarJames Morse <james.morse@arm.com>
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      623e1528
  4. May 21, 2019
  5. May 17, 2019
    • Paolo Bonzini's avatar
      kvm: fix compilation on aarch64 · c011d23b
      Paolo Bonzini authored
      
      Commit e45adf66 ("KVM: Introduce a new guest mapping API", 2019-01-31)
      introduced a build failure on aarch64 defconfig:
      
      $ make -j$(nproc) ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- O=out defconfig \
                      Image.gz
      ...
      ../arch/arm64/kvm/../../../virt/kvm/kvm_main.c:
          In function '__kvm_map_gfn':
      ../arch/arm64/kvm/../../../virt/kvm/kvm_main.c:1763:9: error:
          implicit declaration of function 'memremap'; did you mean 'memset_p'?
      ../arch/arm64/kvm/../../../virt/kvm/kvm_main.c:1763:46: error:
          'MEMREMAP_WB' undeclared (first use in this function)
      ../arch/arm64/kvm/../../../virt/kvm/kvm_main.c:
          In function 'kvm_vcpu_unmap':
      ../arch/arm64/kvm/../../../virt/kvm/kvm_main.c:1795:3: error:
          implicit declaration of function 'memunmap'; did you mean 'vm_munmap'?
      
      because these functions are declared in <linux/io.h> rather than <asm/io.h>,
      and the former was being pulled in already on x86 but not on aarch64.
      
      Reported-by: default avatarNathan Chancellor <natechancellor@gmail.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      c011d23b
  6. May 14, 2019
  7. May 08, 2019
  8. Apr 30, 2019
    • KarimAllah Ahmed's avatar
      KVM: Introduce a new guest mapping API · e45adf66
      KarimAllah Ahmed authored
      
      In KVM, specially for nested guests, there is a dominant pattern of:
      
      	=> map guest memory -> do_something -> unmap guest memory
      
      In addition to all this unnecessarily noise in the code due to boiler plate
      code, most of the time the mapping function does not properly handle memory
      that is not backed by "struct page". This new guest mapping API encapsulate
      most of this boiler plate code and also handles guest memory that is not
      backed by "struct page".
      
      The current implementation of this API is using memremap for memory that is
      not backed by a "struct page" which would lead to a huge slow-down if it
      was used for high-frequency mapping operations. The API does not have any
      effect on current setups where guest memory is backed by a "struct page".
      Further patches are going to also introduce a pfn-cache which would
      significantly improve the performance of the memremap case.
      
      Signed-off-by: default avatarKarimAllah Ahmed <karahmed@amazon.de>
      Reviewed-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      e45adf66
    • Jiang Biao's avatar
      kvm_main: fix some comments · b8b00220
      Jiang Biao authored
      
      is_dirty has been renamed to flush, but the comment for it is
      outdated. And the description about @flush parameter for
      kvm_clear_dirty_log_protect() is missing, add it in this patch
      as well.
      
      Signed-off-by: default avatarJiang Biao <benbjiang@tencent.com>
      Reviewed-by: default avatarCornelia Huck <cohuck@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      b8b00220
    • Paolo Bonzini's avatar
      KVM: fix KVM_CLEAR_DIRTY_LOG for memory slots of unaligned size · 65c4189d
      Paolo Bonzini authored
      
      If a memory slot's size is not a multiple of 64 pages (256K), then
      the KVM_CLEAR_DIRTY_LOG API is unusable: clearing the final 64 pages
      either requires the requested page range to go beyond memslot->npages,
      or requires log->num_pages to be unaligned, and kvm_clear_dirty_log_protect
      requires log->num_pages to be both in range and aligned.
      
      To allow this case, allow log->num_pages not to be a multiple of 64 if
      it ends exactly on the last page of the slot.
      
      Reported-by: default avatarPeter Xu <peterx@redhat.com>
      Fixes: 98938aa8 ("KVM: validate userspace input in kvm_clear_dirty_log_protect()", 2019-01-02)
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      65c4189d
    • Paolo Bonzini's avatar
      KVM: fix KVM_CLEAR_DIRTY_LOG for memory slots of unaligned size · 76d58e0f
      Paolo Bonzini authored
      
      If a memory slot's size is not a multiple of 64 pages (256K), then
      the KVM_CLEAR_DIRTY_LOG API is unusable: clearing the final 64 pages
      either requires the requested page range to go beyond memslot->npages,
      or requires log->num_pages to be unaligned, and kvm_clear_dirty_log_protect
      requires log->num_pages to be both in range and aligned.
      
      To allow this case, allow log->num_pages not to be a multiple of 64 if
      it ends exactly on the last page of the slot.
      
      Reported-by: default avatarPeter Xu <peterx@redhat.com>
      Fixes: 98938aa8 ("KVM: validate userspace input in kvm_clear_dirty_log_protect()", 2019-01-02)
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      76d58e0f
    • Cédric Le Goater's avatar
      KVM: Introduce a 'release' method for KVM devices · 2bde9b3e
      Cédric Le Goater authored
      
      When a P9 sPAPR VM boots, the CAS negotiation process determines which
      interrupt mode to use (XICS legacy or XIVE native) and invokes a
      machine reset to activate the chosen mode.
      
      To be able to switch from one interrupt mode to another, we introduce
      the capability to release a KVM device without destroying the VM. The
      KVM device interface is extended with a new 'release' method which is
      called when the file descriptor of the device is closed.
      
      Once 'release' is called, the 'destroy' method will not be called
      anymore as the device is removed from the device list of the VM.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarCédric Le Goater <clg@kaod.org>
      Reviewed-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      2bde9b3e
    • Cédric Le Goater's avatar
      KVM: Introduce a 'mmap' method for KVM devices · a1cd3f08
      Cédric Le Goater authored
      
      Some KVM devices will want to handle special mappings related to the
      underlying HW. For instance, the XIVE interrupt controller of the
      POWER9 processor has MMIO pages for thread interrupt management and
      for interrupt source control that need to be exposed to the guest when
      the OS has the required support.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarCédric Le Goater <clg@kaod.org>
      Reviewed-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      a1cd3f08
  9. Apr 26, 2019
  10. Apr 25, 2019
    • Christoffer Dall's avatar
      KVM: arm/arm64: Don't emulate virtual timers on userspace ioctls · 6bc21000
      Christoffer Dall authored
      
      When a VCPU never runs before a guest exists, but we set timer registers
      up via ioctls, the associated hrtimer might never get cancelled.
      
      Since we moved vcpu_load/put into the arch-specific implementations and
      only have load/put for KVM_RUN, we won't ever have a scheduled hrtimer
      for emulating a timer when modifying the timer state via an ioctl from
      user space.  All we need to do is make sure that we pick up the right
      state when we load the timer state next time userspace calls KVM_RUN
      again.
      
      We also do not need to worry about this interacting with the bg_timer,
      because if we were in WFI from the guest, and somehow ended up in a
      kvm_arm_timer_set_reg, it means that:
      
       1. the VCPU thread has received a signal,
       2. we have called vcpu_load when being scheduled in again,
       3. we have called vcpu_put when we returned to userspace for it to issue
          another ioctl
      
      And therefore will not have a bg_timer programmed and the event is
      treated as a spurious wakeup from WFI if userspace decides to run the
      vcpu again even if there are not virtual interrupts.
      
      This fixes stray virtual timer interrupts triggered by an expiring
      hrtimer, which happens after a failed live migration, for instance.
      
      Fixes: bee038a6 ("KVM: arm/arm64: Rework the timer code to use a timer_map")
      Signed-off-by: default avatarChristoffer Dall <christoffer.dall@arm.com>
      Reported-by: default avatarAndre Przywara <andre.przywara@arm.com>
      Tested-by: default avatarAndre Przywara <andre.przywara@arm.com>
      Signed-off-by: default avatarAndre Przywara <andre.przywara@arm.com>
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      6bc21000
    • Suzuki K Poulose's avatar
      kvm: arm: Skip stage2 huge mappings for unaligned ipa backed by THP · 2e8010bb
      Suzuki K Poulose authored
      
      With commit a80868f3, we no longer ensure that the
      THP page is properly aligned in the guest IPA. Skip the stage2
      huge mapping for unaligned IPA backed by transparent hugepages.
      
      Fixes: a80868f3 ("KVM: arm/arm64: Enforce PTE mappings at stage2 when needed")
      Reported-by: default avatarEric Auger <eric.auger@redhat.com>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Chirstoffer Dall <christoffer.dall@arm.com>
      Cc: Zenghui Yu <yuzenghui@huawei.com>
      Cc: Zheng Xiang <zhengxiang9@huawei.com>
      Cc: Andrew Murray <andrew.murray@arm.com>
      Cc: Eric Auger <eric.auger@redhat.com>
      Signed-off-by: default avatarSuzuki K Poulose <suzuki.poulose@arm.com>
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      2e8010bb
    • Andrew Jones's avatar
      KVM: arm/arm64: Ensure vcpu target is unset on reset failure · 811328fc
      Andrew Jones authored
      
      A failed KVM_ARM_VCPU_INIT should not set the vcpu target,
      as the vcpu target is used by kvm_vcpu_initialized() to
      determine if other vcpu ioctls may proceed. We need to set
      the target before calling kvm_reset_vcpu(), but if that call
      fails, we should then unset it and clear the feature bitmap
      while we're at it.
      
      Signed-off-by: default avatarAndrew Jones <drjones@redhat.com>
      [maz: Simplified patch, completed commit message]
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      811328fc
  11. Apr 24, 2019
    • Andrew Murray's avatar
      arm64: KVM: Enable VHE support for :G/:H perf event modifiers · 435e53fb
      Andrew Murray authored
      
      With VHE different exception levels are used between the host (EL2) and
      guest (EL1) with a shared exception level for userpace (EL0). We can take
      advantage of this and use the PMU's exception level filtering to avoid
      enabling/disabling counters in the world-switch code. Instead we just
      modify the counter type to include or exclude EL0 at vcpu_{load,put} time.
      
      We also ensure that trapped PMU system register writes do not re-enable
      EL0 when reconfiguring the backing perf events.
      
      This approach completely avoids blackout windows seen with !VHE.
      
      Suggested-by: default avatarChristoffer Dall <christoffer.dall@arm.com>
      Signed-off-by: default avatarAndrew Murray <andrew.murray@arm.com>
      Acked-by: default avatarWill Deacon <will.deacon@arm.com>
      Reviewed-by: default avatarSuzuki K Poulose <suzuki.poulose@arm.com>
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      435e53fb
    • Andrew Murray's avatar
      arm64: KVM: Encapsulate kvm_cpu_context in kvm_host_data · 630a1685
      Andrew Murray authored
      
      The virt/arm core allocates a kvm_cpu_context_t percpu, at present this is
      a typedef to kvm_cpu_context and is used to store host cpu context. The
      kvm_cpu_context structure is also used elsewhere to hold vcpu context.
      In order to use the percpu to hold additional future host information we
      encapsulate kvm_cpu_context in a new structure and rename the typedef and
      percpu to match.
      
      Signed-off-by: default avatarAndrew Murray <andrew.murray@arm.com>
      Reviewed-by: default avatarSuzuki K Poulose <suzuki.poulose@arm.com>
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      630a1685
    • Mark Rutland's avatar
      KVM: arm/arm64: Context-switch ptrauth registers · 384b40ca
      Mark Rutland authored
      
      When pointer authentication is supported, a guest may wish to use it.
      This patch adds the necessary KVM infrastructure for this to work, with
      a semi-lazy context switch of the pointer auth state.
      
      Pointer authentication feature is only enabled when VHE is built
      in the kernel and present in the CPU implementation so only VHE code
      paths are modified.
      
      When we schedule a vcpu, we disable guest usage of pointer
      authentication instructions and accesses to the keys. While these are
      disabled, we avoid context-switching the keys. When we trap the guest
      trying to use pointer authentication functionality, we change to eagerly
      context-switching the keys, and enable the feature. The next time the
      vcpu is scheduled out/in, we start again. However the host key save is
      optimized and implemented inside ptrauth instruction/register access
      trap.
      
      Pointer authentication consists of address authentication and generic
      authentication, and CPUs in a system might have varied support for
      either. Where support for either feature is not uniform, it is hidden
      from guests via ID register emulation, as a result of the cpufeature
      framework in the host.
      
      Unfortunately, address authentication and generic authentication cannot
      be trapped separately, as the architecture provides a single EL2 trap
      covering both. If we wish to expose one without the other, we cannot
      prevent a (badly-written) guest from intermittently using a feature
      which is not uniformly supported (when scheduled on a physical CPU which
      supports the relevant feature). Hence, this patch expects both type of
      authentication to be present in a cpu.
      
      This switch of key is done from guest enter/exit assembly as preparation
      for the upcoming in-kernel pointer authentication support. Hence, these
      key switching routines are not implemented in C code as they may cause
      pointer authentication key signing error in some situations.
      
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      [Only VHE, key switch in full assembly, vcpu_has_ptrauth checks
      , save host key in ptrauth exception trap]
      Signed-off-by: default avatarAmit Daniel Kachhap <amit.kachhap@arm.com>
      Reviewed-by: default avatarJulien Thierry <julien.thierry@arm.com>
      Cc: Christoffer Dall <christoffer.dall@arm.com>
      Cc: kvmarm@lists.cs.columbia.edu
      [maz: various fixups]
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      384b40ca
  12. Apr 18, 2019
  13. Apr 16, 2019
  14. Apr 09, 2019
  15. Apr 03, 2019
  16. Mar 30, 2019
    • Wei Huang's avatar
      KVM: arm/arm64: arch_timer: Fix CNTP_TVAL calculation · 8fa76162
      Wei Huang authored
      
      Recently the generic timer test of kvm-unit-tests failed to complete
      (stalled) when a physical timer is being used. This issue is caused
      by incorrect update of CNTP_CVAL when CNTP_TVAL is being accessed,
      introduced by 'Commit 84135d3d ("KVM: arm/arm64: consolidate arch
      timer trap handlers")'. According to Arm ARM, the read/write behavior
      of accesses to the TVAL registers is expected to be:
      
        * READ: TimerValue = (CompareValue – (Counter - Offset)
        * WRITE: CompareValue = ((Counter - Offset) + Sign(TimerValue)
      
      This patch fixes the TVAL read/write code path according to the
      specification.
      
      Fixes: 84135d3d ("KVM: arm/arm64: consolidate arch timer trap handlers")
      Signed-off-by: default avatarWei Huang <wei@redhat.com>
      [maz: commit message tidy-up]
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      8fa76162
  17. Mar 29, 2019
    • Dave Martin's avatar
      KVM: arm/arm64: Add KVM_ARM_VCPU_FINALIZE ioctl · 7dd32a0d
      Dave Martin authored
      
      Some aspects of vcpu configuration may be too complex to be
      completed inside KVM_ARM_VCPU_INIT.  Thus, there may be a
      requirement for userspace to do some additional configuration
      before various other ioctls will work in a consistent way.
      
      In particular this will be the case for SVE, where userspace will
      need to negotiate the set of vector lengths to be made available to
      the guest before the vcpu becomes fully usable.
      
      In order to provide an explicit way for userspace to confirm that
      it has finished setting up a particular vcpu feature, this patch
      adds a new ioctl KVM_ARM_VCPU_FINALIZE.
      
      When userspace has opted into a feature that requires finalization,
      typically by means of a feature flag passed to KVM_ARM_VCPU_INIT, a
      matching call to KVM_ARM_VCPU_FINALIZE is now required before
      KVM_RUN or KVM_GET_REG_LIST is allowed.  Individual features may
      impose additional restrictions where appropriate.
      
      No existing vcpu features are affected by this, so current
      userspace implementations will continue to work exactly as before,
      with no need to issue KVM_ARM_VCPU_FINALIZE.
      
      As implemented in this patch, KVM_ARM_VCPU_FINALIZE is currently a
      placeholder: no finalizable features exist yet, so ioctl is not
      required and will always yield EINVAL.  Subsequent patches will add
      the finalization logic to make use of this ioctl for SVE.
      
      No functional change for existing userspace.
      
      Signed-off-by: default avatarDave Martin <Dave.Martin@arm.com>
      Reviewed-by: default avatarJulien Thierry <julien.thierry@arm.com>
      Tested-by: default avatarzhang.lei <zhang.lei@jp.fujitsu.com>
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      7dd32a0d
    • Dave Martin's avatar
      KVM: arm/arm64: Add hook for arch-specific KVM initialisation · 0f062bfe
      Dave Martin authored
      
      This patch adds a kvm_arm_init_arch_resources() hook to perform
      subarch-specific initialisation when starting up KVM.
      
      This will be used in a subsequent patch for global SVE-related
      setup on arm64.
      
      No functional change.
      
      Signed-off-by: default avatarDave Martin <Dave.Martin@arm.com>
      Reviewed-by: default avatarJulien Thierry <julien.thierry@arm.com>
      Tested-by: default avatarzhang.lei <zhang.lei@jp.fujitsu.com>
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      0f062bfe
  18. Mar 28, 2019
  19. Mar 20, 2019
    • YueHaibing's avatar
      KVM: arm/arm64: vgic-its: Make attribute accessors static · d9ea27a3
      YueHaibing authored
      
      Fix sparse warnings:
      
      arch/arm64/kvm/../../../virt/kvm/arm/vgic/vgic-its.c:1732:5: warning:
       symbol 'vgic_its_has_attr_regs' was not declared. Should it be static?
      arch/arm64/kvm/../../../virt/kvm/arm/vgic/vgic-its.c:1753:5: warning:
       symbol 'vgic_its_attr_regs_access' was not declared. Should it be static?
      
      Signed-off-by: default avatarYueHaibing <yuehaibing@huawei.com>
      [maz: fixed subject]
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      d9ea27a3
    • Suzuki K Poulose's avatar
      KVM: arm/arm64: Fix handling of stage2 huge mappings · 3c3736cd
      Suzuki K Poulose authored
      
      We rely on the mmu_notifier call backs to handle the split/merge
      of huge pages and thus we are guaranteed that, while creating a
      block mapping, either the entire block is unmapped at stage2 or it
      is missing permission.
      
      However, we miss a case where the block mapping is split for dirty
      logging case and then could later be made block mapping, if we cancel the
      dirty logging. This not only creates inconsistent TLB entries for
      the pages in the the block, but also leakes the table pages for
      PMD level.
      
      Handle this corner case for the huge mappings at stage2 by
      unmapping the non-huge mapping for the block. This could potentially
      release the upper level table. So we need to restart the table walk
      once we unmap the range.
      
      Fixes : ad361f09 ("KVM: ARM: Support hugetlbfs backed huge pages")
      Reported-by: default avatarZheng Xiang <zhengxiang9@huawei.com>
      Cc: Zheng Xiang <zhengxiang9@huawei.com>
      Cc: Zenghui Yu <yuzenghui@huawei.com>
      Cc: Christoffer Dall <christoffer.dall@arm.com>
      Signed-off-by: default avatarSuzuki K Poulose <suzuki.poulose@arm.com>
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      3c3736cd
Loading