Skip to content
Snippets Groups Projects
  1. Apr 21, 2019
  2. Apr 19, 2019
  3. Apr 17, 2019
  4. Apr 12, 2019
  5. Apr 09, 2019
  6. Apr 07, 2019
  7. Apr 06, 2019
  8. Apr 05, 2019
  9. Apr 04, 2019
  10. Apr 03, 2019
  11. Mar 31, 2019
  12. Mar 29, 2019
  13. Mar 28, 2019
    • Paolo Bonzini's avatar
      Documentation: kvm: clarify KVM_SET_USER_MEMORY_REGION · e2788c4a
      Paolo Bonzini authored
      
      The documentation does not mention how to delete a slot, add the
      information.
      
      Reported-by: default avatarNathaniel McCallum <npmccallum@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      e2788c4a
    • Sean Christopherson's avatar
      KVM: doc: Document the life cycle of a VM and its resources · 919f6cd8
      Sean Christopherson authored
      The series to add memcg accounting to KVM allocations[1] states:
      
        There are many KVM kernel memory allocations which are tied to the
        life of the VM process and should be charged to the VM process's
        cgroup.
      
      While it is correct to account KVM kernel allocations to the cgroup of
      the process that created the VM, it's technically incorrect to state
      that the KVM kernel memory allocations are tied to the life of the VM
      process.  This is because the VM itself, i.e. struct kvm, is not tied to
      the life of the process which created it, rather it is tied to the life
      of its associated file descriptor.  In other words, kvm_destroy_vm() is
      not invoked until fput() decrements its associated file's refcount to
      zero.  A simple example is to fork() in Qemu and have the child sleep
      indefinitely; kvm_destroy_vm() isn't called until Qemu closes its file
      descriptor *and* the rogue child is killed.
      
      The allocations are guaranteed to be *accounted* to the process which
      created the VM, but only because KVM's per-{VM,vCPU} ioctls reject the
      ioctl() with -EIO if kvm->mm != current->mm.  I.e. the child can keep
      the VM "alive" but can't do anything useful with its reference.
      
      Note that because 'struct kvm' also holds a reference to the mm_struct
      of its owner, the above behavior also applies to userspace allocations.
      
      Given that mucking with a VM's file descriptor can lead to subtle and
      undesirable behavior, e.g. memcg charges persisting after a VM is shut
      down, explicitly document a VM's lifecycle and its impact on the VM's
      resources.
      
      Alternatively, KVM could aggressively free resources when the creating
      process exits, e.g. via mmu_notifier->release().  However, mmu_notifier
      isn't guaranteed to be available, and freeing resources when the creator
      exits is likely to be error prone and fragile as KVM would need to
      ensure that it only freed resources that are truly out of reach. In
      practice, the existing behavior shouldn't be problematic as a properly
      configured system will prevent a child process from being moved out of
      the appropriate cgroup hierarchy, i.e. prevent hiding the process from
      the OOM killer, and will prevent an unprivileged user from being able to
      to hold a reference to struct kvm via another method, e.g. debugfs.
      
      [1]https://patchwork.kernel.org/patch/10806707/
      
      
      
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      919f6cd8
    • Sean Christopherson's avatar
      KVM: Reject device ioctls from processes other than the VM's creator · ddba9180
      Sean Christopherson authored
      
      KVM's API requires thats ioctls must be issued from the same process
      that created the VM.  In other words, userspace can play games with a
      VM's file descriptors, e.g. fork(), SCM_RIGHTS, etc..., but only the
      creator can do anything useful.  Explicitly reject device ioctls that
      are issued by a process other than the VM's creator, and update KVM's
      API documentation to extend its requirements to device ioctls.
      
      Fixes: 852b6d57 ("kvm: add device control API")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      ddba9180
    • Sean Christopherson's avatar
      KVM: doc: Fix incorrect word ordering regarding supported use of APIs · 5e124900
      Sean Christopherson authored
      Per Paolo[1], instantiating multiple VMs in a single process is legal;
      but this conflicts with KVM's API documentation, which states:
      
        The only supported use is one virtual machine per process, and one
        vcpu per thread.
      
      However, an earlier section in the documentation states:
      
         Only run VM ioctls from the same process (address space) that was used
         to create the VM.
      
      and:
      
         Only run vcpu ioctls from the same thread that was used to create the
         vcpu.
      
      This suggests that the conflicting documentation is simply an incorrect
      ordering of of words, i.e. what's really meant is that a virtual machine
      can't be shared across multiple processes and a vCPU can't be shared
      across multiple threads.
      
      Tweak the blurb on issuing ioctls to use a more assertive tone, and
      rewrite the "supported use" sentence to reference said blurb instead of
      poorly restating it in different terms.
      
      Opportunistically add missing punctuation.
      
      [1] https://lkml.kernel.org/r/f23265d4-528e-3bd4-011f-4d7b8f3281db@redhat.com
      
      
      
      Fixes: 9c1b96e3 ("KVM: Document basic API")
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      [Improve notes on asynchronous ioctl]
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      5e124900
    • Sean Christopherson's avatar
      KVM: x86: fix handling of role.cr4_pae and rename it to 'gpte_size' · 47c42e6b
      Sean Christopherson authored
      
      The cr4_pae flag is a bit of a misnomer, its purpose is really to track
      whether the guest PTE that is being shadowed is a 4-byte entry or an
      8-byte entry.  Prior to supporting nested EPT, the size of the gpte was
      reflected purely by CR4.PAE.  KVM fudged things a bit for direct sptes,
      but it was mostly harmless since the size of the gpte never mattered.
      Now that a spte may be tracking an indirect EPT entry, relying on
      CR4.PAE is wrong and ill-named.
      
      For direct shadow pages, force the gpte_size to '1' as they are always
      8-byte entries; EPT entries can only be 8-bytes and KVM always uses
      8-byte entries for NPT and its identity map (when running with EPT but
      not unrestricted guest).
      
      Likewise, nested EPT entries are always 8-bytes.  Nested EPT presents a
      unique scenario as the size of the entries are not dictated by CR4.PAE,
      but neither is the shadow page a direct map.  To handle this scenario,
      set cr0_wp=1 and smap_andnot_wp=1, an otherwise impossible combination,
      to denote a nested EPT shadow page.  Use the information to avoid
      incorrectly zapping an unsync'd indirect page in __kvm_sync_page().
      
      Providing a consistent and accurate gpte_size fixes a bug reported by
      Vitaly where fast_cr3_switch() always fails when switching from L2 to
      L1 as kvm_mmu_get_page() would force role.cr4_pae=0 for direct pages,
      whereas kvm_calc_mmu_role_common() would set it according to CR4.PAE.
      
      Fixes: 7dcd5755 ("x86/kvm/mmu: check if tdp/shadow MMU reconfiguration is needed")
      Reported-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Tested-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      47c42e6b
    • David Howells's avatar
      vfs: Update mount API docs · 7d6ab823
      David Howells authored
      
      Update the mount API docs to reflect recent changes to the code.
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7d6ab823
  14. Mar 27, 2019
  15. Mar 26, 2019
Loading