• Jann Horn's avatar
    binfmt_elf, binfmt_elf_fdpic: use a VMA list snapshot · a07279c9
    Jann Horn authored
    
    
    In both binfmt_elf and binfmt_elf_fdpic, use a new helper
    dump_vma_snapshot() to take a snapshot of the VMA list (including the gate
    VMA, if we have one) while protected by the mmap_lock, and then use that
    snapshot instead of walking the VMA list without locking.
    
    An alternative approach would be to keep the mmap_lock held across the
    entire core dumping operation; however, keeping the mmap_lock locked while
    we may be blocked for an unbounded amount of time (e.g.  because we're
    dumping to a FUSE filesystem or so) isn't really optimal; the mmap_lock
    blocks things like the ->release handler of userfaultfd, and we don't
    really want critical system daemons to grind to a halt just because
    someone "gifted" them SCM_RIGHTS to an eternally-locked userfaultfd, or
    something like that.
    
    Since both the normal ELF code and the FDPIC ELF code need this
    functionality (and if any other binfmt wants to add coredump support in
    the future, they'd probably need it, too), implement this with a common
    helper in fs/coredump.c.
    
    A downside of this approach is that we now need a bigger amount of kernel
    memory per userspace VMA in the normal ELF case, and that we need O(n)
    kernel memory in the FDPIC ELF case at all; but 40 bytes per VMA shouldn't
    be terribly bad.
    
    There currently is a data race between stack expansion and anything that
    reads ->vm_start or ->vm_end under the mmap_lock held in read mode; to
    mitigate that for core dumping, take the mmap_lock in write mode when
    taking a snapshot of the VMA hierarchy.  (If we only took the mmap_lock in
    read mode, we could end up with a corrupted core dump if someone does
    get_user_pages_remote() concurrently.  Not really a major problem, but
    taking the mmap_lock either way works here, so we might as well avoid the
    issue.) (This doesn't do anything about the existing data races with stack
    expansion in other mm code.)
    
    Signed-off-by: default avatarJann Horn <jannh@google.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Acked-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    Cc: Christoph Hellwig <hch@lst.de>
    Cc: Alexander Viro <viro@zeniv.linux.org.uk>
    Cc: "Eric W . Biederman" <ebiederm@xmission.com>
    Cc: Oleg Nesterov <oleg@redhat.com>
    Cc: Hugh Dickins <hughd@google.com>
    Link: http://lkml.kernel.org/r/20200827114932.3572699-6-jannh@google.com
    
    
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    a07279c9