Skip to content
Snippets Groups Projects
  1. Jul 19, 2019
    • Yang Shi's avatar
      mm: thp: fix false negative of shmem vma's THP eligibility · c0630669
      Yang Shi authored
      Commit 7635d9cb ("mm, thp, proc: report THP eligibility for each
      vma") introduced THPeligible bit for processes' smaps.  But, when
      checking the eligibility for shmem vma, __transparent_hugepage_enabled()
      is called to override the result from shmem_huge_enabled().  It may
      result in the anonymous vma's THP flag override shmem's.  For example,
      running a simple test which create THP for shmem, but with anonymous THP
      disabled, when reading the process's smaps, it may show:
      
        7fc92ec00000-7fc92f000000 rw-s 00000000 00:14 27764 /dev/shm/test
        Size:               4096 kB
        ...
        [snip]
        ...
        ShmemPmdMapped:     4096 kB
        ...
        [snip]
        ...
        THPeligible:    0
      
      And, /proc/meminfo does show THP allocated and PMD mapped too:
      
        ShmemHugePages:     4096 kB
        ShmemPmdMapped:     4096 kB
      
      This doesn't make too much sense.  The shmem objects should be treated
      separately from anonymous THP.  Calling shmem_huge_enabled() with
      checking MMF_DISABLE_THP sounds good enough.  And, we could skip stack
      and dax vma check since we already checked if the vma is shmem already.
      
      Also check if vma is suitable for THP by calling
      transhuge_vma_suitable().
      
      And minor fix to smaps output format and documentation.
      
      Link: http://lkml.kernel.org/r/1560401041-32207-3-git-send-email-yang.shi@linux.alibaba.com
      
      
      Fixes: 7635d9cb ("mm, thp, proc: report THP eligibility for each vma")
      Signed-off-by: default avatarYang Shi <yang.shi@linux.alibaba.com>
      Acked-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c0630669
    • Yang Shi's avatar
      mm: thp: make transhuge_vma_suitable available for anonymous THP · 43675e6f
      Yang Shi authored
      transhuge_vma_suitable() was only available for shmem THP, but anonymous
      THP has the same check except pgoff check.  And, it will be used for THP
      eligible check in the later patch, so make it available for all kind of
      THPs.  This also helps reduce code duplication slightly.
      
      Since anonymous THP doesn't have to check pgoff, so make pgoff check
      shmem vma only.
      
      And regroup some functions in include/linux/mm.h to solve compile issue
      since transhuge_vma_suitable() needs call vma_is_anonymous() which was
      defined after huge_mm.h is included.
      
      [akpm@linux-foundation.org: fix typo]
      [yang.shi@linux.alibaba.com: v4]
        Link: http://lkml.kernel.org/r/1563400758-124759-2-git-send-email-yang.shi@linux.alibaba.com
      Link: http://lkml.kernel.org/r/1560401041-32207-2-git-send-email-yang.shi@linux.alibaba.com
      
      
      Signed-off-by: default avatarYang Shi <yang.shi@linux.alibaba.com>
      Acked-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      43675e6f
    • Wei Yang's avatar
      mm/sparse.c: set section nid for hot-add memory · 26f26bed
      Wei Yang authored
      In case of NODE_NOT_IN_PAGE_FLAGS is set, we store section's node id in
      section_to_node_table[].  While for hot-add memory, this is missed.
      Without this information, page_to_nid() may not give the right node id.
      
      BTW, current online_pages works because it leverages nid in
      memory_block.  But the granularity of node id should be mem_section
      wide.
      
      Link: http://lkml.kernel.org/r/20190618005537.18878-1-richardw.yang@linux.intel.com
      
      
      Signed-off-by: default avatarWei Yang <richardw.yang@linux.intel.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarAnshuman Khandual <anshuman.khandual@arm.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      26f26bed
    • David Hildenbrand's avatar
      mm/memory_hotplug: remove "zone" parameter from sparse_remove_one_section · b9bf8d34
      David Hildenbrand authored
      The parameter is unused, so let's drop it.  Memory removal paths should
      never care about zones.  This is the job of memory offlining and will
      require more refactorings.
      
      Link: http://lkml.kernel.org/r/20190527111152.16324-12-david@redhat.com
      
      
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarDan Williams <dan.j.williams@intel.com>
      Reviewed-by: default avatarWei Yang <richardw.yang@linux.intel.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: Andrew Banman <andrew.banman@hpe.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Arun KS <arunks@codeaurora.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chintan Pandya <cpandya@codeaurora.org>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Jun Yao <yaojun8558363@gmail.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Mathieu Malaterre <malat@debian.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qian Cai <cai@lca.pw>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Rob Herring <robh@kernel.org>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Yu Zhao <yuzhao@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b9bf8d34
    • David Hildenbrand's avatar
      mm/memory_hotplug: remove memory block devices before arch_remove_memory() · 4c4b7f9b
      David Hildenbrand authored
      Let's factor out removing of memory block devices, which is only
      necessary for memory added via add_memory() and friends that created
      memory block devices.  Remove the devices before calling
      arch_remove_memory().
      
      This finishes factoring out memory block device handling from
      arch_add_memory() and arch_remove_memory().
      
      Link: http://lkml.kernel.org/r/20190527111152.16324-10-david@redhat.com
      
      
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarDan Williams <dan.j.williams@intel.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
      Cc: Andrew Banman <andrew.banman@hpe.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Arun KS <arunks@codeaurora.org>
      Cc: Mathieu Malaterre <malat@debian.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chintan Pandya <cpandya@codeaurora.org>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Jun Yao <yaojun8558363@gmail.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Oscar Salvador <osalvador@suse.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qian Cai <cai@lca.pw>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Rob Herring <robh@kernel.org>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Yu Zhao <yuzhao@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4c4b7f9b
    • David Hildenbrand's avatar
      mm/memory_hotplug: drop MHP_MEMBLOCK_API · 05f800a0
      David Hildenbrand authored
      No longer needed, the callers of arch_add_memory() can handle this
      manually.
      
      Link: http://lkml.kernel.org/r/20190527111152.16324-9-david@redhat.com
      
      
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarWei Yang <richardw.yang@linux.intel.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Oscar Salvador <osalvador@suse.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Qian Cai <cai@lca.pw>
      Cc: Arun KS <arunks@codeaurora.org>
      Cc: Mathieu Malaterre <malat@debian.org>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: Andrew Banman <andrew.banman@hpe.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chintan Pandya <cpandya@codeaurora.org>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Jun Yao <yaojun8558363@gmail.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Rob Herring <robh@kernel.org>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Yu Zhao <yuzhao@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      05f800a0
    • David Hildenbrand's avatar
      mm/memory_hotplug: create memory block devices after arch_add_memory() · db051a0d
      David Hildenbrand authored
      Only memory to be added to the buddy and to be onlined/offlined by user
      space using /sys/devices/system/memory/...  needs (and should have!)
      memory block devices.
      
      Factor out creation of memory block devices.  Create all devices after
      arch_add_memory() succeeded.  We can later drop the want_memblock
      parameter, because it is now effectively stale.
      
      Only after memory block devices have been added, memory can be onlined
      by user space.  This implies, that memory is not visible to user space
      at all before arch_add_memory() succeeded.
      
      While at it
       - use WARN_ON_ONCE instead of BUG_ON in moved unregister_memory()
       - introduce find_memory_block_by_id() to search via block id
       - Use find_memory_block_by_id() in init_memory_block() to catch
         duplicates
      
      Link: http://lkml.kernel.org/r/20190527111152.16324-8-david@redhat.com
      
      
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarPavel Tatashin <pasha.tatashin@soleen.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Andrew Banman <andrew.banman@hpe.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Qian Cai <cai@lca.pw>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Arun KS <arunks@codeaurora.org>
      Cc: Mathieu Malaterre <malat@debian.org>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chintan Pandya <cpandya@codeaurora.org>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Jun Yao <yaojun8558363@gmail.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Oscar Salvador <osalvador@suse.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Rob Herring <robh@kernel.org>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Yu Zhao <yuzhao@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      db051a0d
    • David Hildenbrand's avatar
      mm/memory_hotplug: allow arch_remove_memory() without CONFIG_MEMORY_HOTREMOVE · 80ec922d
      David Hildenbrand authored
      We want to improve error handling while adding memory by allowing to use
      arch_remove_memory() and __remove_pages() even if
      CONFIG_MEMORY_HOTREMOVE is not set to e.g., implement something like:
      
      	arch_add_memory()
      	rc = do_something();
      	if (rc) {
      		arch_remove_memory();
      	}
      
      We won't get rid of CONFIG_MEMORY_HOTREMOVE for now, as it will require
      quite some dependencies for memory offlining.
      
      Link: http://lkml.kernel.org/r/20190527111152.16324-7-david@redhat.com
      
      
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarPavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Oscar Salvador <osalvador@suse.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Rob Herring <robh@kernel.org>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
      Cc: Andrew Banman <andrew.banman@hpe.com>
      Cc: Arun KS <arunks@codeaurora.org>
      Cc: Qian Cai <cai@lca.pw>
      Cc: Mathieu Malaterre <malat@debian.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chintan Pandya <cpandya@codeaurora.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Jun Yao <yaojun8558363@gmail.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yu Zhao <yuzhao@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      80ec922d
    • David Hildenbrand's avatar
      mm/memory_hotplug: simplify and fix check_hotplug_memory_range() · cec3ebd0
      David Hildenbrand authored
      Patch series "mm/memory_hotplug: Factor out memory block devicehandling", v3.
      
      We only want memory block devices for memory to be onlined/offlined
      (add/remove from the buddy).  This is required so user space can
      online/offline memory and kdump gets notified about newly onlined
      memory.
      
      Let's factor out creation/removal of memory block devices.  This helps
      to further cleanup arch_add_memory/arch_remove_memory() and to make
      implementation of new features easier - especially sub-section memory
      hot add from Dan.
      
      Anshuman Khandual is currently working on arch_remove_memory().  I added
      a temporary solution via "arm64/mm: Add temporary arch_remove_memory()
      implementation", that is sufficient as a firsts tep in the context of
      this series.  (we don't cleanup page tables in case anything goes wrong
      already)
      
      Did a quick sanity test with DIMM plug/unplug, making sure all devices
      and sysfs links properly get added/removed.  Compile tested on s390x and
      x86-64.
      
      This patch (of 11):
      
      By converting start and size to page granularity, we actually ignore
      unaligned parts within a page instead of properly bailing out with an
      error.
      
      Link: http://lkml.kernel.org/r/20190527111152.16324-2-david@redhat.com
      
      
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarDan Williams <dan.j.williams@intel.com>
      Reviewed-by: default avatarWei Yang <richardw.yang@linux.intel.com>
      Reviewed-by: default avatarPavel Tatashin <pasha.tatashin@soleen.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Qian Cai <cai@lca.pw>
      Cc: Arun KS <arunks@codeaurora.org>
      Cc: Mathieu Malaterre <malat@debian.org>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: Andrew Banman <andrew.banman@hpe.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chintan Pandya <cpandya@codeaurora.org>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Jun Yao <yaojun8558363@gmail.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Rob Herring <robh@kernel.org>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Yu Zhao <yuzhao@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cec3ebd0
  2. Jul 17, 2019
  3. Jul 15, 2019
  4. Jul 12, 2019
    • Tetsuo Handa's avatar
      mm/oom_kill.c: remove redundant OOM score normalization in select_bad_process() · 2c207985
      Tetsuo Handa authored
      Since commit bbbe4802 ("mm, oom: remove 'prefer children over
      parent' heuristic") removed the
      
        "%s: Kill process %d (%s) score %u or sacrifice child\n"
      
      line, oc->chosen_points is no longer used after select_bad_process().
      
      Link: http://lkml.kernel.org/r/1560853435-15575-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp
      
      
      Signed-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2c207985
    • Shakeel Butt's avatar
      oom: decouple mems_allowed from oom_unkillable_task · ac311a14
      Shakeel Butt authored
      Commit ef08e3b4 ("[PATCH] cpusets: confine oom_killer to
      mem_exclusive cpuset") introduces a heuristic where a potential
      oom-killer victim is skipped if the intersection of the potential victim
      and the current (the process triggered the oom) is empty based on the
      reason that killing such victim most probably will not help the current
      allocating process.
      
      However the commit 7887a3da ("[PATCH] oom: cpuset hint") changed the
      heuristic to just decrease the oom_badness scores of such potential
      victim based on the reason that the cpuset of such processes might have
      changed and previously they may have allocated memory on mems where the
      current allocating process can allocate from.
      
      Unintentionally 7887a3da ("[PATCH] oom: cpuset hint") introduced a
      side effect as the oom_badness is also exposed to the user space through
      /proc/[pid]/oom_score, so, readers with different cpusets can read
      different oom_score of the same process.
      
      Later, commit 6cf86ac6 ("oom: filter tasks not sharing the same
      cpuset") fixed the side effect introduced by 7887a3da by moving the
      cpuset intersection back to only oom-killer context and out of
      oom_badness.  However the combination of ab290adb ("oom: make
      oom_unkillable_task() helper function") and 26ebc984 ("oom:
      /proc/<pid>/oom_score treat kernel thread honestly") unintentionally
      brought back the cpuset intersection check into the oom_badness
      calculation function.
      
      Other than doing cpuset/mempolicy intersection from oom_badness, the memcg
      oom context is also doing cpuset/mempolicy intersection which is quite
      wrong and is caught by syzcaller with the following report:
      
      kasan: CONFIG_KASAN_INLINE enabled
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] PREEMPT SMP KASAN
      CPU: 0 PID: 28426 Comm: syz-executor.5 Not tainted 5.2.0-rc3-next-20190607
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
      Google 01/01/2011
      RIP: 0010:__read_once_size include/linux/compiler.h:194 [inline]
      RIP: 0010:has_intersects_mems_allowed mm/oom_kill.c:84 [inline]
      RIP: 0010:oom_unkillable_task mm/oom_kill.c:168 [inline]
      RIP: 0010:oom_unkillable_task+0x180/0x400 mm/oom_kill.c:155
      Code: c1 ea 03 80 3c 02 00 0f 85 80 02 00 00 4c 8b a3 10 07 00 00 48 b8 00
      00 00 00 00 fc ff df 4d 8d 74 24 10 4c 89 f2 48 c1 ea 03 <80> 3c 02 00 0f
      85 67 02 00 00 49 8b 44 24 10 4c 8d a0 68 fa ff ff
      RSP: 0018:ffff888000127490 EFLAGS: 00010a03
      RAX: dffffc0000000000 RBX: ffff8880a4cd5438 RCX: ffffffff818dae9c
      RDX: 100000000c3cc602 RSI: ffffffff818dac8d RDI: 0000000000000001
      RBP: ffff8880001274d0 R08: ffff888000086180 R09: ffffed1015d26be0
      R10: ffffed1015d26bdf R11: ffff8880ae935efb R12: 8000000061e63007
      R13: 0000000000000000 R14: 8000000061e63017 R15: 1ffff11000024ea6
      FS:  00005555561f5940(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000607304 CR3: 000000009237e000 CR4: 00000000001426f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
      Call Trace:
        oom_evaluate_task+0x49/0x520 mm/oom_kill.c:321
        mem_cgroup_scan_tasks+0xcc/0x180 mm/memcontrol.c:1169
        select_bad_process mm/oom_kill.c:374 [inline]
        out_of_memory mm/oom_kill.c:1088 [inline]
        out_of_memory+0x6b2/0x1280 mm/oom_kill.c:1035
        mem_cgroup_out_of_memory+0x1ca/0x230 mm/memcontrol.c:1573
        mem_cgroup_oom mm/memcontrol.c:1905 [inline]
        try_charge+0xfbe/0x1480 mm/memcontrol.c:2468
        mem_cgroup_try_charge+0x24d/0x5e0 mm/memcontrol.c:6073
        mem_cgroup_try_charge_delay+0x1f/0xa0 mm/memcontrol.c:6088
        do_huge_pmd_wp_page_fallback+0x24f/0x1680 mm/huge_memory.c:1201
        do_huge_pmd_wp_page+0x7fc/0x2160 mm/huge_memory.c:1359
        wp_huge_pmd mm/memory.c:3793 [inline]
        __handle_mm_fault+0x164c/0x3eb0 mm/memory.c:4006
        handle_mm_fault+0x3b7/0xa90 mm/memory.c:4053
        do_user_addr_fault arch/x86/mm/fault.c:1455 [inline]
        __do_page_fault+0x5ef/0xda0 arch/x86/mm/fault.c:1521
        do_page_fault+0x71/0x57d arch/x86/mm/fault.c:1552
        page_fault+0x1e/0x30 arch/x86/entry/entry_64.S:1156
      RIP: 0033:0x400590
      Code: 06 e9 49 01 00 00 48 8b 44 24 10 48 0b 44 24 28 75 1f 48 8b 14 24 48
      8b 7c 24 20 be 04 00 00 00 e8 f5 56 00 00 48 8b 74 24 08 <89> 06 e9 1e 01
      00 00 48 8b 44 24 08 48 8b 14 24 be 04 00 00 00 8b
      RSP: 002b:00007fff7bc49780 EFLAGS: 00010206
      RAX: 0000000000000001 RBX: 0000000000760000 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: 000000002000cffc RDI: 0000000000000001
      RBP: fffffffffffffffe R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000075 R11: 0000000000000246 R12: 0000000000760008
      R13: 00000000004c55f2 R14: 0000000000000000 R15: 00007fff7bc499b0
      Modules linked in:
      ---[ end trace a65689219582ffff ]---
      RIP: 0010:__read_once_size include/linux/compiler.h:194 [inline]
      RIP: 0010:has_intersects_mems_allowed mm/oom_kill.c:84 [inline]
      RIP: 0010:oom_unkillable_task mm/oom_kill.c:168 [inline]
      RIP: 0010:oom_unkillable_task+0x180/0x400 mm/oom_kill.c:155
      Code: c1 ea 03 80 3c 02 00 0f 85 80 02 00 00 4c 8b a3 10 07 00 00 48 b8 00
      00 00 00 00 fc ff df 4d 8d 74 24 10 4c 89 f2 48 c1 ea 03 <80> 3c 02 00 0f
      85 67 02 00 00 49 8b 44 24 10 4c 8d a0 68 fa ff ff
      RSP: 0018:ffff888000127490 EFLAGS: 00010a03
      RAX: dffffc0000000000 RBX: ffff8880a4cd5438 RCX: ffffffff818dae9c
      RDX: 100000000c3cc602 RSI: ffffffff818dac8d RDI: 0000000000000001
      RBP: ffff8880001274d0 R08: ffff888000086180 R09: ffffed1015d26be0
      R10: ffffed1015d26bdf R11: ffff8880ae935efb R12: 8000000061e63007
      R13: 0000000000000000 R14: 8000000061e63017 R15: 1ffff11000024ea6
      FS:  00005555561f5940(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000001b2f823000 CR3: 000000009237e000 CR4: 00000000001426f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
      
      The fix is to decouple the cpuset/mempolicy intersection check from
      oom_unkillable_task() and make sure cpuset/mempolicy intersection check is
      only done in the global oom context.
      
      [shakeelb@google.com: change function name and update comment]
        Link: http://lkml.kernel.org/r/20190628152421.198994-3-shakeelb@google.com
      Link: http://lkml.kernel.org/r/20190624212631.87212-3-shakeelb@google.com
      
      
      Signed-off-by: default avatarShakeel Butt <shakeelb@google.com>
      Reported-by: default avatar <syzbot+d0fc9d3c166bc5e4a94b@syzkaller.appspotmail.com>
      Acked-by: default avatarRoman Gushchin <guro@fb.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Paul Jackson <pj@sgi.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ac311a14
    • Shakeel Butt's avatar
      mm, oom: remove redundant task_in_mem_cgroup() check · 6ba749ee
      Shakeel Butt authored
      oom_unkillable_task() can be called from three different contexts i.e.
      global OOM, memcg OOM and oom_score procfs interface.  At the moment
      oom_unkillable_task() does a task_in_mem_cgroup() check on the given
      process.  Since there is no reason to perform task_in_mem_cgroup()
      check for global OOM and oom_score procfs interface, those contexts
      provide NULL memcg and skips the task_in_mem_cgroup() check.  However
      for memcg OOM context, the oom_unkillable_task() is always called from
      mem_cgroup_scan_tasks() and thus task_in_mem_cgroup() check becomes
      redundant and effectively dead code.  So, just remove the
      task_in_mem_cgroup() check altogether.
      
      Link: http://lkml.kernel.org/r/20190624212631.87212-2-shakeelb@google.com
      
      
      Signed-off-by: default avatarShakeel Butt <shakeelb@google.com>
      Signed-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Acked-by: default avatarRoman Gushchin <guro@fb.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Paul Jackson <pj@sgi.com>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6ba749ee
    • Shakeel Butt's avatar
      mm, oom: refactor dump_tasks for memcg OOMs · 5eee7e1c
      Shakeel Butt authored
      dump_tasks() traverses all the existing processes even for the memcg OOM
      context which is not only unnecessary but also wasteful.  This imposes a
      long RCU critical section even from a contained context which can be quite
      disruptive.
      
      Change dump_tasks() to be aligned with select_bad_process and use
      mem_cgroup_scan_tasks to selectively traverse only processes of the target
      memcg hierarchy during memcg OOM.
      
      Link: http://lkml.kernel.org/r/20190617231207.160865-1-shakeelb@google.com
      
      
      Signed-off-by: default avatarShakeel Butt <shakeelb@google.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarRoman Gushchin <guro@fb.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Paul Jackson <pj@sgi.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5eee7e1c
    • Tetsuo Handa's avatar
      mm: memcontrol: use CSS_TASK_ITER_PROCS at mem_cgroup_scan_tasks() · f168a9a5
      Tetsuo Handa authored
      Since commit c03cd773 ("cgroup: Include dying leaders with live
      threads in PROCS iterations") corrected how CSS_TASK_ITER_PROCS works,
      mem_cgroup_scan_tasks() can use CSS_TASK_ITER_PROCS in order to check
      only one thread from each thread group.
      
      [penguin-kernel@I-love.SAKURA.ne.jp: remove thread group leader check in oom_evaluate_task()]
        Link: http://lkml.kernel.org/r/1560853257-14934-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp
      Link: http://lkml.kernel.org/r/c763afc8-f0ae-756a-56a7-395f625b95fc@i-love.sakura.ne.jp
      
      
      Signed-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Reviewed-by: default avatarShakeel Butt <shakeelb@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f168a9a5
    • Jane Chu's avatar
      mm/memory-failure.c: clarify error message · 135e5351
      Jane Chu authored
      Some user who install SIGBUS handler that does longjmp out therefore
      keeping the process alive is confused by the error message
      
        "[188988.765862] Memory failure: 0x1840200: Killing cellsrv:33395 due to hardware memory corruption"
      
      Slightly modify the error message to improve clarity.
      
      Link: http://lkml.kernel.org/r/1558403523-22079-1-git-send-email-jane.chu@oracle.com
      
      
      Signed-off-by: default avatarJane Chu <jane.chu@oracle.com>
      Acked-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Acked-by: default avatarPankaj Gupta <pagupta@redhat.com>
      Reviewed-by: default avatarAnshuman Khandual <anshuman.khandual@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      135e5351
    • Roman Gushchin's avatar
      mm: vmalloc: show number of vmalloc pages in /proc/meminfo · 97105f0a
      Roman Gushchin authored
      Vmalloc() is getting more and more used these days (kernel stacks, bpf and
      percpu allocator are new top users), and the total % of memory consumed by
      vmalloc() can be pretty significant and changes dynamically.
      
      /proc/meminfo is the best place to display this information: its top goal
      is to show top consumers of the memory.
      
      Since the VmallocUsed field in /proc/meminfo is not in use for quite a
      long time (it has been defined to 0 by a5ad88ce ("mm: get rid of
      'vmalloc_info' from /proc/meminfo")), let's reuse it for showing the
      actual physical memory consumption of vmalloc().
      
      Link: http://lkml.kernel.org/r/20190417194002.12369-3-guro@fb.com
      
      
      Signed-off-by: default avatarRoman Gushchin <guro@fb.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      97105f0a
    • Konstantin Khlebnikov's avatar
      mm: use down_read_killable for locking mmap_sem in access_remote_vm · 1e426fe2
      Konstantin Khlebnikov authored
      This function is used by ptrace and proc files like /proc/pid/cmdline and
      /proc/pid/environ.
      
      Access_remote_vm never returns error codes, all errors are ignored and
      only size of successfully read data is returned.  So, if current task was
      killed we'll simply return 0 (bytes read).
      
      Mmap_sem could be locked for a long time or forever if something goes
      wrong.  Using a killable lock permits cleanup of stuck tasks and
      simplifies investigation.
      
      Link: http://lkml.kernel.org/r/156007494202.3335.16782303099589302087.stgit@buzz
      
      
      Signed-off-by: default avatarKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Reviewed-by: default avatarMichal Koutný <mkoutny@suse.com>
      Acked-by: default avatarOleg Nesterov <oleg@redhat.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Cyrill Gorcunov <gorcunov@gmail.com>
      Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Roman Gushchin <guro@fb.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1e426fe2
    • Yang Shi's avatar
      mm: vmscan: correct some vmscan counters for THP swapout · 98879b3b
      Yang Shi authored
      Commit bd4c82c2 ("mm, THP, swap: delay splitting THP after swapped
      out"), THP can be swapped out in a whole.  But, nr_reclaimed and some
      other vm counters still get inc'ed by one even though a whole THP (512
      pages) gets swapped out.
      
      This doesn't make too much sense to memory reclaim.
      
      For example, direct reclaim may just need reclaim SWAP_CLUSTER_MAX
      pages, reclaiming one THP could fulfill it.  But, if nr_reclaimed is not
      increased correctly, direct reclaim may just waste time to reclaim more
      pages, SWAP_CLUSTER_MAX * 512 pages in worst case.
      
      And, it may cause pgsteal_{kswapd|direct} is greater than
      pgscan_{kswapd|direct}, like the below:
      
      pgsteal_kswapd 122933
      pgsteal_direct 26600225
      pgscan_kswapd 174153
      pgscan_direct 14678312
      
      nr_reclaimed and nr_scanned must be fixed in parallel otherwise it would
      break some page reclaim logic, e.g.
      
      vmpressure: this looks at the scanned/reclaimed ratio so it won't change
      semantics as long as scanned & reclaimed are fixed in parallel.
      
      compaction/reclaim: compaction wants a certain number of physical pages
      freed up before going back to compacting.
      
      kswapd priority raising: kswapd raises priority if we scan fewer pages
      than the reclaim target (which itself is obviously expressed in order-0
      pages).  As a result, kswapd can falsely raise its aggressiveness even
      when it's making great progress.
      
      Other than nr_scanned and nr_reclaimed, some other counters, e.g.
      pgactivate, nr_skipped, nr_ref_keep and nr_unmap_fail need to be fixed too
      since they are user visible via cgroup, /proc/vmstat or trace points,
      otherwise they would be underreported.
      
      When isolating pages from LRUs, nr_taken has been accounted in base page,
      but nr_scanned and nr_skipped are still accounted in THP.  It doesn't make
      too much sense too since this may cause trace point underreport the
      numbers as well.
      
      So accounting those counters in base page instead of accounting THP as one
      page.
      
      nr_dirty, nr_unqueued_dirty, nr_congested and nr_writeback are used by
      file cache, so they are not impacted by THP swap.
      
      This change may result in lower steal/scan ratio in some cases since THP
      may get split during page reclaim, then a part of tail pages get reclaimed
      instead of the whole 512 pages, but nr_scanned is accounted by 512,
      particularly for direct reclaim.  But, this should be not a significant
      issue.
      
      Link: http://lkml.kernel.org/r/1559025859-72759-2-git-send-email-yang.shi@linux.alibaba.com
      
      
      Signed-off-by: default avatarYang Shi <yang.shi@linux.alibaba.com>
      Reviewed-by: default avatar"Huang, Ying" <ying.huang@intel.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Hillf Danton <hdanton@sina.com>
      Cc: Josef Bacik <josef@toxicpanda.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      98879b3b
    • Yang Shi's avatar
      mm: vmscan: remove double slab pressure by inc'ing sc->nr_scanned · af5d4403
      Yang Shi authored
      Commit 9092c71b ("mm: use sc->priority for slab shrink targets") has
      broken up the relationship between sc->nr_scanned and slab pressure.
      The sc->nr_scanned can't double slab pressure anymore.  So, it sounds no
      sense to still keep sc->nr_scanned inc'ed.  Actually, it would prevent
      from adding pressure on slab shrink since excessive sc->nr_scanned would
      prevent from scan->priority raise.
      
      The bonnie test doesn't show this would change the behavior of slab
      shrinkers.
      
      				w/		w/o
      			  /sec    %CP      /sec      %CP
      Sequential delete: 	3960.6    94.6    3997.6     96.2
      Random delete: 		2518      63.8    2561.6     64.6
      
      The slight increase of "/sec" without the patch would be caused by the
      slight increase of CPU usage.
      
      Link: http://lkml.kernel.org/r/1559025859-72759-1-git-send-email-yang.shi@linux.alibaba.com
      
      
      Signed-off-by: default avatarYang Shi <yang.shi@linux.alibaba.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Josef Bacik <josef@toxicpanda.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Hillf Danton <hdanton@sina.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      af5d4403
Loading