• Laurent Dufour's avatar
    mm/slub: fix panic in slab_alloc_node() · 22e4663e
    Laurent Dufour authored
    While doing memory hot-unplug operation on a PowerPC VM running 1024 CPUs
    with 11TB of ram, I hit the following panic:
    
        BUG: Kernel NULL pointer dereference on read at 0x00000007
        Faulting instruction address: 0xc000000000456048
        Oops: Kernel access of bad area, sig: 11 [#2]
        LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS= 2048 NUMA pSeries
        Modules linked in: rpadlpar_io rpaphp
        CPU: 160 PID: 1 Comm: systemd Tainted: G      D           5.9.0 #1
        NIP:  c000000000456048 LR: c000000000455fd4 CTR: c00000000047b350
        REGS: c00006028d1b77a0 TRAP: 0300   Tainted: G      D            (5.9.0)
        MSR:  8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 24004228  XER: 00000000
        CFAR: c00000000000f1b0 DAR: 0000000000000007 DSISR: 40000000 IRQMASK: 0
        GPR00: c000000000455fd4 c00006028d1b7a30 c000000001bec800 0000000000000000
        GPR04: 0000000000000dc0 0000000000000000 00000000000374ef c00007c53df99320
        GPR08: 000007c53c980000 0000000000000000 000007c53c980000 0000000000000000
        GPR12: 0000000000004400 c00000001e8e4400 0000000000000000 0000000000000f6a
        GPR16: 0000000000000000 c000000001c25930 c000000001d62528 00000000000000c1
        GPR20: c000000001d62538 c00006be469e9000 0000000fffffffe0 c0000000003c0ff8
        GPR24: 0000000000000018 0000000000000000 0000000000000dc0 0000000000000000
        GPR28: c00007c513755700 c000000001c236a4 c00007bc4001f800 0000000000000001
        NIP [c000000000456048] __kmalloc_node+0x108/0x790
        LR [c000000000455fd4] __kmalloc_node+0x94/0x790
        Call Trace:
          kvmalloc_node+0x58/0x110
          mem_cgroup_css_online+0x10c/0x270
          online_css+0x48/0xd0
          cgroup_apply_control_enable+0x2c4/0x470
          cgroup_mkdir+0x408/0x5f0
          kernfs_iop_mkdir+0x90/0x100
          vfs_mkdir+0x138/0x250
          do_mkdirat+0x154/0x1c0
          system_call_exception+0xf8/0x200
          system_call_common+0xf0/0x27c
        Instruction dump:
        e93e0000 e90d0030 39290008 7cc9402a e94d0030 e93e0000 7ce95214 7f89502a
        2fbc0000 419e0018 41920230 e9270010 <89290007> 7f994800 419e0220 7ee6bb78
    
    This pointing to the following code:
    
        mm/slub.c:2851
                if (unlikely(!object || !node_match(page, node))) {
        c000000000456038:       00 00 bc 2f     cmpdi   cr7,r28,0
        c00000000045603c:       18 00 9e 41     beq     cr7,c000000000456054 <__kmalloc_node+0x114>
        node_match():
        mm/slub.c:2491
                if (node != NUMA_NO_NODE && page_to_nid(page) != node)
        c000000000456040:       30 02 92 41     beq     cr4,c000000000456270 <__kmalloc_node+0x330>
        page_to_nid():
        include/linux/mm.h:1294
        c000000000456044:       10 00 27 e9     ld      r9,16(r7)
        c000000000456048:       07 00 29 89     lbz     r9,7(r9)	<<<< r9 = NULL
        node_match():
        mm/slub.c:2491
        c00000000045604c:       00 48 99 7f     cmpw    cr7,r25,r9
        c000000000456050:       20 02 9e 41     beq     cr7,c000000000456270 <__kmalloc_node+0x330>
    
    The panic occurred in slab_alloc_node() when checking for the page's node:
    
    	object = c->freelist;
    	page = c->page;
    	if (unlikely(!object || !node_match(page, node))) {
    		object = __slab_alloc(s, gfpflags, node, addr, c);
    		stat(s, ALLOC_SLOWPATH);
    
    The issue is that object is not NULL while page is NULL which is odd but
    may happen if the cache flush happened after loading object but before
    loading page.  Thus checking for the page pointer is required too.
    
    The cache flush is done through an inter processor interrupt when a
    piece of memory is off-lined.  That interrupt is triggered when a memory
    hot-unplug operation is initiated and offline_pages() is calling the
    slub's MEM_GOING_OFFLINE callback slab_mem_going_offline_callback()
    which is calling flush_cpu_slab().  If that interrupt is caught between
    the reading of c->freelist and the reading of c->page, this could lead
    to such a situation.  That situation is expected and the later call to
    this_cpu_cmpxchg_double() will detect the change to c->freelist and redo
    the whole operation.
    
    In commit 6159d0f5 ("mm/slub.c: page is always non-NULL in
    node_match()") check on the page pointer has been removed assuming that
    page is always valid when it is called.  It happens that this is not
    true in that particular case, so check for page before calling
    node_match() here.
    
    Fixes: 6159d0f5
    
     ("mm/slub.c: page is always non-NULL in node_match()")
    Signed-off-by: default avatarLaurent Dufour <ldufour@linux.ibm.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
    Acked-by: default avatarChristoph Lameter <cl@linux.com>
    Cc: Wei Yang <richard.weiyang@gmail.com>
    Cc: Pekka Enberg <penberg@kernel.org>
    Cc: David Rientjes <rientjes@google.com>
    Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
    Cc: Nathan Lynch <nathanl@linux.ibm.com>
    Cc: Scott Cheloha <cheloha@linux.ibm.com>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: <stable@vger.kernel.org>
    Link: https://lkml.kernel.org/r/20201027190406.33283-1-ldufour@linux.ibm.com
    
    
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    22e4663e