• Dongli Zhang's avatar
    page_frag: Recover from memory pressure · d8c19014
    Dongli Zhang authored
    The ethernet driver may allocate skb (and skb->data) via napi_alloc_skb().
    This ends up to page_frag_alloc() to allocate skb->data from
    page_frag_cache->va.
    
    During the memory pressure, page_frag_cache->va may be allocated as
    pfmemalloc page. As a result, the skb->pfmemalloc is always true as
    skb->data is from page_frag_cache->va. The skb will be dropped if the
    sock (receiver) does not have SOCK_MEMALLOC. This is expected behaviour
    under memory pressure.
    
    However, once kernel is not under memory pressure any longer (suppose large
    amount of memory pages are just reclaimed), the page_frag_alloc() may still
    re-use the prior pfmemalloc page_frag_cache->va to allocate skb->data. As a
    result, the skb->pfmemalloc is always true unless page_frag_cache->va is
    re-allocated, even if the kernel is not under memory pressure any longer.
    
    Here is how kernel runs into issue.
    
    1. The kernel is under memory pressure and allocation of
    PAGE_FRAG_CACHE_MAX_ORDER in __page_frag_cache_refill() will fail. Instead,
    the pfmemalloc page is allocated for page_frag_cache->va.
    
    2: All skb->data from page_frag_cache->va (pfmemalloc) will have
    skb->pfmemalloc=true. The skb will always be dropped by sock without
    SOCK_MEMALLOC. This is an expected behaviour.
    
    3. Suppose a large amount of pages are reclaimed and kernel is not under
    memory pressure any longer. We expect skb->pfmemalloc drop will not happen.
    
    4. Unfortunately, page_frag_alloc() does not proactively re-allocate
    page_frag_alloc->va and will always re-use the prior pfmemalloc page. The
    skb->pfmemalloc is always true even kernel is not under memory pressure any
    longer.
    
    Fix this by freeing and re-allocating the page instead of recycling it.
    
    References: https://lore.kernel.org/lkml/20201103193239.1807-1-dongli.zhang@oracle.com/
    References: https://lore.kernel.org/linux-mm/20201105042140.5253-1-willy@infradead.org/
    
    
    Suggested-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
    Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com>
    Cc: Bert Barbe <bert.barbe@oracle.com>
    Cc: Rama Nichanamatlu <rama.nichanamatlu@oracle.com>
    Cc: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
    Cc: Manjunath Patil <manjunath.b.patil@oracle.com>
    Cc: Joe Jin <joe.jin@oracle.com>
    Cc: SRINIVAS <srinivas.eeda@oracle.com>
    Fixes: 79930f58
    
     ("net: do not deplete pfmemalloc reserve")
    Signed-off-by: default avatarDongli Zhang <dongli.zhang@oracle.com>
    Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
    Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
    Link: https://lore.kernel.org/r/20201115201029.11903-1-dongli.zhang@oracle.com
    
    
    Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
    d8c19014