• Linus Torvalds's avatar
    mm: allow a controlled amount of unfairness in the page lock · 5ef64cc8
    Linus Torvalds authored
    Commit 2a9127fc ("mm: rewrite wait_on_page_bit_common() logic") made
    the page locking entirely fair, in that if a waiter came in while the
    lock was held, the lock would be transferred to the lockers strictly in
    order.
    
    That was intended to finally get rid of the long-reported watchdog
    failures that involved the page lock under extreme load, where a process
    could end up waiting essentially forever, as other page lockers stole
    the lock from under it.
    
    It also improved some benchmarks, but it ended up causing huge
    performance regressions on others, simply because fair lock behavior
    doesn't end up giving out the lock as aggressively, causing better
    worst-case latency, but potentially much worse average latencies and
    throughput.
    
    Instead of reverting that change entirely, this introduces a controlled
    amount of unfairness, with a sysctl knob to tune it if somebody needs
    to.  But the default value should hopefully be good for any normal load,
    allowing a few rounds of lock stealing, but enforcing the strict
    ordering before the lock has been stolen too many times.
    
    There is also a hint from Matthieu Baerts that the fair page coloring
    may end up exposing an ABBA deadlock that is hidden by the usual
    optimistic lock stealing, and while the unfairness doesn't fix the
    fundamental issue (and I'm still looking at that), it avoids it in
    practice.
    
    The amount of unfairness can be modified by writing a new value to the
    'sysctl_page_lock_unfairness' variable (default value of 5, exposed
    through /proc/sys/vm/page_lock_unfairness), but that is hopefully
    something we'd use mainly for debugging rather than being necessary for
    any deep system tuning.
    
    This whole issue has exposed just how critical the page lock can be, and
    how contended it gets under certain locks.  And the main contention
    doesn't really seem to be anything related to IO (which was the origin
    of this lock), but for things like just verifying that the page file
    mapping is stable while faulting in the page into a page table.
    
    Link: https://lore.kernel.org/linux-fsdevel/ed8442fd-6f54-dd84-cd4a-941e8b7ee603@MichaelLarabel.com/
    Link: https://www.phoronix.com/scan.php?page=article&item=linux-50-59&num=1
    Link: https://lore.kernel.org/linux-fsdevel/c560a38d-8313-51fb-b1ec-e904bd8836bc@tessares.net/
    
    
    Reported-and-tested-by: default avatarMichael Larabel <Michael@michaellarabel.com>
    Tested-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
    Cc: Dave Chinner <david@fromorbit.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Chris Mason <clm@fb.com>
    Cc: Jan Kara <jack@suse.cz>
    Cc: Amir Goldstein <amir73il@gmail.com>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    5ef64cc8