Skip to content
Snippets Groups Projects
  1. Jul 20, 2019
  2. Jun 23, 2019
  3. May 21, 2019
  4. Jan 30, 2019
    • Josh Poimboeuf's avatar
      cpu/hotplug: Fix "SMT disabled by BIOS" detection for KVM · b284909a
      Josh Poimboeuf authored
      
      With the following commit:
      
        73d5e2b4 ("cpu/hotplug: detect SMT disabled by BIOS")
      
      ... the hotplug code attempted to detect when SMT was disabled by BIOS,
      in which case it reported SMT as permanently disabled.  However, that
      code broke a virt hotplug scenario, where the guest is booted with only
      primary CPU threads, and a sibling is brought online later.
      
      The problem is that there doesn't seem to be a way to reliably
      distinguish between the HW "SMT disabled by BIOS" case and the virt
      "sibling not yet brought online" case.  So the above-mentioned commit
      was a bit misguided, as it permanently disabled SMT for both cases,
      preventing future virt sibling hotplugs.
      
      Going back and reviewing the original problems which were attempted to
      be solved by that commit, when SMT was disabled in BIOS:
      
        1) /sys/devices/system/cpu/smt/control showed "on" instead of
           "notsupported"; and
      
        2) vmx_vm_init() was incorrectly showing the L1TF_MSG_SMT warning.
      
      I'd propose that we instead consider #1 above to not actually be a
      problem.  Because, at least in the virt case, it's possible that SMT
      wasn't disabled by BIOS and a sibling thread could be brought online
      later.  So it makes sense to just always default the smt control to "on"
      to allow for that possibility (assuming cpuid indicates that the CPU
      supports SMT).
      
      The real problem is #2, which has a simple fix: change vmx_vm_init() to
      query the actual current SMT state -- i.e., whether any siblings are
      currently online -- instead of looking at the SMT "control" sysfs value.
      
      So fix it by:
      
        a) reverting the original "fix" and its followup fix:
      
           73d5e2b4 ("cpu/hotplug: detect SMT disabled by BIOS")
           bc2d8d26 ("cpu/hotplug: Fix SMT supported evaluation")
      
           and
      
        b) changing vmx_vm_init() to query the actual current SMT state --
           instead of the sysfs control value -- to determine whether the L1TF
           warning is needed.  This also requires the 'sched_smt_present'
           variable to exported, instead of 'cpu_smt_control'.
      
      Fixes: 73d5e2b4 ("cpu/hotplug: detect SMT disabled by BIOS")
      Reported-by: default avatarIgor Mammedov <imammedo@redhat.com>
      Signed-off-by: default avatarJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Joe Mario <jmario@redhat.com>
      Cc: Jiri Kosina <jikos@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: kvm@vger.kernel.org
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/e3a85d585da28cc333ecbc1e78ee9216e6da9396.1548794349.git.jpoimboe@redhat.com
      b284909a
  5. Oct 09, 2018
  6. Aug 07, 2018
    • Thomas Gleixner's avatar
      cpu/hotplug: Fix SMT supported evaluation · bc2d8d26
      Thomas Gleixner authored
      
      Josh reported that the late SMT evaluation in cpu_smt_state_init() sets
      cpu_smt_control to CPU_SMT_NOT_SUPPORTED in case that 'nosmt' was supplied
      on the kernel command line as it cannot differentiate between SMT disabled
      by BIOS and SMT soft disable via 'nosmt'. That wreckages the state and
      makes the sysfs interface unusable.
      
      Rework this so that during bringup of the non boot CPUs the availability of
      SMT is determined in cpu_smt_allowed(). If a newly booted CPU is not a
      'primary' thread then set the local cpu_smt_available marker and evaluate
      this explicitely right after the initial SMP bringup has finished.
      
      SMT evaulation on x86 is a trainwreck as the firmware has all the
      information _before_ booting the kernel, but there is no interface to query
      it.
      
      Fixes: 73d5e2b4 ("cpu/hotplug: detect SMT disabled by BIOS")
      Reported-by: default avatarJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      bc2d8d26
  7. Nov 08, 2017
  8. Sep 09, 2017
  9. Aug 29, 2017
    • Ying Huang's avatar
      smp: Avoid using two cache lines for struct call_single_data · 966a9671
      Ying Huang authored
      
      struct call_single_data is used in IPIs to transfer information between
      CPUs.  Its size is bigger than sizeof(unsigned long) and less than
      cache line size.  Currently it is not allocated with any explicit alignment
      requirements.  This makes it possible for allocated call_single_data to
      cross two cache lines, which results in double the number of the cache lines
      that need to be transferred among CPUs.
      
      This can be fixed by requiring call_single_data to be aligned with the
      size of call_single_data. Currently the size of call_single_data is the
      power of 2.  If we add new fields to call_single_data, we may need to
      add padding to make sure the size of new definition is the power of 2
      as well.
      
      Fortunately, this is enforced by GCC, which will report bad sizes.
      
      To set alignment requirements of call_single_data to the size of
      call_single_data, a struct definition and a typedef is used.
      
      To test the effect of the patch, I used the vm-scalability multiple
      thread swap test case (swap-w-seq-mt).  The test will create multiple
      threads and each thread will eat memory until all RAM and part of swap
      is used, so that huge number of IPIs are triggered when unmapping
      memory.  In the test, the throughput of memory writing improves ~5%
      compared with misaligned call_single_data, because of faster IPIs.
      
      Suggested-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarHuang, Ying <ying.huang@intel.com>
      [ Add call_single_data_t and align with size of call_single_data. ]
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Aaron Lu <aaron.lu@intel.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/87bmnqd6lz.fsf@yhuang-mobile.sh.intel.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      966a9671
  10. May 23, 2017
    • Peter Zijlstra's avatar
      smp, cpumask: Use non-atomic cpumask_{set,clear}_cpu() · 6c8557bd
      Peter Zijlstra authored
      
      The cpumasks in smp_call_function_many() are private and not subject
      to concurrency, atomic bitops are pointless and expensive.
      
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      6c8557bd
    • Aaron Lu's avatar
      smp: Avoid sending needless IPI in smp_call_function_many() · 3fc5b3b6
      Aaron Lu authored
      
      Inter-Processor-Interrupt(IPI) is needed when a page is unmapped and the
      process' mm_cpumask() shows the process has ever run on other CPUs. page
      migration, page reclaim all need IPIs. The number of IPI needed to send
      to different CPUs is especially large for multi-threaded workload since
      mm_cpumask() is per process.
      
      For smp_call_function_many(), whenever a CPU queues a CSD to a target
      CPU, it will send an IPI to let the target CPU to handle the work.
      This isn't necessary - we need only send IPI when queueing a CSD
      to an empty call_single_queue.
      
      The reason:
      
      flush_smp_call_function_queue() that is called upon a CPU receiving an
      IPI will empty the queue and then handle all of the CSDs there. So if
      the target CPU's call_single_queue is not empty, we know that:
      i.  An IPI for the target CPU has already been sent by 'previous queuers';
      ii. flush_smp_call_function_queue() hasn't emptied that CPU's queue yet.
      Thus, it's safe for us to just queue our CSD there without sending an
      addtional IPI. And for the 'previous queuers', we can limit it to the
      first queuer.
      
      To demonstrate the effect of this patch, a multi-thread workload that
      spawns 80 threads to equally consume 100G memory is used. This is tested
      on a 2 node broadwell-EP which has 44cores/88threads and 32G memory. So
      after 32G memory is used up, page reclaiming starts to happen a lot.
      
      With this patch, IPI number dropped 88% and throughput increased about
      15% for the above workload.
      
      Signed-off-by: default avatarAaron Lu <aaron.lu@intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Link: http://lkml.kernel.org/r/20170519075331.GE2084@aaronlu.sh.intel.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      3fc5b3b6
  11. Mar 02, 2017
  12. Oct 26, 2016
  13. Sep 22, 2016
  14. Sep 05, 2016
  15. Jul 15, 2016
  16. Jun 14, 2016
  17. Mar 10, 2016
  18. Mar 01, 2016
    • Thomas Gleixner's avatar
      cpu/hotplug: Create hotplug threads · 4cb28ced
      Thomas Gleixner authored
      
      In order to let the hotplugged cpu take care of the setup/teardown, we need a
      seperate hotplug thread.
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: linux-arch@vger.kernel.org
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Rafael Wysocki <rafael.j.wysocki@intel.com>
      Cc: "Srivatsa S. Bhat" <srivatsa@mit.edu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul Turner <pjt@google.com>
      Link: http://lkml.kernel.org/r/20160226182341.454541272@linutronix.de
      
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      4cb28ced
  19. Nov 07, 2015
    • Mel Gorman's avatar
      mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep... · d0164adc
      Mel Gorman authored
      mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd
      
      __GFP_WAIT has been used to identify atomic context in callers that hold
      spinlocks or are in interrupts.  They are expected to be high priority and
      have access one of two watermarks lower than "min" which can be referred
      to as the "atomic reserve".  __GFP_HIGH users get access to the first
      lower watermark and can be called the "high priority reserve".
      
      Over time, callers had a requirement to not block when fallback options
      were available.  Some have abused __GFP_WAIT leading to a situation where
      an optimisitic allocation with a fallback option can access atomic
      reserves.
      
      This patch uses __GFP_ATOMIC to identify callers that are truely atomic,
      cannot sleep and have no alternative.  High priority users continue to use
      __GFP_HIGH.  __GFP_DIRECT_RECLAIM identifies callers that can sleep and
      are willing to enter direct reclaim.  __GFP_KSWAPD_RECLAIM to identify
      callers that want to wake kswapd for background reclaim.  __GFP_WAIT is
      redefined as a caller that is willing to enter direct reclaim and wake
      kswapd for background reclaim.
      
      This patch then converts a number of sites
      
      o __GFP_ATOMIC is used by callers that are high priority and have memory
        pools for those requests. GFP_ATOMIC uses this flag.
      
      o Callers that have a limited mempool to guarantee forward progress clear
        __GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall
        into this category where kswapd will still be woken but atomic reserves
        are not used as there is a one-entry mempool to guarantee progress.
      
      o Callers that are checking if they are non-blocking should use the
        helper gfpflags_allow_blocking() where possible. This is because
        checking for __GFP_WAIT as was done historically now can trigger false
        positives. Some exceptions like dm-crypt.c exist where the code intent
        is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to
        flag manipulations.
      
      o Callers that built their own GFP flags instead of starting with GFP_KERNEL
        and friends now also need to specify __GFP_KSWAPD_RECLAIM.
      
      The first key hazard to watch out for is callers that removed __GFP_WAIT
      and was depending on access to atomic reserves for inconspicuous reasons.
      In some cases it may be appropriate for them to use __GFP_HIGH.
      
      The second key hazard is callers that assembled their own combination of
      GFP flags instead of starting with something like GFP_KERNEL.  They may
      now wish to specify __GFP_KSWAPD_RECLAIM.  It's almost certainly harmless
      if it's missed in most cases as other activity will wake kswapd.
      
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Vitaly Wool <vitalywool@gmail.com>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d0164adc
  20. Apr 19, 2015
  21. Apr 17, 2015
    • Linus Torvalds's avatar
      smp: Fix smp_call_function_single_async() locking · 8053871d
      Linus Torvalds authored
      
      The current smp_function_call code suffers a number of problems, most
      notably smp_call_function_single_async() is broken.
      
      The problem is that flush_smp_call_function_queue() does csd_unlock()
      _after_ calling csd->func(). This means that a caller cannot properly
      synchronize the csd usage as it has to.
      
      Change the code to release the csd before calling ->func() for the
      async case, and put a WARN_ON_ONCE(csd->flags & CSD_FLAG_LOCK) in
      smp_call_function_single_async() to warn us of improper serialization,
      because any waiting there can results in deadlocks when called with
      IRQs disabled.
      
      Rename the (currently) unused WAIT flag to SYNCHRONOUS and (re)use it
      such that we know what to do in flush_smp_call_function_queue().
      
      Rework csd_{,un}lock() to use smp_load_acquire() / smp_store_release()
      to avoid some full barriers while more clearly providing lock
      semantics.
      
      Finally move the csd maintenance out of generic_exec_single() into its
      callers for clearer code.
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [ Added changelog. ]
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Rafael David Tinoco <inaddy@ubuntu.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/CA+55aFz492bzLFhdbKN-Hygjcreup7CjMEYk3nTSfRWjppz-OA@mail.gmail.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      8053871d
  22. Sep 19, 2014
    • Chuansheng Liu's avatar
      smp: Add new wake_up_all_idle_cpus() function · c6f4459f
      Chuansheng Liu authored
      
      Currently kick_all_cpus_sync() can break non-polling idle cpus
      thru IPI interrupts.
      
      But sometimes we need to break the polling idle cpus immediately
      to reselect the suitable c-state, also for non-idle cpus, we need
      to do nothing if we try to wake up them.
      
      Here adding one new function wake_up_all_idle_cpus() to let all cpus
      out of idle based on function wake_up_if_idle().
      
      Signed-off-by: default avatarChuansheng Liu <chuansheng.liu@intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: daniel.lezcano@linaro.org
      Cc: rjw@rjwysocki.net
      Cc: linux-pm@vger.kernel.org
      Cc: changcheng.liu@intel.com
      Cc: xiaoming.wang@intel.com
      Cc: souvik.k.chakravarty@intel.com
      Cc: luto@amacapital.net
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Geert Uytterhoeven <geert+renesas@glider.be>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jens Axboe <axboe@fb.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Roman Gushchin <klamm@yandex-team.ru>
      Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Link: http://lkml.kernel.org/r/1409815075-4180-2-git-send-email-chuansheng.liu@intel.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      c6f4459f
  23. Aug 26, 2014
  24. Aug 07, 2014
  25. Jun 23, 2014
    • Srivatsa S. Bhat's avatar
      CPU hotplug, smp: flush any pending IPI callbacks before CPU offline · 8d056c48
      Srivatsa S. Bhat authored
      
      There is a race between the CPU offline code (within stop-machine) and
      the smp-call-function code, which can lead to getting IPIs on the
      outgoing CPU, *after* it has gone offline.
      
      Specifically, this can happen when using
      smp_call_function_single_async() to send the IPI, since this API allows
      sending asynchronous IPIs from IRQ disabled contexts.  The exact race
      condition is described below.
      
      During CPU offline, in stop-machine, we don't enforce any rule in the
      _DISABLE_IRQ stage, regarding the order in which the outgoing CPU and
      the other CPUs disable their local interrupts.  Due to this, we can
      encounter a situation in which an IPI is sent by one of the other CPUs
      to the outgoing CPU (while it is *still* online), but the outgoing CPU
      ends up noticing it only *after* it has gone offline.
      
                    CPU 1                                         CPU 2
                (Online CPU)                               (CPU going offline)
      
             Enter _PREPARE stage                          Enter _PREPARE stage
      
                                                           Enter _DISABLE_IRQ stage
      
                                                         =
             Got a device interrupt, and                 | Didn't notice the IPI
             the interrupt handler sent an               | since interrupts were
             IPI to CPU 2 using                          | disabled on this CPU.
             smp_call_function_single_async()            |
                                                         =
      
             Enter _DISABLE_IRQ stage
      
             Enter _RUN stage                              Enter _RUN stage
      
                                        =
             Busy loop with interrupts  |                  Invoke take_cpu_down()
             disabled.                  |                  and take CPU 2 offline
                                        =
      
             Enter _EXIT stage                             Enter _EXIT stage
      
             Re-enable interrupts                          Re-enable interrupts
      
                                                           The pending IPI is noted
                                                           immediately, but alas,
                                                           the CPU is offline at
                                                           this point.
      
      This of course, makes the smp-call-function IPI handler code running on
      CPU 2 unhappy and it complains about "receiving an IPI on an offline
      CPU".
      
      One real example of the scenario on CPU 1 is the block layer's
      complete-request call-path:
      
      	__blk_complete_request() [interrupt-handler]
      	    raise_blk_irq()
      	        smp_call_function_single_async()
      
      However, if we look closely, the block layer does check that the target
      CPU is online before firing the IPI.  So in this case, it is actually
      the unfortunate ordering/timing of events in the stop-machine phase that
      leads to receiving IPIs after the target CPU has gone offline.
      
      In reality, getting a late IPI on an offline CPU is not too bad by
      itself (this can happen even due to hardware latencies in IPI
      send-receive).  It is a bug only if the target CPU really went offline
      without executing all the callbacks queued on its list.  (Note that a
      CPU is free to execute its pending smp-call-function callbacks in a
      batch, without waiting for the corresponding IPIs to arrive for each one
      of those callbacks).
      
      So, fixing this issue can be broken up into two parts:
      
      1. Ensure that a CPU goes offline only after executing all the
         callbacks queued on it.
      
      2. Modify the warning condition in the smp-call-function IPI handler
         code such that it warns only if an offline CPU got an IPI *and* that
         CPU had gone offline with callbacks still pending in its queue.
      
      Achieving part 1 is straight-forward - just flush (execute) all the
      queued callbacks on the outgoing CPU in the CPU_DYING stage[1],
      including those callbacks for which the source CPU's IPIs might not have
      been received on the outgoing CPU yet.  Once we do this, an IPI that
      arrives late on the CPU going offline (either due to the race mentioned
      above, or due to hardware latencies) will be completely harmless, since
      the outgoing CPU would have executed all the queued callbacks before
      going offline.
      
      Overall, this fix (parts 1 and 2 put together) additionally guarantees
      that we will see a warning only when the *IPI-sender code* is buggy -
      that is, if it queues the callback _after_ the target CPU has gone
      offline.
      
      [1].  The CPU_DYING part needs a little more explanation: by the time we
      execute the CPU_DYING notifier callbacks, the CPU would have already
      been marked offline.  But we want to flush out the pending callbacks at
      this stage, ignoring the fact that the CPU is offline.  So restructure
      the IPI handler code so that we can by-pass the "is-cpu-offline?" check
      in this particular case.  (Of course, the right solution here is to fix
      CPU hotplug to mark the CPU offline _after_ invoking the CPU_DYING
      notifiers, but this requires a lot of audit to ensure that this change
      doesn't break any existing code; hence lets go with the solution
      proposed above until that is done).
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: default avatarSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Suggested-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Mike Galbraith <mgalbraith@suse.de>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Tested-by: default avatarSachin Kamat <sachin.kamat@samsung.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8d056c48
  26. Jun 16, 2014
    • Frederic Weisbecker's avatar
      irq_work: Implement remote queueing · 47885016
      Frederic Weisbecker authored
      
      irq work currently only supports local callbacks. However its code
      is mostly ready to run remote callbacks and we have some potential user.
      
      The full nohz subsystem currently open codes its own remote irq work
      on top of the scheduler ipi when it wants a CPU to reevaluate its next
      tick. However this ad hoc solution bloats the scheduler IPI.
      
      Lets just extend the irq work subsystem to support remote queuing on top
      of the generic SMP IPI to handle this kind of user. This shouldn't add
      noticeable overhead.
      
      Suggested-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Acked-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      47885016
  27. Jun 06, 2014
    • Srivatsa S. Bhat's avatar
      smp: print more useful debug info upon receiving IPI on an offline CPU · a219ccf4
      Srivatsa S. Bhat authored
      
      There is a longstanding problem related to CPU hotplug which causes IPIs
      to be delivered to offline CPUs, and the smp-call-function IPI handler
      code prints out a warning whenever this is detected.  Every once in a
      while this (usually harmless) warning gets reported on LKML, but so far
      it has not been completely fixed.  Usually the solution involves finding
      out the IPI sender and fixing it by adding appropriate synchronization
      with CPU hotplug.
      
      However, while going through one such internal bug reports, I found that
      there is a significant bug in the receiver side itself (more
      specifically, in stop-machine) that can lead to this problem even when
      the sender code is perfectly fine.  This patchset fixes that
      synchronization problem in the CPU hotplug stop-machine code.
      
      Patch 1 adds some additional debug code to the smp-call-function
      framework, to help debug such issues easily.
      
      Patch 2 modifies the stop-machine code to ensure that any IPIs that were
      sent while the target CPU was online, would be noticed and handled by
      that CPU without fail before it goes offline.  Thus, this avoids
      scenarios where IPIs are received on offline CPUs (as long as the sender
      uses proper hotplug synchronization).
      
      In fact, I debugged the problem by using Patch 1, and found that the
      payload of the IPI was always the block layer's trigger_softirq()
      function.  But I was not able to find anything wrong with the block
      layer code.  That's when I started looking at the stop-machine code and
      realized that there is a race-window which makes the IPI _receiver_ the
      culprit, not the sender.  Patch 2 fixes that race and hence this should
      put an end to most of the hard-to-debug IPI-to-offline-CPU issues.
      
      This patch (of 2):
      
      Today the smp-call-function code just prints a warning if we get an IPI
      on an offline CPU.  This info is sufficient to let us know that
      something went wrong, but often it is very hard to debug exactly who
      sent the IPI and why, from this info alone.
      
      In most cases, we get the warning about the IPI to an offline CPU,
      immediately after the CPU going offline comes out of the stop-machine
      phase and reenables interrupts.  Since all online CPUs participate in
      stop-machine, the information regarding the sender of the IPI is already
      lost by the time we exit the stop-machine loop.  So even if we dump the
      stack on each CPU at this point, we won't find anything useful since all
      of them will show the stack-trace of the stopper thread.  So we need a
      better way to figure out who sent the IPI and why.
      
      To achieve this, when we detect an IPI targeted to an offline CPU, loop
      through the call-single-data linked list and print out the payload
      (i.e., the name of the function which was supposed to be executed by the
      target CPU).  This would give us an insight as to who might have sent
      the IPI and help us debug this further.
      
      [akpm@linux-foundation.org: correctly suppress warning output on second and later occurrences]
      Signed-off-by: default avatarSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Mike Galbraith <mgalbraith@suse.de>
      Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a219ccf4
  28. Feb 24, 2014
    • Frederic Weisbecker's avatar
      smp: Rename __smp_call_function_single() to smp_call_function_single_async() · c46fff2a
      Frederic Weisbecker authored
      
      The name __smp_call_function_single() doesn't tell much about the
      properties of this function, especially when compared to
      smp_call_function_single().
      
      The comments above the implementation are also misleading. The main
      point of this function is actually not to be able to embed the csd
      in an object. This is actually a requirement that result from the
      purpose of this function which is to raise an IPI asynchronously.
      
      As such it can be called with interrupts disabled. And this feature
      comes at the cost of the caller who then needs to serialize the
      IPIs on this csd.
      
      Lets rename the function and enhance the comments so that they reflect
      these properties.
      
      Suggested-by: default avatarChristoph Hellwig <hch@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jens Axboe <axboe@fb.com>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      c46fff2a
    • Frederic Weisbecker's avatar
      smp: Remove wait argument from __smp_call_function_single() · fce8ad15
      Frederic Weisbecker authored
      
      The main point of calling __smp_call_function_single() is to send
      an IPI in a pure asynchronous way. By embedding a csd in an object,
      a caller can send the IPI without waiting for a previous one to complete
      as is required by smp_call_function_single() for example. As such,
      sending this kind of IPI can be safe even when irqs are disabled.
      
      This flexibility comes at the expense of the caller who then needs to
      synchronize the csd lifecycle by himself and make sure that IPIs on a
      single csd are serialized.
      
      This is how __smp_call_function_single() works when wait = 0 and this
      usecase is relevant.
      
      Now there don't seem to be any usecase with wait = 1 that can't be
      covered by smp_call_function_single() instead, which is safer. Lets look
      at the two possible scenario:
      
      1) The user calls __smp_call_function_single(wait = 1) on a csd embedded
         in an object. It looks like a nice and convenient pattern at the first
         sight because we can then retrieve the object from the IPI handler easily.
      
         But actually it is a waste of memory space in the object since the csd
         can be allocated from the stack by smp_call_function_single(wait = 1)
         and the object can be passed an the IPI argument.
      
         Besides that, embedding the csd in an object is more error prone
         because the caller must take care of the serialization of the IPIs
         for this csd.
      
      2) The user calls __smp_call_function_single(wait = 1) on a csd that
         is allocated on the stack. It's ok but smp_call_function_single()
         can do it as well and it already takes care of the allocation on the
         stack. Again it's more simple and less error prone.
      
      Therefore, using the underscore prepend API version with wait = 1
      is a bad pattern and a sign that the caller can do safer and more
      simple.
      
      There was a single user of that which has just been converted.
      So lets remove this option to discourage further users.
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jens Axboe <axboe@fb.com>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      fce8ad15
    • Frederic Weisbecker's avatar
      smp: Move __smp_call_function_single() below its safe version · d7877c03
      Frederic Weisbecker authored
      
      Move this function closer to __smp_call_function_single(). These functions
      have very similar behavior and should be displayed in the same block
      for clarity.
      
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jens Axboe <axboe@fb.com>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      d7877c03
    • Frederic Weisbecker's avatar
      smp: Consolidate the various smp_call_function_single() declensions · 8b28499a
      Frederic Weisbecker authored
      
      __smp_call_function_single() and smp_call_function_single() share some
      code that can be factorized: execute inline when the target is local,
      check if the target is online, lock the csd, call generic_exec_single().
      
      Lets move the common parts to generic_exec_single().
      
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jens Axboe <axboe@fb.com>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      8b28499a
    • Jan Kara's avatar
      smp: Teach __smp_call_function_single() to check for offline cpus · 08eed44c
      Jan Kara authored
      
      Align __smp_call_function_single() with smp_call_function_single() so
      that it also checks whether requested cpu is still online.
      
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jens Axboe <axboe@fb.com>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      08eed44c
    • Jan Kara's avatar
      smp: Iterate functions through llist_for_each_entry_safe() · 5fd77595
      Jan Kara authored
      
      The IPI function llist iteration is open coded. Lets simplify this
      with using an llist iterator.
      
      Also we want to keep the iteration safe against possible
      csd.llist->next value reuse from the IPI handler. At least the block
      subsystem used to do such things so lets stay careful and use
      llist_for_each_entry_safe().
      
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jens Axboe <axboe@fb.com>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      5fd77595
Loading