Skip to content
Snippets Groups Projects
  1. Jun 14, 2018
    • Christoph Hellwig's avatar
      dma-mapping: move all DMA mapping code to kernel/dma · cf65a0f6
      Christoph Hellwig authored
      
      Currently the code is split over various files with dma- prefixes in the
      lib/ and drives/base directories, and the number of files keeps growing.
      Move them into a single directory to keep the code together and remove
      the file name prefixes.  To match the irq infrastructure this directory
      is placed under the kernel/ directory.
      
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      cf65a0f6
  2. Jun 08, 2018
  3. Jun 06, 2018
    • Mathieu Desnoyers's avatar
      rseq: Introduce restartable sequences system call · d7822b1e
      Mathieu Desnoyers authored
      Expose a new system call allowing each thread to register one userspace
      memory area to be used as an ABI between kernel and user-space for two
      purposes: user-space restartable sequences and quick access to read the
      current CPU number value from user-space.
      
      * Restartable sequences (per-cpu atomics)
      
      Restartables sequences allow user-space to perform update operations on
      per-cpu data without requiring heavy-weight atomic operations.
      
      The restartable critical sections (percpu atomics) work has been started
      by Paul Turner and Andrew Hunter. It lets the kernel handle restart of
      critical sections. [1] [2] The re-implementation proposed here brings a
      few simplifications to the ABI which facilitates porting to other
      architectures and speeds up the user-space fast path.
      
      Here are benchmarks of various rseq use-cases.
      
      Test hardware:
      
      arm32: ARMv7 Processor rev 4 (v7l) "Cubietruck", 2-core
      x86-64: Intel E5-2630 v3@2.40GHz, 16-core, hyperthreading
      
      The following benchmarks were all performed on a single thread.
      
      * Per-CPU statistic counter increment
      
                      getcpu+atomic (ns/op)    rseq (ns/op)    speedup
      arm32:                344.0                 31.4          11.0
      x86-64:                15.3                  2.0           7.7
      
      * LTTng-UST: write event 32-bit header, 32-bit payload into tracer
                   per-cpu buffer
      
                      getcpu+atomic (ns/op)    rseq (ns/op)    speedup
      arm32:               2502.0                 2250.0         1.1
      x86-64:               117.4                   98.0         1.2
      
      * liburcu percpu: lock-unlock pair, dereference, read/compare word
      
                      getcpu+atomic (ns/op)    rseq (ns/op)    speedup
      arm32:                751.0                 128.5          5.8
      x86-64:                53.4                  28.6          1.9
      
      * jemalloc memory allocator adapted to use rseq
      
      Using rseq with per-cpu memory pools in jemalloc at Facebook (based on
      rseq 2016 implementation):
      
      The production workload response-time has 1-2% gain avg. latency, and
      the P99 overall latency drops by 2-3%.
      
      * Reading the current CPU number
      
      Speeding up reading the current CPU number on which the caller thread is
      running is done by keeping the current CPU number up do date within the
      cpu_id field of the memory area registered by the thread. This is done
      by making scheduler preemption set the TIF_NOTIFY_RESUME flag on the
      current thread. Upon return to user-space, a notify-resume handler
      updates the current CPU value within the registered user-space memory
      area. User-space can then read the current CPU number directly from
      memory.
      
      Keeping the current cpu id in a memory area shared between kernel and
      user-space is an improvement over current mechanisms available to read
      the current CPU number, which has the following benefits over
      alternative approaches:
      
      - 35x speedup on ARM vs system call through glibc
      - 20x speedup on x86 compared to calling glibc, which calls vdso
        executing a "lsl" instruction,
      - 14x speedup on x86 compared to inlined "lsl" instruction,
      - Unlike vdso approaches, this cpu_id value can be read from an inline
        assembly, which makes it a useful building block for restartable
        sequences.
      - The approach of reading the cpu id through memory mapping shared
        between kernel and user-space is portable (e.g. ARM), which is not the
        case for the lsl-based x86 vdso.
      
      On x86, yet another possible approach would be to use the gs segment
      selector to point to user-space per-cpu data. This approach performs
      similarly to the cpu id cache, but it has two disadvantages: it is
      not portable, and it is incompatible with existing applications already
      using the gs segment selector for other purposes.
      
      Benchmarking various approaches for reading the current CPU number:
      
      ARMv7 Processor rev 4 (v7l)
      Machine model: Cubietruck
      - Baseline (empty loop):                                    8.4 ns
      - Read CPU from rseq cpu_id:                               16.7 ns
      - Read CPU from rseq cpu_id (lazy register):               19.8 ns
      - glibc 2.19-0ubuntu6.6 getcpu:                           301.8 ns
      - getcpu system call:                                     234.9 ns
      
      x86-64 Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz:
      - Baseline (empty loop):                                    0.8 ns
      - Read CPU from rseq cpu_id:                                0.8 ns
      - Read CPU from rseq cpu_id (lazy register):                0.8 ns
      - Read using gs segment selector:                           0.8 ns
      - "lsl" inline assembly:                                   13.0 ns
      - glibc 2.19-0ubuntu6 getcpu:                              16.6 ns
      - getcpu system call:                                      53.9 ns
      
      - Speed (benchmark taken on v8 of patchset)
      
      Running 10 runs of hackbench -l 100000 seems to indicate, contrary to
      expectations, that enabling CONFIG_RSEQ slightly accelerates the
      scheduler:
      
      Configuration: 2 sockets * 8-core Intel(R) Xeon(R) CPU E5-2630 v3 @
      2.40GHz (directly on hardware, hyperthreading disabled in BIOS, energy
      saving disabled in BIOS, turboboost disabled in BIOS, cpuidle.off=1
      kernel parameter), with a Linux v4.6 defconfig+localyesconfig,
      restartable sequences series applied.
      
      * CONFIG_RSEQ=n
      
      avg.:      41.37 s
      std.dev.:   0.36 s
      
      * CONFIG_RSEQ=y
      
      avg.:      40.46 s
      std.dev.:   0.33 s
      
      - Size
      
      On x86-64, between CONFIG_RSEQ=n/y, the text size increase of vmlinux is
      567 bytes, and the data size increase of vmlinux is 5696 bytes.
      
      [1] https://lwn.net/Articles/650333/
      [2] http://www.linuxplumbersconf.org/2013/ocw/system/presentations/1695/original/LPC%20-%20PerCpu%20Atomics.pdf
      
      
      
      Signed-off-by: default avatarMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Joel Fernandes <joelaf@google.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Dave Watson <davejwatson@fb.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: "H . Peter Anvin" <hpa@zytor.com>
      Cc: Chris Lameter <cl@linux.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Andrew Hunter <ahh@google.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Ben Maurer <bmaurer@fb.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: linux-api@vger.kernel.org
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20151027235635.16059.11630.stgit@pjt-glaptop.roam.corp.google.com
      Link: http://lkml.kernel.org/r/20150624222609.6116.86035.stgit@kitami.mtv.corp.google.com
      Link: https://lkml.kernel.org/r/20180602124408.8430-3-mathieu.desnoyers@efficios.com
      d7822b1e
  4. May 28, 2018
    • Masahiro Yamada's avatar
      kconfig: replace $(UNAME_RELEASE) with function call · 2972666a
      Masahiro Yamada authored
      
      Now that 'shell' function is supported, this can be self-contained in
      Kconfig.
      
      Signed-off-by: default avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Reviewed-by: default avatarUlf Magnusson <ulfalizer@gmail.com>
      2972666a
    • Masahiro Yamada's avatar
      kconfig: reference environment variables directly and remove 'option env=' · 104daea1
      Masahiro Yamada authored
      
      To get access to environment variables, Kconfig needs to define a
      symbol using "option env=" syntax.  It is tedious to add a symbol entry
      for each environment variable given that we need to define much more
      such as 'CC', 'AS', 'srctree' etc. to evaluate the compiler capability
      in Kconfig.
      
      Adding '$' for symbol references is grammatically inconsistent.
      Looking at the code, the symbols prefixed with 'S' are expanded by:
       - conf_expand_value()
         This is used to expand 'arch/$ARCH/defconfig' and 'defconfig_list'
       - sym_expand_string_value()
         This is used to expand strings in 'source' and 'mainmenu'
      
      All of them are fixed values independent of user configuration.  So,
      they can be changed into the direct expansion instead of symbols.
      
      This change makes the code much cleaner.  The bounce symbols 'SRCARCH',
      'ARCH', 'SUBARCH', 'KERNELVERSION' are gone.
      
      sym_init() hard-coding 'UNAME_RELEASE' is also gone.  'UNAME_RELEASE'
      should be replaced with an environment variable.
      
      ARCH_DEFCONFIG is a normal symbol, so it should be simply referenced
      without '$' prefix.
      
      The new syntax is addicted by Make.  The variable reference needs
      parentheses, like $(FOO), but you can omit them for single-letter
      variables, like $F.  Yet, in Makefiles, people tend to use the
      parenthetical form for consistency / clarification.
      
      At this moment, only the environment variable is supported, but I will
      extend the concept of 'variable' later on.
      
      The variables are expanded in the lexer so we can simplify the token
      handling on the parser side.
      
      For example, the following code works.
      
      [Example code]
      
        config MY_TOOLCHAIN_LIST
                string
                default "My tools: CC=$(CC), AS=$(AS), CPP=$(CPP)"
      
      [Result]
      
        $ make -s alldefconfig && tail -n 1 .config
        CONFIG_MY_TOOLCHAIN_LIST="My tools: CC=gcc, AS=as, CPP=gcc -E"
      
      Signed-off-by: default avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      104daea1
    • Masahiro Yamada's avatar
      kbuild: remove CONFIG_CROSS_COMPILE support · f1089c92
      Masahiro Yamada authored
      
      Kbuild provides a couple of ways to specify CROSS_COMPILE:
      
      [1] Command line
      [2] Environment
      [3] arch/*/Makefile (only some architectures)
      [4] CONFIG_CROSS_COMPILE
      
      [4] is problematic for the compiler capability tests in Kconfig.
      CONFIG_CROSS_COMPILE allows users to change the compiler prefix from
      'make menuconfig', etc.  It means, the compiler options would have
      to be all re-calculated everytime CONFIG_CROSS_COMPILE is changed.
      
      To avoid complexity and performance issues, I'd like to evaluate
      the shell commands statically, i.e. only parsing Kconfig files.
      
      I guess the majority is [1] or [2].  Currently, there are only
      5 defconfig files that specify CONFIG_CROSS_COMPILE.
        arch/arm/configs/lpc18xx_defconfig
        arch/hexagon/configs/comet_defconfig
        arch/nds32/configs/defconfig
        arch/openrisc/configs/or1ksim_defconfig
        arch/openrisc/configs/simple_smp_defconfig
      
      Signed-off-by: default avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      f1089c92
  5. May 26, 2018
  6. May 18, 2018
  7. May 17, 2018
  8. May 14, 2018
  9. May 12, 2018
  10. May 07, 2018
  11. Apr 11, 2018
  12. Apr 08, 2018
  13. Apr 06, 2018
    • Steven Rostedt (VMware)'s avatar
      init, tracing: Have printk come through the trace events for initcall_debug · 4e37958d
      Steven Rostedt (VMware) authored
      
      With trace events set before and after the initcall function calls, instead
      of having a separate routine for printing out the initcalls when
      initcall_debug is specified on the kernel command line, have the code
      register a callback to the tracepoints where the initcall trace events are.
      
      This removes the need for having a separate function to do the initcalls as
      the tracepoint callbacks can handle the printk. It also includes other
      initcalls that are not called by the do_one_initcall() which includes
      console and security initcalls.
      
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      4e37958d
    • Steven Rostedt (VMware)'s avatar
      init, tracing: Add initcall trace events · 4ee7c60d
      Steven Rostedt (VMware) authored
      
      Being able to trace the start and stop of initcalls is useful to see where
      the timings are an issue. There is already an "initcall_debug" parameter,
      but that can cause a large overhead itself, as the printing of the
      information may take longer than the initcall functions.
      
      Adding in a start and finish trace event around the initcall functions, as
      well as a trace event that records the level of the initcalls, one can get a
      much finer measurement of the times and interactions of the initcalls
      themselves, as trace events are much lighter than printk()s.
      
      Suggested-by: default avatarAbderrahmane Benbachir <abderrahmane.benbachir@polymtl.ca>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      4ee7c60d
  14. Apr 05, 2018
    • Dominik Brodowski's avatar
      syscalls/core: Prepare CONFIG_ARCH_HAS_SYSCALL_WRAPPER=y for compat syscalls · 7303e30e
      Dominik Brodowski authored
      
      It may be useful for an architecture to override the definitions of the
      COMPAT_SYSCALL_DEFINE0() and __COMPAT_SYSCALL_DEFINEx() macros in
      <linux/compat.h>, in particular to use a different calling convention
      for syscalls. This patch provides a mechanism to do so, based on the
      previously introduced CONFIG_ARCH_HAS_SYSCALL_WRAPPER. If it is enabled,
      <asm/sycall_wrapper.h> is included in <linux/compat.h> and may be used
      to define the macros mentioned above. Moreover, as the syscall calling
      convention may be different if CONFIG_ARCH_HAS_SYSCALL_WRAPPER is set,
      the compat syscall function prototypes in <linux/compat.h> are #ifndef'd
      out in that case.
      
      As some of the syscalls and/or compat syscalls may not be present,
      the COND_SYSCALL() and COND_SYSCALL_COMPAT() macros in kernel/sys_ni.c
      as well as the SYS_NI() and COMPAT_SYS_NI() macros in
      kernel/time/posix-stubs.c can be re-defined in <asm/syscall_wrapper.h> iff
      CONFIG_ARCH_HAS_SYSCALL_WRAPPER is enabled.
      
      Signed-off-by: default avatarDominik Brodowski <linux@dominikbrodowski.net>
      Acked-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20180405095307.3730-5-linux@dominikbrodowski.net
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      7303e30e
    • Dominik Brodowski's avatar
      syscalls/core: Introduce CONFIG_ARCH_HAS_SYSCALL_WRAPPER=y · 1bd21c6c
      Dominik Brodowski authored
      
      It may be useful for an architecture to override the definitions of the
      SYSCALL_DEFINE0() and __SYSCALL_DEFINEx() macros in <linux/syscalls.h>,
      in particular to use a different calling convention for syscalls.
      
      This patch provides a mechanism to do so: It introduces
      CONFIG_ARCH_HAS_SYSCALL_WRAPPER. If it is enabled, <asm/sycall_wrapper.h>
      is included in <linux/syscalls.h> and may be used to define the macros
      mentioned above. Moreover, as the syscall calling convention may be
      different if CONFIG_ARCH_HAS_SYSCALL_WRAPPER is set, the syscall function
      prototypes in <linux/syscalls.h> are #ifndef'd out in that case.
      
      Signed-off-by: default avatarDominik Brodowski <linux@dominikbrodowski.net>
      Acked-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20180405095307.3730-3-linux@dominikbrodowski.net
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      1bd21c6c
  15. Apr 02, 2018
Loading