Skip to content
Snippets Groups Projects
  1. Apr 06, 2019
    • Kirill Smelkov's avatar
      fs: stream_open - opener for stream-like files so that read and write can run... · 10dce8af
      Kirill Smelkov authored
      fs: stream_open - opener for stream-like files so that read and write can run simultaneously without deadlock
      
      Commit 9c225f26 ("vfs: atomic f_pos accesses as per POSIX") added
      locking for file.f_pos access and in particular made concurrent read and
      write not possible - now both those functions take f_pos lock for the
      whole run, and so if e.g. a read is blocked waiting for data, write will
      deadlock waiting for that read to complete.
      
      This caused regression for stream-like files where previously read and
      write could run simultaneously, but after that patch could not do so
      anymore. See e.g. commit 581d21a2 ("xenbus: fix deadlock on writes
      to /proc/xen/xenbus") which fixes such regression for particular case of
      /proc/xen/xenbus.
      
      The patch that added f_pos lock in 2014 did so to guarantee POSIX thread
      safety for read/write/lseek and added the locking to file descriptors of
      all regular files. In 2014 that thread-safety problem was not new as it
      was already discussed earlier in 2006.
      
      However even though 2006'th version of Linus's patch was adding f_pos
      locking "only for files that are marked seekable with FMODE_LSEEK (thus
      avoiding the stream-like objects like pipes and sockets)", the 2014
      version - the one that actually made it into the tree as 9c225f26 -
      is doing so irregardless of whether a file is seekable or not.
      
      See
      
          https://lore.kernel.org/lkml/53022DB1.4070805@gmail.com/
          https://lwn.net/Articles/180387
          https://lwn.net/Articles/180396
      
      for historic context.
      
      The reason that it did so is, probably, that there are many files that
      are marked non-seekable, but e.g. their read implementation actually
      depends on knowing current position to correctly handle the read. Some
      examples:
      
      	kernel/power/user.c		snapshot_read
      	fs/debugfs/file.c		u32_array_read
      	fs/fuse/control.c		fuse_conn_waiting_read + ...
      	drivers/hwmon/asus_atk0110.c	atk_debugfs_ggrp_read
      	arch/s390/hypfs/inode.c		hypfs_read_iter
      	...
      
      Despite that, many nonseekable_open users implement read and write with
      pure stream semantics - they don't depend on passed ppos at all. And for
      those cases where read could wait for something inside, it creates a
      situation similar to xenbus - the write could be never made to go until
      read is done, and read is waiting for some, potentially external, event,
      for potentially unbounded time -> deadlock.
      
      Besides xenbus, there are 14 such places in the kernel that I've found
      with semantic patch (see below):
      
      	drivers/xen/evtchn.c:667:8-24: ERROR: evtchn_fops: .read() can deadlock .write()
      	drivers/isdn/capi/capi.c:963:8-24: ERROR: capi_fops: .read() can deadlock .write()
      	drivers/input/evdev.c:527:1-17: ERROR: evdev_fops: .read() can deadlock .write()
      	drivers/char/pcmcia/cm4000_cs.c:1685:7-23: ERROR: cm4000_fops: .read() can deadlock .write()
      	net/rfkill/core.c:1146:8-24: ERROR: rfkill_fops: .read() can deadlock .write()
      	drivers/s390/char/fs3270.c:488:1-17: ERROR: fs3270_fops: .read() can deadlock .write()
      	drivers/usb/misc/ldusb.c:310:1-17: ERROR: ld_usb_fops: .read() can deadlock .write()
      	drivers/hid/uhid.c:635:1-17: ERROR: uhid_fops: .read() can deadlock .write()
      	net/batman-adv/icmp_socket.c:80:1-17: ERROR: batadv_fops: .read() can deadlock .write()
      	drivers/media/rc/lirc_dev.c:198:1-17: ERROR: lirc_fops: .read() can deadlock .write()
      	drivers/leds/uleds.c:77:1-17: ERROR: uleds_fops: .read() can deadlock .write()
      	drivers/input/misc/uinput.c:400:1-17: ERROR: uinput_fops: .read() can deadlock .write()
      	drivers/infiniband/core/user_mad.c:985:7-23: ERROR: umad_fops: .read() can deadlock .write()
      	drivers/gnss/core.c:45:1-17: ERROR: gnss_fops: .read() can deadlock .write()
      
      In addition to the cases above another regression caused by f_pos
      locking is that now FUSE filesystems that implement open with
      FOPEN_NONSEEKABLE flag, can no longer implement bidirectional
      stream-like files - for the same reason as above e.g. read can deadlock
      write locking on file.f_pos in the kernel.
      
      FUSE's FOPEN_NONSEEKABLE was added in 2008 in a7c1b990 ("fuse:
      implement nonseekable open") to support OSSPD. OSSPD implements /dev/dsp
      in userspace with FOPEN_NONSEEKABLE flag, with corresponding read and
      write routines not depending on current position at all, and with both
      read and write being potentially blocking operations:
      
      See
      
          https://github.com/libfuse/osspd
          https://lwn.net/Articles/308445
      
          https://github.com/libfuse/osspd/blob/14a9cff0/osspd.c#L1406
          https://github.com/libfuse/osspd/blob/14a9cff0/osspd.c#L1438-L1477
          https://github.com/libfuse/osspd/blob/14a9cff0/osspd.c#L1479-L1510
      
      Corresponding libfuse example/test also describes FOPEN_NONSEEKABLE as
      "somewhat pipe-like files ..." with read handler not using offset.
      However that test implements only read without write and cannot exercise
      the deadlock scenario:
      
          https://github.com/libfuse/libfuse/blob/fuse-3.4.2-3-ga1bff7d/example/poll.c#L124-L131
          https://github.com/libfuse/libfuse/blob/fuse-3.4.2-3-ga1bff7d/example/poll.c#L146-L163
          https://github.com/libfuse/libfuse/blob/fuse-3.4.2-3-ga1bff7d/example/poll.c#L209-L216
      
      I've actually hit the read vs write deadlock for real while implementing
      my FUSE filesystem where there is /head/watch file, for which open
      creates separate bidirectional socket-like stream in between filesystem
      and its user with both read and write being later performed
      simultaneously. And there it is semantically not easy to split the
      stream into two separate read-only and write-only channels:
      
          https://lab.nexedi.com/kirr/wendelin.core/blob/f13aa600/wcfs/wcfs.go#L88-169
      
      Let's fix this regression. The plan is:
      
      1. We can't change nonseekable_open to include &~FMODE_ATOMIC_POS -
         doing so would break many in-kernel nonseekable_open users which
         actually use ppos in read/write handlers.
      
      2. Add stream_open() to kernel to open stream-like non-seekable file
         descriptors. Read and write on such file descriptors would never use
         nor change ppos. And with that property on stream-like files read and
         write will be running without taking f_pos lock - i.e. read and write
         could be running simultaneously.
      
      3. With semantic patch search and convert to stream_open all in-kernel
         nonseekable_open users for which read and write actually do not
         depend on ppos and where there is no other methods in file_operations
         which assume @offset access.
      
      4. Add FOPEN_STREAM to fs/fuse/ and open in-kernel file-descriptors via
         steam_open if that bit is present in filesystem open reply.
      
         It was tempting to change fs/fuse/ open handler to use stream_open
         instead of nonseekable_open on just FOPEN_NONSEEKABLE flags, but
         grepping through Debian codesearch shows users of FOPEN_NONSEEKABLE,
         and in particular GVFS which actually uses offset in its read and
         write handlers
      
      	https://codesearch.debian.net/search?q=-%3Enonseekable+%3D
      	https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1080
      	https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1247-1346
      	https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1399-1481
      
      
      
         so if we would do such a change it will break a real user.
      
      5. Add stream_open and FOPEN_STREAM handling to stable kernels starting
         from v3.14+ (the kernel where 9c225f26 first appeared).
      
         This will allow to patch OSSPD and other FUSE filesystems that
         provide stream-like files to return FOPEN_STREAM | FOPEN_NONSEEKABLE
         in their open handler and this way avoid the deadlock on all kernel
         versions. This should work because fs/fuse/ ignores unknown open
         flags returned from a filesystem and so passing FOPEN_STREAM to a
         kernel that is not aware of this flag cannot hurt. In turn the kernel
         that is not aware of FOPEN_STREAM will be < v3.14 where just
         FOPEN_NONSEEKABLE is sufficient to implement streams without read vs
         write deadlock.
      
      This patch adds stream_open, converts /proc/xen/xenbus to it and adds
      semantic patch to automatically locate in-kernel places that are either
      required to be converted due to read vs write deadlock, or that are just
      safe to be converted because read and write do not use ppos and there
      are no other funky methods in file_operations.
      
      Regarding semantic patch I've verified each generated change manually -
      that it is correct to convert - and each other nonseekable_open instance
      left - that it is either not correct to convert there, or that it is not
      converted due to current stream_open.cocci limitations.
      
      The script also does not convert files that should be valid to convert,
      but that currently have .llseek = noop_llseek or generic_file_llseek for
      unknown reason despite file being opened with nonseekable_open (e.g.
      drivers/input/mousedev.c)
      
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Yongzhi Pan <panyongzhi@gmail.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Miklos Szeredi <miklos@szeredi.hu>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Julia Lawall <Julia.Lawall@lip6.fr>
      Cc: Nikolaus Rath <Nikolaus@rath.org>
      Cc: Han-Wen Nienhuys <hanwen@google.com>
      Signed-off-by: default avatarKirill Smelkov <kirr@nexedi.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      10dce8af
  2. Mar 29, 2019
  3. Mar 28, 2019
  4. Mar 17, 2019
    • Masahiro Yamada's avatar
      kconfig: remove stale lxdialog/.gitignore · c71bb9f8
      Masahiro Yamada authored
      
      When this .gitignore was added, lxdialog was an independent hostprogs-y.
      
      Now that all objects in lxdialog/ are directly linked to mconf, the
      lxdialog is no longer generated.
      
      Signed-off-by: default avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      c71bb9f8
    • Masahiro Yamada's avatar
      kbuild: force all architectures except um to include mandatory-y · 037fc336
      Masahiro Yamada authored
      
      Currently, every arch/*/include/uapi/asm/Kbuild explicitly includes
      the common Kbuild.asm file. Factor out the duplicated include directives
      to scripts/Makefile.asm-generic so that no architecture would opt out
      of the mandatory-y mechanism.
      
      um is not forced to include mandatory-y since it is a very exceptional
      case which does not support UAPI.
      
      Signed-off-by: default avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      037fc336
    • Masahiro Yamada's avatar
      kbuild: warn redundant generic-y · 7cbbbb8b
      Masahiro Yamada authored
      
      The generic-y is redundant under the following condition:
      
       - arch has its own implementation
      
       - the same header is added to generated-y
      
       - the same header is added to mandatory-y
      
      If a redundant generic-y is found, the warning like follows is displayed:
      
        scripts/Makefile.asm-generic:20: redundant generic-y found in arch/arm/include/asm/Kbuild: timex.h
      
      I fixed up arch Kbuild files found by this.
      
      Suggested-by: default avatarSam Ravnborg <sam@ravnborg.org>
      Signed-off-by: default avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      7cbbbb8b
    • Douglas Anderson's avatar
      Revert "modsign: Abort modules_install when signing fails" · f84dde10
      Douglas Anderson authored
      This reverts commit caf6fe91.
      
      The commit was fine but is no longer needed as of commit 3a2429e1
      ("kbuild: change if_changed_rule for multi-line recipe").  Let's go
      back to using ";" to be consistent.
      
      For some discussion, see:
      
      https://lkml.kernel.org/r/CAK7LNASde0Q9S5GKeQiWhArfER4S4wL1=R_FW8q0++_X3T5=hQ@mail.gmail.com
      
      
      
      Signed-off-by: default avatarDouglas Anderson <dianders@chromium.org>
      Signed-off-by: default avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      f84dde10
    • Arseny Maslennikov's avatar
      kbuild: deb-pkg: avoid implicit effects · f6d9db63
      Arseny Maslennikov authored
      
      * The man page for dpkg-source(1) notes:
      
      >      -b, --build directory [format-specific-parameters]
      >             Build  a  source  package  (--build since dpkg 1.17.14).
      >             <...>
      >
      >             dpkg-source will build the source package with the first
      >             format found in this ordered list: the format  indicated
      >             with  the  --format  command  line  option,  the  format
      >             indicated in debian/source/format, “1.0”.  The  fallback
      >             to “1.0” is deprecated and will be removed at some point
      >             in the future, you should always  document  the  desired
      >             source   format  in  debian/source/format.  See  section
      >             SOURCE PACKAGE FORMATS for an extensive  description  of
      >             the various source package formats.
      
        Thus it would be more foolproof to explicitly use 1.0 (as we always
        did) than to rely on dpkg-source's defaults.
      
      * In a similar vein, debian/rules is not made executable by mkdebian,
        and dpkg-source warns about that but still silently fixes the file.
        Let's be explicit once again.
      
      Signed-off-by: default avatarArseny Maslennikov <ar@cs.msu.ru>
      Signed-off-by: default avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      f6d9db63
    • Wen Yang's avatar
      coccinelle: semantic code search for missing put_device() · da9cfb87
      Wen Yang authored
      
      The of_find_device_by_node() takes a reference to the underlying device
      structure, we should release that reference.
      The implementation of this semantic code search is:
      In a function, for a local variable returned by calling
      of_find_device_by_node(),
      a, if it is released by a function such as
         put_device()/of_dev_put()/platform_device_put() after the last use,
         it is considered that there is no reference leak;
      b, if it is passed back to the caller via
         dev_get_drvdata()/platform_get_drvdata()/get_device(), etc., the
         reference will be released in other functions, and the current function
         also considers that there is no reference leak;
      c, for the rest of the situation, the current function should release the
         reference by calling put_device, this code search will report the
         corresponding error message.
      
      By using this semantic code search, we have found some object reference leaks,
      such as:
      commit 11907e9d ("ASoC: fsl-asoc-card: fix object reference leaks in
      fsl_asoc_card_probe")
      commit a12085d1 ("mtd: rawnand: atmel: fix possible object reference leak")
      commit 11493f26 ("mtd: rawnand: jz4780: fix possible object reference leak")
      
      There are still dozens of reference leaks in the current kernel code.
      
      Further, for the case of b, the object returned to other functions may also
      have a reference leak, we will continue to develop other cocci scripts to
      further check the reference leak.
      
      Signed-off-by: default avatarWen Yang <wen.yang99@zte.com.cn>
      Reviewed-by: default avatarJulia Lawall <Julia.Lawall@lip6.fr>
      Reviewed-by: default avatarMarkus Elfring <Markus.Elfring@web.de>
      Signed-off-by: default avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      da9cfb87
  5. Mar 13, 2019
  6. Mar 11, 2019
  7. Mar 08, 2019
  8. Mar 07, 2019
  9. Mar 06, 2019
  10. Mar 04, 2019
    • Kees Cook's avatar
      gcc-plugins: structleak: Generalize to all variable types · 81a56f6d
      Kees Cook authored
      
      This adjusts structleak to also work with non-struct types when they
      are passed by reference, since those variables may leak just like
      anything else. This is exposed via an improved set of Kconfig options.
      (This does mean structleak is slightly misnamed now.)
      
      Building with CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF_ALL should give the
      kernel complete initialization coverage of all stack variables passed
      by reference, including padding (see lib/test_stackinit.c).
      
      Using CONFIG_GCC_PLUGIN_STRUCTLEAK_VERBOSE to count added initializations
      under defconfig:
      
      	..._BYREF:      5945 added initializations
      	..._BYREF_ALL: 16606 added initializations
      
      There is virtually no change to text+data size (both have less than 0.05%
      growth):
      
         text    data     bss     dec     hex filename
      19502103        5051456 1917000 26470559        193e89f vmlinux.stock
      19513412        5051456 1908808 26473676        193f4cc vmlinux.byref
      19516974        5047360 1900616 26464950        193d2b6 vmlinux.byref_all
      
      The measured performance difference is in the noise for hackbench and
      kernel build benchmarks:
      
      Stock:
      
      	5x hackbench -g 20 -l 1000
      	Mean:   10.649s
      	Std Dev: 0.339
      
      	5x kernel build (4-way parallel)
      	Mean:  261.98s
      	Std Dev: 1.53
      
      CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF:
      
      	5x hackbench -g 20 -l 1000
      	Mean:   10.540s
      	Std Dev: 0.233
      
      	5x kernel build (4-way parallel)
      	Mean:  260.52s
      	Std Dev: 1.31
      
      CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF_ALL:
      
      	5x hackbench -g 20 -l 1000
      	Mean:   10.320
      	Std Dev: 0.413
      
      	5x kernel build (4-way parallel)
      	Mean:  260.10
      	Std Dev: 0.86
      
      This does not yet solve missing padding initialization for structures
      on the stack that are never passed by reference (which should be a tiny
      minority). Hopefully this will be more easily addressed by upstream
      compiler fixes after clarifying the C11 padding initialization
      specification.
      
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Reviewed-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      81a56f6d
    • Masahiro Yamada's avatar
      kbuild: clean up scripts/gcc-version.sh · fa7295ab
      Masahiro Yamada authored
      
      Now that the Kconfig is the only user of this script, we can drop
      unneeded code.
      
      Remove the -p option, and stop prepending the output with zero,
      so that Kconfig can directly use the output from this script.
      
      Signed-off-by: default avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      fa7295ab
    • Masahiro Yamada's avatar
      kbuild: remove cc-version macro · d3a918c6
      Masahiro Yamada authored
      
      There is no more direct user of this macro; it is only used by
      cc-ifversion.
      
      Calling this macro is not efficient since it invokes the compiler to
      get the compiler version. CONFIG_GCC_VERSION is already calculated in
      the Kconfig stage, so Makefile can reuse it.
      
      Here is a note about the slight difference between cc-version and
      CONFIG_GCC_VERSION:
      
      When using Clang, cc-version is evaluated to '0402' because Clang
      defines __GNUC__ and __GNUC__MINOR__, and looks like GCC 4.2 in the
      version point of view. On the other hand, CONFIG_GCC_VERSION=0
      when $(CC) is clang.
      
      There are currently two users of cc-ifversion:
        arch/mips/loongson64/Platform
        arch/powerpc/Makefile
      
      They are not affected by this change.
      
      The format of cc-version is <major><minor>, while CONFIG_GCC_VERSION
      <major><minor><patch>. I adjusted cc-ifversion for the difference of
      the number of digits.
      
      Signed-off-by: default avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      d3a918c6
    • Masahiro Yamada's avatar
      kbuild: update comment block of scripts/clang-version.sh · 00250b52
      Masahiro Yamada authored
      
      Commit 469cb737 ("kconfig: add CC_IS_CLANG and CLANG_VERSION")
      changed the code, but missed to update the comment block.
      
      The -p option was gone, and the output is 5-digit (or 6-digit when
      Clang 10 is released).
      
      Update the comment now.
      
      Signed-off-by: default avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      00250b52
  11. Mar 01, 2019
    • Arnd Bergmann's avatar
      kasan: turn off asan-stack for clang-8 and earlier · 6baec880
      Arnd Bergmann authored
      Building an arm64 allmodconfig kernel with clang results in over 140
      warnings about overly large stack frames, the worst ones being:
      
        drivers/gpu/drm/panel/panel-sitronix-st7789v.c:196:12: error: stack frame size of 20224 bytes in function 'st7789v_prepare'
        drivers/video/fbdev/omap2/omapfb/displays/panel-tpo-td028ttec1.c:196:12: error: stack frame size of 13120 bytes in function 'td028ttec1_panel_enable'
        drivers/usb/host/max3421-hcd.c:1395:1: error: stack frame size of 10048 bytes in function 'max3421_spi_thread'
        drivers/net/wan/slic_ds26522.c:209:12: error: stack frame size of 9664 bytes in function 'slic_ds26522_probe'
        drivers/crypto/ccp/ccp-ops.c:2434:5: error: stack frame size of 8832 bytes in function 'ccp_run_cmd'
        drivers/media/dvb-frontends/stv0367.c:1005:12: error: stack frame size of 7840 bytes in function 'stv0367ter_algo'
      
      None of these happen with gcc today, and almost all of these are the
      result of a single known issue in llvm.  Hopefully it will eventually
      get fixed with the clang-9 release.
      
      In the meantime, the best idea I have is to turn off asan-stack for
      clang-8 and earlier, so we can produce a kernel that is safe to run.
      
      I have posted three patches that address the frame overflow warnings
      that are not addressed by turning off asan-stack, so in combination with
      this change, we get much closer to a clean allmodconfig build, which in
      turn is necessary to do meaningful build regression testing.
      
      It is still possible to turn on the CONFIG_ASAN_STACK option on all
      versions of clang, and it's always enabled for gcc, but when
      CONFIG_COMPILE_TEST is set, the option remains invisible, so
      allmodconfig and randconfig builds (which are normally done with a
      forced CONFIG_COMPILE_TEST) will still result in a mostly clean build.
      
      Link: http://lkml.kernel.org/r/20190222222950.3997333-1-arnd@arndb.de
      Link: https://bugs.llvm.org/show_bug.cgi?id=38809
      
      
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Reviewed-by: default avatarQian Cai <cai@lca.pw>
      Reviewed-by: default avatarMark Brown <broonie@kernel.org>
      Acked-by: default avatarAndrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Kostya Serebryany <kcc@google.com>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6baec880
  12. Feb 28, 2019
Loading