Skip to content
Snippets Groups Projects
  1. May 21, 2019
  2. Apr 30, 2019
  3. Feb 15, 2019
  4. Jan 22, 2019
  5. Nov 30, 2018
    • Maximilian Heyne's avatar
      fs: fix lost error code in dio_complete · 41e817bc
      Maximilian Heyne authored
      commit e2592217 ("fs: simplify the
      generic_write_sync prototype") reworked callers of generic_write_sync(),
      and ended up dropping the error return for the directio path. Prior to
      that commit, in dio_complete(), an error would be bubbled up the stack,
      but after that commit, errors passed on to dio_complete were eaten up.
      
      This was reported on the list earlier, and a fix was proposed in
      https://lore.kernel.org/lkml/20160921141539.GA17898@infradead.org/
      
      , but
      never followed up with.  We recently hit this bug in our testing where
      fencing io errors, which were previously erroring out with EIO, were
      being returned as success operations after this commit.
      
      The fix proposed on the list earlier was a little short -- it would have
      still called generic_write_sync() in case `ret` already contained an
      error. This fix ensures generic_write_sync() is only called when there's
      no pending error in the write. Additionally, transferred is replaced
      with ret to bring this code in line with other callers.
      
      Fixes: e2592217 ("fs: simplify the generic_write_sync prototype")
      Reported-by: default avatarRavi Nankani <rnankani@amazon.com>
      Signed-off-by: default avatarMaximilian Heyne <mheyne@amazon.de>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      CC: Torsten Mehlan <tomeh@amazon.de>
      CC: Uwe Dannowski <uwed@amazon.de>
      CC: Amit Shah <aams@amazon.de>
      CC: David Woodhouse <dwmw@amazon.co.uk>
      CC: stable@vger.kernel.org
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      41e817bc
  6. Nov 26, 2018
  7. Nov 07, 2018
  8. Oct 23, 2018
  9. May 14, 2018
  10. Apr 06, 2018
  11. Mar 12, 2018
  12. Feb 26, 2018
    • Jan Kara's avatar
      direct-io: Fix sleep in atomic due to sync AIO · d9c10e5b
      Jan Kara authored
      
      Commit e864f395 "fs: add RWF_DSYNC aand RWF_SYNC" added additional
      way for direct IO to become synchronous and thus trigger fsync from the
      IO completion handler. Then commit 9830f4be "fs: Use RWF_* flags for
      AIO operations" allowed these flags to be set for AIO as well. However
      that commit forgot to update the condition checking whether the IO
      completion handling should be defered to a workqueue and thus AIO DIO
      with RWF_[D]SYNC set will call fsync() from IRQ context resulting in
      sleep in atomic.
      
      Fix the problem by checking directly iocb flags (the same way as it is
      done in dio_complete()) instead of checking all conditions that could
      lead to IO being synchronous.
      
      CC: Christoph Hellwig <hch@lst.de>
      CC: Goldwyn Rodrigues <rgoldwyn@suse.com>
      CC: stable@vger.kernel.org
      Reported-by: default avatarMark Rutland <mark.rutland@arm.com>
      Tested-by: default avatarMark Rutland <mark.rutland@arm.com>
      Fixes: 9830f4be
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      d9c10e5b
  13. Jan 08, 2018
    • Darrick J. Wong's avatar
      iomap: report collisions between directio and buffered writes to userspace · 5a9d929d
      Darrick J. Wong authored
      
      If two programs simultaneously try to write to the same part of a file
      via direct IO and buffered IO, there's a chance that the post-diowrite
      pagecache invalidation will fail on the dirty page.  When this happens,
      the dio write succeeded, which means that the page cache is no longer
      coherent with the disk!
      
      Programs are not supposed to mix IO types and this is a clear case of
      data corruption, so store an EIO which will be reflected to userspace
      during the next fsync.  Replace the WARN_ON with a ratelimited pr_crit
      so that the developers have /some/ kind of breadcrumb to track down the
      offending program(s) and file(s) involved.
      
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      5a9d929d
  14. Nov 03, 2017
  15. Oct 25, 2017
    • Mark Rutland's avatar
      locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns... · 6aa7de05
      Mark Rutland authored
      locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns to READ_ONCE()/WRITE_ONCE()
      
      Please do not apply this to mainline directly, instead please re-run the
      coccinelle script shown below and apply its output.
      
      For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
      preference to ACCESS_ONCE(), and new code is expected to use one of the
      former. So far, there's been no reason to change most existing uses of
      ACCESS_ONCE(), as these aren't harmful, and changing them results in
      churn.
      
      However, for some features, the read/write distinction is critical to
      correct operation. To distinguish these cases, separate read/write
      accessors must be used. This patch migrates (most) remaining
      ACCESS_ONCE() instances to {READ,WRITE}_ONCE(), using the following
      coccinelle script:
      
      ----
      // Convert trivial ACCESS_ONCE() uses to equivalent READ_ONCE() and
      // WRITE_ONCE()
      
      // $ make coccicheck COCCI=/home/mark/once.cocci SPFLAGS="--include-headers" MODE=patch
      
      virtual patch
      
      @ depends on patch @
      expression E1, E2;
      @@
      
      - ACCESS_ONCE(E1) = E2
      + WRITE_ONCE(E1, E2)
      
      @ depends on patch @
      expression E;
      @@
      
      - ACCESS_ONCE(E)
      + READ_ONCE(E)
      ----
      
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: davem@davemloft.net
      Cc: linux-arch@vger.kernel.org
      Cc: mpe@ellerman.id.au
      Cc: shuah@kernel.org
      Cc: snitzer@redhat.com
      Cc: thor.thayer@linux.intel.com
      Cc: tj@kernel.org
      Cc: viro@zeniv.linux.org.uk
      Cc: will.deacon@arm.com
      Link: http://lkml.kernel.org/r/1508792849-3115-19-git-send-email-paulmck@linux.vnet.ibm.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      6aa7de05
  16. Oct 17, 2017
  17. Oct 16, 2017
    • Eryu Guan's avatar
      fs: invalidate page cache after end_io() in dio completion · 5e25c269
      Eryu Guan authored
      
      Commit 332391a9 ("fs: Fix page cache inconsistency when mixing
      buffered and AIO DIO") moved page cache invalidation from
      iomap_dio_rw() to iomap_dio_complete() for iomap based direct write
      path, but before the dio->end_io() call, and it re-introdued the bug
      fixed by commit c771c14b ("iomap: invalidate page caches should
      be after iomap_dio_complete() in direct write").
      
      I found this because fstests generic/418 started failing on XFS with
      v4.14-rc3 kernel, which is the regression test for this specific
      bug.
      
      So similarly, fix it by moving dio->end_io() (which does the
      unwritten extent conversion) before page cache invalidation, to make
      sure next buffer read reads the final real allocations not unwritten
      extents. I also add some comments about why should end_io() go first
      in case we get it wrong again in the future.
      
      Note that, there's no such problem in the non-iomap based direct
      write path, because we didn't remove the page cache invalidation
      after the ->direct_IO() in generic_file_direct_write() call, but I
      decided to fix dio_complete() too so we don't leave a landmine
      there, also be consistent with iomap_dio_complete().
      
      Fixes: 332391a9 ("fs: Fix page cache inconsistency when mixing buffered and AIO DIO")
      Signed-off-by: default avatarEryu Guan <eguan@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarLukas Czerner <lczerner@redhat.com>
      5e25c269
  18. Oct 11, 2017
  19. Sep 25, 2017
    • Lukas Czerner's avatar
      fs: Fix page cache inconsistency when mixing buffered and AIO DIO · 332391a9
      Lukas Czerner authored
      
      Currently when mixing buffered reads and asynchronous direct writes it
      is possible to end up with the situation where we have stale data in the
      page cache while the new data is already written to disk. This is
      permanent until the affected pages are flushed away. Despite the fact
      that mixing buffered and direct IO is ill-advised it does pose a thread
      for a data integrity, is unexpected and should be fixed.
      
      Fix this by deferring completion of asynchronous direct writes to a
      process context in the case that there are mapped pages to be found in
      the inode. Later before the completion in dio_complete() invalidate
      the pages in question. This ensures that after the completion the pages
      in the written area are either unmapped, or populated with up-to-date
      data. Also do the same for the iomap case which uses
      iomap_dio_complete() instead.
      
      This has a side effect of deferring the completion to a process context
      for every AIO DIO that happens on inode that has pages mapped. However
      since the consensus is that this is ill-advised practice the performance
      implication should not be a problem.
      
      This was based on proposal from Jeff Moyer, thanks!
      
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: default avatarLukas Czerner <lczerner@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      332391a9
  20. Aug 23, 2017
    • Christoph Hellwig's avatar
      block: replace bi_bdev with a gendisk pointer and partitions index · 74d46992
      Christoph Hellwig authored
      
      This way we don't need a block_device structure to submit I/O.  The
      block_device has different life time rules from the gendisk and
      request_queue and is usually only available when the block device node
      is open.  Other callers need to explicitly create one (e.g. the lightnvm
      passthrough code, or the new nvme multipathing code).
      
      For the actual I/O path all that we need is the gendisk, which exists
      once per block device.  But given that the block layer also does
      partition remapping we additionally need a partition index, which is
      used for said remapping in generic_make_request.
      
      Note that all the block drivers generally want request_queue or
      sometimes the gendisk, so this removes a layer of indirection all
      over the stack.
      
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      74d46992
  21. Jun 27, 2017
  22. Jun 20, 2017
    • Goldwyn Rodrigues's avatar
      block: return on congested block device · 03a07c92
      Goldwyn Rodrigues authored
      
      A new bio operation flag REQ_NOWAIT is introduced to identify bio's
      orignating from iocb with IOCB_NOWAIT. This flag indicates
      to return immediately if a request cannot be made instead
      of retrying.
      
      Stacked devices such as md (the ones with make_request_fn hooks)
      currently are not supported because it may block for housekeeping.
      For example, an md can have a part of the device suspended.
      For this reason, only request based devices are supported.
      In the future, this feature will be expanded to stacked devices
      by teaching them how to handle the REQ_NOWAIT flags.
      
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGoldwyn Rodrigues <rgoldwyn@suse.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      03a07c92
  23. Jun 09, 2017
  24. Feb 28, 2017
  25. Jan 10, 2017
    • Chandan Rajendra's avatar
      do_direct_IO: Use inode->i_blkbits to compute block count to be cleaned · dd545b52
      Chandan Rajendra authored
      
      The code currently uses sdio->blkbits to compute the number of blocks to
      be cleaned. However sdio->blkbits is derived from the logical block size
      of the underlying block device (Refer to the definition of
      do_blockdev_direct_IO()). Due to this, generic/299 test would rarely
      fail when executed on an ext4 filesystem with 64k as the block size and
      when using a virtio based disk (having 512 byte as the logical block
      size) inside a kvm guest.
      
      This commit fixes the bug by using inode->i_blkbits to compute the
      number of blocks to be cleaned.
      
      Signed-off-by: default avatarChandan Rajendra <chandan@linux.vnet.ibm.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      
      Fixed up by Jeff Moyer to only use/evaluate inode->i_blkbits once,
      to avoid issues with block size changes with IO in flight.
      
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      dd545b52
  26. Nov 30, 2016
  27. Nov 11, 2016
  28. Nov 04, 2016
  29. Nov 01, 2016
  30. Oct 04, 2016
  31. Jun 07, 2016
  32. May 27, 2016
    • Eryu Guan's avatar
      direct-io: fix direct write stale data exposure from concurrent buffered read · 9ecd10b7
      Eryu Guan authored
      Currently direct writes inside i_size on a DIO_SKIP_HOLES filesystem are
      not allowed to allocate blocks(get_more_blocks() sets 'create' to 0
      before calling get_block() callback), if it's a sparse file, direct
      writes fall back to buffered writes to avoid stale data exposure from
      concurrent buffered read.  But there're two cases that can result in
      stale data exposure are not correctly detected.
      
      1. The detection for "writing inside i_size" is not sufficient,
         writes can be treated as "extending writes" wrongly.  For example,
         direct write 1FSB (file system block) to a 1FSB sparse file on
         ext2/3/4, starting from offset 0, in this case it's writing inside
         i_size, but 'create' is non-zero, because 'block_in_file' and
         '(i_size_read(inode) >> blkbits' are both zero.
      
      2. Direct writes starting from or beyong i_size (not inside i_size)
         also could trigger block allocation and expose stale data.  For
         example, consider a sparse file with i_size of 2k, and a write to
         offset 2k or 3k into the file, with a filesystem block size of 4k.
         (Thanks to Jeff Moyer for pointing this case out in his review.)
      
      The first problem can be demostrated by running ltp-aiodio test ADSP045
      many times.  When testing on extN filesystems, I see test failures
      occasionally, buffered read could read non-zero (stale) data.
      
      ADSP045: dio_sparse -a 4k -w 4k -s 2k -n 1
      
      dio_sparse    0  TINFO  :  Dirtying free blocks
      dio_sparse    0  TINFO  :  Starting I/O tests
      non zero buffer at buf[0] => 0xffffffaa,ffffffaa,ffffffaa,ffffffaa
      non-zero read at offset 0
      dio_sparse    0  TINFO  :  Killing childrens(s)
      dio_sparse    1  TFAIL  :  dio_sparse.c:191: 1 children(s) exited abnormally
      
      The second problem can also be reproduced easily by a hacked dio_sparse
      program, which accepts an option to specify the write offset.
      
      What we should really do is to disable block allocation for writes that
      could result in filling holes inside i_size.
      
      Link: http://lkml.kernel.org/r/1463156728-13357-1-git-send-email-guaneryu@gmail.com
      
      
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarEryu Guan <guaneryu@gmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9ecd10b7
  33. May 01, 2016
Loading