Commits · ebc41018d84bc526e9f873a80228aba24c9f9189 · Tobias Schramm / linux-pinebook-pro

Feb 12, 2019

blk-mq: insert rq with DONTPREP to hctx dispatch list when requeue · aef1897c

Jianchao Wang authored 6 years ago

When requeue, if RQF_DONTPREP, rq has contained some driver
specific data, so insert it to hctx dispatch list to avoid any
merge. Take scsi as example, here is the trace event log (no
io scheduler, because RQF_STARTED would prevent merging),

kworker/0:1H-339 [000] ...1 2037.209289: block_rq_insert: 8,0 R 4096 () 32768 + 8 [kworker/0:1H]
scsi_inert_test-1987 [000] .... 2037.220465: block_bio_queue: 8,0 R 32776 + 8 [scsi_inert_test]
scsi_inert_test-1987 [000] ...2 2037.220466: block_bio_backmerge: 8,0 R 32776 + 8 [scsi_inert_test]
kworker/0:1H-339 [000] .... 2047.220913: block_rq_issue: 8,0 R 8192 () 32768 + 16 [kworker/0:1H]
scsi_inert_test-1996 [000] ..s1 2047.221007: block_rq_complete: 8,0 R () 32768 + 8 [0]
scsi_inert_test-1996 [000] .Ns1 2047.221045: block_rq_requeue: 8,0 R () 32776 + 8 [0]
kworker/0:1H-339 [000] ...1 2047.221054: block_rq_insert: 8,0 R 4096 () 32776 + 8 [kworker/0:1H]
kworker/0:1H-339 [000] ...1 2047.221056: block_rq_issue: 8,0 R 4096 () 32776 + 8 [kworker/0:1H]
scsi_inert_test-1986 [000] ..s1 2047.221119: block_rq_complete: 8,0 R () 32776 + 8 [0]

(32768 + 8) was requeued by scsi_queue_insert and had RQF_DONTPREP.
Then it was merged with (32776 + 8) and issued. Due to RQF_DONTPREP,
the sdb only contained the part of (32768 + 8), then only that part
was completed. The lucky thing was that scsi_io_completion detected
it and requeued the remaining part. So we didn't get corrupted data.
However, the requeue of (32776 + 8) is not expected.

Suggested-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

aef1897c

Feb 08, 2019

blk-mq: remove duplicated definition of blk_mq_freeze_queue · 26984841

Liu Bo authored 6 years ago


As the prototype has been defined in "include/linux/blk-mq.h", the one
in "block/blk-mq.h" can be removed then.

Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

26984841

Blk-iolatency: warn on negative inflight IO counter · 391f552a

Liu Bo authored 6 years ago


This is to catch any unexpected negative value of inflight IO counter.

Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

391f552a

blk-iolatency: fix IO hang due to negative inflight counter · 8c772a9b

Liu Bo authored 6 years ago


Our test reported the following stack, and vmcore showed that
->inflight counter is -1.

[ffffc9003fcc38d0] __schedule at ffffffff8173d95d
[ffffc9003fcc3958] schedule at ffffffff8173de26
[ffffc9003fcc3970] io_schedule at ffffffff810bb6b6
[ffffc9003fcc3988] blkcg_iolatency_throttle at ffffffff813911cb
[ffffc9003fcc3a20] rq_qos_throttle at ffffffff813847f3
[ffffc9003fcc3a48] blk_mq_make_request at ffffffff8137468a
[ffffc9003fcc3b08] generic_make_request at ffffffff81368b49
[ffffc9003fcc3b68] submit_bio at ffffffff81368d7d
[ffffc9003fcc3bb8] ext4_io_submit at ffffffffa031be00 [ext4]
[ffffc9003fcc3c00] ext4_writepages at ffffffffa03163de [ext4]
[ffffc9003fcc3d68] do_writepages at ffffffff811c49ae
[ffffc9003fcc3d78] __filemap_fdatawrite_range at ffffffff811b6188
[ffffc9003fcc3e30] filemap_write_and_wait_range at ffffffff811b6301
[ffffc9003fcc3e60] ext4_sync_file at ffffffffa030cee8 [ext4]
[ffffc9003fcc3ea8] vfs_fsync_range at ffffffff8128594b
[ffffc9003fcc3ee8] do_fsync at ffffffff81285abd
[ffffc9003fcc3f18] sys_fsync at ffffffff81285d50
[ffffc9003fcc3f28] do_syscall_64 at ffffffff81003c04
[ffffc9003fcc3f50] entry_SYSCALL_64_after_swapgs at ffffffff81742b8e

The ->inflight counter may be negative (-1) if

1) blk-iolatency was disabled when the IO was issued,

2) blk-iolatency was enabled before this IO reached its endio,

3) the ->inflight counter is decreased from 0 to -1 in endio()

In fact the hang can be easily reproduced by the below script,

H=/sys/fs/cgroup/unified/
P=/sys/fs/cgroup/unified/test

echo "+io" > $H/cgroup.subtree_control
mkdir -p $P

echo $$ > $P/cgroup.procs

xfs_io -f -d -c "pwrite 0 4k" /dev/sdg

echo "`cat /sys/block/sdg/dev` target=1000000" > $P/io.latency

xfs_io -f -d -c "pwrite 0 4k" /dev/sdg

This fixes the problem by freezing the queue so that while
enabling/disabling iolatency, there is no inflight rq running.

Note that quiesce_queue is not needed as this only updating iolatency
configuration about which dispatching request_queue doesn't care.

Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

8c772a9b

Jan 31, 2019

blk-mq: protect debugfs_create_files() from failures · 36991ca6

Greg Kroah-Hartman authored 6 years ago

If debugfs were to return a non-NULL error for a debugfs call, using
that pointer later in debugfs_create_files() would crash.

Fix that by properly checking the pointer before referencing it.

Reported-by: Michal Hocko <mhocko@kernel.org>
Reported-and-tested-by: <syzbot+b382ba6a802a3d242790@syzkaller.appspotmail.com>
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

36991ca6

Jan 30, 2019

blk-mq: fix a hung issue when fsync · 85bd6e61

Jianchao Wang authored 6 years ago


Florian reported a io hung issue when fsync(). It should be
triggered by following race condition.

data + post flush         a flush

blk_flush_complete_seq
  case REQ_FSEQ_DATA
    blk_flush_queue_rq
    issued to driver      blk_mq_dispatch_rq_list
                            try to issue a flush req
                            failed due to NON-NCQ command
                            .queue_rq return BLK_STS_DEV_RESOURCE

request completion
  req->end_io // doesn't check RESTART
  mq_flush_data_end_io
    case REQ_FSEQ_POSTFLUSH
      blk_kick_flush
        do nothing because previous flush
        has not been completed
     blk_mq_run_hw_queue
                              insert rq to hctx->dispatch
                              due to RESTART is still set, do nothing

To fix this, replace the blk_mq_run_hw_queue in mq_flush_data_end_io
with blk_mq_sched_restart to check and clear the RESTART flag.

Fixes: bd166ef1 (blk-mq-sched: add framework for MQ capable IO schedulers)
Reported-by: Florian Stecker <m19@florianstecker.de>
Tested-by: Florian Stecker <m19@florianstecker.de>
Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

85bd6e61

block: pass no-op callback to INIT_WORK(). · 2e3c18d0

Tetsuo Handa authored 6 years ago

syzbot is hitting flush_work() warning caused by commit 4d43d395
("workqueue: Try to catch flush_work() without INIT_WORK().") [1].
Although that commit did not expect INIT_WORK(NULL) case, calling
flush_work() without setting a valid callback should be avoided anyway.
Fix this problem by setting a no-op callback instead of NULL.

[1] https://syzkaller.appspot.com/bug?id=e390366bc48bc82a7c668326e0663be3b91cbd29



Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-and-tested-by: syzbot <syzbot+ba2a929dcf8e704c180e@syzkaller.appspotmail.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

2e3c18d0

Jan 27, 2019

Revert "block: cover another queue enter recursion via BIO_QUEUE_ENTERED" · 947b7ac1

Jens Axboe authored 6 years ago


We can't touch a bio after ->make_request_fn(), for all we know it could
already have been completed by the time this function returns.

This reverts commit 698cef17.

Reported-by:  <syzbot+4df6ca820108fd248943@syzkaller.appspotmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

947b7ac1

Jan 24, 2019

blk-wbt: Declare local functions static · c83f536a

Bart Van Assche authored 6 years ago


This patch avoids that sparse reports the following warnings:

  CHECK   block/blk-wbt.c
block/blk-wbt.c:600:6: warning: symbol 'wbt_issue' was not declared. Should it be static?
block/blk-wbt.c:620:6: warning: symbol 'wbt_requeue' was not declared. Should it be static?
  CC      block/blk-wbt.o
block/blk-wbt.c:600:6: warning: no previous prototype for wbt_issue [-Wmissing-prototypes]
 void wbt_issue(struct rq_qos *rqos, struct request *rq)
      ^~~~~~~~~
block/blk-wbt.c:620:6: warning: no previous prototype for wbt_requeue [-Wmissing-prototypes]
 void wbt_requeue(struct rq_qos *rqos, struct request *rq)
      ^~~~~~~~~~~

Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

c83f536a

blk-mq: fix the cmd_flag_name array · 1c26010c

Jianchao Wang authored 6 years ago


Swap REQ_NOWAIT and REQ_NOUNMAP and add REQ_HIPRI.

Acked-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

1c26010c

Jan 22, 2019

block: cover another queue enter recursion via BIO_QUEUE_ENTERED · 698cef17

Ming Lei authored 6 years ago


Except for blk_queue_split(), bio_split() is used for splitting bio too,
then the remained bio is often resubmit to queue via generic_make_request().
So the same queue enter recursion exits in this case too. Unfortunatley
commit cd4a4ae4 doesn't help this case.

This patch covers the above case by setting BIO_QUEUE_ENTERED before calling
q->make_request_fn.

In theory the per-bio flag is used to simulate one stack variable, it is
just fine to clear it after q->make_request_fn is returned. Especially
the same bio can't be submitted from another context.

Fixes: cd4a4ae4 ("block: don't use blocking queue entered for recursive bio submits")
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: NeilBrown <neilb@suse.com>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

698cef17

Jan 18, 2019

block: Cleanup license notice · 38197ca1

Thomas Gleixner authored 6 years ago


Remove the imprecise and sloppy:

  "This files is licensed under the GPL."

license notice in the top level comment.

1) The file already contains a SPDX license identifier which clearly
   states that the license of the file is GPL V2 only

2) The notice resolves to GPL v1 or later for scanners which is just
   contrary to the intent of SPDX identifiers to provide clear and non
   ambiguous license information. Aside of that the value add of this
   notice is below zero,

Cc: Damien Le Moal <damien.lemoal@wdc.com>
Cc: Matias Bjorling <mb@lightnvm.io>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org
Fixes: 6a5ac984 ("block: Make struct request_queue smaller for CONFIG_BLK_DEV_ZONED=n")
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

38197ca1

Jan 16, 2019

block: don't lose track of REQ_INTEGRITY flag · 7809167d

Ming Lei authored 6 years ago

We need to pass bio->bi_opf after bio intergrity preparing, otherwise
the flag of REQ_INTEGRITY may not be set on the allocated request, then
breaks block integrity.

Fixes: f9afca4d ("blk-mq: pass in request/bio flags to queue mapping")
Cc: Hannes Reinecke <hare@suse.com>
Cc: Keith Busch <keith.busch@intel.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

7809167d

Jan 14, 2019

block, bfq: fix comments on __bfq_deactivate_entity · 5bf85908

Paolo Valente authored 6 years ago


Comments on function __bfq_deactivate_entity contains two imprecise or
wrong statements:
1) The function performs the deactivation of the entity.
2) The function must be invoked only if the entity is on a service tree.

This commits replaces both statements with the correct ones:
1) The functions updates sched_data and service trees for the entity,
so as to represent entity as inactive (which is only part of the steps
needed for the deactivation of the entity).
2) The function must be invoked on every entity being deactivated.

Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

5bf85908

Jan 09, 2019

block: fix kerneldoc comment for blk_attempt_plug_merge() · 649d4968

Jonathan Corbet authored 6 years ago


Commit 5f0ed774 ("block: sum requests in the plug structure") removed
the request_count parameter from block_attempt_plug_merge(), but did not
remove the associated kerneldoc comment, introducing this warning to the
docs build:

  ./block/blk-core.c:685: warning: Excess function parameter 'request_count' description in 'blk_attempt_plug_merge'

Remove the obsolete description and make things a little quieter.

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

649d4968

block: clarify documentation for blk_{start|finish}_plug · 40405851

Jeff Moyer authored 6 years ago


There was some confusion about what these functions did.  Make it clear
that this is a hint for upper layers to pass to the block layer, and
that it does not guarantee that I/O will not be submitted between a
start and finish plug.

Reported-by: "Darrick J. Wong" <darrick.wong@oracle.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

40405851

Dec 21, 2018

bsg: deprecate BIDI support in bsg · 2e5b2d7c

Christoph Hellwig authored 6 years ago

Besides the OSD command set that never got traction, the only SCSI
command using bidirectional buffers is XDWRITEREAD in the 10 and 32 byte
variants, which is extremely esoteric and has been removed from the spec
again as of SBC4r15. It probably doesn't make sense to keep the support
code around just for that, so start deprecating the support.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

2e5b2d7c

blkcg: remove unused __blkg_release_rcu() · 6b450535

Dennis Zhou authored 6 years ago


An earlier commit 7fcf2b03 ("blkcg: change blkg reference counting
to use percpu_ref") moved around the release call from blkg_put() to be
a part of the percpu_ref cleanup. Remove the additional unused code
which should have been removed earlier.

Signed-off-by: Dennis Zhou <dennis@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

6b450535

blkcg: clean up blkg_tryget_closest() · 6ab21879

Dennis Zhou authored 6 years ago

The implementation of blkg_tryget_closest() wasn't super obvious and
became a point of suspicion when debugging [1]. So let's clean it up so
it's obviously not the problem.

Also add missing RCU read locking to bio_clone_blkg_association(), which
got exposed by adding the RCU read lock held check in
blkg_tryget_closest().

[1] https://lore.kernel.org/linux-block/a7e97e4b-0dd8-3a54-23b7-a0f27b17fde8@kernel.dk/



Signed-off-by: Dennis Zhou <dennis@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

6ab21879

treewide: surround Kconfig file paths with double quotes · 8636a1f9

Masahiro Yamada authored 6 years ago


The Kconfig lexer supports special characters such as '.' and '/' in
the parameter context. In my understanding, the reason is just to
support bare file paths in the source statement.

I do not see a good reason to complicate Kconfig for the room of
ambiguity.

The majority of code already surrounds file paths with double quotes,
and it makes sense since file paths are constant string literals.

Make it treewide consistent now.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Wolfram Sang <wsa@the-dreams.de>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Ingo Molnar <mingo@kernel.org>

8636a1f9

Dec 20, 2018

kyber: use sbitmap add_wait_queue/list_del wait helpers · 00203ba4

Jens Axboe authored 6 years ago


sbq_wake_ptr() checks sbq->ws_active to know if it needs to loop
the wait indexes or not. This requires the use of the sbitmap
waitqueue wrappers, but kyber doesn't use those for its domain
token waitqueue handling.

Convert kyber to use the helpers. This fixes a hang with waiting
for domain tokens.

Fixes: 5d2ee712 ("sbitmap: optimize wakeup check")
Tested-by: Ming Lei <ming.lei@redhat.com>
Reported-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

00203ba4

Dec 19, 2018

block: save irq state in blkg_lookup_create() · 3a762de5

Ming Lei authored 6 years ago


blkg_lookup_create() may be called from pool_map() in which
irq state is saved, so we have to do that in blkg_lookup_create().

Otherwise, the following lockdep warning can be triggered:

[  104.258537] ================================
[  104.259129] WARNING: inconsistent lock state
[  104.259725] 4.20.0-rc6+ #545 Not tainted
[  104.260268] --------------------------------
[  104.260865] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
[  104.261727] swapper/49/0 [HC0[0]:SC1[1]:HE0:SE0] takes:
[  104.262444] 00000000db365b5d (&(&pool->lock)->rlock#3){+.?.}, at: thin_endio+0xcf/0x2a3 [dm_thin_pool]
[  104.263747] {SOFTIRQ-ON-W} state was registered at:
[  104.264417]   _raw_spin_unlock_irq+0x29/0x4c
[  104.265014]   blkg_lookup_create+0xdc/0xe6
[  104.265609]   bio_associate_blkg_from_css+0xd3/0x13f
[  104.266312]   bio_associate_blkg+0x15a/0x1bb
[  104.266913]   pool_map+0xe8/0x103 [dm_thin_pool]
[  104.267572]   __map_bio+0x98/0x29c [dm_mod]
[  104.268162]   __split_and_process_non_flush+0x29e/0x306 [dm_mod]
[  104.269003]   __split_and_process_bio+0x16a/0x25b [dm_mod]
[  104.269971]   __dm_make_request.isra.14+0xdc/0x124 [dm_mod]
[  104.270973]   generic_make_request+0x3f5/0x68b
[  104.271676]   process_prepared_mapping+0x166/0x1ef [dm_thin_pool]
[  104.272531]   schedule_zero+0x239/0x273 [dm_thin_pool]
[  104.273245]   process_cell+0x60c/0x6f1 [dm_thin_pool]
[  104.273967]   do_worker+0x60c/0xca8 [dm_thin_pool]
[  104.274635]   process_one_work+0x4eb/0x834
[  104.275203]   worker_thread+0x318/0x484
[  104.275740]   kthread+0x1d1/0x1e1
[  104.276203]   ret_from_fork+0x3a/0x50
[  104.276714] irq event stamp: 170003
[  104.277201] hardirqs last  enabled at (170002): [<ffffffff81bcc33e>] _raw_spin_unlock_irqrestore+0x44/0x6b
[  104.278535] hardirqs last disabled at (170003): [<ffffffff81bcc1ad>] _raw_spin_lock_irqsave+0x20/0x55
[  104.280273] softirqs last  enabled at (169978): [<ffffffff810d13d4>] irq_enter+0x4c/0x73
[  104.281617] softirqs last disabled at (169979): [<ffffffff810d1479>] irq_exit+0x7e/0x11d
[  104.282744]
[  104.282744] other info that might help us debug this:
[  104.283640]  Possible unsafe locking scenario:
[  104.283640]
[  104.284452]        CPU0
[  104.284803]        ----
[  104.285150]   lock(&(&pool->lock)->rlock#3);
[  104.285762]   <Interrupt>
[  104.286130]     lock(&(&pool->lock)->rlock#3);
[  104.286750]
[  104.286750]  *** DEADLOCK ***
[  104.286750]
[  104.287564] no locks held by swapper/49/0.
[  104.288129]
[  104.288129] stack backtrace:
[  104.288738] CPU: 49 PID: 0 Comm: swapper/49 Not tainted 4.20.0-rc6+ #545
[  104.289700] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-2.fc27 04/01/2014
[  104.290858] Call Trace:
[  104.291204]  <IRQ>
[  104.291502]  dump_stack+0x9a/0xe6
[  104.291968]  mark_lock+0x56c/0x7a6
[  104.292442]  ? check_usage_backwards+0x209/0x209
[  104.293086]  __lock_acquire+0x400/0x15bf
[  104.293662]  ? check_chain_key+0x150/0x1aa
[  104.294236]  lock_acquire+0x1a6/0x1e3
[  104.294768]  ? thin_endio+0xcf/0x2a3 [dm_thin_pool]
[  104.295444]  ? _raw_spin_unlock_irqrestore+0x44/0x6b
[  104.296143]  ? process_prepared_discard_fail+0x36/0x36 [dm_thin_pool]
[  104.297031]  _raw_spin_lock_irqsave+0x46/0x55
[  104.297659]  ? thin_endio+0xcf/0x2a3 [dm_thin_pool]
[  104.298335]  thin_endio+0xcf/0x2a3 [dm_thin_pool]
[  104.298997]  ? process_prepared_discard_fail+0x36/0x36 [dm_thin_pool]
[  104.299886]  ? check_flags+0x20a/0x20a
[  104.300408]  ? lock_acquire+0x1a6/0x1e3
[  104.300954]  ? process_prepared_discard_fail+0x36/0x36 [dm_thin_pool]
[  104.301865]  clone_endio+0x1bb/0x22d [dm_mod]
[  104.302491]  ? disable_write_zeroes+0x20/0x20 [dm_mod]
[  104.303200]  ? bio_disassociate_blkg+0xc6/0x15f
[  104.303836]  ? bio_endio+0x2b2/0x2da
[  104.304349]  clone_endio+0x1f3/0x22d [dm_mod]
[  104.304978]  ? disable_write_zeroes+0x20/0x20 [dm_mod]
[  104.305709]  ? bio_disassociate_blkg+0xc6/0x15f
[  104.306333]  ? bio_endio+0x2b2/0x2da
[  104.306853]  clone_endio+0x1f3/0x22d [dm_mod]
[  104.307476]  ? disable_write_zeroes+0x20/0x20 [dm_mod]
[  104.308185]  ? bio_disassociate_blkg+0xc6/0x15f
[  104.308817]  ? bio_endio+0x2b2/0x2da
[  104.309319]  blk_update_request+0x2de/0x4cc
[  104.309927]  blk_mq_end_request+0x2a/0x183
[  104.310498]  blk_done_softirq+0x16a/0x1a6
[  104.311051]  ? blk_softirq_cpu_dead+0xe2/0xe2
[  104.311653]  ? __lock_is_held+0x2a/0x87
[  104.312186]  __do_softirq+0x250/0x4e8
[  104.312705]  irq_exit+0x7e/0x11d
[  104.313157]  call_function_single_interrupt+0xf/0x20
[  104.313860]  </IRQ>
[  104.314163] RIP: 0010:native_safe_halt+0x2/0x3
[  104.314792] Code: 63 02 df f0 83 44 24 fc 00 48 89 df e8 cc 3f 7a ff 48 8b 03 a8 08 74 0b 65 81 25 9d 31 45 7e ff ff ff 7f 5b 5d 41 5c c3 fb f4 <c3> f4 c3 0f 1f 44 00 00 41 56 41 55 41 54 55 53 e8 a2 0d 5c ff e8
[  104.317339] RSP: 0018:ffff888106c9fdc0 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff04
[  104.318390] RAX: 1ffff11020d92100 RBX: 0000000000000000 RCX: ffffffff81159ac7
[  104.319366] RDX: 1ffffffff05d5e69 RSI: 0000000000000007 RDI: ffff888106c90d1c
[  104.320339] RBP: 0000000000000000 R08: dffffc0000000000 R09: 0000000000000001
[  104.321313] R10: ffffed1025d57ba0 R11: ffffed1025d57b9f R12: 1ffff11020d93fbf
[  104.322328] R13: 0000000000000031 R14: ffff888106c90040 R15: 0000000000000000
[  104.323307]  ? lockdep_hardirqs_on+0x26b/0x278
[  104.323927]  default_idle+0xd9/0x1a8
[  104.324427]  do_idle+0x162/0x2b2
[  104.324891]  ? arch_cpu_idle_exit+0x28/0x28
[  104.325467]  ? mark_held_locks+0x28/0x7f
[  104.326031]  ? _raw_spin_unlock_irqrestore+0x44/0x6b
[  104.326719]  cpu_startup_entry+0x1d/0x1f
[  104.327261]  start_secondary+0x2cb/0x308
[  104.327806]  ? set_cpu_sibling_map+0x8a3/0x8a3
[  104.328421]  secondary_startup_64+0xa4/0xb0

Fixes: b978962a ("blkcg: update blkg_lookup_create() to do locking")
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Dennis Zhou <dennis@kernel.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

3a762de5

scsi: block: remove the cluster flag · 38417468

Christoph Hellwig authored 6 years ago


Now that the the SCSI layer replaced the use of the cluster flag with
segment size limits and the DMA boundary we can remove the cluster flag
from the block layer.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

38417468

Dec 18, 2018

block: make request_to_qc_t public · 7b7ab780

Sagi Grimberg authored 6 years ago


block consumers will need it for polling requests that
are sent with blk_execute_rq_nowait. Also, get rid of
blk_tag_to_qc_t and open-code it instead.

Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>

7b7ab780

blk-mq: enable IO poll if .nr_queues of type poll > 0 · cd19181b

Ming Lei authored 6 years ago

The queue mapping of type poll only exists when set->map[HCTX_TYPE_POLL].nr_queues
is bigger than zero, so enhance the constraint by checking .nr_queues of type poll
before enabling IO poll.

Otherwise IO race & timeout can be observed when running block/007.

Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

cd19181b

blk-mq: change blk_mq_queue_busy() to blk_mq_queue_inflight() · 3c94d83c

Jens Axboe authored 6 years ago


There's a single user of this function, dm, and dm just wants
to check if IO is inflight, not that it's just allocated.

This fixes a hang with srp/002 in blktests with dm, where it tries
to suspend but waits for inflight IO to finish first. As it checks
for just allocated requests, this fails.

Tested-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

3c94d83c

Dec 17, 2018

blk-mq: skip zero-queue maps in blk_mq_map_swqueue · e5edd5f2

Ming Lei authored 6 years ago

From 7e849dd9 ("nvme-pci: don't share queue maps"), the mapping
table won't be initialized actually if map->nr_queues is zero, so
we can't use blk_mq_map_queue_type() to retrieve hctx any more.

This way still may cause broken mapping, fix it by skipping zero-queues
maps in blk_mq_map_swqueue().

Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

e5edd5f2

block: fix blk-iolatency accounting underflow · 13369816

Dennis Zhou authored 6 years ago

The blk-iolatency controller measures the time from rq_qos_throttle() to
rq_qos_done_bio() and attributes this time to the first bio that needs
to create the request. This means if a bio is plug-mergeable or
bio-mergeable, it gets to bypass the blk-iolatency controller.

The recent series [1], to tag all bios w/ blkgs undermined how iolatency
was determining which bios it was charging and should process in
rq_qos_done_bio(). Because all bios are being tagged, this caused the
atomic_t for the struct rq_wait inflight count to underflow and result
in a stall.

This patch adds a new flag BIO_TRACKED to let controllers know that a
bio is going through the rq_qos path. blk-iolatency now checks if this
flag is set to see if it should process the bio in rq_qos_done_bio().

Overloading BLK_QUEUE_ENTERED works, but makes the flag rules confusing.
BIO_THROTTLED was another candidate, but the flag is set for all bios
that have gone through blk-throttle code. Overloading a flag comes with
the burden of making sure that when either implementation changes, a
change in setting rules for one doesn't cause a bug in the other. So
here, we unfortunately opt for adding a new flag.

[1] https://lore.kernel.org/lkml/20181205171039.73066-1-dennis@kernel.org/

Fixes: 5cdf2e3f ("blkcg: associate blkg when associating a device")
Signed-off-by: Dennis Zhou <dennis@kernel.org>
Cc: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

13369816

blk-mq: fix dispatch from sw queue · c16d6b5a

Ming Lei authored 6 years ago


When a request is added to rq list of sw queue(ctx), the rq may be from
a different type of hctx, especially after multi queue mapping is
introduced.

So when dispach request from sw queue via blk_mq_flush_busy_ctxs() or
blk_mq_dequeue_from_ctx(), one request belonging to other queue type of
hctx can be dispatched to current hctx in case that read queue or poll
queue is enabled.

This patch fixes this issue by introducing per-queue-type list.

Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>

Changed by me to not use separately cacheline aligned lists, just
place them all in the same cacheline where we had just the one list
and lock before.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

c16d6b5a

block: mq-deadline: Fix write completion handling · 7211aef8

Damien Le Moal authored 6 years ago


For a zoned block device using mq-deadline, if a write request for a
zone is received while another write was already dispatched for the same
zone, dd_dispatch_request() will return NULL and the newly inserted
write request is kept in the scheduler queue waiting for the ongoing
zone write to complete. With this behavior, when no other request has
been dispatched, rq_list in blk_mq_sched_dispatch_requests() is empty
and blk_mq_sched_mark_restart_hctx() not called. This in turn leads to
__blk_mq_free_request() call of blk_mq_sched_restart() to not run the
queue when the already dispatched write request completes. The newly
dispatched request stays stuck in the scheduler queue until eventually
another request is submitted.

This problem does not affect SCSI disk as the SCSI stack handles queue
restart on request completion. However, this problem is can be triggered
the nullblk driver with zoned mode enabled.

Fix this by always requesting a queue restart in dd_dispatch_request()
if no request was dispatched while WRITE requests are queued.

Fixes: 5700f691 ("mq-deadline: Introduce zone locking support")
Cc: <stable@vger.kernel.org>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>

Add missing export of blk_mq_sched_restart()

Signed-off-by: Jens Axboe <axboe@kernel.dk>

7211aef8

blk-mq: only dispatch to non-defauly queue maps if they have queues · 5aceaeb2

Christoph Hellwig authored 6 years ago


We should check if a given queue map actually has queues enabled before
dispatching to it.  This allows drivers to not initialize optional but
not used map types, which subsequently will allow fixing problems with
queue map rebuilds for that case.

Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

5aceaeb2

blk-mq: export hctx->type in debugfs instead of sysfs · 346fc108

Ming Lei authored 6 years ago


Now we only export hctx->type via sysfs, and there isn't such info
in hctx entry under debugfs. We often use debugfs only to diagnose
queue mapping issue, so add the support in debugfs.

Queue mapping becomes a bit more complicated after multiple queue
mapping is supported, we may write blktest to verify if queue mapping
is valid based on blk-mq-debugfs.

Given not necessary to export hctx->type twice, so remove the export
from sysfs.

Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

346fc108

blk-mq: fix allocation for queue mapping table · 07b35eb5

Ming Lei authored 6 years ago

Type of each element in queue mapping table is 'unsigned int,
intead of 'struct blk_mq_queue_map)', so fix it.

Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

07b35eb5

blk-wbt: export internal state via debugfs · d19afebc

Ming Lei authored 6 years ago


This information is helpful to either investigate issues, or understand
wbt's internal behaviour.

Cc: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Omar Sandoval <osandov@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

d19afebc

blk-mq-debugfs: support rq_qos · cc56694f

Ming Lei authored 6 years ago

blk-mq-debugfs has been proved as very helpful for debug some
tough issues, such as IO hang.

We have seen blk-wbt related IO hang several times, even inside
Red Hat BZ, there is such report not sovled yet, so this patch
adds support debugfs on rq_qos.

Cc: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Omar Sandoval <osandov@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

cc56694f

Dec 16, 2018

block: clear REQ_HIPRI if polling is not supported · d04c406f

Christoph Hellwig authored 6 years ago


This prevents a HIPRI bio from being submitted through a stacking
driver that does not support polling and thus won't poll for I/O
completion.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

d04c406f

blk-mq: replace and kill blk_mq_request_issue_directly · d6a51a97

Jianchao Wang authored 6 years ago


Replace blk_mq_request_issue_directly with blk_mq_try_issue_directly
in blk_insert_cloned_request and kill it as nobody uses it any more.

Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

d6a51a97

blk-mq: issue directly with bypass 'false' in blk_mq_sched_insert_requests · 5b7a6f12

Jianchao Wang authored 6 years ago


It is not necessary to issue request directly with bypass 'true'
in blk_mq_sched_insert_requests and handle the non-issued requests
itself. Just set bypass to 'false' and let blk_mq_try_issue_directly
handle them totally. Remove the blk_rq_can_direct_dispatch check,
because blk_mq_try_issue_directly can handle it well.If request is
direct-issued unsuccessfully, insert the reset.

Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

5b7a6f12

blk-mq: refactor the code of issue request directly · 7f556a44

Jianchao Wang authored 6 years ago


Merge blk_mq_try_issue_directly and __blk_mq_try_issue_directly
into one interface to unify the interfaces to issue requests
directly. The merged interface takes over the requests totally,
it could insert, end or do nothing based on the return value of
.queue_rq and 'bypass' parameter. Then caller needn't any other
handling any more and then code could be cleaned up.

And also the commit c616cbee ( blk-mq: punt failed direct issue
to dispatch list ) always inserts requests to hctx dispatch list
whenever get a BLK_STS_RESOURCE or BLK_STS_DEV_RESOURCE, this is
overkill and will harm the merging. We just need to do that for
the requests that has been through .queue_rq. This patch also
could fix this.

Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

7f556a44

block: remove the bio_integrity_advance export · 4c9770c9
Christoph Hellwig authored 6 years ago
```
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
```
4c9770c9

Admin message