- Jun 29, 2018
-
-
Jens Axboe authored
Some devices have different queue limits depending on the type of IO. A classic case is SATA NCQ, where some commands can queue, but others cannot. If we have NCQ commands inflight and encounter a non-queueable command, the driver returns busy. Currently we attempt to dispatch more from the scheduler, if we were able to queue some commands. But for the case where we ended up stopping due to BUSY, we should not attempt to retrieve more from the scheduler. If we do, we can get into a situation where we attempt to queue a non-queueable command, get BUSY, then successfully retrieve more commands from that scheduler and queue those. This can repeat forever, starving the non-queuable command indefinitely. Fix this by NOT attempting to pull more commands from the scheduler, if we get a BUSY return. This should also be more optimal in terms of letting requests stay in the scheduler for as long as possible, if we get a BUSY due to the regular out-of-tags condition. Reviewed-by:
Omar Sandoval <osandov@fb.com> Reviewed-by:
Ming Lei <ming.lei@redhat.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Jun 28, 2018
-
-
Bart Van Assche authored
This patch avoids that removing a path controlled by the dm-mpath driver while mkfs is running triggers the following kernel bug: kernel BUG at block/blk-core.c:3347! invalid opcode: 0000 [#1] PREEMPT SMP KASAN CPU: 20 PID: 24369 Comm: mkfs.ext4 Not tainted 4.18.0-rc1-dbg+ #2 RIP: 0010:blk_end_request_all+0x68/0x70 Call Trace: <IRQ> dm_softirq_done+0x326/0x3d0 [dm_mod] blk_done_softirq+0x19b/0x1e0 __do_softirq+0x128/0x60d irq_exit+0x100/0x110 smp_call_function_single_interrupt+0x90/0x330 call_function_single_interrupt+0xf/0x20 </IRQ> Fixes: f9d03f96 ("block: improve handling of the magic discard payload") Reviewed-by:
Ming Lei <ming.lei@redhat.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Acked-by:
Mike Snitzer <snitzer@redhat.com> Signed-off-by:
Bart Van Assche <bart.vanassche@wdc.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Cc: <stable@vger.kernel.org> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Jun 23, 2018
-
-
Bart Van Assche authored
Make sure that RQF_TIMED_OUT is cleared when a request is reused after a block driver timeout handler has returned BLK_EH_DONE. Fixes: da661267 ("blk-mq: don't time out requests again that are in the timeout handler") Signed-off-by:
Bart Van Assche <bart.vanassche@wdc.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Jianchao Wang <jianchao.w.wang@oracle.com> Cc: Andrew Randrianasulu <randrianasulu@gmail.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Jun 20, 2018
-
-
Dan Carpenter authored
resp->num is the number of tokens in resp->tok[]. It gets set in response_parse(). So if n == resp->num then we're reading beyond the end of the data. Fixes: 455a7b23 ("block: Add Sed-opal library") Reviewed-by:
Scott Bauer <scott.bauer@intel.com> Tested-by:
Scott Bauer <scott.bauer@intel.com> Signed-off-by:
Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Dan Carpenter authored
If rq_state == ARRAY_SIZE() then we read one element beyond the end of the blk_mq_rq_state_name_array[] array. Fixes: ec6dcf63 ("blk-mq-debugfs: Show more request state information") Reviewed-by:
Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by:
Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Jun 19, 2018
-
-
Bart Van Assche authored
Commit 0ba99ca4 ("block: Add warning for bi_next not NULL in bio_endio()") breaks the dm driver. end_clone_bio() detects whether or not a bio is the last bio associated with a request by checking the .bi_next field. Commit 0ba99ca4 clears that field before end_clone_bio() has had a chance to inspect that field. Hence revert commit 0ba99ca4. This patch avoids that KASAN reports the following complaint when running the srp-test software (srp-test/run_tests -c -d -r 10 -t 02-mq): ================================================================== BUG: KASAN: use-after-free in bio_advance+0x11b/0x1d0 Read of size 4 at addr ffff8801300e06d0 by task ksoftirqd/0/9 CPU: 0 PID: 9 Comm: ksoftirqd/0 Not tainted 4.18.0-rc1-dbg+ #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014 Call Trace: dump_stack+0xa4/0xf5 print_address_description+0x6f/0x270 kasan_report+0x241/0x360 __asan_load4+0x78/0x80 bio_advance+0x11b/0x1d0 blk_update_request+0xa7/0x5b0 scsi_end_request+0x56/0x320 [scsi_mod] scsi_io_completion+0x7d6/0xb20 [scsi_mod] scsi_finish_command+0x1c0/0x280 [scsi_mod] scsi_softirq_done+0x19a/0x230 [scsi_mod] blk_mq_complete_request+0x160/0x240 scsi_mq_done+0x50/0x1a0 [scsi_mod] srp_recv_done+0x515/0x1330 [ib_srp] __ib_process_cq+0xa0/0xf0 [ib_core] ib_poll_handler+0x38/0xa0 [ib_core] irq_poll_softirq+0xe8/0x1f0 __do_softirq+0x128/0x60d run_ksoftirqd+0x3f/0x60 smpboot_thread_fn+0x352/0x460 kthread+0x1c1/0x1e0 ret_from_fork+0x24/0x30 Allocated by task 1918: save_stack+0x43/0xd0 kasan_kmalloc+0xad/0xe0 kasan_slab_alloc+0x11/0x20 kmem_cache_alloc+0xfe/0x350 mempool_alloc_slab+0x15/0x20 mempool_alloc+0xfb/0x270 bio_alloc_bioset+0x244/0x350 submit_bh_wbc+0x9c/0x2f0 __block_write_full_page+0x299/0x5a0 block_write_full_page+0x16b/0x180 blkdev_writepage+0x18/0x20 __writepage+0x42/0x80 write_cache_pages+0x376/0x8a0 generic_writepages+0xbe/0x110 blkdev_writepages+0xe/0x10 do_writepages+0x9b/0x180 __filemap_fdatawrite_range+0x178/0x1c0 file_write_and_wait_range+0x59/0xc0 blkdev_fsync+0x46/0x80 vfs_fsync_range+0x66/0x100 do_fsync+0x3d/0x70 __x64_sys_fsync+0x21/0x30 do_syscall_64+0x77/0x230 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 9: save_stack+0x43/0xd0 __kasan_slab_free+0x137/0x190 kasan_slab_free+0xe/0x10 kmem_cache_free+0xd3/0x380 mempool_free_slab+0x17/0x20 mempool_free+0x63/0x160 bio_free+0x81/0xa0 bio_put+0x59/0x60 end_bio_bh_io_sync+0x5d/0x70 bio_endio+0x1a7/0x360 blk_update_request+0xd0/0x5b0 end_clone_bio+0xa3/0xd0 [dm_mod] bio_endio+0x1a7/0x360 blk_update_request+0xd0/0x5b0 scsi_end_request+0x56/0x320 [scsi_mod] scsi_io_completion+0x7d6/0xb20 [scsi_mod] scsi_finish_command+0x1c0/0x280 [scsi_mod] scsi_softirq_done+0x19a/0x230 [scsi_mod] blk_mq_complete_request+0x160/0x240 scsi_mq_done+0x50/0x1a0 [scsi_mod] srp_recv_done+0x515/0x1330 [ib_srp] __ib_process_cq+0xa0/0xf0 [ib_core] ib_poll_handler+0x38/0xa0 [ib_core] irq_poll_softirq+0xe8/0x1f0 __do_softirq+0x128/0x60d The buggy address belongs to the object at ffff8801300e0640 which belongs to the cache bio-0 of size 200 The buggy address is located 144 bytes inside of 200-byte region [ffff8801300e0640, ffff8801300e0708) The buggy address belongs to the page: page:ffffea0004c03800 count:1 mapcount:0 mapping:ffff88015a563a00 index:0x0 compound_mapcount: 0 flags: 0x8000000000008100(slab|head) raw: 8000000000008100 dead000000000100 dead000000000200 ffff88015a563a00 raw: 0000000000000000 0000000000330033 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff8801300e0580: fb fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc ffff8801300e0600: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb >ffff8801300e0680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff8801300e0700: fb fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff8801300e0780: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ================================================================== Cc: Kent Overstreet <kent.overstreet@gmail.com> Fixes: 0ba99ca4 ("block: Add warning for bi_next not NULL in bio_endio()") Acked-by:
Mike Snitzer <snitzer@redhat.com> Signed-off-by:
Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
blk_mq_complete_request can only be called for blk-mq drivers, but when removing the BLK_EH_HANDLED return value, two legacy request timeout methods incorrectly got switched to call blk_mq_complete_request. Call __blk_complete_request instead to reinstance the previous behavior. For that __blk_complete_request needs to be exported. Fixes: 1fc2b62e ("scsi_transport_fc: complete requests from ->timeout") Fixes: 0df0bb08 ("null_blk: complete requests from ->timeout") Reported-by:
Jianchao Wang <jianchao.w.wang@oracle.com> Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Jun 15, 2018
-
-
Anatoliy Glagolev authored
The existing implementation allows races between bsg_unregister and bsg_open paths. bsg_unregister and request_queue cleanup and deletion may start and complete right after bsg_get_device (in bsg_open path) retrieves bsg_class_device and releases the mutex. Then bsg_open path touches freed memory of bsg_class_device and request_queue. One possible fix is to hold the mutex all the way through bsg_get_device instead of releasing it after bsg_class_device retrieval. Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-Off-By:
Anatoliy Glagolev <glagolig@gmail.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
This function is entirely unused, so remove it and the tag_queue_busy member of struct request_queue. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Jun 14, 2018
-
-
Christoph Hellwig authored
Unused now that nvme stopped using it. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
We can currently call the timeout handler again on a request that has already been handed over to the timeout handler. Prevent that with a new flag. Fixes: 12f5b931 ("blk-mq: Remove generation seqeunce") Reported-by:
Andrew Randrianasulu <randrianasulu@gmail.com> Tested-by:
Andrew Randrianasulu <randrianasulu@gmail.com> Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Jun 11, 2018
-
-
Roman Pen authored
It is not allowed to reinit q->tag_set_list list entry while RCU grace period has not completed yet, otherwise the following soft lockup in blk_mq_sched_restart() happens: [ 1064.252652] watchdog: BUG: soft lockup - CPU#12 stuck for 23s! [fio:9270] [ 1064.254445] task: ffff99b912e8b900 task.stack: ffffa6d54c758000 [ 1064.254613] RIP: 0010:blk_mq_sched_restart+0x96/0x150 [ 1064.256510] Call Trace: [ 1064.256664] <IRQ> [ 1064.256824] blk_mq_free_request+0xea/0x100 [ 1064.256987] msg_io_conf+0x59/0xd0 [ibnbd_client] [ 1064.257175] complete_rdma_req+0xf2/0x230 [ibtrs_client] [ 1064.257340] ? ibtrs_post_recv_empty+0x4d/0x70 [ibtrs_core] [ 1064.257502] ibtrs_clt_rdma_done+0xd1/0x1e0 [ibtrs_client] [ 1064.257669] ib_create_qp+0x321/0x380 [ib_core] [ 1064.257841] ib_process_cq_direct+0xbd/0x120 [ib_core] [ 1064.258007] irq_poll_softirq+0xb7/0xe0 [ 1064.258165] __do_softirq+0x106/0x2a2 [ 1064.258328] irq_exit+0x92/0xa0 [ 1064.258509] do_IRQ+0x4a/0xd0 [ 1064.258660] common_interrupt+0x7a/0x7a [ 1064.258818] </IRQ> Meanwhile another context frees other queue but with the same set of shared tags: [ 1288.201183] INFO: task bash:5910 blocked for more than 180 seconds. [ 1288.201833] bash D 0 5910 5820 0x00000000 [ 1288.202016] Call Trace: [ 1288.202315] schedule+0x32/0x80 [ 1288.202462] schedule_timeout+0x1e5/0x380 [ 1288.203838] wait_for_completion+0xb0/0x120 [ 1288.204137] __wait_rcu_gp+0x125/0x160 [ 1288.204287] synchronize_sched+0x6e/0x80 [ 1288.204770] blk_mq_free_queue+0x74/0xe0 [ 1288.204922] blk_cleanup_queue+0xc7/0x110 [ 1288.205073] ibnbd_clt_unmap_device+0x1bc/0x280 [ibnbd_client] [ 1288.205389] ibnbd_clt_unmap_dev_store+0x169/0x1f0 [ibnbd_client] [ 1288.205548] kernfs_fop_write+0x109/0x180 [ 1288.206328] vfs_write+0xb3/0x1a0 [ 1288.206476] SyS_write+0x52/0xc0 [ 1288.206624] do_syscall_64+0x68/0x1d0 [ 1288.206774] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 What happened is the following: 1. There are several MQ queues with shared tags. 2. One queue is about to be freed and now task is in blk_mq_del_queue_tag_set(). 3. Other CPU is in blk_mq_sched_restart() and loops over all queues in tag list in order to find hctx to restart. Because linked list entry was modified in blk_mq_del_queue_tag_set() without proper waiting for a grace period, blk_mq_sched_restart() never ends, spining in list_for_each_entry_rcu_rr(), thus soft lockup. Fix is simple: reinit list entry after an RCU grace period elapsed. Fixes: Fixes: 705cda97 ("blk-mq: Make it safe to use RCU to iterate over blk_mq_tag_set.tag_list") Cc: stable@vger.kernel.org Cc: Sagi Grimberg <sagi@grimberg.me> Cc: linux-block@vger.kernel.org Reviewed-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Ming Lei <ming.lei@redhat.com> Reviewed-by:
Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by:
Roman Pen <roman.penyaev@profitbricks.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Jun 09, 2018
-
-
Jens Axboe authored
A recent commit reused the original request flags for the flush queue handling. However, for some of the kick flush cases, the original request was already completed. This caused a use after free, if blk-mq wasn't used. Fixes: 84fca1b0 ("block: pass failfast and driver-specific flags to flush requests") Reported-by:
Dmitry Vyukov <dvyukov@google.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Jun 08, 2018
-
-
Jens Axboe authored
Add a helper that allows a caller to initialize a new bio_set, using the settings from an existing bio_set. Reported-by:
Venkat R.B <vrbagal1@linux.vnet.ibm.com> Tested-by:
Venkat R.B <vrbagal1@linux.vnet.ibm.com> Tested-by:
Li Wang <liwang@redhat.com> Reviewed-by:
Mike Snitzer <snitzer@redhat.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Jun 07, 2018
-
-
Hannes Reinecke authored
blk_partition_remap() will only clear bi_partno if an actual remapping has happened. But flush request et al don't have an actual size, so the remapping doesn't happen and bi_partno is never cleared. So for stacked devices blk_partition_remap() will be called on each level. If (as is the case for native nvme multipathing) one of the lower-level devices do _not_support partitioning a spurious I/O error is generated. Reviewed-by:
Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by:
Sagi Grimberg <sagi@grimberg.me> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Hannes Reinecke <hare@suse.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Jun 06, 2018
-
-
Hannes Reinecke authored
If flush requests are being sent to the device we need to inherit the failfast and driver-specific flags, too, otherwise I/O will fail. Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Hannes Reinecke <hare@suse.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Jun 05, 2018
-
-
Wang YanQing authored
I meet strange filesystem corruption issue recently, the reason is there are overlaps partitions in cmdline partition argument. This patch add verifier for cmdline partition, then if there are overlaps partitions, cmdline_partition will log a warning. We don't treat overlaps partition as a error: " Caizhiyong <caizhiyong@hisilicon.com> said: Partition overlap was intentionally designed in this cmdline partition. reference http://lists.infradead.org/pipermail/linux-mtd/2013-August/048092.html " Signed-off-by:
Wang YanQing <udknight@gmail.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Jun 04, 2018
-
-
Jianchao Wang authored
If a hardware queue is stopped, it should not be run again before explicitly started. Ignore stopped queues in blk_mq_run_work_fn(), fixing a regression recently introduced when the START_ON_RUN bit was removed. Fixes: 15fe8a90 ("blk-mq: remove blk_mq_delay_queue()") Reviewed-by:
Ming Lei <ming.lei@redhat.com> Reviewed-by:
Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by:
Jianchao Wang <jianchao.w.wang@oracle.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Jun 03, 2018
-
-
Ming Lei authored
Now we setup q->nr_requests when switching to one new scheduler, but not do it for 'none', then q->nr_requests may not be correct for 'none'. This patch fixes this issue by always updating 'nr_requests' when switching to 'none'. Cc: Marco Patalano <mpatalan@redhat.com> Cc: "Ewan D. Milne" <emilne@redhat.com> Signed-off-by:
Ming Lei <ming.lei@redhat.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
If we end up splitting a bio and the queue goes away between the initial submission and the later split submission, then we can block forever in blk_queue_enter() waiting for the reference to drop to zero. This will never happen, since we already hold a reference. Mark a split bio as already having entered the queue, so we can just use the live non-blocking queue enter variant. Thanks to Tetsuo Handa for the analysis. Reported-by:
<syzbot+c4f9cebf9d651f6e54de@syzkaller.appspotmail.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Jun 02, 2018
-
-
Christoph Hellwig authored
For the upcoming removal of buffer heads in XFS we need to keep track of the number of outstanding writeback requests per page. For this we need to know if bio_add_page merged a region with the previous bvec or not. Instead of adding additional arguments this refactors bio_add_page to be implemented using three lower level helpers which users like XFS can use directly if they care about the merge decisions. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Jens Axboe <axboe@kernel.dk> Reviewed-by:
Ming Lei <ming.lei@redhat.com> Reviewed-by:
Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by:
Darrick J. Wong <darrick.wong@oracle.com>
-
- Jun 01, 2018
-
-
Christoph Hellwig authored
There is almost no shared logic, which leads to a very confusing code flow. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Damien Le Moal <damien.lemoal@wdc.com> Tested-by:
Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Both callers take just around so function call, so move it in. Also remove the now pointless blk_mq_sched_init wrapper. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Damien Le Moal <damien.lemoal@wdc.com> Tested-by:
Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Reported-by:
Damien Le Moal <Damien.LeMoal@wdc.com> Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Damien Le Moal <damien.lemoal@wdc.com> Tested-by:
Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
These are only used by the block core. Also move the declarations to block/blk.h. Reported-by:
Damien Le Moal <Damien.LeMoal@wdc.com> Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Damien Le Moal <damien.lemoal@wdc.com> Tested-by:
Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
No point in doing this in elevator_init. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reported-by:
Damien Le Moal <Damien.LeMoal@wdc.com> Reviewed-by:
Damien Le Moal <damien.lemoal@wdc.com> Tested-by:
Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- May 31, 2018
-
-
Davide Sapienza authored
BFQ can deem a bfq_queue as soft real-time only if the queue - periodically becomes completely idle, i.e., empty and with no still-outstanding I/O request; - after becoming idle, gets new I/O only after a special reference time soft_rt_next_start. In this respect, after commit "block, bfq: consider also past I/O in soft real-time detection", the value of soft_rt_next_start can never decrease. This causes a problem with the following special updating case for soft_rt_next_start: to prevent queues that are not completely idle to be wrongly detected as soft real-time (when they become non-empty again), soft_rt_next_start is temporarily set to infinity for empty queues with still outstanding I/O requests. But, if such an update is actually performed, then, because of the above commit, soft_rt_next_start will be stuck at infinity forever, and the queue will have no more chance to be considered soft real-time. On slow systems, this problem does cause actual soft real-time applications to be occasionally not detected as such. This commit addresses this issue by eliminating the pushing of soft_rt_next_start to infinity, and by changing the way non-empty queues are prevented from being wrongly detected as soft real-time. Simply, a queue that becomes non-empty again can now be detected as soft real-time only if it has no outstanding I/O request. Signed-off-by:
Davide Sapienza <sapienza.dav@gmail.com> Signed-off-by:
Paolo Valente <paolo.valente@linaro.org> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Davide Sapienza authored
The maximum possible duration of the weight-raising period for interactive applications is limited to 13 seconds, as this is the time needed to load the largest application that we considered when tuning weight raising. Unfortunately, in such an evaluation, we did not consider the case of very slow virtual machines. For example, on a QEMU/KVM virtual machine - running in a slow PC; - with a virtual disk stacked on a slow low-end 5400rpm HDD; - serving a heavy I/O workload, such as the sequential reading of several files; mplayer takes 23 seconds to start, if constantly weight-raised. To address this issue, this commit conservatively sets the upper limit for weight-raising duration to 25 seconds. Signed-off-by:
Davide Sapienza <sapienza.dav@gmail.com> Signed-off-by:
Paolo Valente <paolo.valente@linaro.org> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Paolo Valente authored
BFQ computes the duration of weight raising for interactive applications automatically, using some reference parameters. In particular, BFQ uses the best durations (see comments in the code for how these durations have been assessed) for two classes of systems: slow and fast ones. Examples of slow systems are old phones or systems using micro HDDs. Fast systems are all the remaining ones. Using these parameters, BFQ computes the actual duration of the weight raising, for the system at hand, as a function of the relative speed of the system w.r.t. the speed of a reference system, belonging to the same class of systems as the system at hand. This slow vs fast differentiation proved to be useful in the past, but happens to have little meaning with current hardware. Even worse, it does cause problems in virtual systems, where the speed of the system can vary frequently, and so widely to just confuse the class-detection mechanism, and, as we have verified experimentally, to cause BFQ to compute non-sensical weight-raising durations. This commit addresses this issue by removing the slow class and the class-detection mechanism. Signed-off-by:
Paolo Valente <paolo.valente@linaro.org> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Paolo Valente authored
A description of how weight raising works is missing in BFQ sources. In addition, the code for handling weight raising is scattered across a few functions. This makes it rather hard to understand the mechanism and its rationale. This commits adds such a description at the beginning of the main source file. Signed-off-by:
Paolo Valente <paolo.valente@linaro.org> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Adam Manzanares authored
Aio per command iopriority support introduces a second interface between userland and the kernel capable of passing iopriority. The aio interface also needs the ability to verify that the submitting context has sufficient privileges to submit IOPRIO_RT commands. This patch creates the ioprio_check_cap function to be used by the ioprio_set system call and also by the aio interface. Signed-off-by:
Adam Manzanares <adam.manzanares@wdc.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Jeff Moyer <jmoyer@redhat.com> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Filippo Muzzini authored
Since bfq_finish_request() is always called on the request 'next', after bfq_requests_merged() is finished, and bfq_finish_request() removes 'next' from its bfq_queue if needed, it isn't necessary to do such a removal in advance in bfq_merged_requests(). This commit removes such a useless 'next' removal. Signed-off-by:
Filippo Muzzini <filippo.muzzini@outlook.it> Signed-off-by:
Paolo Valente <paolo.valente@linaro.org> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Paolo Valente authored
The request rq passed to the function bfq_requests_merged is always in a bfq_queue, so the check !RB_EMPTY_NODE(&rq->rb_node) at the beginning of bfq_requests_merged always succeeds, and the control flow systematically skips to the end of the function. This implies that the body of the function is never executed, i.e., the repositioning of rq is never performed. On the opposite end, a control is missing in the body of the function: 'next' must be removed only if it is inside a bfq_queue. This commit removes the wrong check on rq, and adds the missing check on 'next'. In addition, this commit adds comments on bfq_requests_merged. Signed-off-by:
Filippo Muzzini <filippo.muzzini@outlook.it> Signed-off-by:
Paolo Valente <paolo.valente@linaro.org> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Filippo Muzzini authored
In bfq_requests_merged(), there is a deadlock because the lock on bfqq->bfqd->lock is held by the calling function, but the code of this function tries to grab the lock again. This deadlock is currently hidden by another bug (fixed by next commit for this source file), which causes the body of bfq_requests_merged() to be never executed. This commit removes the deadlock by removing the lock/unlock pair. Signed-off-by:
Filippo Muzzini <filippo.muzzini@outlook.it> Signed-off-by:
Paolo Valente <paolo.valente@linaro.org> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
Missed converting the bioset_integrity_create() bounce bio set call. Fixes: 338aa96d ("block: convert bounce, q->bio_split to bioset_init()/mempool_init()") Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- May 30, 2018
-
-
Kent Overstreet authored
All users have been converted to bioset_init(), kill off the old API. Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Kent Overstreet authored
Convert the core block functionality to embedded bio sets. Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Chengguang Xu authored
Change to return true/false only for bool type return code. Signed-off-by:
Chengguang Xu <cgxu519@gmx.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
We already check for started commands in all callbacks, but we should also protect against already completed commands. Do this by taking the checks to common code. Acked-by:
Josef Bacik <josef@toxicpanda.com> Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Liu Bo authored
tg in throtl_select_dispatch is used first and then do check. Since tg may be NULL, it has potential NULL pointer dereference risk. So fix it. Signed-off-by:
Joseph Qi <joseph.qi@linux.alibaba.com> Signed-off-by:
Liu Bo <bo.liu@linux.alibaba.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-