- May 16, 2019
-
-
Jackie Liu authored
Use the new struct_size() helper to keep code simple. Reviewed-by:
Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by:
Jackie Liu <liuyun01@kylinos.cn> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- May 04, 2019
-
-
Ming Lei authored
Now freeing hw queue resource is moved to hctx's release handler, we don't need to worry about the race between blk_cleanup_queue and run queue any more. So don't drain in-progress dispatch in blk_cleanup_queue(). This is basically revert of c2856ae2 ("blk-mq: quiesce queue before freeing queue"). Cc: Dongli Zhang <dongli.zhang@oracle.com> Cc: James Smart <james.smart@broadcom.com> Cc: Bart Van Assche <bart.vanassche@wdc.com> Cc: linux-scsi@vger.kernel.org, Cc: Martin K . Petersen <martin.petersen@oracle.com>, Cc: Christoph Hellwig <hch@lst.de>, Cc: James E . J . Bottomley <jejb@linux.vnet.ibm.com>, Reviewed-by:
Bart Van Assche <bvanassche@acm.org> Reviewed-by:
Hannes Reinecke <hare@suse.com> Tested-by:
James Smart <james.smart@broadcom.com> Signed-off-by:
Ming Lei <ming.lei@redhat.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Ming Lei authored
hctx is always released after requeue is freed. With holding queue's kobject refcount, it is safe for driver to run queue, so one run queue might be scheduled after blk_sync_queue() is done. So moving the cancel of hctx->run_work into blk_mq_hw_sysfs_release() for avoiding run released queue. Cc: Dongli Zhang <dongli.zhang@oracle.com> Cc: James Smart <james.smart@broadcom.com> Cc: Bart Van Assche <bart.vanassche@wdc.com> Cc: linux-scsi@vger.kernel.org, Cc: Martin K . Petersen <martin.petersen@oracle.com>, Cc: Christoph Hellwig <hch@lst.de>, Cc: James E . J . Bottomley <jejb@linux.vnet.ibm.com>, Reviewed-by:
Bart Van Assche <bvanassche@acm.org> Reviewed-by:
Hannes Reinecke <hare@suse.com> Tested-by:
James Smart <james.smart@broadcom.com> Signed-off-by:
Ming Lei <ming.lei@redhat.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Ming Lei authored
In normal queue cleanup path, hctx is released after request queue is freed, see blk_mq_release(). However, in __blk_mq_update_nr_hw_queues(), hctx may be freed because of hw queues shrinking. This way is easy to cause use-after-free, because: one implicit rule is that it is safe to call almost all block layer APIs if the request queue is alive; and one hctx may be retrieved by one API, then the hctx can be freed by blk_mq_update_nr_hw_queues(); finally use-after-free is triggered. Fixes this issue by always freeing hctx after releasing request queue. If some hctxs are removed in blk_mq_update_nr_hw_queues(), introduce a per-queue list to hold them, then try to resuse these hctxs if numa node is matched. Cc: Dongli Zhang <dongli.zhang@oracle.com> Cc: James Smart <james.smart@broadcom.com> Cc: Bart Van Assche <bart.vanassche@wdc.com> Cc: linux-scsi@vger.kernel.org, Cc: Martin K . Petersen <martin.petersen@oracle.com>, Cc: Christoph Hellwig <hch@lst.de>, Cc: James E . J . Bottomley <jejb@linux.vnet.ibm.com>, Reviewed-by:
Hannes Reinecke <hare@suse.com> Tested-by:
James Smart <james.smart@broadcom.com> Signed-off-by:
Ming Lei <ming.lei@redhat.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Ming Lei authored
Split blk_mq_alloc_and_init_hctx into two parts, and one is blk_mq_alloc_hctx() for allocating all hctx resources, another is blk_mq_init_hctx() for initializing hctx, which serves as counter-part of blk_mq_exit_hctx(). Cc: Dongli Zhang <dongli.zhang@oracle.com> Cc: James Smart <james.smart@broadcom.com> Cc: Bart Van Assche <bart.vanassche@wdc.com> Cc: linux-scsi@vger.kernel.org Cc: Martin K . Petersen <martin.petersen@oracle.com> Cc: Christoph Hellwig <hch@lst.de> Cc: James E . J . Bottomley <jejb@linux.vnet.ibm.com> Reviewed-by:
Hannes Reinecke <hare@suse.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Tested-by:
James Smart <james.smart@broadcom.com> Signed-off-by:
Ming Lei <ming.lei@redhat.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Ming Lei authored
Once blk_cleanup_queue() returns, tags shouldn't be used any more, because blk_mq_free_tag_set() may be called. Commit 45a9c9d9 ("blk-mq: Fix a use-after-free") fixes this issue exactly. However, that commit introduces another issue. Before 45a9c9d9, we are allowed to run queue during cleaning up queue if the queue's kobj refcount is held. After that commit, queue can't be run during queue cleaning up, otherwise oops can be triggered easily because some fields of hctx are freed by blk_mq_free_queue() in blk_cleanup_queue(). We have invented ways for addressing this kind of issue before, such as: 8dc765d4 ("SCSI: fix queue cleanup race before queue initialization is done") c2856ae2 ("blk-mq: quiesce queue before freeing queue") But still can't cover all cases, recently James reports another such kind of issue: https://marc.info/?l=linux-scsi&m=155389088124782&w=2 This issue can be quite hard to address by previous way, given scsi_run_queue() may run requeues for other LUNs. Fixes the above issue by freeing hctx's resources in its release handler, and this way is safe becasue tags isn't needed for freeing such hctx resource. This approach follows typical design pattern wrt. kobject's release handler. Cc: Dongli Zhang <dongli.zhang@oracle.com> Cc: James Smart <james.smart@broadcom.com> Cc: Bart Van Assche <bart.vanassche@wdc.com> Cc: linux-scsi@vger.kernel.org, Cc: Martin K . Petersen <martin.petersen@oracle.com>, Cc: Christoph Hellwig <hch@lst.de>, Cc: James E . J . Bottomley <jejb@linux.vnet.ibm.com>, Reported-by:
James Smart <james.smart@broadcom.com> Fixes: 45a9c9d9 ("blk-mq: Fix a use-after-free") Cc: stable@vger.kernel.org Reviewed-by:
Hannes Reinecke <hare@suse.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Tested-by:
James Smart <james.smart@broadcom.com> Signed-off-by:
Ming Lei <ming.lei@redhat.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Ming Lei authored
With holding queue's kobject refcount, it is safe for driver to schedule requeue. However, blk_mq_kick_requeue_list() may be called after blk_sync_queue() is done because of concurrent requeue activities, then requeue work may not be completed when freeing queue, and kernel oops is triggered. So moving the cancel of requeue_work into blk_mq_release() for avoiding race between requeue and freeing queue. Cc: Dongli Zhang <dongli.zhang@oracle.com> Cc: James Smart <james.smart@broadcom.com> Cc: Bart Van Assche <bart.vanassche@wdc.com> Cc: linux-scsi@vger.kernel.org, Cc: Martin K . Petersen <martin.petersen@oracle.com>, Cc: Christoph Hellwig <hch@lst.de>, Cc: James E . J . Bottomley <jejb@linux.vnet.ibm.com>, Reviewed-by:
Bart Van Assche <bvanassche@acm.org> Reviewed-by:
Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by:
Hannes Reinecke <hare@suse.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Tested-by:
James Smart <james.smart@broadcom.com> Signed-off-by:
Ming Lei <ming.lei@redhat.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Ming Lei authored
Just like aio/io_uring, we need to grab 2 refcount for queuing one request, one is for submission, another is for completion. If the request isn't queued from plug code path, the refcount grabbed in generic_make_request() serves for submission. In theroy, this refcount should have been released after the sumission(async run queue) is done. blk_freeze_queue() works with blk_sync_queue() together for avoiding race between cleanup queue and IO submission, given async run queue activities are canceled because hctx->run_work is scheduled with the refcount held, so it is fine to not hold the refcount when running the run queue work function for dispatch IO. However, if request is staggered into plug list, and finally queued from plug code path, the refcount in submission side is actually missed. And we may start to run queue after queue is removed because the queue's kobject refcount isn't guaranteed to be grabbed in flushing plug list context, then kernel oops is triggered, see the following race: blk_mq_flush_plug_list(): blk_mq_sched_insert_requests() insert requests to sw queue or scheduler queue blk_mq_run_hw_queue Because of concurrent run queue, all requests inserted above may be completed before calling the above blk_mq_run_hw_queue. Then queue can be freed during the above blk_mq_run_hw_queue(). Fixes the issue by grab .q_usage_counter before calling blk_mq_sched_insert_requests() in blk_mq_flush_plug_list(). This way is safe because the queue is absolutely alive before inserting request. Cc: Dongli Zhang <dongli.zhang@oracle.com> Cc: James Smart <james.smart@broadcom.com> Cc: linux-scsi@vger.kernel.org, Cc: Martin K . Petersen <martin.petersen@oracle.com>, Cc: Christoph Hellwig <hch@lst.de>, Cc: James E . J . Bottomley <jejb@linux.vnet.ibm.com>, Reviewed-by:
Bart Van Assche <bvanassche@acm.org> Tested-by:
James Smart <james.smart@broadcom.com> Signed-off-by:
Ming Lei <ming.lei@redhat.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- May 02, 2019
-
-
Raul E Rangel authored
The comment was out of date. Reviewed-by:
Bart Van Assche <bvanassche@acm.org> Signed-off-by:
Raul E Rangel <rrangel@chromium.org> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Apr 30, 2019
-
-
Christoph Hellwig authored
Reviewed-by:
Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Various block layer files do not have any licensing information at all. Add SPDX tags for the default kernel GPLv2 license to those. Reviewed-by:
Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
All these files have some form of the usual GPLv2 or later boilerplate. Switch them to use SPDX tags instead. Reviewed-by:
Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
All these files have some form of the usual GPLv2 boilerplate. Switch them to use SPDX tags instead. Reviewed-by:
Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Share the bi_size update by moving the done label up, and duplicate the bv_len update in the two callers to get rid of the bvec_merge label. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
We are never called with file system pages by defintions for the passthrough interface, and we also never undo any addition later these days. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
The same page optimization is a rather odd corner case, which is not used outside bio.c and which really should not be used outside of bio.c either - we have better highlevel helpers like the rq/bio mapping helpers. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
We only have two callers that need the integer loop iterator, and they can easily maintain it themselves. Suggested-by:
Matthew Wilcox <willy@infradead.org> Reviewed-by:
Johannes Thumshirn <jthumshirn@suse.de> Acked-by:
David Sterba <dsterba@suse.com> Reviewed-by:
Hannes Reinecke <hare@suse.com> Acked-by:
Coly Li <colyli@suse.de> Reviewed-by:
Matthew Wilcox <willy@infradead.org> Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Apr 25, 2019
-
-
Kimberly Brown authored
The kobj_type default_attrs field is being replaced by the default_groups field. Replace all of the ktype default_attrs fields in the block subsystem with default_groups and use the ATTRIBUTE_GROUPS macro to create the default groups. Remove default_ctx_attrs[] because it doesn't contain any attributes. This patch was tested by verifying that the sysfs files for the attributes in the default groups were created. Signed-off-by:
Kimberly Brown <kimbrownkd@gmail.com> Reviewed-by:
Bart Van Assche <bvanassche@acm.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- Apr 24, 2019
-
-
Ming Lei authored
The refcount has been increased for pages retrieved from non-bvec iov iter via __bio_iov_iter_get_pages(), so don't need to do that again. Otherwise, IO pages are leaked easily. Cc: Christoph Hellwig <hch@lst.de> Reviewed-by:
Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Fixes: 7321ecbf ("block: change how we get page references in bio_iov_iter_get_pages") Signed-off-by:
Ming Lei <ming.lei@redhat.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Apr 23, 2019
-
-
Ming Lei authored
bio_add_page() and __bio_add_page() are capable of adding pages into bio, and now we have at least two such usages alreay: - __bio_iov_bvec_add_pages() - nvmet_bdev_execute_rw(). So update comments on these two helpers. The thing is a bit special for __bio_try_merge_page(), given the caller needs to know if the new added page is same with the last added page, then it isn't safe to pass multi-page in case that 'same_page' is true, so adds warning on potential misuse, and updates comment on __bio_try_merge_page(). Cc: linux-xfs@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org Reviewed-by:
Hannes Reinecke <hare@suse.com> Reviewed-by:
Christoph Hellwig <hch@infradead.org> Signed-off-by:
Ming Lei <ming.lei@redhat.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Apr 22, 2019
-
-
Weiping Zhang authored
If the low level driver has no timeout handler, the /sys/block/<disk>/queue/io_timeout will not be displayed. Reviewed-by:
Bart Van Assche <bvanassche@acm.org> Signed-off-by:
Weiping Zhang <zhangweiping@didiglobal.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
While we generally allow scatterlists to have offsets larger than page size for an entry, and other subsystems like the crypto code make use of that, the block layer isn't quite ready for that. Flip the switch back to avoid them for now, and revisit that decision early in a merge window once the known offenders are fixed. Fixes: 8a96a0e4 ("block: rewrite blk_bvec_map_sg to avoid a nth_page call") Reviewed-by:
Ming Lei <ming.lei@redhat.com> Tested-by:
Guenter Roeck <linux@roeck-us.net> Reported-by:
Guenter Roeck <linux@roeck-us.net> Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Yufen Yu authored
commit 2da78092 "block: Fix dev_t minor allocation lifetime" specifically moved blk_free_devt(dev->devt) call to part_release() to avoid reallocating device number before the device is fully shutdown. However, it can cause use-after-free on gendisk in get_gendisk(). We use md device as example to show the race scenes: Process1 Worker Process2 md_free blkdev_open del_gendisk add delete_partition_work_fn() to wq __blkdev_get get_gendisk put_disk disk_release kfree(disk) find part from ext_devt_idr get_disk_and_module(disk) cause use after free delete_partition_work_fn put_device(part) part_release remove part from ext_devt_idr Before <devt, hd_struct pointer> is removed from ext_devt_idr by delete_partition_work_fn(), we can find the devt and then access gendisk by hd_struct pointer. But, if we access the gendisk after it have been freed, it can cause in use-after-freeon gendisk in get_gendisk(). We fix this by adding a new helper blk_invalidate_devt() in delete_partition() and del_gendisk(). It replaces hd_struct pointer in idr with value 'NULL', and deletes the entry from idr in part_release() as we do now. Thanks to Jan Kara for providing the solution and more clear comments for the code. Fixes: 2da78092 ("block: Fix dev_t minor allocation lifetime") Cc: Al Viro <viro@zeniv.linux.org.uk> Reviewed-by:
Bart Van Assche <bvanassche@acm.org> Reviewed-by:
Keith Busch <keith.busch@intel.com> Reviewed-by:
Jan Kara <jack@suse.cz> Suggested-by:
Jan Kara <jack@suse.cz> Signed-off-by:
Yufen Yu <yuyufen@huawei.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Apr 15, 2019
-
-
Yufen Yu authored
commit 2da78092 "block: Fix dev_t minor allocation lifetime" specifically moved blk_free_devt(dev->devt) call to part_release() to avoid reallocating device number before the device is fully shutdown. However, it can cause use-after-free on gendisk in get_gendisk(). We use md device as example to show the race scenes: Process1 Worker Process2 md_free blkdev_open del_gendisk add delete_partition_work_fn() to wq __blkdev_get get_gendisk put_disk disk_release kfree(disk) find part from ext_devt_idr get_disk_and_module(disk) cause use after free delete_partition_work_fn put_device(part) part_release remove part from ext_devt_idr Before <devt, hd_struct pointer> is removed from ext_devt_idr by delete_partition_work_fn(), we can find the devt and then access gendisk by hd_struct pointer. But, if we access the gendisk after it have been freed, it can cause in use-after-freeon gendisk in get_gendisk(). We fix this by adding a new helper blk_invalidate_devt() in delete_partition() and del_gendisk(). It replaces hd_struct pointer in idr with value 'NULL', and deletes the entry from idr in part_release() as we do now. Thanks to Jan Kara for providing the solution and more clear comments for the code. Fixes: 2da78092 ("block: Fix dev_t minor allocation lifetime") Cc: Al Viro <viro@zeniv.linux.org.uk> Reviewed-by:
Bart Van Assche <bvanassche@acm.org> Reviewed-by:
Keith Busch <keith.busch@intel.com> Reviewed-by:
Jan Kara <jack@suse.cz> Suggested-by:
Jan Kara <jack@suse.cz> Signed-off-by:
Yufen Yu <yuyufen@huawei.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Apr 14, 2019
-
-
Jens Axboe authored
A previous commit moved the shallow depth and BFQ depth map calculations to be done at init time, moving it outside of the hotter IO path. This potentially causes hangs if the users changes the depth of the scheduler map, by writing to the 'nr_requests' sysfs file for that device. Add a blk-mq-sched hook that allows blk-mq to inform the scheduler if the depth changes, so that the scheduler can update its internal state. Tested-by:
Kai Krakow <kai@kaishome.de> Reported-by:
Paolo Valente <paolo.valente@linaro.org> Fixes: f0635b8a ("bfq: calculate shallow depths at init time") Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Apr 12, 2019
-
-
Martin Wilck authored
Drivers now report to the block layer if they support media change events. If this is not the case, there's no need to allocate the event structure, and all event handling code can effectively be skipped. This simplifies code flow in particular for non-removable sd devices. This effectively reverts commit 75e3f3ee ("block: always allocate genhd->ev if check_events is implemented"). The sysfs files for the events are kept in place even if no events are supported, as user space may rely on them being present. The only difference is that an error code is now returned if the user tries to set poll_msecs. Reviewed-by:
Hannes Reinecke <hare@suse.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Martin Wilck <mwilck@suse.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Martin Wilck authored
Currently, an empty disk->events field tells the block layer not to forward media change events to user space. This was done in commit 7c88a168 ("block: don't propagate unlisted DISK_EVENTs to userland") in order to avoid events from "fringe" drivers to be forwarded to user space. By doing so, the block layer lost the information which events were supported by a particular block device, and most importantly, whether or not a given device supports media change events at all. Prepare for not interpreting the "events" field this way in the future any more. This is done by adding an additional field "event_flags" to struct gendisk, and two flag bits that can be set to have the device treated like one that had the "events" field set to a non-zero value before. This applies only to the sd and sr drivers, which are changed to set the new flags. The new flags are DISK_EVENT_FLAG_POLL to enforce polling of the device for synchronous events, and DISK_EVENT_FLAG_UEVENT to tell the blocklayer to generate udev events from kernel events. In order to add the event_flags field to struct gendisk, the events field is converted to an "unsigned short"; it doesn't need to hold values bigger than 2 anyway. This patch doesn't change behavior. Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Martin Wilck <mwilck@suse.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Martin Wilck authored
The async_events field, intended to be used for drivers that support asynchronous notifications about disk events (aka media change events), isn't currently used by any driver, and apparently that has been that way for a long time (if not forever). Remove it. Reviewed-by:
Hannes Reinecke <hare@suse.de> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Martin Wilck <mwilck@suse.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
We currently have to call nth_page when iterating over pages inside a bio_vec. Jens complained a while ago that this is fairly expensive. To mitigate this we can check that that the actual page structures are contiguous when adding them to the bio, and just do check pointer arithmetics later on. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Instead of needing a special macro to iterate over all pages in a bvec just do a second passs over the whole bio. This also matches what we do on the release side. The release side helper is moved up to where we need the get helper to clearly express the symmetry. Reviewed-by:
Ming Lei <ming.lei@redhat.com> Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
No caller uses bio_iov_iter_get_pages multiple times on a given bio, and that funtionality isn't all that useful. Removing it will make some future changes a little easier and also simplifies the function a bit. Reviewed-by:
Ming Lei <ming.lei@redhat.com> Reviewed-by:
Bart Van Assche <bvanassche@acm.org> Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Return early on error, and add an unlikely annotation for that case. Reviewed-by:
Ming Lei <ming.lei@redhat.com> Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Bart Van Assche <bvanassche@acm.org> Reviewed-by:
Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
The offset in scatterlists is allowed to be larger than the page size, so don't go to great length to avoid that case and simplify the arithmetics. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Bart Van Assche <bvanassche@acm.org> Reviewed-by:
Ming Lei <ming.lei@redhat.com> Reviewed-by:
Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Apr 10, 2019
-
-
Jérôme Glisse authored
When bio_add_pc_page() fails in bio_copy_user_iov() we should free the page we just allocated otherwise we are leaking it. Cc: linux-block@vger.kernel.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: stable@vger.kernel.org Reviewed-by:
Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by:
Jérôme Glisse <jglisse@redhat.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Ming Lei authored
In NVMe's error handler, follows the typical steps of tearing down hardware for recovering controller: 1) stop blk_mq hw queues 2) stop the real hw queues 3) cancel in-flight requests via blk_mq_tagset_busy_iter(tags, cancel_request, ...) cancel_request(): mark the request as abort blk_mq_complete_request(req); 4) destroy real hw queues However, there may be race between #3 and #4, because blk_mq_complete_request() may run q->mq_ops->complete(rq) remotelly and asynchronously, and ->complete(rq) may be run after #4. This patch introduces blk_mq_complete_request_sync() for fixing the above race. Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Bart Van Assche <bvanassche@acm.org> Cc: James Smart <james.smart@broadcom.com> Cc: linux-nvme@lists.infradead.org Reviewed-by:
Keith Busch <keith.busch@intel.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Ming Lei <ming.lei@redhat.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Paolo Valente authored
The function bfq_bfqq_expire() invokes the function __bfq_bfqq_expire(), and the latter may free the in-service bfq-queue. If this happens, then no other instruction of bfq_bfqq_expire() must be executed, or a use-after-free will occur. Basing on the assumption that __bfq_bfqq_expire() invokes bfq_put_queue() on the in-service bfq-queue exactly once, the queue is assumed to be freed if its refcounter is equal to one right before invoking __bfq_bfqq_expire(). But, since commit 9dee8b3b ("block, bfq: fix queue removal from weights tree") this assumption is false. __bfq_bfqq_expire() may also invoke bfq_weights_tree_remove() and, since commit 9dee8b3b ("block, bfq: fix queue removal from weights tree"), also the latter function may invoke bfq_put_queue(). So __bfq_bfqq_expire() may invoke bfq_put_queue() twice, and this is the actual case where the in-service queue may happen to be freed. To address this issue, this commit moves the check on the refcounter of the queue right around the last bfq_put_queue() that may be invoked on the queue. Fixes: 9dee8b3b ("block, bfq: fix queue removal from weights tree") Reported-by:
Dmitrii Tcvetkov <demfloro@demfloro.ru> Reported-by:
Douglas Anderson <dianders@chromium.org> Tested-by:
Dmitrii Tcvetkov <demfloro@demfloro.ru> Tested-by:
Douglas Anderson <dianders@chromium.org> Signed-off-by:
Paolo Valente <paolo.valente@linaro.org> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Apr 08, 2019
-
-
Ming Lei authored
Commit f6970f83 ("block: don't check if adjacent bvecs in one bio can be mergeable") changes bvec merge by only considering two bvecs from different bios. However, if the former bio doesn't inlcude any io bvec, then the following warning may be triggered: warning: ‘bvec.bv_offset’ may be used uninitialized in this function [-Wmaybe-uninitialized] In practice, it shouldn't be triggered. Fixes it by adding check on former bio, the check shouldn't add any cost given 'bio->bi_iter' can be hit in cache. Reported-by:
Jens Axboe <axboe@kernel.dk> Fixes: f6970f83 ("block: don't check if adjacent bvecs in one bio can be mergeable") Signed-off-by:
Ming Lei <ming.lei@redhat.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Angelo Ruocco authored
Some of the comments in the bfq files had typos. This patch fixes them. Signed-off-by:
Angelo Ruocco <angeloruocco90@gmail.com> Signed-off-by:
Paolo Valente <paolo.valente@linaro.org> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Hisao Tanabe authored
The 'def' local variable became unused after commit f382fb0b ("block: remove legacy IO schedulers"), let's remove it. Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Hisao Tanabe <xtanabe@gmail.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Apr 06, 2019
-
-
David Kozub authored
As the function is responsible for executing the individual steps supplied in the steps argument, execute_steps is a more descriptive name than the rather generic next. Signed-off-by:
David Kozub <zub@linux.fjfi.cvut.cz> Reviewed-by:
Scott Bauer <sbauer@plzdonthack.me> Reviewed-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Jon Derrick <jonathan.derrick@intel.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-