mbox series

[V2,00/16] Introduce the BFQ I/O scheduler

Message ID 20170331124743.3530-1-paolo.valente@linaro.org
Headers show
Series Introduce the BFQ I/O scheduler | expand

Message

Paolo Valente March 31, 2017, 12:47 p.m. UTC
Hi,
with respect to the previous submission [1], these new patch series:
- contains all the changes suggested by Jens and Bart [1], apart from
  those for which I raised doubts that either have been acknowledged,
  or have not received a reply yet (I will of course apply also the
  latter changes if those threads restart);
- contains a fix to the bug causing the failure reported by Jens [2].

As for major changes, this patch series:
- solves the nesting problem between scheduler and io-context locks, by
  not taking any reference to io contexts anymore [3];
- splits the original, single source file into three files.

These last two contributions are provided by two additional patches in
the series. I've not merged these changes into the other patches for
the following reasons:

- Merging these changes would have implied splitting them into further
  smaller pieces, applying each piece to the right previous patch, and
  solving all the conflicts generated by each per-patch
  modification. This would have taken really a lot of time, and would
  have implied a certain probability of introducing subtle errors (I
  have tried for a few days, and then abandoned this solution).

- The removal of extra io-context references is a non-trivial change
  to code that has worked the other way, for probably about a decade,
  in CFQ.  The change seems to be fine, but in case of errors, it is
  probably much easier to find and clearly fix them, if they are
  confined in a single commit.

- A dedicated commit for the removal of extra io-context references
  also documents how it has been obtained, and what assumptions have
  been made.

- Similarly, an explicit split of the srouce file shows where each
  piece has gone, instead of exposing only the result of the split,
  with possible mistakes buried in it.

I have run all the tests I could.

Some patch still generates WARNINGS with checkpatch.pl, but these
WARNINGS seem to be either unavoidable for the involved pieces of code
(which the patch just extends), or false positives.

Thanks,
Paolo

[1] https://lkml.org/lkml/2017/3/4/148
[2] https://lkml.org/lkml/2017/3/6/887
[3] https://lkml.org/lkml/2017/3/18/34
Arianna Avanzini (4):
  block, bfq: add full hierarchical scheduling and cgroups support
  block, bfq: add Early Queue Merge (EQM)
  block, bfq: reduce idling only in symmetric scenarios
  block, bfq: handle bursts of queue activations

Paolo Valente (12):
  block, bfq: introduce the BFQ-v0 I/O scheduler as an extra scheduler
  block, bfq: improve throughput boosting
  block, bfq: modify the peak-rate estimator
  block, bfq: add more fairness with writes and slow processes
  block, bfq: improve responsiveness
  block, bfq: reduce I/O latency for soft real-time applications
  block, bfq: preserve a low latency also with NCQ-capable drives
  block, bfq: reduce latency during request-pool saturation
  block, bfq: boost the throughput on NCQ-capable flash-based devices
  block, bfq: boost the throughput with random I/O on NCQ-capable HDDs
  block, bfq: remove all get and put of I/O contexts
  block, bfq: split bfq-iosched.c into multiple source files

 Documentation/block/00-INDEX        |    2 +
 Documentation/block/bfq-iosched.txt |  531 ++++
 block/Kconfig.iosched               |   21 +
 block/Makefile                      |    1 +
 block/bfq-cgroup.c                  | 1139 ++++++++
 block/bfq-iosched.c                 | 5030 +++++++++++++++++++++++++++++++++++
 block/bfq-iosched.h                 |  942 +++++++
 block/bfq-wf2q.c                    | 1616 +++++++++++
 include/linux/blkdev.h              |    2 +-
 9 files changed, 9283 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/block/bfq-iosched.txt
 create mode 100644 block/bfq-cgroup.c
 create mode 100644 block/bfq-iosched.c
 create mode 100644 block/bfq-iosched.h
 create mode 100644 block/bfq-wf2q.c

--
2.10.0

Comments

Bart Van Assche April 10, 2017, 4:56 p.m. UTC | #1
On Fri, 2017-03-31 at 14:47 +0200, Paolo Valente wrote:
> [ ... ]


Hello Paolo,

Is the git tree that is available at https://github.com/Algodev-github/bfq-mq
appropriate for testing BFQ? If I merge that tree with v4.11-rc6 and if I run
the srp-test software against that tree as follows:

    ./run_tests -e bfq-mq -t 02-mq

then the following appears on the console:

[ 2748.650352] BUG: unable to handle kernel NULL pointer dereference at 00000000000000d0
[ 2748.650442] IP: __bfq_insert_request+0x26/0x650 [bfq_mq_iosched]
[ 2748.650509] PGD 0 
[ 2748.650511] 
[ 2748.650585] Oops: 0000 [#1] SMP
[ 2748.651107] CPU: 9 PID: 10772 Comm: kworker/9:2H Tainted: G          I     4.11.0-rc6-dbg+ #1
[ 2748.651191] Workqueue: kblockd blk_mq_requeue_work
[ 2748.651228] task: ffff88037c808040 task.stack: ffffc90003b4c000
[ 2748.651268] RIP: 0010:__bfq_insert_request+0x26/0x650 [bfq_mq_iosched]
[ 2748.651307] RSP: 0018:ffffc90003b4f9d8 EFLAGS: 00010002
[ 2748.651345] RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000001
[ 2748.651383] RDX: 0000000000000001 RSI: ffff880377f52e80 RDI: ffff880401f774e8
[ 2748.651423] RBP: ffffc90003b4fa80 R08: 9093955f00000000 R09: 0000000000000001
[ 2748.651464] R10: ffffc90003b4fa00 R11: ffffffffa06d0d53 R12: ffff880401f77840
[ 2748.651506] R13: ffff880401f774e8 R14: ffff880378a451e0 R15: 0000000000000000
[ 2748.651547] FS:  0000000000000000(0000) GS:ffff88046f040000(0000) knlGS:0000000000000000
[ 2748.651588] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2748.651626] CR2: 00000000000000d0 CR3: 0000000001c0f000 CR4: 00000000001406e0
[ 2748.651664] Call Trace:
[ 2748.651778]  bfq_insert_request+0x83/0x280 [bfq_mq_iosched]
[ 2748.651934]  bfq_insert_requests+0x50/0x70 [bfq_mq_iosched]
[ 2748.651975]  blk_mq_sched_insert_request+0x11e/0x170
[ 2748.652015]  blk_insert_cloned_request+0xb6/0x1f0
[ 2748.652361]  map_request+0x13c/0x290 [dm_mod]
[ 2748.652403]  dm_mq_queue_rq+0x90/0x160 [dm_mod]
[ 2748.652441]  blk_mq_dispatch_rq_list+0x1f2/0x3e0
[ 2748.652479]  blk_mq_sched_dispatch_requests+0xf1/0x190
[ 2748.652516]  __blk_mq_run_hw_queue+0x12d/0x1c0
[ 2748.652553]  __blk_mq_delay_run_hw_queue+0xe3/0xf0
[ 2748.652593]  blk_mq_run_hw_queues+0x5c/0x80
[ 2748.652632]  blk_mq_requeue_work+0x132/0x150
[ 2748.652671]  process_one_work+0x206/0x6a0
[ 2748.652709]  worker_thread+0x49/0x4a0
[ 2748.652745]  kthread+0x107/0x140
[ 2748.652854]  ret_from_fork+0x2e/0x40
[ 2748.652891] Code: ff 0f 1f 40 00 55 48 89 e5 41 57 41 56 41 55 41 54 53 48 83 c4 80 8b 87 58 03 00 00 48 8b 9e b0 00 00 00 85 c0 0f 84 8b 04 00 00 <48> 8b 83 d0 00 00 00 48 85 c0 0f 84 63 04 00 00
48 83 e8 10 48 
[ 2748.653049] RIP: __bfq_insert_request+0x26/0x650 [bfq_mq_iosched] RSP: ffffc90003b4f9d8
[ 2748.653090] CR2: 00000000000000d0

The crash address corresponds to the following source code according to gdb:

(gdb) list *(__bfq_insert_request+0x26)
0xd6f6 is in __bfq_insert_request (block/bfq-mq-iosched.c:4430).
4425
4426    static void __bfq_insert_request(struct bfq_data *bfqd, struct request *rq)
4427    {
4428            struct bfq_queue *bfqq = RQ_BFQQ(rq), *new_bfqq;
4429
4430            assert_spin_locked(&bfqd->lock);
4431
4432            bfq_log_bfqq(bfqd, bfqq, "__insert_req: rq %p bfqq %p", rq, bfqq);
4433
4434            /*

Bart.
Paolo Valente April 11, 2017, 8:43 a.m. UTC | #2
> Il giorno 10 apr 2017, alle ore 18:56, Bart Van Assche <bart.vanassche@sandisk.com> ha scritto:

> 

> On Fri, 2017-03-31 at 14:47 +0200, Paolo Valente wrote:

>> [ ... ]

> 

> Hello Paolo,

> 

> Is the git tree that is available at https://github.com/Algodev-github/bfq-mq

> appropriate for testing BFQ? If I merge that tree with v4.11-rc6 and if I run

> the srp-test software against that tree as follows:

> 

>    ./run_tests -e bfq-mq -t 02-mq

> 

> then the following appears on the console:

> 

> [ 2748.650352] BUG: unable to handle kernel NULL pointer dereference at 00000000000000d0

> [ 2748.650442] IP: __bfq_insert_request+0x26/0x650 [bfq_mq_iosched]

> [ 2748.650509] PGD 0 

> [ 2748.650511] 

> [ 2748.650585] Oops: 0000 [#1] SMP

> [ 2748.651107] CPU: 9 PID: 10772 Comm: kworker/9:2H Tainted: G          I     4.11.0-rc6-dbg+ #1

> [ 2748.651191] Workqueue: kblockd blk_mq_requeue_work

> [ 2748.651228] task: ffff88037c808040 task.stack: ffffc90003b4c000

> [ 2748.651268] RIP: 0010:__bfq_insert_request+0x26/0x650 [bfq_mq_iosched]

> [ 2748.651307] RSP: 0018:ffffc90003b4f9d8 EFLAGS: 00010002

> [ 2748.651345] RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000001

> [ 2748.651383] RDX: 0000000000000001 RSI: ffff880377f52e80 RDI: ffff880401f774e8

> [ 2748.651423] RBP: ffffc90003b4fa80 R08: 9093955f00000000 R09: 0000000000000001

> [ 2748.651464] R10: ffffc90003b4fa00 R11: ffffffffa06d0d53 R12: ffff880401f77840

> [ 2748.651506] R13: ffff880401f774e8 R14: ffff880378a451e0 R15: 0000000000000000

> [ 2748.651547] FS:  0000000000000000(0000) GS:ffff88046f040000(0000) knlGS:0000000000000000

> [ 2748.651588] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033

> [ 2748.651626] CR2: 00000000000000d0 CR3: 0000000001c0f000 CR4: 00000000001406e0

> [ 2748.651664] Call Trace:

> [ 2748.651778]  bfq_insert_request+0x83/0x280 [bfq_mq_iosched]

> [ 2748.651934]  bfq_insert_requests+0x50/0x70 [bfq_mq_iosched]

> [ 2748.651975]  blk_mq_sched_insert_request+0x11e/0x170

> [ 2748.652015]  blk_insert_cloned_request+0xb6/0x1f0

> [ 2748.652361]  map_request+0x13c/0x290 [dm_mod]

> [ 2748.652403]  dm_mq_queue_rq+0x90/0x160 [dm_mod]

> [ 2748.652441]  blk_mq_dispatch_rq_list+0x1f2/0x3e0

> [ 2748.652479]  blk_mq_sched_dispatch_requests+0xf1/0x190

> [ 2748.652516]  __blk_mq_run_hw_queue+0x12d/0x1c0

> [ 2748.652553]  __blk_mq_delay_run_hw_queue+0xe3/0xf0

> [ 2748.652593]  blk_mq_run_hw_queues+0x5c/0x80

> [ 2748.652632]  blk_mq_requeue_work+0x132/0x150

> [ 2748.652671]  process_one_work+0x206/0x6a0

> [ 2748.652709]  worker_thread+0x49/0x4a0

> [ 2748.652745]  kthread+0x107/0x140

> [ 2748.652854]  ret_from_fork+0x2e/0x40

> [ 2748.652891] Code: ff 0f 1f 40 00 55 48 89 e5 41 57 41 56 41 55 41 54 53 48 83 c4 80 8b 87 58 03 00 00 48 8b 9e b0 00 00 00 85 c0 0f 84 8b 04 00 00 <48> 8b 83 d0 00 00 00 48 85 c0 0f 84 63 04 00 00

> 48 83 e8 10 48 

> [ 2748.653049] RIP: __bfq_insert_request+0x26/0x650 [bfq_mq_iosched] RSP: ffffc90003b4f9d8

> [ 2748.653090] CR2: 00000000000000d0

> 

> The crash address corresponds to the following source code according to gdb:

> 

> (gdb) list *(__bfq_insert_request+0x26)

> 0xd6f6 is in __bfq_insert_request (block/bfq-mq-iosched.c:4430).

> 4425

> 4426    static void __bfq_insert_request(struct bfq_data *bfqd, struct request *rq)

> 4427    {

> 4428            struct bfq_queue *bfqq = RQ_BFQQ(rq), *new_bfqq;

> 4429

> 4430            assert_spin_locked(&bfqd->lock);

> 4431

> 4432            bfq_log_bfqq(bfqd, bfqq, "__insert_req: rq %p bfqq %p", rq, bfqq);

> 4433

> 4434            /*

> 


Hi Bart,
I've tried to figure out how to deal with this crash, but I didn't
find any sensible way to go, for the following two reasons.

First, if I'm not missing anything, then I don't yet have the hardware
required to run the srp-test.  So, I cannot easily reproduce this
failure.  Actually, BFQ is not yet suitable, and maybe will never be
in its current design, for very high-speed hardware as InfiniBand and
NVMe devices.

Second, a NULL-pointer fault at the line you report is rather weird.
In fact, the sequence of C-code instructions executed up to that line
is:

struct bfq_data *bfqd = q->elevator->elevator_data;
...
spin_lock_irq(&bfqd->lock);
__bfq_insert_request(bfqd, rq);
	/* inside the __bfq_insert_request function: */
	struct bfq_queue *bfqq = RQ_BFQQ(rq), ...;
	assert_spin_locked(&bfqd->lock);

So, how can the last line cause a NULL-pointer-dereference exception
on the same address, &bfqd->lock, on which spin_lock_irq(&bfqd->lock);
was happy to work to get a spin lock?

Any idea on how to proceed?  If this strage bug remains hard to spot,
then, if you agree, I will go on in the meanwhile with submitting a
new version of the patch series, which addresses your other issues.

Thanks,
Paolo

> Bart.