mbox series

[BUGFIX/IMPROVEMENT,V2,0/3] three bfq fixes restoring service guarantees with random sync writes in bg

Message ID 20170831064631.2223-1-paolo.valente@linaro.org
Headers show
Series three bfq fixes restoring service guarantees with random sync writes in bg | expand

Message

Paolo Valente Aug. 31, 2017, 6:46 a.m. UTC
[SECOND TAKE, with just the name of one of the tester fixed]

Hi,
while testing the read-write unfairness issues reported by Mel, I
found BFQ failing to guarantee good responsiveness against heavy
random sync writes in the background, i.e., multiple writers doing
random writes and systematic fdatasync [1]. The failure was caused by
three related bugs, because of which BFQ failed to guarantee to
high-weight processes the expected fraction of the throughput.

The three patches in this series fix these bugs. These fixes restore
the usual BFQ service guarantees (and thus optimal responsiveness
too), against the above background workload and, probably, against
other similar workloads.

Thanks,
Paolo

[1] https://lkml.org/lkml/2017/8/9/957

Paolo Valente (3):
  block, bfq: make lookup_next_entity push up vtime on expirations
  block, bfq: remove direct switch to an entity in higher class
  block, bfq: guarantee update_next_in_service always returns an
    eligible entity

 block/bfq-iosched.c |  4 +--
 block/bfq-iosched.h |  4 +--
 block/bfq-wf2q.c    | 91 ++++++++++++++++++++++++++++++++---------------------
 3 files changed, 60 insertions(+), 39 deletions(-)

--
2.10.0

Comments

Jens Axboe Aug. 31, 2017, 2:21 p.m. UTC | #1
On 08/31/2017 12:46 AM, Paolo Valente wrote:
> [SECOND TAKE, with just the name of one of the tester fixed]

> 

> Hi,

> while testing the read-write unfairness issues reported by Mel, I

> found BFQ failing to guarantee good responsiveness against heavy

> random sync writes in the background, i.e., multiple writers doing

> random writes and systematic fdatasync [1]. The failure was caused by

> three related bugs, because of which BFQ failed to guarantee to

> high-weight processes the expected fraction of the throughput.

> 

> The three patches in this series fix these bugs. These fixes restore

> the usual BFQ service guarantees (and thus optimal responsiveness

> too), against the above background workload and, probably, against

> other similar workloads.


Applied for 4.14, thanks Paolo.

-- 
Jens Axboe
Mel Gorman Aug. 31, 2017, 2:42 p.m. UTC | #2
On Thu, Aug 31, 2017 at 08:46:28AM +0200, Paolo Valente wrote:
> [SECOND TAKE, with just the name of one of the tester fixed]

> 

> Hi,

> while testing the read-write unfairness issues reported by Mel, I

> found BFQ failing to guarantee good responsiveness against heavy

> random sync writes in the background, i.e., multiple writers doing

> random writes and systematic fdatasync [1]. The failure was caused by

> three related bugs, because of which BFQ failed to guarantee to

> high-weight processes the expected fraction of the throughput.

> 


Queued on top of Ming's most recent series even though that's still a work
in progress. I should know in a few days how things stand.

-- 
Mel Gorman
SUSE Labs
Mike Galbraith Aug. 31, 2017, 5:06 p.m. UTC | #3
On Thu, 2017-08-31 at 15:42 +0100, Mel Gorman wrote:
> On Thu, Aug 31, 2017 at 08:46:28AM +0200, Paolo Valente wrote:

> > [SECOND TAKE, with just the name of one of the tester fixed]

> > 

> > Hi,

> > while testing the read-write unfairness issues reported by Mel, I

> > found BFQ failing to guarantee good responsiveness against heavy

> > random sync writes in the background, i.e., multiple writers doing

> > random writes and systematic fdatasync [1]. The failure was caused by

> > three related bugs, because of which BFQ failed to guarantee to

> > high-weight processes the expected fraction of the throughput.

> > 

> 

> Queued on top of Ming's most recent series even though that's still a work

> in progress. I should know in a few days how things stand.


It seems to have cured an interactivity issue I regularly meet during
kbuild final link/depmod phase of fat kernel kbuild, especially bad
with evolution mail usage during that on spinning rust.  Can't really
say for sure given this is not based on measurement.

	-Mike
Paolo Valente Aug. 31, 2017, 5:12 p.m. UTC | #4
> Il giorno 31 ago 2017, alle ore 19:06, Mike Galbraith <efault@gmx.de> ha scritto:

> 

> On Thu, 2017-08-31 at 15:42 +0100, Mel Gorman wrote:

>> On Thu, Aug 31, 2017 at 08:46:28AM +0200, Paolo Valente wrote:

>>> [SECOND TAKE, with just the name of one of the tester fixed]

>>> 

>>> Hi,

>>> while testing the read-write unfairness issues reported by Mel, I

>>> found BFQ failing to guarantee good responsiveness against heavy

>>> random sync writes in the background, i.e., multiple writers doing

>>> random writes and systematic fdatasync [1]. The failure was caused by

>>> three related bugs, because of which BFQ failed to guarantee to

>>> high-weight processes the expected fraction of the throughput.

>>> 

>> 

>> Queued on top of Ming's most recent series even though that's still a work

>> in progress. I should know in a few days how things stand.

> 

> It seems to have cured an interactivity issue I regularly meet during

> kbuild final link/depmod phase of fat kernel kbuild, especially bad

> with evolution mail usage during that on spinning rust.  Can't really

> say for sure given this is not based on measurement.

> 



Great!  Actually, when I found these bugs, I thought also about the
issues you told me you experienced with updatedb running.  But then I
forgot to tell you that these fixes might help.

Thanks,
Paolo

> 	-Mike
Mike Galbraith Aug. 31, 2017, 5:31 p.m. UTC | #5
On Thu, 2017-08-31 at 19:12 +0200, Paolo Valente wrote:
> > Il giorno 31 ago 2017, alle ore 19:06, Mike Galbraith <efault@gmx.de> ha scritto:

> > 

> > On Thu, 2017-08-31 at 15:42 +0100, Mel Gorman wrote:

> >> On Thu, Aug 31, 2017 at 08:46:28AM +0200, Paolo Valente wrote:

> >>> [SECOND TAKE, with just the name of one of the tester fixed]

> >>> 

> >>> Hi,

> >>> while testing the read-write unfairness issues reported by Mel, I

> >>> found BFQ failing to guarantee good responsiveness against heavy

> >>> random sync writes in the background, i.e., multiple writers doing

> >>> random writes and systematic fdatasync [1]. The failure was caused by

> >>> three related bugs, because of which BFQ failed to guarantee to

> >>> high-weight processes the expected fraction of the throughput.

> >>> 

> >> 

> >> Queued on top of Ming's most recent series even though that's still a work

> >> in progress. I should know in a few days how things stand.

> > 

> > It seems to have cured an interactivity issue I regularly meet during

> > kbuild final link/depmod phase of fat kernel kbuild, especially bad

> > with evolution mail usage during that on spinning rust.  Can't really

> > say for sure given this is not based on measurement.

> >

> 

> 

> Great!  Actually, when I found these bugs, I thought also about the

> issues you told me you experienced with updatedb running.  But then I

> forgot to tell you that these fixes might help.


I'm going to actively test that, because that is every bit as
infuriating as the evolution thing, only updatedb is nukable.  In fact,
it infuriated me to the point that it no longer has a crontab entry,
runs only when I decide to run it.  At this point, I'll be pretty
surprised if that rotten <naughty words> is still alive.

	-Mike
Mel Gorman Sept. 4, 2017, 8:14 a.m. UTC | #6
On Thu, Aug 31, 2017 at 03:42:57PM +0100, Mel Gorman wrote:
> On Thu, Aug 31, 2017 at 08:46:28AM +0200, Paolo Valente wrote:

> > [SECOND TAKE, with just the name of one of the tester fixed]

> > 

> > Hi,

> > while testing the read-write unfairness issues reported by Mel, I

> > found BFQ failing to guarantee good responsiveness against heavy

> > random sync writes in the background, i.e., multiple writers doing

> > random writes and systematic fdatasync [1]. The failure was caused by

> > three related bugs, because of which BFQ failed to guarantee to

> > high-weight processes the expected fraction of the throughput.

> > 

> 

> Queued on top of Ming's most recent series even though that's still a work

> in progress. I should know in a few days how things stand.

> 


The problems with parallel heavy writers seem to have disappeared with this
series. There are still revisions taking place on Ming's to overall setting
of legacy vs mq is still a work in progress but this series looks good.

Thanks.

-- 
Mel Gorman
SUSE Labs
Paolo Valente Sept. 4, 2017, 8:55 a.m. UTC | #7
> Il giorno 04 set 2017, alle ore 10:14, Mel Gorman <mgorman@techsingularity.net> ha scritto:

> 

> On Thu, Aug 31, 2017 at 03:42:57PM +0100, Mel Gorman wrote:

>> On Thu, Aug 31, 2017 at 08:46:28AM +0200, Paolo Valente wrote:

>>> [SECOND TAKE, with just the name of one of the tester fixed]

>>> 

>>> Hi,

>>> while testing the read-write unfairness issues reported by Mel, I

>>> found BFQ failing to guarantee good responsiveness against heavy

>>> random sync writes in the background, i.e., multiple writers doing

>>> random writes and systematic fdatasync [1]. The failure was caused by

>>> three related bugs, because of which BFQ failed to guarantee to

>>> high-weight processes the expected fraction of the throughput.

>>> 

>> 

>> Queued on top of Ming's most recent series even though that's still a work

>> in progress. I should know in a few days how things stand.

>> 

> 

> The problems with parallel heavy writers seem to have disappeared with this

> series. There are still revisions taking place on Ming's to overall setting

> of legacy vs mq is still a work in progress but this series looks good.

> 


Great news!

Thanks for testing,
Paolo

> Thanks.

> 

> -- 

> Mel Gorman

> SUSE Labs
Ming Lei Sept. 4, 2017, 9:07 a.m. UTC | #8
On Mon, Sep 4, 2017 at 4:14 PM, Mel Gorman <mgorman@techsingularity.net> wrote:
> On Thu, Aug 31, 2017 at 03:42:57PM +0100, Mel Gorman wrote:

>> On Thu, Aug 31, 2017 at 08:46:28AM +0200, Paolo Valente wrote:

>> > [SECOND TAKE, with just the name of one of the tester fixed]

>> >

>> > Hi,

>> > while testing the read-write unfairness issues reported by Mel, I

>> > found BFQ failing to guarantee good responsiveness against heavy

>> > random sync writes in the background, i.e., multiple writers doing

>> > random writes and systematic fdatasync [1]. The failure was caused by

>> > three related bugs, because of which BFQ failed to guarantee to

>> > high-weight processes the expected fraction of the throughput.

>> >

>>

>> Queued on top of Ming's most recent series even though that's still a work

>> in progress. I should know in a few days how things stand.

>>

>

> The problems with parallel heavy writers seem to have disappeared with this

> series. There are still revisions taking place on Ming's to overall setting

> of legacy vs mq is still a work in progress but this series looks good.


Hi Mel and Paolo,

BTW, no actual functional change in V4.

Also could you guys provide one tested-by since looks you are using
it in your test?

Thanks,
Ming Lei