Message ID | 20200909151149.490589-14-kwolf@redhat.com |
---|---|
State | New |
Headers | show |
Series | monitor: Optionally run handlers in coroutines | expand |
On Wed, Sep 09, 2020 at 05:11:49PM +0200, Kevin Wolf wrote: > @@ -2456,8 +2456,7 @@ void qmp_block_resize(bool has_device, const char *device, > return; > } > > - aio_context = bdrv_get_aio_context(bs); > - aio_context_acquire(aio_context); > + old_ctx = bdrv_co_move_to_aio_context(bs); > > if (size < 0) { > error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "size", "a >0 size"); Is it safe to call blk_new() outside the BQL since it mutates global state? In other words, could another thread race with us? > @@ -2479,8 +2478,8 @@ void qmp_block_resize(bool has_device, const char *device, > bdrv_drained_end(bs); > > out: > + aio_co_reschedule_self(old_ctx); > blk_unref(blk); > - aio_context_release(aio_context); The following precondition is violated by the blk_unref -> bdrv_drain -> AIO_WAIT_WHILE() call if blk->refcnt is 1 here: * The caller's thread must be the IOThread that owns @ctx or the main loop * thread (with @ctx acquired exactly once). blk_unref() is called from the main loop thread without having acquired blk's AioContext. Normally blk->refcnt will be > 1 so bdrv_drain() won't be called, but I'm not sure if that can be guaranteed. The following seems safer although it's uglier: aio_context = bdrv_get_aio_context(bs); aio_context_acquire(aio_context); blk_unref(blk); aio_context_release(aio_context);
Am 15.09.2020 um 16:57 hat Stefan Hajnoczi geschrieben: > On Wed, Sep 09, 2020 at 05:11:49PM +0200, Kevin Wolf wrote: > > @@ -2456,8 +2456,7 @@ void qmp_block_resize(bool has_device, const char *device, > > return; > > } > > > > - aio_context = bdrv_get_aio_context(bs); > > - aio_context_acquire(aio_context); > > + old_ctx = bdrv_co_move_to_aio_context(bs); > > > > if (size < 0) { > > error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "size", "a >0 size"); > > Is it safe to call blk_new() outside the BQL since it mutates global state? > > In other words, could another thread race with us? Hm, probably not. Would it be safer to have the bdrv_co_move_to_aio_context() call only immediately before the drain? > > @@ -2479,8 +2478,8 @@ void qmp_block_resize(bool has_device, const char *device, > > bdrv_drained_end(bs); > > > > out: > > + aio_co_reschedule_self(old_ctx); > > blk_unref(blk); > > - aio_context_release(aio_context); > > The following precondition is violated by the blk_unref -> bdrv_drain -> > AIO_WAIT_WHILE() call if blk->refcnt is 1 here: > > * The caller's thread must be the IOThread that owns @ctx or the main loop > * thread (with @ctx acquired exactly once). > > blk_unref() is called from the main loop thread without having acquired > blk's AioContext. > > Normally blk->refcnt will be > 1 so bdrv_drain() won't be called, but > I'm not sure if that can be guaranteed. > > The following seems safer although it's uglier: > > aio_context = bdrv_get_aio_context(bs); > aio_context_acquire(aio_context); > blk_unref(blk); > aio_context_release(aio_context); May we actually acquire aio_context if blk is in the main thread? I think we must only do this if it's in a different iothread because we'd end up with a recursive lock and drain would hang. Kevin
On Fri, Sep 25, 2020 at 06:07:50PM +0200, Kevin Wolf wrote: > Am 15.09.2020 um 16:57 hat Stefan Hajnoczi geschrieben: > > On Wed, Sep 09, 2020 at 05:11:49PM +0200, Kevin Wolf wrote: > > > @@ -2456,8 +2456,7 @@ void qmp_block_resize(bool has_device, const char *device, > > > return; > > > } > > > > > > - aio_context = bdrv_get_aio_context(bs); > > > - aio_context_acquire(aio_context); > > > + old_ctx = bdrv_co_move_to_aio_context(bs); > > > > > > if (size < 0) { > > > error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "size", "a >0 size"); > > > > Is it safe to call blk_new() outside the BQL since it mutates global state? > > > > In other words, could another thread race with us? > > Hm, probably not. > > Would it be safer to have the bdrv_co_move_to_aio_context() call only > immediately before the drain? Yes, sounds good. > > > @@ -2479,8 +2478,8 @@ void qmp_block_resize(bool has_device, const char *device, > > > bdrv_drained_end(bs); > > > > > > out: > > > + aio_co_reschedule_self(old_ctx); > > > blk_unref(blk); > > > - aio_context_release(aio_context); > > > > The following precondition is violated by the blk_unref -> bdrv_drain -> > > AIO_WAIT_WHILE() call if blk->refcnt is 1 here: > > > > * The caller's thread must be the IOThread that owns @ctx or the main loop > > * thread (with @ctx acquired exactly once). > > > > blk_unref() is called from the main loop thread without having acquired > > blk's AioContext. > > > > Normally blk->refcnt will be > 1 so bdrv_drain() won't be called, but > > I'm not sure if that can be guaranteed. > > > > The following seems safer although it's uglier: > > > > aio_context = bdrv_get_aio_context(bs); > > aio_context_acquire(aio_context); > > blk_unref(blk); > > aio_context_release(aio_context); > > May we actually acquire aio_context if blk is in the main thread? I > think we must only do this if it's in a different iothread because we'd > end up with a recursive lock and drain would hang. Right :). Maybe an aio_context_acquire_once() API would help. Stefan
diff --git a/qapi/block-core.json b/qapi/block-core.json index 0345f6f2d2..d3e49c9419 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -1302,7 +1302,8 @@ { 'command': 'block_resize', 'data': { '*device': 'str', '*node-name': 'str', - 'size': 'int' } } + 'size': 'int' }, + 'coroutine': true } ## # @NewImageMode: diff --git a/blockdev.c b/blockdev.c index 7f2561081e..064989fc2d 100644 --- a/blockdev.c +++ b/blockdev.c @@ -2439,14 +2439,14 @@ BlockDirtyBitmapSha256 *qmp_x_debug_block_dirty_bitmap_sha256(const char *node, return ret; } -void qmp_block_resize(bool has_device, const char *device, - bool has_node_name, const char *node_name, - int64_t size, Error **errp) +void coroutine_fn qmp_block_resize(bool has_device, const char *device, + bool has_node_name, const char *node_name, + int64_t size, Error **errp) { Error *local_err = NULL; BlockBackend *blk = NULL; BlockDriverState *bs; - AioContext *aio_context; + AioContext *old_ctx; bs = bdrv_lookup_bs(has_device ? device : NULL, has_node_name ? node_name : NULL, @@ -2456,8 +2456,7 @@ void qmp_block_resize(bool has_device, const char *device, return; } - aio_context = bdrv_get_aio_context(bs); - aio_context_acquire(aio_context); + old_ctx = bdrv_co_move_to_aio_context(bs); if (size < 0) { error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "size", "a >0 size"); @@ -2479,8 +2478,8 @@ void qmp_block_resize(bool has_device, const char *device, bdrv_drained_end(bs); out: + aio_co_reschedule_self(old_ctx); blk_unref(blk); - aio_context_release(aio_context); } void qmp_block_stream(bool has_job_id, const char *job_id, const char *device, diff --git a/hmp-commands.hx b/hmp-commands.hx index 60f395c276..ac360b73f6 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -76,6 +76,7 @@ ERST .params = "device size", .help = "resize a block image", .cmd = hmp_block_resize, + .coroutine = true, }, SRST
block_resize performs some I/O that could potentially take quite some time, so use it as an example for the new 'coroutine': true annotation in the QAPI schema. bdrv_truncate() requires that we're already in the right AioContext for the BlockDriverState if called in coroutine context. So instead of just taking the AioContext lock, move the QMP handler coroutine to the context. Call blk_unref() only after switching back because blk_unref() may only be called in the main thread. Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- qapi/block-core.json | 3 ++- blockdev.c | 13 ++++++------- hmp-commands.hx | 1 + 3 files changed, 9 insertions(+), 8 deletions(-)