Message ID | 20201102153010.11979-1-peterx@redhat.com |
---|---|
Headers | show |
Series | migration: Two extra fixes | expand |
* Peter Xu (peterx@redhat.com) wrote: > The new migrate_send_rp_req_pages_pending() call should greatly improve > destination responsiveness because it will resync faulted address after > postcopy recovery. However it is also the 1st place to initiate the page > request from the main thread. > > One thing is overlooked on that migrate_send_rp_message_req_pages() is not > designed to be thread-safe. So if we wake the fault thread before syncing all > the faulted pages in the main thread, it means they can race. > > Postpone the wake up operation after the sync of faulted addresses. > > Fixes: 0c26781c09 ("migration: Sync requested pages after postcopy recovery") > Tested-by: Christian Schoenebeck <qemu_oss@crudebyte.com> > Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> > --- > migration/savevm.c | 11 ++++++++--- > 1 file changed, 8 insertions(+), 3 deletions(-) > > diff --git a/migration/savevm.c b/migration/savevm.c > index e8834991ec..5f937a2762 100644 > --- a/migration/savevm.c > +++ b/migration/savevm.c > @@ -2069,12 +2069,9 @@ static int loadvm_postcopy_handle_resume(MigrationIncomingState *mis) > > /* > * This means source VM is ready to resume the postcopy migration. > - * It's time to switch state and release the fault thread to > - * continue service page faults. > */ > migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_RECOVER, > MIGRATION_STATUS_POSTCOPY_ACTIVE); > - qemu_sem_post(&mis->postcopy_pause_sem_fault); > > trace_loadvm_postcopy_handle_resume(); > > @@ -2095,6 +2092,14 @@ static int loadvm_postcopy_handle_resume(MigrationIncomingState *mis) > */ > migrate_send_rp_req_pages_pending(mis); > > + /* > + * It's time to switch state and release the fault thread to continue > + * service page faults. Note that this should be explicitly after the > + * above call to migrate_send_rp_req_pages_pending(). In short: > + * migrate_send_rp_message_req_pages() is not thread safe, yet. > + */ > + qemu_sem_post(&mis->postcopy_pause_sem_fault); > + > return 0; > } > > -- > 2.26.2 > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
* Peter Xu (peterx@redhat.com) wrote: > This should fix intermittent hang of migration-test due to the latest update to > postcopy recovery. > > Thanks, Queued > > Peter Xu (2): > migration: Unify reset of last_rb on destination node when recover > migration: Postpone the kick of the fault thread after recover > > migration/postcopy-ram.c | 2 -- > migration/savevm.c | 17 ++++++++++++++--- > 2 files changed, 14 insertions(+), 5 deletions(-) > > -- > 2.26.2 > > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
* Peter Xu (peterx@redhat.com) wrote: > This should fix intermittent hang of migration-test due to the latest update to > postcopy recovery. > > Thanks, Queued > > Peter Xu (2): > migration: Unify reset of last_rb on destination node when recover > migration: Postpone the kick of the fault thread after recover > > migration/postcopy-ram.c | 2 -- > migration/savevm.c | 17 ++++++++++++++--- > 2 files changed, 14 insertions(+), 5 deletions(-) > > -- > 2.26.2 > > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK