mbox series

[0/2] migration: Two extra fixes

Message ID 20201102153010.11979-1-peterx@redhat.com
Headers show
Series migration: Two extra fixes | expand

Message

Peter Xu Nov. 2, 2020, 3:30 p.m. UTC
This should fix intermittent hang of migration-test due to the latest update to
postcopy recovery.

Thanks,

Peter Xu (2):
  migration: Unify reset of last_rb on destination node when recover
  migration: Postpone the kick of the fault thread after recover

 migration/postcopy-ram.c |  2 --
 migration/savevm.c       | 17 ++++++++++++++---
 2 files changed, 14 insertions(+), 5 deletions(-)

-- 
2.26.2

Comments

Dr. David Alan Gilbert Nov. 2, 2020, 6:24 p.m. UTC | #1
* Peter Xu (peterx@redhat.com) wrote:
> The new migrate_send_rp_req_pages_pending() call should greatly improve

> destination responsiveness because it will resync faulted address after

> postcopy recovery.  However it is also the 1st place to initiate the page

> request from the main thread.

> 

> One thing is overlooked on that migrate_send_rp_message_req_pages() is not

> designed to be thread-safe.  So if we wake the fault thread before syncing all

> the faulted pages in the main thread, it means they can race.

> 

> Postpone the wake up operation after the sync of faulted addresses.

> 

> Fixes: 0c26781c09 ("migration: Sync requested pages after postcopy recovery")

> Tested-by: Christian Schoenebeck <qemu_oss@crudebyte.com>

> Signed-off-by: Peter Xu <peterx@redhat.com>


Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>


> ---

>  migration/savevm.c | 11 ++++++++---

>  1 file changed, 8 insertions(+), 3 deletions(-)

> 

> diff --git a/migration/savevm.c b/migration/savevm.c

> index e8834991ec..5f937a2762 100644

> --- a/migration/savevm.c

> +++ b/migration/savevm.c

> @@ -2069,12 +2069,9 @@ static int loadvm_postcopy_handle_resume(MigrationIncomingState *mis)

>  

>      /*

>       * This means source VM is ready to resume the postcopy migration.

> -     * It's time to switch state and release the fault thread to

> -     * continue service page faults.

>       */

>      migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_RECOVER,

>                        MIGRATION_STATUS_POSTCOPY_ACTIVE);

> -    qemu_sem_post(&mis->postcopy_pause_sem_fault);

>  

>      trace_loadvm_postcopy_handle_resume();

>  

> @@ -2095,6 +2092,14 @@ static int loadvm_postcopy_handle_resume(MigrationIncomingState *mis)

>       */

>      migrate_send_rp_req_pages_pending(mis);

>  

> +    /*

> +     * It's time to switch state and release the fault thread to continue

> +     * service page faults.  Note that this should be explicitly after the

> +     * above call to migrate_send_rp_req_pages_pending().  In short:

> +     * migrate_send_rp_message_req_pages() is not thread safe, yet.

> +     */

> +    qemu_sem_post(&mis->postcopy_pause_sem_fault);

> +

>      return 0;

>  }

>  

> -- 

> 2.26.2

> 

-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Dr. David Alan Gilbert Nov. 2, 2020, 6:26 p.m. UTC | #2
* Peter Xu (peterx@redhat.com) wrote:
> This should fix intermittent hang of migration-test due to the latest update to

> postcopy recovery.

> 

> Thanks,


Queued

> 

> Peter Xu (2):

>   migration: Unify reset of last_rb on destination node when recover

>   migration: Postpone the kick of the fault thread after recover

> 

>  migration/postcopy-ram.c |  2 --

>  migration/savevm.c       | 17 ++++++++++++++---

>  2 files changed, 14 insertions(+), 5 deletions(-)

> 

> -- 

> 2.26.2

> 

> 

-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Dr. David Alan Gilbert Nov. 2, 2020, 6:26 p.m. UTC | #3
* Peter Xu (peterx@redhat.com) wrote:
> This should fix intermittent hang of migration-test due to the latest update to

> postcopy recovery.

> 

> Thanks,


Queued

> 

> Peter Xu (2):

>   migration: Unify reset of last_rb on destination node when recover

>   migration: Postpone the kick of the fault thread after recover

> 

>  migration/postcopy-ram.c |  2 --

>  migration/savevm.c       | 17 ++++++++++++++---

>  2 files changed, 14 insertions(+), 5 deletions(-)

> 

> -- 

> 2.26.2

> 

> 

-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK