[v3,10/11] xen: Update sched clock offset to avoid system instability in hibernation

Message ID 238e837b8d4e17925801c4e85de17bdfca4ddd00.1598042152.git.anchalag@amazon.com
State New
Headers show
Series
  • [v3,01/11] xen/manage: keep track of the on-going suspend mode
Related show

Commit Message

Anchal Agarwal Aug. 21, 2020, 10:30 p.m.
Save/restore xen_sched_clock_offset in syscore suspend/resume during PM
hibernation. Commit '867cefb4cb1012: ("xen: Fix x86 sched_clock() interface
for xen")' fixes xen guest time handling during migration. A similar issue
is seen during PM hibernation when system runs CPU intensive workload.
Post resume pvclock resets the value to 0 however, xen sched_clock_offset
is never updated. System instability is seen during resume from hibernation
when system is under heavy CPU load. Since xen_sched_clock_offset is not
updated, system does not see the monotonic clock value and the scheduler
would then think that heavy CPU hog tasks need more time in CPU, causing
the system to freeze

Signed-off-by: Anchal Agarwal <anchalag@amazon.com>
---
 arch/x86/xen/suspend.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

Comments

Boris Ostrovsky Sept. 13, 2020, 5:52 p.m. | #1
On 8/21/20 6:30 PM, Anchal Agarwal wrote:
> Save/restore xen_sched_clock_offset in syscore suspend/resume during PM
> hibernation. Commit '867cefb4cb1012: ("xen: Fix x86 sched_clock() interface
> for xen")' fixes xen guest time handling during migration. A similar issue
> is seen during PM hibernation when system runs CPU intensive workload.
> Post resume pvclock resets the value to 0 however, xen sched_clock_offset
> is never updated. System instability is seen during resume from hibernation
> when system is under heavy CPU load. Since xen_sched_clock_offset is not
> updated, system does not see the monotonic clock value and the scheduler
> would then think that heavy CPU hog tasks need more time in CPU, causing
> the system to freeze


I don't think you need to explain why non-monotonic clocks are bad.
(and, in fact, the same applies to commit message in patch 8)


>
> Signed-off-by: Anchal Agarwal <anchalag@amazon.com>
> ---
>  arch/x86/xen/suspend.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/xen/suspend.c b/arch/x86/xen/suspend.c
> index b12db6966af6..a62e08a11681 100644
> --- a/arch/x86/xen/suspend.c
> +++ b/arch/x86/xen/suspend.c
> @@ -98,8 +98,9 @@ static int xen_syscore_suspend(void)
>  		return 0;
>  
>  	gnttab_suspend();
> -
>  	xen_manage_runstate_time(-1);
> +	xen_save_sched_clock_offset();
> +
>  	xrfp.domid = DOMID_SELF;
>  	xrfp.gpfn = __pa(HYPERVISOR_shared_info) >> PAGE_SHIFT;
>  
> @@ -120,6 +121,12 @@ static void xen_syscore_resume(void)
>  	xen_hvm_map_shared_info();
>  
>  	pvclock_resume();
> +
> +	/*
> +	 * Restore xen_sched_clock_offset during resume to maintain
> +	 * monotonic clock value
> +	 */


I'd drop this comment, we know what the call does.


-boris


> +	xen_restore_sched_clock_offset();
>  	xen_manage_runstate_time(0);
>  	gnttab_resume();
>  }

Patch

diff --git a/arch/x86/xen/suspend.c b/arch/x86/xen/suspend.c
index b12db6966af6..a62e08a11681 100644
--- a/arch/x86/xen/suspend.c
+++ b/arch/x86/xen/suspend.c
@@ -98,8 +98,9 @@  static int xen_syscore_suspend(void)
 		return 0;
 
 	gnttab_suspend();
-
 	xen_manage_runstate_time(-1);
+	xen_save_sched_clock_offset();
+
 	xrfp.domid = DOMID_SELF;
 	xrfp.gpfn = __pa(HYPERVISOR_shared_info) >> PAGE_SHIFT;
 
@@ -120,6 +121,12 @@  static void xen_syscore_resume(void)
 	xen_hvm_map_shared_info();
 
 	pvclock_resume();
+
+	/*
+	 * Restore xen_sched_clock_offset during resume to maintain
+	 * monotonic clock value
+	 */
+	xen_restore_sched_clock_offset();
 	xen_manage_runstate_time(0);
 	gnttab_resume();
 }