Message ID | 1416939705-1272-1-git-send-email-peter.maydell@linaro.org |
---|---|
State | Accepted |
Commit | 490309fcfbed9fa1ed357541f609975016a34628 |
Headers | show |
On Tue, Nov 25, 2014 at 06:21:45PM +0000, Peter Maydell wrote: > In qemu_poll_ns(), when we convert an int64_t nanosecond timeout into > a struct timespec, we may accidentally run into overflow problems if > the timeout is very long. This happens because the tv_sec field is a > time_t, which is signed, so we might end up setting it to a negative > value by mistake. This will result in what was intended to be a > near-infinite timeout turning into an instantaneous timeout, and we'll > busy loop. Cap the maximum timeout at INT32_MAX seconds (about 68 years) > to avoid this problem. > > This specifically manifested on ARM hosts as an extreme slowdown on > guest shutdown (when the guest reprogrammed the PL031 RTC to not > generate alarms using a very long timeout) but could happen on other > hosts and guests too. > > Reported-by: Christoffer Dall <christoffer.dall@linaro.org> > Signed-off-by: Peter Maydell <peter.maydell@linaro.org> > --- > It's not quite clear why this only causes problems in some KVM > configurations -- presumably in the others we complete the guest > shutdown reasonably quickly without the busy-waiting QEMU thread > interfering, but in some setups, notably on TC2 host, we go into > an extreme slowdown printing out the final bits of the guest shutdown > to its serial port. Given that (and given that I think this is fairly > safe) I'd like to get this into 2.2 if possible... > It's visibly a cleaner shutdown on my cubieboard2 (ubuntu kernel config) than without this patch. I've been running a VM on TC2 in a loop with shutdown for a couple of hours and it just works now, so this patch definitely solves the issue I was seeing. I'm wondering if the timespec struct field is an unsigned long and that's why we werent' seeing the overflow on arm64? In any case, huge thanks for chasing this down. -Christoffer
On 25 November 2014 at 20:29, Christoffer Dall <christoffer.dall@linaro.org> wrote: > I'm wondering if the timespec struct field is an unsigned long and > that's why we werent' seeing the overflow on arm64? It's a time_t, and they're signed, but I imagine on arm64 time_t is 64 bits. -- PMM
On 26 November 2014 at 03:09, Fam Zheng <famz@redhat.com> wrote: > On Tue, 11/25 18:21, Peter Maydell wrote: >> In qemu_poll_ns(), when we convert an int64_t nanosecond timeout into >> a struct timespec, we may accidentally run into overflow problems if >> the timeout is very long. This happens because the tv_sec field is a >> time_t, which is signed, so we might end up setting it to a negative >> value by mistake. This will result in what was intended to be a >> near-infinite timeout turning into an instantaneous timeout, and we'll >> busy loop. Cap the maximum timeout at INT32_MAX seconds (about 68 years) >> to avoid this problem. > Reviewed-by: Fam Zheng <famz@redhat.com> Thanks. Applied to master (with a cc:stable). -- PMM
diff --git a/qemu-timer.c b/qemu-timer.c index 00a5d35..c77de64 100644 --- a/qemu-timer.c +++ b/qemu-timer.c @@ -314,7 +314,14 @@ int qemu_poll_ns(GPollFD *fds, guint nfds, int64_t timeout) return ppoll((struct pollfd *)fds, nfds, NULL, NULL); } else { struct timespec ts; - ts.tv_sec = timeout / 1000000000LL; + int64_t tvsec = timeout / 1000000000LL; + /* Avoid possibly overflowing and specifying a negative number of + * seconds, which would turn a very long timeout into a busy-wait. + */ + if (tvsec > (int64_t)INT32_MAX) { + tvsec = INT32_MAX; + } + ts.tv_sec = tvsec; ts.tv_nsec = timeout % 1000000000LL; return ppoll((struct pollfd *)fds, nfds, &ts, NULL); }
In qemu_poll_ns(), when we convert an int64_t nanosecond timeout into a struct timespec, we may accidentally run into overflow problems if the timeout is very long. This happens because the tv_sec field is a time_t, which is signed, so we might end up setting it to a negative value by mistake. This will result in what was intended to be a near-infinite timeout turning into an instantaneous timeout, and we'll busy loop. Cap the maximum timeout at INT32_MAX seconds (about 68 years) to avoid this problem. This specifically manifested on ARM hosts as an extreme slowdown on guest shutdown (when the guest reprogrammed the PL031 RTC to not generate alarms using a very long timeout) but could happen on other hosts and guests too. Reported-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- It's not quite clear why this only causes problems in some KVM configurations -- presumably in the others we complete the guest shutdown reasonably quickly without the busy-waiting QEMU thread interfering, but in some setups, notably on TC2 host, we go into an extreme slowdown printing out the final bits of the guest shutdown to its serial port. Given that (and given that I think this is fairly safe) I'd like to get this into 2.2 if possible... qemu-timer.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)