From patchwork Fri Dec 7 15:59:11 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Maydell X-Patchwork-Id: 153171 Delivered-To: patches@linaro.org Received: by 2002:a2e:299d:0:0:0:0:0 with SMTP id p29-v6csp659589ljp; Fri, 7 Dec 2018 07:59:14 -0800 (PST) X-Google-Smtp-Source: AFSGD/X9jcUGNstTXsNRfwLamz9aXXLp4l8uryy8LV7YiHKwvceuTRI2FslneAkFh7MehZbfN5jC X-Received: by 2002:adf:e247:: with SMTP id n7mr2158255wri.205.1544198354276; Fri, 07 Dec 2018 07:59:14 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544198354; cv=none; d=google.com; s=arc-20160816; b=sZhe0/wIYcIuB3ah+Y57gqB2CcmYOtfapa0Yir4XTBQk29AHmg92+btj2XoVko+Fw2 Ew+dgt1tkTECQGB3DRKjEatxxW3aLFzTMIREcjSWn6sVaVfmy0nvDAsqWBb5E+PhtS2u /fChaUhS5kogUFeLCgUmo32gsxX5zyoIlspA6+G3YaRafeMJ34I1KG4qZ69zVzlvZwAe uKAH5+zvYTkIwvFwW6q58RJr1lWyFy1UnYoJi35qyTjw/00UUsR/ERz/24zbHlyFm648 Cy42ZuPgr0OX8zEpR+qT5g/Dt/MQFbGz6omK5jhqerjdRMn6liR9YR5Nj5xRykCPM168 wC7w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from; bh=JyfycSXjknGoHk//ZDTt1R0e29jVvb3aL/7cx7Hsg74=; b=im0LtRJtGfG+7+fkLbAHiWS638dK67W1rfmIYdjj6m88CVM4pnMVrorLF7gqdH0t8K KZFdBXi9o/JkdRv1/xqNfcwm5OURwJCiStzaN3cpCsvJT9eCc+85TwHPKa4eQAFLvPfS MPaQVvd0pbzM5PjyKP3d1xIzOaGAGDexzrWWq3pXjqHcLGl1P5l4X4a4bh7z1AdtS8gu TChfGsqsiC1hqQGWOKUKDCisnWdxs6vru8utMJB2PQvpJzNAWlzMw72irjWFyeo3VoAh MQcXgMGv1Ran+rTWWX8Nf40U+/XL79Eranm8wwXx3G6FNcI35aiEMDQXALugDKopzzE0 I/aQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of pm215@archaic.org.uk designates 2001:8b0:1d0::2 as permitted sender) smtp.mailfrom=pm215@archaic.org.uk; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from orth.archaic.org.uk (orth.archaic.org.uk. [2001:8b0:1d0::2]) by mx.google.com with ESMTPS id i3si2871934wrh.295.2018.12.07.07.59.14 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 07 Dec 2018 07:59:14 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of pm215@archaic.org.uk designates 2001:8b0:1d0::2 as permitted sender) client-ip=2001:8b0:1d0::2; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of pm215@archaic.org.uk designates 2001:8b0:1d0::2 as permitted sender) smtp.mailfrom=pm215@archaic.org.uk; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from pm215 by orth.archaic.org.uk with local (Exim 4.89) (envelope-from ) id 1gVIX2-0006la-7A; Fri, 07 Dec 2018 15:59:12 +0000 From: Peter Maydell To: qemu-devel@nongnu.org Cc: patches@linaro.org, Paolo Bonzini , Jaap Crezee Subject: [PATCH] cpus.c: Fix race condition in cpu_stop_current() Date: Fri, 7 Dec 2018 15:59:11 +0000 Message-Id: <20181207155911.12710-1-peter.maydell@linaro.org> X-Mailer: git-send-email 2.19.2 MIME-Version: 1.0 We use cpu_stop_current() to ensure the current CPU has stopped from places like qemu_system_reset_request(). Unfortunately its current implementation has a race. It calls qemu_cpu_stop(), which sets cpu->stopped to true even though the CPU hasn't actually stopped yet. The main thread will look at the flags set by qemu_system_reset_request() and call pause_all_vcpus(). pause_all_vcpus() waits for every cpu to have cpu->stopped true, so it can continue (and we will start the system reset operation) before the vcpu thread has got back to its top level loop. Instead, just set cpu->stop and call cpu_exit(). This will cause the vcpu to exit back to the top level loop, and there (as part of the wait_io_event code) it will call qemu_cpu_stop(). This fixes bugs where the reset request appeared to be ignored or the CPU misbehaved because the reset operation started to change vcpu state while the vcpu thread was still using it. Signed-off-by: Peter Maydell --- We discussed this a little while back: https://lists.gnu.org/archive/html/qemu-devel/2018-10/msg00154.html and Jaap reported a bug which I suspect of being the same thing: https://lists.gnu.org/archive/html/qemu-discuss/2018-10/msg00014.html Annoyingly I have lost the test case that demonstrated this race, but I analysed it at the time and this should definitely fix it. I have opted not to try to address any of the other possible cleanup here (eg vm_stop() has a potential similar race if called from a vcpu thread I suspect), since it gets pretty tangled. Jaap: could you test whether this patch fixes the issue you were seeing, please? --- cpus.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) -- 2.19.2 Tested-by: Jaap Crezee Reviewed-by: Emilio G. Cota diff --git a/cpus.c b/cpus.c index 0ddeeefc14f..b09b7027126 100644 --- a/cpus.c +++ b/cpus.c @@ -2100,7 +2100,8 @@ void qemu_init_vcpu(CPUState *cpu) void cpu_stop_current(void) { if (current_cpu) { - qemu_cpu_stop(current_cpu, true); + current_cpu->stop = true; + cpu_exit(current_cpu); } }