From patchwork Tue Jan 23 08:54:54 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Pavel Dovgalyuk X-Patchwork-Id: 125482 Delivered-To: patch@linaro.org Received: by 10.46.66.141 with SMTP id h13csp1632534ljf; Tue, 23 Jan 2018 01:02:48 -0800 (PST) X-Google-Smtp-Source: AH8x227eODsyjFhr/w8dIKOTxU3/vRsygn0UAHC8Cx3B0xzMPHXqqexp62NaBSNaj7Vhkgr6sQce X-Received: by 10.37.106.65 with SMTP id f62mr1650303ybc.460.1516698168628; Tue, 23 Jan 2018 01:02:48 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1516698168; cv=none; d=google.com; s=arc-20160816; b=qpG1T7QBQrcML9/9vliniBCN91nMjJ1lBF5sGRzODPTgBmxnxDn6uDgfXalsd5tvpe lExUO+jMg9FElDzas6OYVGkjC9LrdJa/u2kTTjXIHmyoXxwseCwZVFZtSBGO9TpcRgxr V1GVYPDvZReQSLf7GwuILb+p0DvpDnURlE+Ea3WE842+eTT2pg6Zmh+Q1YjnjI776B11 BfANk3/L38tQVSmTGMtJTZFfxdgacwUkPZ8heEqQHqbvdCstpK2qAuqIl/MdzdFn1oOt B3eS8OhEdze0sychg+GphbNrxUb5OVx6H0VaPr1cnW4wlrSXfmSc188nDZZyMGcHSnPw kFFA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject :content-transfer-encoding:mime-version:user-agent:references :in-reply-to:message-id:date:to:from:arc-authentication-results; bh=G5XmA5OZXHsA0+tihtYk3aaESR9mz836Ho0ASTq6Eoc=; b=snbBseHRvAavzDtaGtytC6lkVHNMlNtdDfQXB8C930QM6r7EL9MB3J+V0obG4sY0FZ Rtygoh8cOp55OyMZ3p6gBPMp9AdUPjxtA/3YewUNZgFFsWIuBxomkpDlfmyX6khP1pZ3 OIhPJHB/hX9cjiDaEttNAYxUSEc+BXMfhbmfhM53sjWNGab3p2jlifDk21lwKgpfZtkc n4WjuFlJ0KhQrmpP+cquXKTVqAqKOw0lLBtTRak5l+/lqKkCtdv20D0UEdLhOyDaGU15 VgjLOyEWqVBZTD+7/7pGHqq8gWrlrRGpaLSdUInlaP+LLnNldS2R95aDhNxWyEjeKS3l wejA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11]) by mx.google.com with ESMTPS id t9si982715ybk.393.2018.01.23.01.02.48 for (version=TLS1 cipher=AES128-SHA bits=128/128); Tue, 23 Jan 2018 01:02:48 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; Authentication-Results: mx.google.com; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org Received: from localhost ([::1]:45972 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eduTf-0000aY-UC for patch@linaro.org; Tue, 23 Jan 2018 04:02:48 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:40867) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eduM5-0002FF-8v for qemu-devel@nongnu.org; Tue, 23 Jan 2018 03:54:59 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eduM3-0000OS-1v for qemu-devel@nongnu.org; Tue, 23 Jan 2018 03:54:57 -0500 Received: from mail.ispras.ru ([83.149.199.45]:57350) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eduM2-0000No-HH for qemu-devel@nongnu.org; Tue, 23 Jan 2018 03:54:54 -0500 Received: from [127.0.1.1] (unknown [85.142.117.226]) by mail.ispras.ru (Postfix) with ESMTPSA id 88D4854006A; Tue, 23 Jan 2018 11:54:53 +0300 (MSK) From: Pavel Dovgalyuk To: qemu-devel@nongnu.org Date: Tue, 23 Jan 2018 11:54:54 +0300 Message-ID: <20180123085454.3419.80253.stgit@pasha-VirtualBox> In-Reply-To: <20180123085319.3419.97865.stgit@pasha-VirtualBox> References: <20180123085319.3419.97865.stgit@pasha-VirtualBox> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x [fuzzy] X-Received-From: 83.149.199.45 Subject: [Qemu-devel] [RFC PATCH v5 17/24] replay: push replay_mutex_lock up the call tree X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kwolf@redhat.com, peter.maydell@linaro.org, pavel.dovgaluk@ispras.ru, mst@redhat.com, jasowang@redhat.com, quintela@redhat.com, zuban32s@gmail.com, maria.klimushenkova@ispras.ru, dovgaluk@ispras.ru, kraxel@redhat.com, boost.lists@gmail.com, pbonzini@redhat.com, alex.bennee@linaro.org Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" From: Alex Bennée Now instead of using the replay_lock to guard the output of the log we now use it to protect the whole execution section. This replaces what the BQL used to do when it was held during TCG execution. We also introduce some rules for locking order - mainly that you cannot take the replay_mutex while holding the BQL. This leads to some slight sophistry during start-up and extending the replay_mutex_destroy function to unlock the mutex without checking for the BQL condition so it can be cleanly dropped in the non-replay case. Signed-off-by: Alex Bennée Tested-by: Pavel Dovgalyuk -- v2: updated replay_mutex_lock/unlock functions as suggested by Paolo Bonzini updated docs --- cpus.c | 16 ++++++++++++++++ docs/replay.txt | 22 ++++++++++++++++++++++ include/sysemu/replay.h | 2 ++ replay/replay-char.c | 21 ++++++++------------- replay/replay-events.c | 18 +++++------------- replay/replay-internal.c | 33 +++++++++++++++++++++++---------- replay/replay-time.c | 10 +++++----- replay/replay.c | 38 ++++++++++++++++++-------------------- util/main-loop.c | 17 +++++++++++++---- vl.c | 2 ++ 10 files changed, 114 insertions(+), 65 deletions(-) diff --git a/cpus.c b/cpus.c index 577c764..a1f8808 100644 --- a/cpus.c +++ b/cpus.c @@ -1309,6 +1309,8 @@ static void prepare_icount_for_run(CPUState *cpu) insns_left = MIN(0xffff, cpu->icount_budget); cpu->icount_decr.u16.low = insns_left; cpu->icount_extra = cpu->icount_budget - insns_left; + + replay_mutex_lock(); } } @@ -1324,6 +1326,8 @@ static void process_icount_data(CPUState *cpu) cpu->icount_budget = 0; replay_account_executed_instructions(); + + replay_mutex_unlock(); } } @@ -1410,6 +1414,7 @@ static void *qemu_tcg_rr_cpu_thread_fn(void *arg) cpu->exit_request = 1; while (1) { + replay_mutex_lock(); qemu_mutex_lock_iothread(); /* Account partial waits to QEMU_CLOCK_VIRTUAL. */ @@ -1422,6 +1427,8 @@ static void *qemu_tcg_rr_cpu_thread_fn(void *arg) qemu_mutex_unlock_iothread(); + replay_mutex_unlock(); + if (!cpu) { cpu = first_cpu; } @@ -1727,12 +1734,21 @@ void pause_all_vcpus(void) } } + /* We need to drop the replay_lock so any vCPU threads woken up + * can finish their replay tasks + */ + replay_mutex_unlock(); + while (!all_vcpus_paused()) { qemu_cond_wait(&qemu_pause_cond, &qemu_global_mutex); CPU_FOREACH(cpu) { qemu_cpu_kick(cpu); } } + + qemu_mutex_unlock_iothread(); + replay_mutex_lock(); + qemu_mutex_lock_iothread(); } void cpu_resume(CPUState *cpu) diff --git a/docs/replay.txt b/docs/replay.txt index c52407f..959633e 100644 --- a/docs/replay.txt +++ b/docs/replay.txt @@ -49,6 +49,28 @@ Modifications of qemu include: * recording/replaying user input (mouse and keyboard) * adding internal checkpoints for cpu and io synchronization +Locking and thread synchronisation +---------------------------------- + +Previously the synchronisation of the main thread and the vCPU thread +was ensured by the holding of the BQL. However the trend has been to +reduce the time the BQL was held across the system including under TCG +system emulation. As it is important that batches of events are kept +in sequence (e.g. expiring timers and checkpoints in the main thread +while instruction checkpoints are written by the vCPU thread) we need +another lock to keep things in lock-step. This role is now handled by +the replay_mutex_lock. It used to be held only for each event being +written but now it is held for a whole execution period. This results +in a deterministic ping-pong between the two main threads. + +As the BQL is now a finer grained lock than the replay_lock it is almost +certainly a bug, and a source of deadlocks, to take the +replay_mutex_lock while the BQL is held. This is enforced by an assert. +While the unlocks are usually in the reverse order, this is not +necessary; you can drop the replay_lock while holding the BQL, without +doing a more complicated unlock_iothread/replay_unlock/lock_iothread +sequence. + Non-deterministic events ------------------------ diff --git a/include/sysemu/replay.h b/include/sysemu/replay.h index 9973849..d026b28 100644 --- a/include/sysemu/replay.h +++ b/include/sysemu/replay.h @@ -63,6 +63,8 @@ bool replay_mutex_locked(void); /* Replay process control functions */ +/*! Enables and take replay locks (even if we don't use it) */ +void replay_init_locks(void); /*! Enables recording or saving event log with specified parameters */ void replay_configure(struct QemuOpts *opts); /*! Initializes timers used for snapshotting and enables events recording */ diff --git a/replay/replay-char.c b/replay/replay-char.c index cbf7c04..736cc8c 100755 --- a/replay/replay-char.c +++ b/replay/replay-char.c @@ -96,25 +96,24 @@ void *replay_event_char_read_load(void) void replay_char_write_event_save(int res, int offset) { + g_assert(replay_mutex_locked()); + replay_save_instructions(); - replay_mutex_lock(); replay_put_event(EVENT_CHAR_WRITE); replay_put_dword(res); replay_put_dword(offset); - replay_mutex_unlock(); } void replay_char_write_event_load(int *res, int *offset) { + g_assert(replay_mutex_locked()); + replay_account_executed_instructions(); - replay_mutex_lock(); if (replay_next_event_is(EVENT_CHAR_WRITE)) { *res = replay_get_dword(); *offset = replay_get_dword(); replay_finish_event(); - replay_mutex_unlock(); } else { - replay_mutex_unlock(); error_report("Missing character write event in the replay log"); exit(1); } @@ -122,23 +121,21 @@ void replay_char_write_event_load(int *res, int *offset) int replay_char_read_all_load(uint8_t *buf) { - replay_mutex_lock(); + g_assert(replay_mutex_locked()); + if (replay_next_event_is(EVENT_CHAR_READ_ALL)) { size_t size; int res; replay_get_array(buf, &size); replay_finish_event(); - replay_mutex_unlock(); res = (int)size; assert(res >= 0); return res; } else if (replay_next_event_is(EVENT_CHAR_READ_ALL_ERROR)) { int res = replay_get_dword(); replay_finish_event(); - replay_mutex_unlock(); return res; } else { - replay_mutex_unlock(); error_report("Missing character read all event in the replay log"); exit(1); } @@ -146,19 +143,17 @@ int replay_char_read_all_load(uint8_t *buf) void replay_char_read_all_save_error(int res) { + g_assert(replay_mutex_locked()); assert(res < 0); replay_save_instructions(); - replay_mutex_lock(); replay_put_event(EVENT_CHAR_READ_ALL_ERROR); replay_put_dword(res); - replay_mutex_unlock(); } void replay_char_read_all_save_buf(uint8_t *buf, int offset) { + g_assert(replay_mutex_locked()); replay_save_instructions(); - replay_mutex_lock(); replay_put_event(EVENT_CHAR_READ_ALL); replay_put_array(buf, offset); - replay_mutex_unlock(); } diff --git a/replay/replay-events.c b/replay/replay-events.c index e858254..a941efb 100644 --- a/replay/replay-events.c +++ b/replay/replay-events.c @@ -79,16 +79,14 @@ bool replay_has_events(void) void replay_flush_events(void) { - replay_mutex_lock(); + g_assert(replay_mutex_locked()); + while (!QTAILQ_EMPTY(&events_list)) { Event *event = QTAILQ_FIRST(&events_list); - replay_mutex_unlock(); replay_run_event(event); - replay_mutex_lock(); QTAILQ_REMOVE(&events_list, event, events); g_free(event); } - replay_mutex_unlock(); } void replay_disable_events(void) @@ -102,14 +100,14 @@ void replay_disable_events(void) void replay_clear_events(void) { - replay_mutex_lock(); + g_assert(replay_mutex_locked()); + while (!QTAILQ_EMPTY(&events_list)) { Event *event = QTAILQ_FIRST(&events_list); QTAILQ_REMOVE(&events_list, event, events); g_free(event); } - replay_mutex_unlock(); } /*! Adds specified async event to the queue */ @@ -136,9 +134,8 @@ void replay_add_event(ReplayAsyncEventKind event_kind, event->opaque2 = opaque2; event->id = id; - replay_mutex_lock(); + g_assert(replay_mutex_locked()); QTAILQ_INSERT_TAIL(&events_list, event, events); - replay_mutex_unlock(); } void replay_bh_schedule_event(QEMUBH *bh) @@ -210,10 +207,7 @@ void replay_save_events(int checkpoint) while (!QTAILQ_EMPTY(&events_list)) { Event *event = QTAILQ_FIRST(&events_list); replay_save_event(event, checkpoint); - - replay_mutex_unlock(); replay_run_event(event); - replay_mutex_lock(); QTAILQ_REMOVE(&events_list, event, events); g_free(event); } @@ -299,9 +293,7 @@ void replay_read_events(int checkpoint) } replay_finish_event(); read_event_kind = -1; - replay_mutex_unlock(); replay_run_event(event); - replay_mutex_lock(); g_free(event); } diff --git a/replay/replay-internal.c b/replay/replay-internal.c index a9a6a64..a1a7686 100644 --- a/replay/replay-internal.c +++ b/replay/replay-internal.c @@ -174,30 +174,43 @@ static __thread bool replay_locked; void replay_mutex_init(void) { qemu_mutex_init(&lock); + /* Hold the mutex while we start-up */ + qemu_mutex_lock(&lock); + replay_locked = true; } -void replay_mutex_destroy(void) +bool replay_mutex_locked(void) { - qemu_mutex_destroy(&lock); + return replay_locked; } -bool replay_mutex_locked(void) +void replay_mutex_destroy(void) { - return replay_locked; + if (replay_mutex_locked()) { + qemu_mutex_unlock(&lock); + } + qemu_mutex_destroy(&lock); } +/* Ordering constraints, replay_lock must be taken before BQL */ void replay_mutex_lock(void) { - g_assert(!replay_mutex_locked()); - qemu_mutex_lock(&lock); - replay_locked = true; + if (replay_mode != REPLAY_MODE_NONE) { + g_assert(!qemu_mutex_iothread_locked()); + g_assert(!replay_mutex_locked()); + qemu_mutex_lock(&lock); + replay_locked = true; + } } +/* BQL can't be held when releasing the replay_lock */ void replay_mutex_unlock(void) { - g_assert(replay_mutex_locked()); - replay_locked = false; - qemu_mutex_unlock(&lock); + if (replay_mode != REPLAY_MODE_NONE) { + g_assert(replay_mutex_locked()); + replay_locked = false; + qemu_mutex_unlock(&lock); + } } /*! Saves cached instructions. */ diff --git a/replay/replay-time.c b/replay/replay-time.c index f70382a..6a7565e 100644 --- a/replay/replay-time.c +++ b/replay/replay-time.c @@ -17,13 +17,13 @@ int64_t replay_save_clock(ReplayClockKind kind, int64_t clock) { - replay_save_instructions(); if (replay_file) { - replay_mutex_lock(); + g_assert(replay_mutex_locked()); + + replay_save_instructions(); replay_put_event(EVENT_CLOCK + kind); replay_put_qword(clock); - replay_mutex_unlock(); } return clock; @@ -46,16 +46,16 @@ void replay_read_next_clock(ReplayClockKind kind) /*! Reads next clock event from the input. */ int64_t replay_read_clock(ReplayClockKind kind) { + g_assert(replay_file && replay_mutex_locked()); + replay_account_executed_instructions(); if (replay_file) { int64_t ret; - replay_mutex_lock(); if (replay_next_event_is(EVENT_CLOCK + kind)) { replay_read_next_clock(kind); } ret = replay_state.cached_clock[kind]; - replay_mutex_unlock(); return ret; } diff --git a/replay/replay.c b/replay/replay.c index 4f24498..a3ab3bb 100644 --- a/replay/replay.c +++ b/replay/replay.c @@ -81,7 +81,7 @@ int replay_get_instructions(void) void replay_account_executed_instructions(void) { if (replay_mode == REPLAY_MODE_PLAY) { - replay_mutex_lock(); + g_assert(replay_mutex_locked()); if (replay_state.instructions_count > 0) { int count = (int)(replay_get_current_step() - replay_state.current_step); @@ -100,24 +100,22 @@ void replay_account_executed_instructions(void) qemu_notify_event(); } } - replay_mutex_unlock(); } } bool replay_exception(void) { + if (replay_mode == REPLAY_MODE_RECORD) { + g_assert(replay_mutex_locked()); replay_save_instructions(); - replay_mutex_lock(); replay_put_event(EVENT_EXCEPTION); - replay_mutex_unlock(); return true; } else if (replay_mode == REPLAY_MODE_PLAY) { + g_assert(replay_mutex_locked()); bool res = replay_has_exception(); if (res) { - replay_mutex_lock(); replay_finish_event(); - replay_mutex_unlock(); } return res; } @@ -129,10 +127,9 @@ bool replay_has_exception(void) { bool res = false; if (replay_mode == REPLAY_MODE_PLAY) { + g_assert(replay_mutex_locked()); replay_account_executed_instructions(); - replay_mutex_lock(); res = replay_next_event_is(EVENT_EXCEPTION); - replay_mutex_unlock(); } return res; @@ -141,17 +138,15 @@ bool replay_has_exception(void) bool replay_interrupt(void) { if (replay_mode == REPLAY_MODE_RECORD) { + g_assert(replay_mutex_locked()); replay_save_instructions(); - replay_mutex_lock(); replay_put_event(EVENT_INTERRUPT); - replay_mutex_unlock(); return true; } else if (replay_mode == REPLAY_MODE_PLAY) { + g_assert(replay_mutex_locked()); bool res = replay_has_interrupt(); if (res) { - replay_mutex_lock(); replay_finish_event(); - replay_mutex_unlock(); } return res; } @@ -163,10 +158,9 @@ bool replay_has_interrupt(void) { bool res = false; if (replay_mode == REPLAY_MODE_PLAY) { + g_assert(replay_mutex_locked()); replay_account_executed_instructions(); - replay_mutex_lock(); res = replay_next_event_is(EVENT_INTERRUPT); - replay_mutex_unlock(); } return res; } @@ -174,9 +168,8 @@ bool replay_has_interrupt(void) void replay_shutdown_request(ShutdownCause cause) { if (replay_mode == REPLAY_MODE_RECORD) { - replay_mutex_lock(); + g_assert(replay_mutex_locked()); replay_put_event(EVENT_SHUTDOWN + cause); - replay_mutex_unlock(); } } @@ -190,9 +183,9 @@ bool replay_checkpoint(ReplayCheckpoint checkpoint) return true; } - replay_mutex_lock(); if (replay_mode == REPLAY_MODE_PLAY) { + g_assert(replay_mutex_locked()); if (replay_next_event_is(EVENT_CHECKPOINT + checkpoint)) { replay_finish_event(); } else if (replay_state.data_kind != EVENT_ASYNC) { @@ -205,15 +198,20 @@ bool replay_checkpoint(ReplayCheckpoint checkpoint) checkpoint were processed */ res = replay_state.data_kind != EVENT_ASYNC; } else if (replay_mode == REPLAY_MODE_RECORD) { + g_assert(replay_mutex_locked()); replay_put_event(EVENT_CHECKPOINT + checkpoint); replay_save_events(checkpoint); res = true; } out: - replay_mutex_unlock(); return res; } +void replay_init_locks(void) +{ + replay_mutex_init(); +} + static void replay_enable(const char *fname, int mode) { const char *fmode = NULL; @@ -233,8 +231,6 @@ static void replay_enable(const char *fname, int mode) atexit(replay_finish); - replay_mutex_init(); - replay_file = fopen(fname, fmode); if (replay_file == NULL) { fprintf(stderr, "Replay: open %s: %s\n", fname, strerror(errno)); @@ -274,6 +270,8 @@ void replay_configure(QemuOpts *opts) Location loc; if (!opts) { + /* we no longer need this lock */ + replay_mutex_destroy(); return; } diff --git a/util/main-loop.c b/util/main-loop.c index 7558eb5..1e928b3 100644 --- a/util/main-loop.c +++ b/util/main-loop.c @@ -29,6 +29,7 @@ #include "qemu/sockets.h" // struct in_addr needed for libslirp.h #include "sysemu/qtest.h" #include "sysemu/cpus.h" +#include "sysemu/replay.h" #include "slirp/libslirp.h" #include "qemu/main-loop.h" #include "block/aio.h" @@ -245,18 +246,21 @@ static int os_host_main_loop_wait(int64_t timeout) timeout = SCALE_MS; } + if (timeout) { spin_counter = 0; - qemu_mutex_unlock_iothread(); } else { spin_counter++; } + qemu_mutex_unlock_iothread(); + + replay_mutex_unlock(); ret = qemu_poll_ns((GPollFD *)gpollfds->data, gpollfds->len, timeout); - if (timeout) { - qemu_mutex_lock_iothread(); - } + replay_mutex_lock(); + + qemu_mutex_lock_iothread(); glib_pollfds_poll(); @@ -463,8 +467,13 @@ static int os_host_main_loop_wait(int64_t timeout) poll_timeout_ns = qemu_soonest_timeout(poll_timeout_ns, timeout); qemu_mutex_unlock_iothread(); + + replay_mutex_unlock(); + g_poll_ret = qemu_poll_ns(poll_fds, n_poll_fds + w->num, poll_timeout_ns); + replay_mutex_lock(); + qemu_mutex_lock_iothread(); if (g_poll_ret > 0) { for (i = 0; i < w->num; i++) { diff --git a/vl.c b/vl.c index 1d5ea70..287704c 100644 --- a/vl.c +++ b/vl.c @@ -3091,6 +3091,8 @@ int main(int argc, char **argv, char **envp) qemu_init_cpu_list(); qemu_init_cpu_loop(); + + replay_init_locks(); qemu_mutex_lock_iothread(); atexit(qemu_run_exit_notifiers);