From patchwork Thu May 22 23:52:15 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sean Christopherson X-Patchwork-Id: 891815 Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7C4D82D3235 for ; Thu, 22 May 2025 23:52:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747957965; cv=none; b=fz1qoi0QOkJM6qxcBO9OVpNFw53PoJXrmqmqUx6Ym4fA/J3kvXtoJP46dtQd0TJT/LzI6Fz390UZFkNOv/7cL91OUExhok2+krc2Oj2fx+Kme3hNHrx0ttuY1ULqJzEMlUGlr0CvMDQu84BTWrtJ7Fl49mrWsyvv+6+2JJ4WWD8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747957965; c=relaxed/simple; bh=JpleTdHHK7C1xQPDlVA5bM4aXnpZtQxgRshM++nWF7s=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=NtzSvLXLRskL58wcgEVt0UPhTtU2fvYysC/YnYUiMOXfMh37pWI3FvmlhuGStE4unr4lfHQFYuNxF/MNRfxtAlX8uwm0Rx3nmo69awwnF+x7TqkPuv19kxUeGr3hymFCapiuOxcDQMzrCpTTDqef6KmdQdHDFVunEc9FsahlvFM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=2PcJ5KCd; arc=none smtp.client-ip=209.85.214.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="2PcJ5KCd" Received: by mail-pl1-f202.google.com with SMTP id d9443c01a7336-231d13ac4d4so116209065ad.3 for ; Thu, 22 May 2025 16:52:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747957963; x=1748562763; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=LLDErP7lwjTldw1/pS35D8l5zAYowrsUowCS37TMy6E=; b=2PcJ5KCdGsyjyNx+ja5Vu9TdeI7Bd9WN7xtEXH9ilfxGpUNejh+MWer1qCPFgL+CWm fdG/fkDL4qlAfLM5AX30YWvRBrWnVcvyNE1JPTke+1nvJul3CilDJVR6P5bfVajUGK74 nl2ySuGJ5d0wRwd/NiBNYyBd4Cxt5f6d8i8orhTtTYh87PBBqiBk3Mq2cdGqB0UTDWGO k9kM6JYhOt1OecDHRWU+KdZyEf6tnG52dZDhjp5pavVg4EcjRbshkmeuGF8UD/MXaWLM Cs6oQq3JVQH6QHxd1T6DHlX+rQRYRU23ngawT/1gI/1zWjT29yd7bwqMYYElLmVoAgj0 YQAw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747957963; x=1748562763; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=LLDErP7lwjTldw1/pS35D8l5zAYowrsUowCS37TMy6E=; b=fcTMJ+D9k9MdlMBzcXfb8ww3IuHgmcwH73YSVmlf9v1kRaByqlRrAvyb4WgyHR1rDj 6tQvlMxR3tNheMyqE4I85UK9Lz+VK6ki1hCtwQeZ+KIpbKwxYSvPL2S/RLkNZAX+prKt cuTiH9Q5CsqGflTHxS7OZSNgLrupTaDDGZJbbF3M/jCHVPA0xbsN8zvYOhTBRIhCRLno MgQ3ftbAjkgB9YGZssHK+Lwl4GhJke+0vpSUY3lmajy1G9MUGpgfScEc8xRhL7wLdP5O l0G8n3jCgGp4X50QOGU8WfTJYnBUO2h9xDT5Xdjss3Vek7mKra4izc9KgkXrdTwLGdZ6 C0dQ== X-Forwarded-Encrypted: i=1; AJvYcCXhvAWKegoq5gWfpPPfH/o7SXW0sayr82iVWsDV54YcHO1rFH0mcVFaV6t3Av8E6hDeUeAk7RvHL26iaFqM2m4=@vger.kernel.org X-Gm-Message-State: AOJu0YxlpHC14on3Rb1+6SvOpe2sLDt43nTPKcsM03G09bkH/e3mhWRl cG0IxmnRyFjLo6J9wJd1Yjb1K+23zPQxQ1pE6uvqG80YxNlWFO3m/1SwW2i9vGvXq2MH1f1DraZ xEiz/qA== X-Google-Smtp-Source: AGHT+IHOq6Cu35CTphJBsObc11lhP6kmwx8FMCg+YC2dNqr/wE8KG8ou9486KEEIS+EkPDFQhABlYWoiNSA= X-Received: from plhi16.prod.google.com ([2002:a17:903:2ed0:b0:22e:4a61:5545]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:902:e808:b0:22e:3c2:d477 with SMTP id d9443c01a7336-233f21ae905mr11694945ad.25.1747957962830; Thu, 22 May 2025 16:52:42 -0700 (PDT) Reply-To: Sean Christopherson Date: Thu, 22 May 2025 16:52:15 -0700 In-Reply-To: <20250522235223.3178519-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kselftest@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250522235223.3178519-1-seanjc@google.com> X-Mailer: git-send-email 2.49.0.1151.ga128411c76-goog Message-ID: <20250522235223.3178519-6-seanjc@google.com> Subject: [PATCH v3 05/13] KVM: Add irqfd to eventfd's waitqueue while holding irqfds.lock From: Sean Christopherson To: "K. Y. Srinivasan" , Haiyang Zhang , Wei Liu , Dexuan Cui , Juergen Gross , Stefano Stabellini , Paolo Bonzini , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Shuah Khan , Marc Zyngier , Oliver Upton , Sean Christopherson Cc: linux-kernel@vger.kernel.org, linux-hyperv@vger.kernel.org, xen-devel@lists.xenproject.org, kvm@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, K Prateek Nayak , David Matlack Add an irqfd to its target eventfd's waitqueue while holding irqfds.lock, which is mildly terrifying but functionally safe. irqfds.lock is taken inside the waitqueue's lock, but if and only if the eventfd is being released, i.e. that path is mutually exclusive with registration as KVM holds a reference to the eventfd (and obviously must do so to avoid UAF). This will allow using the eventfd's waitqueue to enforce KVM's requirement that eventfd is assigned to at most one irqfd, without introducing races. Signed-off-by: Sean Christopherson --- virt/kvm/eventfd.c | 21 ++++++++++++++++++--- 1 file changed, 18 insertions(+), 3 deletions(-) diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index 99274d60335d..04877b297267 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -204,6 +204,11 @@ irqfd_wakeup(wait_queue_entry_t *wait, unsigned mode, int sync, void *key) int ret = 0; if (flags & EPOLLIN) { + /* + * WARNING: Do NOT take irqfds.lock in any path except EPOLLHUP, + * as KVM holds irqfds.lock when registering the irqfd with the + * eventfd. + */ u64 cnt; eventfd_ctx_do_read(irqfd->eventfd, &cnt); @@ -225,6 +230,11 @@ irqfd_wakeup(wait_queue_entry_t *wait, unsigned mode, int sync, void *key) /* The eventfd is closing, detach from KVM */ unsigned long iflags; + /* + * Taking irqfds.lock is safe here, as KVM holds a reference to + * the eventfd when registering the irqfd, i.e. this path can't + * be reached while kvm_irqfd_add() is running. + */ spin_lock_irqsave(&kvm->irqfds.lock, iflags); /* @@ -296,16 +306,21 @@ static void kvm_irqfd_register(struct file *file, wait_queue_head_t *wqh, list_add_tail(&irqfd->list, &kvm->irqfds.items); - spin_unlock_irq(&kvm->irqfds.lock); - /* * Add the irqfd as a priority waiter on the eventfd, with a custom * wake-up handler, so that KVM *and only KVM* is notified whenever the - * underlying eventfd is signaled. + * underlying eventfd is signaled. Temporarily lie to lockdep about + * holding irqfds.lock to avoid a false positive regarding potential + * deadlock with irqfd_wakeup() (see irqfd_wakeup() for details). */ init_waitqueue_func_entry(&irqfd->wait, irqfd_wakeup); + spin_release(&kvm->irqfds.lock.dep_map, _RET_IP_); add_wait_queue_priority(wqh, &irqfd->wait); + spin_acquire(&kvm->irqfds.lock.dep_map, 0, 0, _RET_IP_); + + spin_unlock_irq(&kvm->irqfds.lock); + p->ret = 0; }