From patchwork Tue Jul 19 12:15:19 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Abhishek Sahu X-Patchwork-Id: 592070 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9332DC43334 for ; Tue, 19 Jul 2022 12:40:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241095AbiGSMkL (ORCPT ); Tue, 19 Jul 2022 08:40:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51248 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241343AbiGSMj2 (ORCPT ); Tue, 19 Jul 2022 08:39:28 -0400 Received: from NAM02-SN1-obe.outbound.protection.outlook.com (mail-sn1anam02on2089.outbound.protection.outlook.com [40.107.96.89]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 17A3D52FEC; Tue, 19 Jul 2022 05:15:48 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=RwtRlGWLK5MLoc3qOdEKErwxy8Hxh3JE5Nau4cOGEn4m0c7xIUSs5D0+KubG9AnjlhFmhLCvVGLiZbBPRoX7YHIuByAU0odGPv7wYgToFD5kQrjCpPMKMSTZJmFN7nZGDAo8Qb4WshA7zHHsKxwP9UJeeoEZj6igLvqPCm7HSe2fqRVZx/PVPHJtgo+xKcTWnzd+B4clAu1VTpgA42IQFeHwAFI1e4e13l32lLHQSJAXloVxTAV7OdDn/decW3VM20TXk2pdkvEDN9dICxKMxc9r6LCRARSDxZMkMQYsICMCLqN8TdgnC4jenn/CS2JhZiAMSaJ6+Q4ELMoWBcbbLA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=C32XeuxEeBMBdAS3Esyq/KV+5YWi/SdXHtxJ67gidys=; b=IJE0yYvj7KYsZfqNKli7fdYqIV1GNOo6aASCtshBYSvE5rtZcVfyums4dIS8KDygLaaP9m9FT/WrcnWDKjfnhK8wgkLYxNrdMpA666rJaMgT+RYTMH7jQLB9xU5t+IS1WvNFcxUNAiBP0ea8XaCJ+wVWMxPYQKoZ8RK4ujPod5NmitBPfBbIJdNNa74Ck9IXEs22m/fbEKH7YCWxYh2z5qrguRcieYpr8kTNXqNsWmf/LzbY4GID5rq/PT77P/rIBmohZfkQmAD3YDGVCqqtvq0GSZu7rH2IuQb1tDddGlNrvBdSIgXCQuNxY6d5paR3Zs8hIbWcpTxr9fT5cBTwbA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 12.22.5.236) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=C32XeuxEeBMBdAS3Esyq/KV+5YWi/SdXHtxJ67gidys=; b=QtWzBLXmkUE6z7dtbeo8i4J5HJ5wL17bVud7JQnT2wffheUIb3592mhou14fMyTxRnbtXaoBSmHWCrcsEoiMT9qSe6tguDr14PAssKPeQY/ZaOo52qh5DdjGIWLR2yUo8G0vTY4DfRoN2BvQHBX2ERbk47MHVMpGGF9tGdJw8YXWZIlazVL4jfQcXEm9ktj4GFNL6BhXpN4iHdA5tOlGPqqsUJ517SL0B1r8owFb3xLKLCGvXM+mV+kNjme58d+T0hWWDNYSlxgjJkGNppyJDKqGkiyaXKF+yTYSe6+bceJLJjXLVaeT2WMXcQt3QupoIUCmQ57WTQUyvhRs+4DgpA== Received: from DS7PR03CA0019.namprd03.prod.outlook.com (2603:10b6:5:3b8::24) by BL0PR12MB2418.namprd12.prod.outlook.com (2603:10b6:207:4d::24) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5438.17; Tue, 19 Jul 2022 12:15:46 +0000 Received: from DM6NAM11FT057.eop-nam11.prod.protection.outlook.com (2603:10b6:5:3b8:cafe::9f) by DS7PR03CA0019.outlook.office365.com (2603:10b6:5:3b8::24) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5438.14 via Frontend Transport; Tue, 19 Jul 2022 12:15:46 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 12.22.5.236) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 12.22.5.236 as permitted sender) receiver=protection.outlook.com; client-ip=12.22.5.236; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (12.22.5.236) by DM6NAM11FT057.mail.protection.outlook.com (10.13.172.252) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.5438.12 via Frontend Transport; Tue, 19 Jul 2022 12:15:45 +0000 Received: from drhqmail203.nvidia.com (10.126.190.182) by DRHQMAIL109.nvidia.com (10.27.9.19) with Microsoft SMTP Server (TLS) id 15.0.1497.32; Tue, 19 Jul 2022 12:15:44 +0000 Received: from drhqmail202.nvidia.com (10.126.190.181) by drhqmail203.nvidia.com (10.126.190.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.26; Tue, 19 Jul 2022 05:15:43 -0700 Received: from nvidia-abhsahu-1.nvidia.com (10.127.8.12) by mail.nvidia.com (10.126.190.181) with Microsoft SMTP Server id 15.2.986.26 via Frontend Transport; Tue, 19 Jul 2022 05:15:39 -0700 From: Abhishek Sahu To: Alex Williamson , Cornelia Huck , Yishai Hadas , Jason Gunthorpe , Shameer Kolothum , Kevin Tian , "Rafael J . Wysocki" CC: Max Gurtovoy , Bjorn Helgaas , , , , , Abhishek Sahu Subject: [PATCH v5 1/5] vfio: Add the device features for the low power entry and exit Date: Tue, 19 Jul 2022 17:45:19 +0530 Message-ID: <20220719121523.21396-2-abhsahu@nvidia.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220719121523.21396-1-abhsahu@nvidia.com> References: <20220719121523.21396-1-abhsahu@nvidia.com> X-NVConfidentiality: public MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 8f84dd83-c7c6-40ef-dec3-08da698068e7 X-MS-TrafficTypeDiagnostic: BL0PR12MB2418:EE_ X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: qrtKNAjIzehknbgAMoHhvZyCPDRnlTebgIAhjN8USQI54tY17Y+pXFPsJMdk31x+1gWruT7+ttVc+nBHuuQaYch8pRtVqBAfTFTa7AeYdqFOlxV8aMLYHQmbnsgGM4J49DunmYElwzX+rtFIird7uiVpJdjLfW3isfWBv25epno6WGOV7uOpsofa2WuP2kYHW6Rk+3RbEAb2TYDnQ5DCeCmRSVarlXBlew0Jw+xPJuON78MF2RdoDylFyZEKQxKQrU0sGpyL90FAh6UjnogkStGzBP8OMab/JR1/EM4q6e8rfjGik59hgVlbp19PFECgOYgW4KWvQugOKXGt/wuw4LWsFWUZ/TzEDZAgVvYqZCb21+3vpGVzNP+JvuWWTDxt5dxk3cK63b5kho+g/foT9sXwTa/dx+AIEV+IejvZdpGZLYmd05w7KedsLKl3ARBMGQMArz7x5Az02G+GXW7nAJau8lFvHpnGG3a+aWDTblgewZ5iby+Cp0Y4RtuJifwLqpZ4Yh4wD0gC+a6YNuhydE832hqLqF+Z55hHSUIIjMybdoeKYIOhcOcCc5CvjPXmNKqnY+Oa7pfIZf7hOYAO7Kb927TwQr8qNwLpT0M58cpNEeNcDcQysRSZNNk5nnmBVQaqj1V3M9UBDoQfJ2ekLPKyI2KZfdylVUH6dPmTuXUP1ca52LmDiC9q8fM3SYBQ2lUBE4EoCOIfbdyqRj7CypUnufdhlUByMD6pDGrYrIUDnmlCuTo9ukFfiFY8+ByX603jnseuizFtxMf8uhBAs/LactejwLBO7TMe96sxVg9sQ/r/OWEOqhljW4uDktnLlNIPvqqcpIBGXGy3ZTPfeQ== X-Forefront-Antispam-Report: CIP:12.22.5.236; CTRY:US; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:mail.nvidia.com; PTR:InfoNoRecords; CAT:NONE; SFS:(13230016)(4636009)(376002)(136003)(346002)(396003)(39860400002)(36840700001)(40470700004)(46966006)(4326008)(70206006)(8676002)(8936002)(70586007)(54906003)(316002)(86362001)(110136005)(356005)(36756003)(36860700001)(7416002)(2906002)(82740400003)(81166007)(186003)(26005)(5660300002)(47076005)(336012)(40480700001)(7696005)(41300700001)(6666004)(478600001)(40460700003)(107886003)(82310400005)(2616005)(1076003)(83380400001)(426003)(36900700001); DIR:OUT; SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 19 Jul 2022 12:15:45.7478 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 8f84dd83-c7c6-40ef-dec3-08da698068e7 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a; Ip=[12.22.5.236]; Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: DM6NAM11FT057.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: BL0PR12MB2418 Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org This patch adds the following new device features for the low power entry and exit in the header file. The implementation for the same will be added in the subsequent patches. - VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY - VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP - VFIO_DEVICE_FEATURE_LOW_POWER_EXIT With the standard registers, all power states cannot be achieved. The platform-based power management needs to be involved to go into the lowest power state. For doing low power entry and exit with platform-based power management, these device features can be used. The entry device feature has two variants. These two variants are mainly to support the different behaviour for the low power entry. If there is any access for the VFIO device on the host side, then the device will be moved out of the low power state without the user's guest driver involvement. Some devices (for example NVIDIA VGA or 3D controller) require the user's guest driver involvement for each low-power entry. In the first variant, the host can move the device into low power without any guest driver involvement while in the second variant, the host will send a notification to the user through eventfd and then the users guest driver needs to move the device into low power. These device features only support VFIO_DEVICE_FEATURE_SET operation. Signed-off-by: Abhishek Sahu --- include/uapi/linux/vfio.h | 55 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 55 insertions(+) diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index 733a1cddde30..08fd3482d22b 100644 --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h @@ -986,6 +986,61 @@ enum vfio_device_mig_state { VFIO_DEVICE_STATE_RUNNING_P2P = 5, }; +/* + * Upon VFIO_DEVICE_FEATURE_SET, move the VFIO device into the low power state + * with the platform-based power management. This low power state will be + * internal to the VFIO driver and the user will not come to know which power + * state is chosen. If any device access happens (either from the host or + * the guest) when the device is in the low power state, then the host will + * move the device out of the low power state first. Once the access has been + * finished, then the host will move the device into the low power state again. + * If the user wants that the device should not go into the low power state + * again in this case, then the user should use the + * VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP device feature for the + * low power entry. The mmap'ed region access is not allowed till the low power + * exit happens through VFIO_DEVICE_FEATURE_LOW_POWER_EXIT and will + * generate the access fault. + */ +#define VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY 3 + +/* + * Upon VFIO_DEVICE_FEATURE_SET, move the VFIO device into the low power state + * with the platform-based power management and provide support for the wake-up + * notifications through eventfd. This low power state will be internal to the + * VFIO driver and the user will not come to know which power state is chosen. + * If any device access happens (either from the host or the guest) when the + * device is in the low power state, then the host will move the device out of + * the low power state first and a notification will be sent to the guest + * through eventfd. Once the access is finished, the host will not move back + * the device into the low power state. The guest should move the device into + * the low power state again upon receiving the wakeup notification. The + * notification will be generated only if the device physically went into the + * low power state. If the low power entry has been disabled from the host + * side, then the device will not go into the low power state even after + * calling this device feature and then the device access does not require + * wake-up. The mmap'ed region access is not allowed till the low power exit + * happens. The low power exit can happen either through + * VFIO_DEVICE_FEATURE_LOW_POWER_EXIT or through any other access (where the + * wake-up notification has been generated). + */ +struct vfio_device_low_power_entry_with_wakeup { + __s32 wakeup_eventfd; + __u32 reserved; +}; + +#define VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP 4 + +/* + * Upon VFIO_DEVICE_FEATURE_SET, move the VFIO device out of the low power + * state. This device feature should be called only if the user has previously + * put the device into the low power state either with + * VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY or + * VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP device feature. If the + * device is not in the low power state currently, this device feature will + * return early with the success status. + */ +#define VFIO_DEVICE_FEATURE_LOW_POWER_EXIT 5 + /* -------- API for Type1 VFIO IOMMU -------- */ /** From patchwork Tue Jul 19 12:15:20 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Abhishek Sahu X-Patchwork-Id: 591792 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2D36FCCA47F for ; Tue, 19 Jul 2022 12:40:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241117AbiGSMkN (ORCPT ); Tue, 19 Jul 2022 08:40:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51646 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241361AbiGSMja (ORCPT ); Tue, 19 Jul 2022 08:39:30 -0400 Received: from NAM12-MW2-obe.outbound.protection.outlook.com (mail-mw2nam12on2046.outbound.protection.outlook.com [40.107.244.46]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 13D1453D1D; Tue, 19 Jul 2022 05:15:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=EMRkjRJcCcacWGUd4Hhx2jwrsI61PdKQmaaZ3KeNj1tEMo9KF0WZLLrlQq27N8R9aN5Ky28gKbziXNppqdPOU1PLKa7xWAw8c5Xz/4c3Ya0WNvgd8NGN0MTFxYKIm4zZe05kHS3I9sGMQvEF8nPeFP/WxOqpq9/yJjC++UsDoIU6WrrOQMVsn3ZojegLHuFJGBLgSMvRrieCtgwWqlbt7PvvETJeIhm3xN0of14hVB+Ps7QBbYPi4cyKl5z5uE5fe31w9JL5hdb4V7Ct6r+5kLaUV7UzRkoq6hm0ybBl3G13ux9sN61nTyroAk87Y728WxyQ9twUuAmC8xx+grP8Eg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=92la50CMMw1d3QAx8Joo1Ncw73XcJwxbEb5O2iXCZlM=; b=VnUBAtxulENQ5cV76XTlHgZgZ0FFT8vctsHtVkFcuVTsB/gsyuISaRj3aZAi2jVlK7WEWEdCttWQtwpfmuOcYpPJU2ILKGMldrJPcm9NR01FJiXysmBpRIkIxZkIUtTU+lH5dbbnio/mtpf52RKURjDedTA/sCcY5cIyKEqrPwtH9+tGyztij7wHQWxoKmp8IF5Pbn/obOdvYtFohzntRQFjHYd0E/RZkfksD0/FWDOzZ7YrI3QiWFeNPQP6eTBvTmOREbCWbK+mbwyZHTHWeLy5LJ7Ci/F6C+9gLX+p6FxwzE+n1XjRMPzXFz/Gpokf6Hw2M8i40y+c9CN15w1IpA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 12.22.5.236) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=92la50CMMw1d3QAx8Joo1Ncw73XcJwxbEb5O2iXCZlM=; b=nDgJd/eOmymY/TPwP9C7DstHwA77tZ060ibaABrAEKqLop6BxSbkSmHZC4+lVjA/4RQbk2BDVSQbGvAxVtpPJlPKWJ5g7VhUnsKpMZ0FGQHVcLn6skQoAcanFB0+eR4jzNcfcH0fV1/t8N31+83J0Pq1nbjd8DsEAC6U8QQublnje+c5RE8e/YyBKY51fORKteOaKM6i4EEEoKy10BMwb0Xj/LeU7iXI5L6Nq69bID/49DXwJcT96ybXJVdRShBOb1SLphyr3KrMGVqz1nGrA/RUbSqPUcjGYZ2NrUEZBqFZLcDxTMnyMSKI7aiYNCPf/mYGLgu0gXhzXKCUwZ/48Q== Received: from DS7PR03CA0117.namprd03.prod.outlook.com (2603:10b6:5:3b7::32) by DM6PR12MB3289.namprd12.prod.outlook.com (2603:10b6:5:15d::31) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5438.23; Tue, 19 Jul 2022 12:15:51 +0000 Received: from DM6NAM11FT045.eop-nam11.prod.protection.outlook.com (2603:10b6:5:3b7:cafe::b8) by DS7PR03CA0117.outlook.office365.com (2603:10b6:5:3b7::32) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5417.25 via Frontend Transport; Tue, 19 Jul 2022 12:15:51 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 12.22.5.236) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 12.22.5.236 as permitted sender) receiver=protection.outlook.com; client-ip=12.22.5.236; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (12.22.5.236) by DM6NAM11FT045.mail.protection.outlook.com (10.13.173.123) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.5438.12 via Frontend Transport; Tue, 19 Jul 2022 12:15:50 +0000 Received: from drhqmail202.nvidia.com (10.126.190.181) by DRHQMAIL109.nvidia.com (10.27.9.19) with Microsoft SMTP Server (TLS) id 15.0.1497.32; Tue, 19 Jul 2022 12:15:50 +0000 Received: from drhqmail202.nvidia.com (10.126.190.181) by drhqmail202.nvidia.com (10.126.190.181) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.26; Tue, 19 Jul 2022 05:15:49 -0700 Received: from nvidia-abhsahu-1.nvidia.com (10.127.8.12) by mail.nvidia.com (10.126.190.181) with Microsoft SMTP Server id 15.2.986.26 via Frontend Transport; Tue, 19 Jul 2022 05:15:44 -0700 From: Abhishek Sahu To: Alex Williamson , Cornelia Huck , Yishai Hadas , Jason Gunthorpe , Shameer Kolothum , Kevin Tian , "Rafael J . Wysocki" CC: Max Gurtovoy , Bjorn Helgaas , , , , , Abhishek Sahu Subject: [PATCH v5 2/5] vfio: Increment the runtime PM usage count during IOCTL call Date: Tue, 19 Jul 2022 17:45:20 +0530 Message-ID: <20220719121523.21396-3-abhsahu@nvidia.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220719121523.21396-1-abhsahu@nvidia.com> References: <20220719121523.21396-1-abhsahu@nvidia.com> X-NVConfidentiality: public MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: ee120fed-81d0-4451-e5e8-08da69806bbf X-MS-TrafficTypeDiagnostic: DM6PR12MB3289:EE_ X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: lCEMjMdWZdyPZkqCR9EY1x/qoBeL1pSfdr0Q5x3aQVZsgZiHm6CE7MprkdzrrrgcwJmxgkAJcfLi4/EgrcQMc3O1EEA8iITcjJybM+HD4Kd3HAzPiojDsgPVLHVSDOf3arg7qUzlE8xikOFH1jrQcuLblZcx184YxWaLhlbiU3WUVqW4xRfR3nhBj3Xlz2XLLVcieGT0VnpS+WyFzEgmBmBHn7WTM7AGcZDaWfOO94b6ZMLWl3ShlfvUnTX0MX/3cX1zhTxjtMOEKj72EwtFjuyYUcgFClSszUPW/nIMB/JWT8q3sP+Y5kP+MEwvuZhDkuoqcsQFdSHoTLRunZpZWsZxYtUyQru5wDwulWklu4iIEjvf9pSolEWmdZZ0t7d1+3oMt9/sCLl5Jc+qiME9wYkxN8tdFIUCeX34KDuoDTRtfVbP8W7Qh0tqOugdcSj+kID4SHz6dGrWOkXkdHeiKJkYmJOVv8aJZhovYfPIPuZm4r4YUfw5mXJmUKK2C8ZANbSpDSktdzV6m/y0lOI6PrpBWHPn1rpuPEwbHYHEZ4BJuJzFBKHoHQSZY+w5thHhVbDpE0cZax4jMpFnSdABgB4dibS0pEs2V9Ac694w2TtsuQzXX99NzWIlDo/51kVNSiIfLw8thkYv9LnfCCmb8Cs4IkFMybSjFSKeSvlqbv40tqA7IulVFxE5QYnDas1c63qx7OfWewo3190arWCJziOXgghPW5KYn8gNGptM6Sn2KFlIwYyUZQL0jlsOLWcvvBmSnDepR6fxjHmcxNAVvRj9dtQc9uz3vVHbeSQFdQQDFM1w/MWDA7ZW9EXIsH0nV3KmCIIVrslkiEClfcQhmA== X-Forefront-Antispam-Report: CIP:12.22.5.236; CTRY:US; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:mail.nvidia.com; PTR:InfoNoRecords; CAT:NONE; SFS:(13230016)(4636009)(39860400002)(136003)(376002)(396003)(346002)(40470700004)(46966006)(36840700001)(83380400001)(186003)(26005)(2906002)(41300700001)(6666004)(86362001)(47076005)(336012)(1076003)(7696005)(5660300002)(7416002)(40460700003)(2616005)(36860700001)(107886003)(8936002)(82310400005)(316002)(356005)(82740400003)(81166007)(4326008)(36756003)(426003)(478600001)(8676002)(70586007)(70206006)(110136005)(54906003)(40480700001)(36900700001); DIR:OUT; SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 19 Jul 2022 12:15:50.9721 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: ee120fed-81d0-4451-e5e8-08da69806bbf X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a; Ip=[12.22.5.236]; Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: DM6NAM11FT045.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM6PR12MB3289 Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org The vfio-pci based drivers will have runtime power management support where the user can put the device into the low power state and then PCI devices can go into the D3cold state. If the device is in the low power state and the user issues any IOCTL, then the device should be moved out of the low power state first. Once the IOCTL is serviced, then it can go into the low power state again. The runtime PM framework manages this with help of usage count. One option was to add the runtime PM related API's inside vfio-pci driver but some IOCTL (like VFIO_DEVICE_FEATURE) can follow a different path and more IOCTL can be added in the future. Also, the runtime PM will be added for vfio-pci based drivers variant currently, but the other VFIO based drivers can use the same in the future. So, this patch adds the runtime calls runtime-related API in the top-level IOCTL function itself. For the VFIO drivers which do not have runtime power management support currently, the runtime PM API's won't be invoked. Only for vfio-pci based drivers currently, the runtime PM API's will be invoked to increment and decrement the usage count. In the vfio-pci drivers also, the variant drivers can opt-out by incrementing the usage count during device-open. The pm_runtime_resume_and_get() checks the device current status and will return early if the device is already in the ACTIVE state. Taking this usage count incremented while servicing IOCTL will make sure that the user won't put the device into the low power state when any other IOCTL is being serviced in parallel. Let's consider the following scenario: 1. Some other IOCTL is called. 2. The user has opened another device instance and called the IOCTL for low power entry. 3. The low power entry IOCTL moves the device into the low power state. 4. The other IOCTL finishes. If we don't keep the usage count incremented then the device access will happen between step 3 and 4 while the device has already gone into the low power state. The pm_runtime_resume_and_get() will be the first call so its error should not be propagated to user space directly. For example, if pm_runtime_resume_and_get() can return -EINVAL for the cases where the user has passed the correct argument. So the pm_runtime_resume_and_get() errors have been masked behind -EIO. Signed-off-by: Abhishek Sahu --- drivers/vfio/vfio.c | 52 ++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 49 insertions(+), 3 deletions(-) diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c index bd84ca7c5e35..1d005a0a9d3d 100644 --- a/drivers/vfio/vfio.c +++ b/drivers/vfio/vfio.c @@ -32,6 +32,7 @@ #include #include #include +#include #include "vfio.h" #define DRIVER_VERSION "0.3" @@ -1335,6 +1336,39 @@ static const struct file_operations vfio_group_fops = { .release = vfio_group_fops_release, }; +/* + * Wrapper around pm_runtime_resume_and_get(). + * Return error code on failure or 0 on success. + */ +static inline int vfio_device_pm_runtime_get(struct vfio_device *device) +{ + struct device *dev = device->dev; + + if (dev->driver && dev->driver->pm) { + int ret; + + ret = pm_runtime_resume_and_get(dev); + if (ret < 0) { + dev_info_ratelimited(dev, + "vfio: runtime resume failed %d\n", ret); + return -EIO; + } + } + + return 0; +} + +/* + * Wrapper around pm_runtime_put(). + */ +static inline void vfio_device_pm_runtime_put(struct vfio_device *device) +{ + struct device *dev = device->dev; + + if (dev->driver && dev->driver->pm) + pm_runtime_put(dev); +} + /* * VFIO Device fd */ @@ -1649,15 +1683,27 @@ static long vfio_device_fops_unl_ioctl(struct file *filep, unsigned int cmd, unsigned long arg) { struct vfio_device *device = filep->private_data; + int ret; + + ret = vfio_device_pm_runtime_get(device); + if (ret) + return ret; switch (cmd) { case VFIO_DEVICE_FEATURE: - return vfio_ioctl_device_feature(device, (void __user *)arg); + ret = vfio_ioctl_device_feature(device, (void __user *)arg); + break; + default: if (unlikely(!device->ops->ioctl)) - return -EINVAL; - return device->ops->ioctl(device, cmd, arg); + ret = -EINVAL; + else + ret = device->ops->ioctl(device, cmd, arg); + break; } + + vfio_device_pm_runtime_put(device); + return ret; } static ssize_t vfio_device_fops_read(struct file *filep, char __user *buf, From patchwork Tue Jul 19 12:15:21 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Abhishek Sahu X-Patchwork-Id: 592068 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CFFE9C43334 for ; Tue, 19 Jul 2022 12:41:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241016AbiGSMlO (ORCPT ); Tue, 19 Jul 2022 08:41:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53790 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241403AbiGSMjf (ORCPT ); Tue, 19 Jul 2022 08:39:35 -0400 Received: from NAM11-CO1-obe.outbound.protection.outlook.com (mail-co1nam11on2060.outbound.protection.outlook.com [40.107.220.60]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7D545545E7; Tue, 19 Jul 2022 05:15:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=INNYXnIAxXAUuGih59h8Fnt6cIBnAswmEG5Dq1BifgwTNmVSYvLgxy+fRLKeOPk+Ycy6XnZ2Y9mpfVqfVpyzQM3zyeHZxf2IS3ycII0arAJ8e3wR4fjy0DWwu/0oA0yqwmKVLBHzbEL06aPiUAuqc/sCBdhs0dktYpX9f9pYpfNXU64PTKhchxZgrE8QhKfinPTgehwMbehIum37nhRw3zAcPA53G1E1uUR1CtaSD+Ic8bSo+E7i+QaY2l0Wx0bcYMeuyDxBKSEM70EnJ+Do2nLYMHCbp3va8CP5cT1g7tjBsD5dEM9MKypH39R/m9vaeTQD9Std3FQ6ZJH7xx4g1w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=jUBeeaLGxq/Xwo/lBPoSj+vhPbYiJD+Zg7S1FgXRUt0=; b=XCnwYs7HWvQvbA052+jkcN+ZI2S/tj+EvknMDy1k7CUEDaT9cVmZ66idROfKnGC2xmiAwHP5alLjjeYOl/o0xKGa4HErDfvZz7r9p5y2As8PYRdrPVtOcSP7fQicq0JMSQaZF9s0sW9PozCbrP1s63ij/FNoBwOxfYJLwaGs36h5t4DhF3sGs2Ei64OZrluPxqK24rlSk4ODNmFUe8elghYQ1KUx+869OrsktEokOsxp6190/+wlPF+bP7yMptRV8uJi0kRFcAsbH1MPfd1ZoB5vOyuS8kqjEPn4E3D2nZvsOm9RaNQdiNzehzc/P4TOV6BBKCfmPoxYuaK8jglNJw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 12.22.5.235) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=jUBeeaLGxq/Xwo/lBPoSj+vhPbYiJD+Zg7S1FgXRUt0=; b=MICFkjDhFG37numaYWu8PQe04CKpJaf1YJWufJl1aN4Ukudz+n0KQNEgrYMEMtyvqDvr4yD7MBhv8E202ZT9cYT3XdO8gr4QqQ9Ue32o/fuMpKZbMKFfSi0Pod6qDhI6/s/cFKppDM2m1NcrXg5nP6sYh85654rnLKiyjpUXqjJHIUYzesRuIp6GMoPjosBeGnGa3uMSyA8ZGA5YJV57D9ZZSiRqKw9Kh52Cu00e4vJxJbp0EHV14dZHuxdGLS3zThsROIhCc0PSOkVsyc2Wm0/tyfbmWuH8K2VWGHplJqFNl2i87gx6SxdOjZZYvOM7So8jKod0n8SMm+MiMMhrSg== Received: from MW4PR03CA0230.namprd03.prod.outlook.com (2603:10b6:303:b9::25) by MW3PR12MB4555.namprd12.prod.outlook.com (2603:10b6:303:59::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5438.17; Tue, 19 Jul 2022 12:15:57 +0000 Received: from CO1NAM11FT030.eop-nam11.prod.protection.outlook.com (2603:10b6:303:b9:cafe::62) by MW4PR03CA0230.outlook.office365.com (2603:10b6:303:b9::25) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5438.20 via Frontend Transport; Tue, 19 Jul 2022 12:15:57 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 12.22.5.235) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 12.22.5.235 as permitted sender) receiver=protection.outlook.com; client-ip=12.22.5.235; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (12.22.5.235) by CO1NAM11FT030.mail.protection.outlook.com (10.13.174.125) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.5438.12 via Frontend Transport; Tue, 19 Jul 2022 12:15:57 +0000 Received: from drhqmail202.nvidia.com (10.126.190.181) by DRHQMAIL107.nvidia.com (10.27.9.16) with Microsoft SMTP Server (TLS) id 15.0.1497.32; Tue, 19 Jul 2022 12:15:56 +0000 Received: from drhqmail202.nvidia.com (10.126.190.181) by drhqmail202.nvidia.com (10.126.190.181) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.26; Tue, 19 Jul 2022 05:15:54 -0700 Received: from nvidia-abhsahu-1.nvidia.com (10.127.8.12) by mail.nvidia.com (10.126.190.181) with Microsoft SMTP Server id 15.2.986.26 via Frontend Transport; Tue, 19 Jul 2022 05:15:49 -0700 From: Abhishek Sahu To: Alex Williamson , Cornelia Huck , Yishai Hadas , Jason Gunthorpe , Shameer Kolothum , Kevin Tian , "Rafael J . Wysocki" CC: Max Gurtovoy , Bjorn Helgaas , , , , , Abhishek Sahu Subject: [PATCH v5 3/5] vfio/pci: Mask INTx during runtime suspend Date: Tue, 19 Jul 2022 17:45:21 +0530 Message-ID: <20220719121523.21396-4-abhsahu@nvidia.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220719121523.21396-1-abhsahu@nvidia.com> References: <20220719121523.21396-1-abhsahu@nvidia.com> X-NVConfidentiality: public MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 45286d23-8b18-4455-0d1e-08da69806f61 X-MS-TrafficTypeDiagnostic: MW3PR12MB4555:EE_ X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: OXAEQoH0/V329zeHLQ09Ydn5C4YeZZDpSnu5vOUvQIAL58z6+3iGyTCXlIszEly1N9trGq2SpmQ4QVty0AFFisXRGULqM0lme7OLh4tAhG2mu3WwLdXYJK/3XUvdb+IW9yXCPupmNJYRC1xirLUiFsrfaZnkMoP/ZvCVDpoetA65cO37IL2bZoEilCP07HRWAiPD0pjkT9SjMMid1qMmDn59FWiAXHIXCnj9gV6/xhdIaRvllAAcpYnjtEPsdZS7kXMyEqJrR8rSA9a0bvY79JG0KeUHqrt9r5S5xgZ5yg3ARYLGNBzGGJWW+/B2ARbFokBOvUwIbG2deOnileZQK/G+gpsIqw9GEIIsIF67pChDlXWH4GmNIPEdJsD8GXRrXAp9oU5dTSY8qNrdEZXTplDz0XWdwAsdiub0OkF8zoE3wfH3CHsz/EXt+C6xdLnd3iI4pn3XFycLilKEIs6T0bh476J7xVqlpjEbuJC7kuqJBbVv3ZgUceVKLAKL+TWwL4zRpA+xslI3NFL2LxY6f98Za9xEcsZdKT36gIoWOxr7jJAw+enfCXbH3QCLeHR0qzzoBS2ysonaJ1V7H4SLATS5A4mAPcVnRbAbA/JPHZyItaum01YU9lUW92JaSrW6HZQl4YQ1kaz5GZcqkxUHs8zF4xKREz0+OgbSYyUb9t3fLvf4aaeFpigUioYPFgG735cfUD/Dy44gs261fXT4DFGMNOPhV9QAnhD1u/sh2XhPvM/if+1Dxhd6SI2m7P73n5PpnHy22IiGeXQ9QY4hPqGB7FLaXGmBkUBhq0on7BE4WzjLAM7sVzeDyIyjbFUlfBImYyu45vTnHI6Y4mZM3g== X-Forefront-Antispam-Report: CIP:12.22.5.235; CTRY:US; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:mail.nvidia.com; PTR:InfoNoRecords; CAT:NONE; SFS:(13230016)(4636009)(376002)(136003)(39860400002)(396003)(346002)(36840700001)(46966006)(40470700004)(70206006)(15650500001)(70586007)(26005)(8676002)(40460700003)(4326008)(83380400001)(478600001)(82740400003)(40480700001)(7416002)(36860700001)(356005)(8936002)(5660300002)(7696005)(41300700001)(2906002)(81166007)(1076003)(86362001)(6666004)(107886003)(316002)(36756003)(186003)(54906003)(82310400005)(47076005)(110136005)(426003)(336012)(2616005)(36900700001); DIR:OUT; SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 19 Jul 2022 12:15:57.0614 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 45286d23-8b18-4455-0d1e-08da69806f61 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a; Ip=[12.22.5.235]; Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: CO1NAM11FT030.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: MW3PR12MB4555 Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org This patch adds INTx handling during runtime suspend/resume. All the suspend/resume related code for the user to put the device into the low power state will be added in subsequent patches. The INTx lines may be shared among devices. Whenever any INTx interrupt comes for the VFIO devices, then vfio_intx_handler() will be called for each device sharing the interrupt. Inside vfio_intx_handler(), it calls pci_check_and_mask_intx() and checks if the interrupt has been generated for the current device. Now, if the device is already in the D3cold state, then the config space can not be read. Attempt to read config space in D3cold state can cause system unresponsiveness in a few systems. To prevent this, mask INTx in runtime suspend callback, and unmask the same in runtime resume callback. If INTx has been already masked, then no handling is needed in runtime suspend/resume callbacks. 'pm_intx_masked' tracks this, and vfio_pci_intx_mask() has been updated to return true if the INTx vfio_pci_irq_ctx.masked value is changed inside this function. For the runtime suspend which is triggered for the no user of VFIO device, the is_intx() will return false and these callbacks won't do anything. The MSI/MSI-X are not shared so similar handling should not be needed for MSI/MSI-X. vfio_msihandler() triggers eventfd_signal() without doing any device-specific config access. When the user performs any config access or IOCTL after receiving the eventfd notification, then the device will be moved to the D0 state first before servicing any request. Another option was to check this flag 'pm_intx_masked' inside vfio_intx_handler() instead of masking the interrupts. This flag is being set inside the runtime_suspend callback but the device can be in non-D3cold state (for example, if the user has disabled D3cold explicitly by sysfs, the D3cold is not supported in the platform, etc.). Also, in D3cold supported case, the device will be in D0 till the PCI core moves the device into D3cold. In this case, there is a possibility that the device can generate an interrupt. Adding check in the IRQ handler will not clear the IRQ status and the interrupt line will still be asserted. This can cause interrupt flooding. Signed-off-by: Abhishek Sahu --- drivers/vfio/pci/vfio_pci_core.c | 37 +++++++++++++++++++++++++++---- drivers/vfio/pci/vfio_pci_intrs.c | 6 ++++- include/linux/vfio_pci_core.h | 3 ++- 3 files changed, 40 insertions(+), 6 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c index 2efa06b1fafa..9517645acfa6 100644 --- a/drivers/vfio/pci/vfio_pci_core.c +++ b/drivers/vfio/pci/vfio_pci_core.c @@ -259,16 +259,45 @@ int vfio_pci_set_power_state(struct vfio_pci_core_device *vdev, pci_power_t stat return ret; } +#ifdef CONFIG_PM +static int vfio_pci_core_runtime_suspend(struct device *dev) +{ + struct vfio_pci_core_device *vdev = dev_get_drvdata(dev); + + /* + * If INTx is enabled, then mask INTx before going into the runtime + * suspended state and unmask the same in the runtime resume. + * If INTx has already been masked by the user, then + * vfio_pci_intx_mask() will return false and in that case, INTx + * should not be unmasked in the runtime resume. + */ + vdev->pm_intx_masked = (is_intx(vdev) && vfio_pci_intx_mask(vdev)); + + return 0; +} + +static int vfio_pci_core_runtime_resume(struct device *dev) +{ + struct vfio_pci_core_device *vdev = dev_get_drvdata(dev); + + if (vdev->pm_intx_masked) + vfio_pci_intx_unmask(vdev); + + return 0; +} +#endif /* CONFIG_PM */ + /* - * The dev_pm_ops needs to be provided to make pci-driver runtime PM working, - * so use structure without any callbacks. - * * The pci-driver core runtime PM routines always save the device state * before going into suspended state. If the device is going into low power * state with only with runtime PM ops, then no explicit handling is needed * for the devices which have NoSoftRst-. */ -static const struct dev_pm_ops vfio_pci_core_pm_ops = { }; +static const struct dev_pm_ops vfio_pci_core_pm_ops = { + SET_RUNTIME_PM_OPS(vfio_pci_core_runtime_suspend, + vfio_pci_core_runtime_resume, + NULL) +}; int vfio_pci_core_enable(struct vfio_pci_core_device *vdev) { diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c index 6069a11fb51a..8b805d5d19e1 100644 --- a/drivers/vfio/pci/vfio_pci_intrs.c +++ b/drivers/vfio/pci/vfio_pci_intrs.c @@ -33,10 +33,12 @@ static void vfio_send_intx_eventfd(void *opaque, void *unused) eventfd_signal(vdev->ctx[0].trigger, 1); } -void vfio_pci_intx_mask(struct vfio_pci_core_device *vdev) +/* Returns true if the INTx vfio_pci_irq_ctx.masked value is changed. */ +bool vfio_pci_intx_mask(struct vfio_pci_core_device *vdev) { struct pci_dev *pdev = vdev->pdev; unsigned long flags; + bool masked_changed = false; spin_lock_irqsave(&vdev->irqlock, flags); @@ -60,9 +62,11 @@ void vfio_pci_intx_mask(struct vfio_pci_core_device *vdev) disable_irq_nosync(pdev->irq); vdev->ctx[0].masked = true; + masked_changed = true; } spin_unlock_irqrestore(&vdev->irqlock, flags); + return masked_changed; } /* diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h index 22de2bce6394..e96cc3081236 100644 --- a/include/linux/vfio_pci_core.h +++ b/include/linux/vfio_pci_core.h @@ -124,6 +124,7 @@ struct vfio_pci_core_device { bool needs_reset; bool nointx; bool needs_pm_restore; + bool pm_intx_masked; struct pci_saved_state *pci_saved_state; struct pci_saved_state *pm_save; int ioeventfds_nr; @@ -147,7 +148,7 @@ struct vfio_pci_core_device { #define is_irq_none(vdev) (!(is_intx(vdev) || is_msi(vdev) || is_msix(vdev))) #define irq_is(vdev, type) (vdev->irq_type == type) -void vfio_pci_intx_mask(struct vfio_pci_core_device *vdev); +bool vfio_pci_intx_mask(struct vfio_pci_core_device *vdev); void vfio_pci_intx_unmask(struct vfio_pci_core_device *vdev); int vfio_pci_set_irqs_ioctl(struct vfio_pci_core_device *vdev, From patchwork Tue Jul 19 12:15:22 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Abhishek Sahu X-Patchwork-Id: 591791 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E9805CCA482 for ; Tue, 19 Jul 2022 12:40:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241268AbiGSMka (ORCPT ); Tue, 19 Jul 2022 08:40:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49830 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241540AbiGSMjt (ORCPT ); Tue, 19 Jul 2022 08:39:49 -0400 Received: from NAM12-DM6-obe.outbound.protection.outlook.com (mail-dm6nam12on2071.outbound.protection.outlook.com [40.107.243.71]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2A6B55C36B; Tue, 19 Jul 2022 05:16:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=cXFcgU7tfPYXT4ulox3IVbCaCnFTu6z1HpShzFyT8hFZs+xD65wPtjKbXj1VaIll6nnsdxOtDXq0IOw6T6vf3zuES3nk4euc1FbMuRAeXOmTBeoNwl7DM65gAr1095JYe34pJJ/XMVGMlJKPemok0VaMCressJdGVBtyi6pZvH0/V0XSWmZr0tqqjSIbrcRsTM2+guTK1Yn4BuITZrRGu5trt2BXTrXii0jL2p2rYzZtTAhoH5elZT4mbVtV86mWsjlRb+5oV5lnIQMRhDWu5eX/w37/UmijdemOtnoST0+uZpeK5V+DLUPu2y7dh/slYH5BRe9vwAS+ZorbRUYeag== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=h6vVCKtKWmjl1TILDH0TXAeoi8G0VZKAzO55Wssu16Y=; b=eRZslSg0QF75QLZVmNmtGEYwirpZrtuN4RMcVljaIZlBAwv7WUaUOCFi5sszhGK5QlK+uO2jdkoTKm8ruZxoJk37SCGDCcGuOGEpNVcw0190v+GFNcp9wT8vRyXsZ2+QYSVKQXLfVmuho+uMLYRYSJKE1yvlowjYigqNL0dp8IqT9pDlyA3tbTm+xTXU7B5WFLaZeceaVCe6mhro/sHhU1LpdqsFVX4KevCOS4dETVRZpPYgVfxWVf7EhzCI6CridBBH9BC1FVxxoqG5xI/SU15vikRauuxG4ntcgQX693497m4TOWcBk7POUaZi2bSp9nMiYC4WCe7+oKeWOFDKQw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 12.22.5.238) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=h6vVCKtKWmjl1TILDH0TXAeoi8G0VZKAzO55Wssu16Y=; b=tS7HmmFyG69Gy9ZCbqCXeiO8Tpes7do1AQR9x5puNTfJXaDBCV6LK4pwSmnGef3wDu+CWAcEtXizGfBuqEk4pNqHTWbSnZp5Q31X8sBeln43Yq6Y5UZGLaKhyVJaeldLgN/A8C/zFOw/umKANlRoeNEvTgWNk5yCPfA32losVYybECL8n6RsARsdXdnBdhfGzd0mWUEeLfm4v1X1v6QYnIzRAn9mk/albSJ3SiLBvJpdXp7s7sf99UtD3feKOLbu/N054Ljm5q4vXUDaoeBTpqnzv879SDxuiITDhUWh7HMcW5lq2o2PqRy6FSNlVqTll88QLglDpsA5gwYVyxHxZg== Received: from BN9PR03CA0973.namprd03.prod.outlook.com (2603:10b6:408:109::18) by MWHPR12MB1694.namprd12.prod.outlook.com (2603:10b6:301:11::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5438.20; Tue, 19 Jul 2022 12:16:02 +0000 Received: from BN8NAM11FT061.eop-nam11.prod.protection.outlook.com (2603:10b6:408:109:cafe::d6) by BN9PR03CA0973.outlook.office365.com (2603:10b6:408:109::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5438.14 via Frontend Transport; Tue, 19 Jul 2022 12:16:01 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 12.22.5.238) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 12.22.5.238 as permitted sender) receiver=protection.outlook.com; client-ip=12.22.5.238; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (12.22.5.238) by BN8NAM11FT061.mail.protection.outlook.com (10.13.177.144) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.5438.12 via Frontend Transport; Tue, 19 Jul 2022 12:16:01 +0000 Received: from drhqmail202.nvidia.com (10.126.190.181) by DRHQMAIL105.nvidia.com (10.27.9.14) with Microsoft SMTP Server (TLS) id 15.0.1497.32; Tue, 19 Jul 2022 12:16:00 +0000 Received: from drhqmail202.nvidia.com (10.126.190.181) by drhqmail202.nvidia.com (10.126.190.181) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.26; Tue, 19 Jul 2022 05:15:59 -0700 Received: from nvidia-abhsahu-1.nvidia.com (10.127.8.12) by mail.nvidia.com (10.126.190.181) with Microsoft SMTP Server id 15.2.986.26 via Frontend Transport; Tue, 19 Jul 2022 05:15:55 -0700 From: Abhishek Sahu To: Alex Williamson , Cornelia Huck , Yishai Hadas , Jason Gunthorpe , Shameer Kolothum , Kevin Tian , "Rafael J . Wysocki" CC: Max Gurtovoy , Bjorn Helgaas , , , , , Abhishek Sahu Subject: [PATCH v5 4/5] vfio/pci: Implement VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY/EXIT Date: Tue, 19 Jul 2022 17:45:22 +0530 Message-ID: <20220719121523.21396-5-abhsahu@nvidia.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220719121523.21396-1-abhsahu@nvidia.com> References: <20220719121523.21396-1-abhsahu@nvidia.com> X-NVConfidentiality: public MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: c7284629-bd37-4975-c1a9-08da69807217 X-MS-TrafficTypeDiagnostic: MWHPR12MB1694:EE_ X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: leMgE7TweL6xUzT8Ac1KUlkFAmFIRbR9AMIo4caxoNqsaU8KJkRsMTp+iGeQb5hfyEqLJwPOLo6PPJDvRud6qK+lwBXgLI1MlWUD64pLIO03BhoScUS1oAyjdNsEh3jwkUwhM59Sj+yvGMPrfIFVw0CqfIrfiEazqAeSali1HTKmw1rv0qpxPHehKlwob8gfVObfqw9xRMCRVD3qZltzOj4+DwkanGKuiXQPDNeJHZGA/9sMu40QHRZx+KK+TpH+XhzR8sJngOH7PATdkbB5/0qg6KLnPPuSl54SGOgiDWBEGly+frbIPF6F4JOYUFgisK8Cg0dMjY5EkmKJcjl8kOKTPmCOyBC1jl5MQcm+algdeRJTRfPWhjaNJnGgAE+kuJ7dGImbK986X1HHgV5cFAMda3YExTbEnsbCPmBDQcwQGmceRL/HUVBqrRSZnieAY2BqoEFdAJe8v6hi4WNN8bojp/1Fsy38RGc0Be5ebUDRbfHcivEQXH2JO8U3eQ8vMi9b5Y3nxmuYe4KbKomHav6IhT99JfEysK3d3/2vTbU/Gt9qH3WldAPYu/hsejkCF3g7Wr3wjI6D3vdfOR4JVBbclG7u6xowaM2fLiVWnVXXjQk5ESzFQ4qv+Gb2lrSVzEpu0xCG88ZqBnUzeudjn24d2pYzmgRq+cy/rKYxCkUJhW65abZv2O83/w4gcBGWNZLFuHneq/3X5ZjB1/0tLmcG8aGjB0Uj3KBwgSdjYjRCLbluurMuisrgy11DpQtVeOGTaEKJJyDnuzV7g6LddhAQauhyKLnoJd10etlU83FvGUjl6/uowcGXio0h7uvj5YYS3biTMkYxPiMEh/xLgQ== X-Forefront-Antispam-Report: CIP:12.22.5.238; CTRY:US; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:mail.nvidia.com; PTR:InfoNoRecords; CAT:NONE; SFS:(13230016)(4636009)(346002)(39860400002)(376002)(396003)(136003)(36840700001)(40470700004)(46966006)(86362001)(7696005)(426003)(36756003)(356005)(478600001)(2616005)(186003)(82740400003)(47076005)(30864003)(110136005)(26005)(6666004)(41300700001)(81166007)(336012)(1076003)(316002)(40460700003)(82310400005)(2906002)(70586007)(8936002)(5660300002)(4326008)(40480700001)(36860700001)(54906003)(7416002)(8676002)(107886003)(83380400001)(70206006)(36900700001); DIR:OUT; SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 19 Jul 2022 12:16:01.5049 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: c7284629-bd37-4975-c1a9-08da69807217 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a; Ip=[12.22.5.238]; Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: BN8NAM11FT061.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: MWHPR12MB1694 Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org Currently, if the runtime power management is enabled for vfio-pci based devices in the guest OS, then the guest OS will do the register write for PCI_PM_CTRL register. This write request will be handled in vfio_pm_config_write() where it will do the actual register write of PCI_PM_CTRL register. With this, the maximum D3hot state can be achieved for low power. If we can use the runtime PM framework, then we can achieve the D3cold state (on the supported systems) which will help in saving maximum power. 1. D3cold state can't be achieved by writing PCI standard PM config registers. This patch implements the following newly added low power related device features: - VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY - VFIO_DEVICE_FEATURE_LOW_POWER_EXIT The VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY will move the device into the low power state, and the VFIO_DEVICE_FEATURE_LOW_POWER_EXIT will move the device out of the low power state. 2. The vfio-pci driver uses runtime PM framework for low power entry and exit. On the platforms where D3cold state is supported, the runtime PM framework will put the device into D3cold otherwise, D3hot or some other power state will be used. If the user has explicitly disabled runtime PM for the device, then the device will be in the power state configured by the guest OS through PCI_PM_CTRL. 3. The hypervisors can implement virtual ACPI methods. For example, in guest linux OS if PCI device ACPI node has _PR3 and _PR0 power resources with _ON/_OFF method, then guest linux OS invokes the _OFF method during D3cold transition and then _ON during D0 transition. The hypervisor can tap these virtual ACPI calls and then call the low power device feature IOCTL. 4. The 'pm_runtime_engaged' flag tracks the entry and exit to runtime PM. This flag is protected with 'memory_lock' semaphore. 5. All the config and other region access are wrapped under pm_runtime_resume_and_get() and pm_runtime_put(). So, if any device access happens while the device is in the runtime suspended state, then the device will be resumed first before access. Once the access has been finished, then the device will again go into the runtime suspended state. 6. The memory region access through mmap will not be allowed in the low power state. Since __vfio_pci_memory_enabled() is a common function, so check for 'pm_runtime_engaged' has been added explicitly in vfio_pci_mmap_fault() to block only mmap'ed access. Signed-off-by: Abhishek Sahu --- drivers/vfio/pci/vfio_pci_core.c | 151 +++++++++++++++++++++++++++++-- include/linux/vfio_pci_core.h | 1 + 2 files changed, 144 insertions(+), 8 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c index 9517645acfa6..726a6f282496 100644 --- a/drivers/vfio/pci/vfio_pci_core.c +++ b/drivers/vfio/pci/vfio_pci_core.c @@ -259,11 +259,98 @@ int vfio_pci_set_power_state(struct vfio_pci_core_device *vdev, pci_power_t stat return ret; } +static int vfio_pci_runtime_pm_entry(struct vfio_pci_core_device *vdev) +{ + /* + * The vdev power related flags are protected with 'memory_lock' + * semaphore. + */ + vfio_pci_zap_and_down_write_memory_lock(vdev); + if (vdev->pm_runtime_engaged) { + up_write(&vdev->memory_lock); + return -EINVAL; + } + + vdev->pm_runtime_engaged = true; + pm_runtime_put_noidle(&vdev->pdev->dev); + up_write(&vdev->memory_lock); + + return 0; +} + +static int vfio_pci_core_pm_entry(struct vfio_device *device, u32 flags, + void __user *arg, size_t argsz) +{ + struct vfio_pci_core_device *vdev = + container_of(device, struct vfio_pci_core_device, vdev); + int ret; + + ret = vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_SET, 0); + if (ret != 1) + return ret; + + /* + * Inside vfio_pci_runtime_pm_entry(), only the runtime PM usage count + * will be decremented. The pm_runtime_put() will be invoked again + * while returning from the ioctl and then the device can go into + * runtime suspended state. + */ + return vfio_pci_runtime_pm_entry(vdev); +} + +static void vfio_pci_runtime_pm_exit(struct vfio_pci_core_device *vdev) +{ + /* + * The vdev power related flags are protected with 'memory_lock' + * semaphore. + */ + down_write(&vdev->memory_lock); + if (vdev->pm_runtime_engaged) { + vdev->pm_runtime_engaged = false; + pm_runtime_get_noresume(&vdev->pdev->dev); + } + + up_write(&vdev->memory_lock); +} + +static int vfio_pci_core_pm_exit(struct vfio_device *device, u32 flags, + void __user *arg, size_t argsz) +{ + struct vfio_pci_core_device *vdev = + container_of(device, struct vfio_pci_core_device, vdev); + int ret; + + ret = vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_SET, 0); + if (ret != 1) + return ret; + + /* + * The device should already be resumed by the vfio core layer. + * vfio_pci_runtime_pm_exit() will internally increment the usage + * count corresponding to pm_runtime_put() called during low power + * feature entry. + */ + vfio_pci_runtime_pm_exit(vdev); + return 0; +} + #ifdef CONFIG_PM static int vfio_pci_core_runtime_suspend(struct device *dev) { struct vfio_pci_core_device *vdev = dev_get_drvdata(dev); + down_write(&vdev->memory_lock); + /* + * The user can move the device into D3hot state before invoking + * power management IOCTL. Move the device into D0 state here and then + * the pci-driver core runtime PM suspend function will move the device + * into the low power state. Also, for the devices which have + * NoSoftRst-, it will help in restoring the original state + * (saved locally in 'vdev->pm_save'). + */ + vfio_pci_set_power_state(vdev, PCI_D0); + up_write(&vdev->memory_lock); + /* * If INTx is enabled, then mask INTx before going into the runtime * suspended state and unmask the same in the runtime resume. @@ -393,6 +480,18 @@ void vfio_pci_core_disable(struct vfio_pci_core_device *vdev) /* * This function can be invoked while the power state is non-D0. + * This non-D0 power state can be with or without runtime PM. + * vfio_pci_runtime_pm_exit() will internally increment the usage + * count corresponding to pm_runtime_put() called during low power + * feature entry and then pm_runtime_resume() will wake up the device, + * if the device has already gone into the suspended state. Otherwise, + * the vfio_pci_set_power_state() will change the device power state + * to D0. + */ + vfio_pci_runtime_pm_exit(vdev); + pm_runtime_resume(&pdev->dev); + + /* * This function calls __pci_reset_function_locked() which internally * can use pci_pm_reset() for the function reset. pci_pm_reset() will * fail if the power state is non-D0. Also, for the devices which @@ -1224,6 +1323,10 @@ int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags, switch (flags & VFIO_DEVICE_FEATURE_MASK) { case VFIO_DEVICE_FEATURE_PCI_VF_TOKEN: return vfio_pci_core_feature_token(device, flags, arg, argsz); + case VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY: + return vfio_pci_core_pm_entry(device, flags, arg, argsz); + case VFIO_DEVICE_FEATURE_LOW_POWER_EXIT: + return vfio_pci_core_pm_exit(device, flags, arg, argsz); default: return -ENOTTY; } @@ -1234,31 +1337,47 @@ static ssize_t vfio_pci_rw(struct vfio_pci_core_device *vdev, char __user *buf, size_t count, loff_t *ppos, bool iswrite) { unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos); + int ret; if (index >= VFIO_PCI_NUM_REGIONS + vdev->num_regions) return -EINVAL; + ret = pm_runtime_resume_and_get(&vdev->pdev->dev); + if (ret < 0) { + pci_info_ratelimited(vdev->pdev, "runtime resume failed %d\n", + ret); + return -EIO; + } + switch (index) { case VFIO_PCI_CONFIG_REGION_INDEX: - return vfio_pci_config_rw(vdev, buf, count, ppos, iswrite); + ret = vfio_pci_config_rw(vdev, buf, count, ppos, iswrite); + break; case VFIO_PCI_ROM_REGION_INDEX: if (iswrite) - return -EINVAL; - return vfio_pci_bar_rw(vdev, buf, count, ppos, false); + ret = -EINVAL; + else + ret = vfio_pci_bar_rw(vdev, buf, count, ppos, false); + break; case VFIO_PCI_BAR0_REGION_INDEX ... VFIO_PCI_BAR5_REGION_INDEX: - return vfio_pci_bar_rw(vdev, buf, count, ppos, iswrite); + ret = vfio_pci_bar_rw(vdev, buf, count, ppos, iswrite); + break; case VFIO_PCI_VGA_REGION_INDEX: - return vfio_pci_vga_rw(vdev, buf, count, ppos, iswrite); + ret = vfio_pci_vga_rw(vdev, buf, count, ppos, iswrite); + break; + default: index -= VFIO_PCI_NUM_REGIONS; - return vdev->region[index].ops->rw(vdev, buf, + ret = vdev->region[index].ops->rw(vdev, buf, count, ppos, iswrite); + break; } - return -EINVAL; + pm_runtime_put(&vdev->pdev->dev); + return ret; } ssize_t vfio_pci_core_read(struct vfio_device *core_vdev, char __user *buf, @@ -1453,7 +1572,11 @@ static vm_fault_t vfio_pci_mmap_fault(struct vm_fault *vmf) mutex_lock(&vdev->vma_lock); down_read(&vdev->memory_lock); - if (!__vfio_pci_memory_enabled(vdev)) { + /* + * Memory region cannot be accessed if the low power feature is engaged + * or memory access is disabled. + */ + if (vdev->pm_runtime_engaged || !__vfio_pci_memory_enabled(vdev)) { ret = VM_FAULT_SIGBUS; goto up_out; } @@ -2164,6 +2287,15 @@ static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set, goto err_unlock; } + /* + * Some of the devices in the dev_set can be in the runtime suspended + * state. Increment the usage count for all the devices in the dev_set + * before reset and decrement the same after reset. + */ + ret = vfio_pci_dev_set_pm_runtime_get(dev_set); + if (ret) + goto err_unlock; + list_for_each_entry(cur_vma, &dev_set->device_list, vdev.dev_set_list) { /* * Test whether all the affected devices are contained by the @@ -2219,6 +2351,9 @@ static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set, else mutex_unlock(&cur->vma_lock); } + + list_for_each_entry(cur, &dev_set->device_list, vdev.dev_set_list) + pm_runtime_put(&cur->pdev->dev); err_unlock: mutex_unlock(&dev_set->lock); return ret; diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h index e96cc3081236..7ec81271bd05 100644 --- a/include/linux/vfio_pci_core.h +++ b/include/linux/vfio_pci_core.h @@ -125,6 +125,7 @@ struct vfio_pci_core_device { bool nointx; bool needs_pm_restore; bool pm_intx_masked; + bool pm_runtime_engaged; struct pci_saved_state *pci_saved_state; struct pci_saved_state *pm_save; int ioeventfds_nr; From patchwork Tue Jul 19 12:15:23 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Abhishek Sahu X-Patchwork-Id: 591790 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A197AC43334 for ; Tue, 19 Jul 2022 12:42:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241460AbiGSMmM (ORCPT ); Tue, 19 Jul 2022 08:42:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47844 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241088AbiGSMkL (ORCPT ); Tue, 19 Jul 2022 08:40:11 -0400 Received: from NAM04-BN8-obe.outbound.protection.outlook.com (mail-bn8nam04on2042.outbound.protection.outlook.com [40.107.100.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1165A7D1EA; Tue, 19 Jul 2022 05:16:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=YT4fYvXCdhXGr81K3mNyXQFKFW6lAoGci7JwmHVBThBOLhsNVXjOP4gxu2RX2ELoHM4G5xRPNCWTgg91tCwjO0EYAiryUkI3ECbOHxJo9Oz/bQqKOZstIkVq3wqB09Lp9ERiLKQg2Q5TzgJa4SrHjMks09NrLOVhZtxST80bD8gfEeMSesMyESGmN/vA79VcXrHzCUko8NzetM7YcXaVcMtI6FeQFoWnDFs2OALJHu5faBzQqzZDWd+30zTpzq5hGzdaANpt2XDN6rZpbJfbMUfFe3yousFtwAyAeXXHMk99qDzfNvPHU4RgLD8jv7a6id6dMYmwvno9a7iYhynTyA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=sUj3qxu9m27AIc6jOuXv8oWnuMFNUnpK1oh7Wdq8gSE=; b=kwN7rKGsSs31x6X+PJNNE6Eg8EmDEimv2wNFK1UIOrpI5toE5cLYb1urz2Bqd6k9QlB5pX/8SRokJXZEwsDs3UmeZD1XYWjYk0O88aDzcsqn+5bibD+MNWfL+O+rbsrmOsRyIf6Kl54LcppkTVXSUaXqV65ivXSVLDrrMajIZ7L0OnFUOMdlv6CJ7rH5p8exMiBOwXYWBlQPPYGfKRZNpfR3X6KtzsuLmXL6wdETuxjysQbEZtKSi0LB8Pv9TV9fKkpOfY3AWqXgnm1MoTJxppG2b9tHEz71QT8pIzMBKjvkyzQYEOyuuqtEeXxwu8XpNbG7XGd32mfI9zOuLEzEBQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 12.22.5.236) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=sUj3qxu9m27AIc6jOuXv8oWnuMFNUnpK1oh7Wdq8gSE=; b=bGtchpVJzArZ3rakp+E2jsE62dt/SIFh/ltmxtxecuSRQfFgiHVIOwR2ptcjJtTZj95h4m3Y0LbCPQEr6qphxb7ZAHLdRVXI1/O/bFTTTHBRxmaGtYIDPd52pe0rwybuthHTXJK/9dsU+YpvBvthjHDrxhVha4Ji8KEDI53GY0Q7VYd48wPVVcHasPva/8B2XLCZvVwyF1O2GOyDRTS/7pbUDnv0JIf3zpax61iVuW03GgftV+Nn5g9C5lbP3RePpeMZrPnU2ZYuWGFf1cKq5/fv1RllbuszQl0G1vftD/q6npOvVH5V7MrElcpmn1Hkk4AcFEhVzBg6hFqnzzHnLg== Received: from DS7PR06CA0018.namprd06.prod.outlook.com (2603:10b6:8:2a::9) by BY5PR12MB4966.namprd12.prod.outlook.com (2603:10b6:a03:1da::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5438.14; Tue, 19 Jul 2022 12:16:07 +0000 Received: from DM6NAM11FT060.eop-nam11.prod.protection.outlook.com (2603:10b6:8:2a:cafe::71) by DS7PR06CA0018.outlook.office365.com (2603:10b6:8:2a::9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5438.12 via Frontend Transport; Tue, 19 Jul 2022 12:16:07 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 12.22.5.236) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 12.22.5.236 as permitted sender) receiver=protection.outlook.com; client-ip=12.22.5.236; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (12.22.5.236) by DM6NAM11FT060.mail.protection.outlook.com (10.13.173.63) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.5438.12 via Frontend Transport; Tue, 19 Jul 2022 12:16:07 +0000 Received: from drhqmail202.nvidia.com (10.126.190.181) by DRHQMAIL109.nvidia.com (10.27.9.19) with Microsoft SMTP Server (TLS) id 15.0.1497.32; Tue, 19 Jul 2022 12:16:06 +0000 Received: from drhqmail202.nvidia.com (10.126.190.181) by drhqmail202.nvidia.com (10.126.190.181) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.26; Tue, 19 Jul 2022 05:16:05 -0700 Received: from nvidia-abhsahu-1.nvidia.com (10.127.8.12) by mail.nvidia.com (10.126.190.181) with Microsoft SMTP Server id 15.2.986.26 via Frontend Transport; Tue, 19 Jul 2022 05:16:00 -0700 From: Abhishek Sahu To: Alex Williamson , Cornelia Huck , Yishai Hadas , Jason Gunthorpe , Shameer Kolothum , Kevin Tian , "Rafael J . Wysocki" CC: Max Gurtovoy , Bjorn Helgaas , , , , , Abhishek Sahu Subject: [PATCH v5 5/5] vfio/pci: Implement VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP Date: Tue, 19 Jul 2022 17:45:23 +0530 Message-ID: <20220719121523.21396-6-abhsahu@nvidia.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220719121523.21396-1-abhsahu@nvidia.com> References: <20220719121523.21396-1-abhsahu@nvidia.com> X-NVConfidentiality: public MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 57cbd838-601b-4f56-0922-08da69807554 X-MS-TrafficTypeDiagnostic: BY5PR12MB4966:EE_ X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: WfmQxHTvEx8kNKj16NuKjH2orlyCj0WPhf5YWeEfgwHUDOhOXCZYRQTnahH26AzdjgpQ97iNC7iLbMuag340mo3OOSQ3DoARPToli4n2DScKQOG8dPW3s3kWJrKeFmegcml/irBFB1yQaUkQzdsl3YNXfbM2TTAh4LTgi3mpmmtWEmG0huTCtYiOL2oAIq4zI10FPl//8BqbB08q6GfJEfCHqp5kfBUDXTI54vepXgCSdUIbjhBtIBZ10HjfHNq2x+ezvyVfFtFoB2koaZsYK4YaG/sEZVl/zymy3NSX/PyEOuFh9ZsRfDUtTN8G9VCeNHTTJZtnStrmSFFSARuKHf+rvCaQ2MIzfloAZpaBFqx2ja1iFDJuIg6xRKRbyURBlif4yhvX426IrLPgfU9o/rfTRgFNn/b3mrxb2QP9g5z7pcxislSc1qrMQEqwQLDS6u7hgSWBfcwEcz+gr0KG9pV+zWiUorCB8pPw0XkBcoM5/p3oo088jKofL44Rh4clUBser9dTcBcBxkK6PBh5VKKh/xRcr2aCNgpv5+EdBpRrx+5MeGZ9ESQ6Ealh8Cmdf0hvYL5N1dwBUjatqQ+3RlJXkfLeHDVNJ75A1HHaLwEj6wbEHXUY4H9H3PW4mJwhPKJiioReaFNXnQGX+XWnWSHXG2ACftDdNK0aN+elqvmAn10DxGHjSLJCYNs1Bz8qaILhLKQOlo3qI79k161kdv8MO07zv2uCLU4MtZ2gsH/CWlPkQnlU0hBZ/Nj7HEmY0xVsUm+V/qC+FLGq4wUkk5XYXkBw5C5RtSZpGMVa3z6b3diV8/FAP6P0n/E8HFVRJj098I+y9Jut/2DrqzXEwA== X-Forefront-Antispam-Report: CIP:12.22.5.236; CTRY:US; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:mail.nvidia.com; PTR:InfoNoRecords; CAT:NONE; SFS:(13230016)(4636009)(346002)(39860400002)(376002)(136003)(396003)(46966006)(36840700001)(40470700004)(8936002)(7416002)(5660300002)(2906002)(356005)(82310400005)(82740400003)(81166007)(26005)(54906003)(316002)(110136005)(2616005)(107886003)(7696005)(478600001)(1076003)(6666004)(70586007)(4326008)(70206006)(8676002)(36860700001)(426003)(47076005)(336012)(186003)(40460700003)(83380400001)(36756003)(41300700001)(40480700001)(86362001)(36900700001); DIR:OUT; SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 19 Jul 2022 12:16:07.0424 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 57cbd838-601b-4f56-0922-08da69807554 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a; Ip=[12.22.5.236]; Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: DM6NAM11FT060.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: BY5PR12MB4966 Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org This patch implements VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP device feature. In the VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY, if there is any access for the VFIO device on the host side, then the device will be moved out of the low power state without the user's guest driver involvement. Once the device access has been finished, then the device will be moved again into low power state. With the low power entry happened through VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP, the device will not be moved back into the low power state and a notification will be sent to the user by triggering wakeup eventfd. vfio_pci_core_pm_entry() will be called for both the variants of low power feature entry so add an extra argument for wakeup eventfd context and store locally in 'struct vfio_pci_core_device'. For the entry happened without wakeup eventfd, all the exit related handling will be done by the LOW_POWER_EXIT device feature only. When the LOW_POWER_EXIT will be called, then the vfio core layer vfio_device_pm_runtime_get() will increment the usage count and will resume the device. In the driver runtime_resume callback, the 'pm_wake_eventfd_ctx' will be NULL so the vfio_pci_runtime_pm_exit() will return early. Then vfio_pci_core_pm_exit() will again call vfio_pci_runtime_pm_exit() and now the exit related handling will be done. For the entry happened with wakeup eventfd, in the driver resume callback, eventfd will be triggered and all the exit related handling will be done. When vfio_pci_runtime_pm_exit() will be called by vfio_pci_core_pm_exit(), then it will return early. But if the user has disabled the runtime PM on the host side, the device will never go runtime suspended state and in this case, all the exit related handling will be done during vfio_pci_core_pm_exit() only. Also, the eventfd will not be triggered since the device power state has not been changed by the host driver. For vfio_pci_core_disable() also, all the exit related handling needs to be done if user has closed the device after putting into low power. In this case eventfd will not be triggered since the device close has been initiated by the user only. Signed-off-by: Abhishek Sahu --- drivers/vfio/pci/vfio_pci_core.c | 78 ++++++++++++++++++++++++++++++-- include/linux/vfio_pci_core.h | 1 + 2 files changed, 74 insertions(+), 5 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c index 726a6f282496..dbe942bcaa67 100644 --- a/drivers/vfio/pci/vfio_pci_core.c +++ b/drivers/vfio/pci/vfio_pci_core.c @@ -259,7 +259,8 @@ int vfio_pci_set_power_state(struct vfio_pci_core_device *vdev, pci_power_t stat return ret; } -static int vfio_pci_runtime_pm_entry(struct vfio_pci_core_device *vdev) +static int vfio_pci_runtime_pm_entry(struct vfio_pci_core_device *vdev, + struct eventfd_ctx *efdctx) { /* * The vdev power related flags are protected with 'memory_lock' @@ -272,6 +273,7 @@ static int vfio_pci_runtime_pm_entry(struct vfio_pci_core_device *vdev) } vdev->pm_runtime_engaged = true; + vdev->pm_wake_eventfd_ctx = efdctx; pm_runtime_put_noidle(&vdev->pdev->dev); up_write(&vdev->memory_lock); @@ -295,21 +297,67 @@ static int vfio_pci_core_pm_entry(struct vfio_device *device, u32 flags, * while returning from the ioctl and then the device can go into * runtime suspended state. */ - return vfio_pci_runtime_pm_entry(vdev); + return vfio_pci_runtime_pm_entry(vdev, NULL); } -static void vfio_pci_runtime_pm_exit(struct vfio_pci_core_device *vdev) +static int +vfio_pci_core_pm_entry_with_wakeup(struct vfio_device *device, u32 flags, + void __user *arg, size_t argsz) +{ + struct vfio_pci_core_device *vdev = + container_of(device, struct vfio_pci_core_device, vdev); + struct vfio_device_low_power_entry_with_wakeup entry; + struct eventfd_ctx *efdctx; + int ret; + + ret = vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_SET, + sizeof(entry)); + if (ret != 1) + return ret; + + if (copy_from_user(&entry, arg, sizeof(entry))) + return -EFAULT; + + if (entry.wakeup_eventfd < 0) + return -EINVAL; + + efdctx = eventfd_ctx_fdget(entry.wakeup_eventfd); + if (IS_ERR(efdctx)) + return PTR_ERR(efdctx); + + ret = vfio_pci_runtime_pm_entry(vdev, efdctx); + if (ret) + eventfd_ctx_put(efdctx); + + return ret; +} + +static void vfio_pci_runtime_pm_exit(struct vfio_pci_core_device *vdev, + bool resume_callback) { /* * The vdev power related flags are protected with 'memory_lock' * semaphore. */ down_write(&vdev->memory_lock); + if (resume_callback && !vdev->pm_wake_eventfd_ctx) { + up_write(&vdev->memory_lock); + return; + } + if (vdev->pm_runtime_engaged) { vdev->pm_runtime_engaged = false; pm_runtime_get_noresume(&vdev->pdev->dev); } + if (vdev->pm_wake_eventfd_ctx) { + if (resume_callback) + eventfd_signal(vdev->pm_wake_eventfd_ctx, 1); + + eventfd_ctx_put(vdev->pm_wake_eventfd_ctx); + vdev->pm_wake_eventfd_ctx = NULL; + } + up_write(&vdev->memory_lock); } @@ -329,8 +377,18 @@ static int vfio_pci_core_pm_exit(struct vfio_device *device, u32 flags, * vfio_pci_runtime_pm_exit() will internally increment the usage * count corresponding to pm_runtime_put() called during low power * feature entry. + * + * For the low power entry happened with wakeup eventfd, there will + * be two cases: + * + * 1. The device has gone into runtime suspended state. In this case, + * the runtime resume by the vfio core layer should already have + * performed all exit related handling and the + * vfio_pci_runtime_pm_exit() will return early. + * 2. The device was in runtime active state. In this case, the + * vfio_pci_runtime_pm_exit() will do all the required handling. */ - vfio_pci_runtime_pm_exit(vdev); + vfio_pci_runtime_pm_exit(vdev, false); return 0; } @@ -370,6 +428,13 @@ static int vfio_pci_core_runtime_resume(struct device *dev) if (vdev->pm_intx_masked) vfio_pci_intx_unmask(vdev); + /* + * Only for the low power entry happened with wakeup eventfd, + * the vfio_pci_runtime_pm_exit() will perform exit related handling + * and will trigger eventfd. For the other cases, it will return early. + */ + vfio_pci_runtime_pm_exit(vdev, true); + return 0; } #endif /* CONFIG_PM */ @@ -488,7 +553,7 @@ void vfio_pci_core_disable(struct vfio_pci_core_device *vdev) * the vfio_pci_set_power_state() will change the device power state * to D0. */ - vfio_pci_runtime_pm_exit(vdev); + vfio_pci_runtime_pm_exit(vdev, false); pm_runtime_resume(&pdev->dev); /* @@ -1325,6 +1390,9 @@ int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags, return vfio_pci_core_feature_token(device, flags, arg, argsz); case VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY: return vfio_pci_core_pm_entry(device, flags, arg, argsz); + case VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP: + return vfio_pci_core_pm_entry_with_wakeup(device, flags, + arg, argsz); case VFIO_DEVICE_FEATURE_LOW_POWER_EXIT: return vfio_pci_core_pm_exit(device, flags, arg, argsz); default: diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h index 7ec81271bd05..fb25214e85c8 100644 --- a/include/linux/vfio_pci_core.h +++ b/include/linux/vfio_pci_core.h @@ -131,6 +131,7 @@ struct vfio_pci_core_device { int ioeventfds_nr; struct eventfd_ctx *err_trigger; struct eventfd_ctx *req_trigger; + struct eventfd_ctx *pm_wake_eventfd_ctx; struct list_head dummy_resources_list; struct mutex ioeventfds_lock; struct list_head ioeventfds_list;