From patchwork Tue Jun 15 14:13:22 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yongji Xie X-Patchwork-Id: 460904 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 015D2C48BDF for ; Tue, 15 Jun 2021 14:16:29 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DA77F61459 for ; Tue, 15 Jun 2021 14:16:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231666AbhFOOS2 (ORCPT ); Tue, 15 Jun 2021 10:18:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36740 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231389AbhFOORF (ORCPT ); Tue, 15 Jun 2021 10:17:05 -0400 Received: from mail-pj1-x1029.google.com (mail-pj1-x1029.google.com [IPv6:2607:f8b0:4864:20::1029]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1A940C06114E for ; Tue, 15 Jun 2021 07:13:58 -0700 (PDT) Received: by mail-pj1-x1029.google.com with SMTP id ei4so11861862pjb.3 for ; Tue, 15 Jun 2021 07:13:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=uqLi4n/sYt1V7zlXO7PEWrtyAjrct3aaeB0JWe6pldc=; b=hdxih8MIuJMG7DQ8ZmQrXh42xZ0vXefzPJ9Is4yeGeXwK9ruXKC6EBFn8DHqD+qX4B mXWvCicjq9IAlBh5wT9cKgIXtBv2V1Jt8Ml734PbO5KZR6ZfwAOwp2HqEzKJFVZ6LEjc jpr3DR3C7jV6q+xycSeDEyiUG5tdTsrQyhBszjQhllpSC6ihPrnYO3rbVtT0URaBUp8J Cbzhv1Hxb4BR27Ug0G/lD303gI6VNhL+3mm6r0pj60g/Lc5g3jB3iCJcKlWZyawmvu9X Mql527KcPIDMqPFgJV5a0yXFc8HVrjnBAmxl4aFB4nfYUtPTjOuU1i+1kYhrIB4V+EUy 44+Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=uqLi4n/sYt1V7zlXO7PEWrtyAjrct3aaeB0JWe6pldc=; b=dlEumr2C/Z+K69+DJbIh2vk8Wky6KGy/oftn2uDhrQ09Qiuie4OVlDl48DkK3d5kBL ffBGEcEiYvkLxuvBYL8whm0Sw5KYzAcnIS29GcAX8/3y0eBzsvDUj6RGzDp0FKi28ANP Ppqt/F3RNQdbfvl+LfVC1u8dDlguhWo19BcLwShvngyzimQo9pPpxiIIpktWno72ue8X KcReRROAkSFLPqkQRqbPDYCAf5xBjdGGxjXKIyvCSdZLkY0fwrE0o+B3GLaPfwUqVXYd 88ZdKKR2GGctJbxVw3l4AQDdW0y9XVVRRyUPFCXhdryhrDHudMy0NVdiZe72OznRQoMb Kfcw== X-Gm-Message-State: AOAM530uaZYIyxOhT70LiMKCYRdv96HqQlXlzdyFSt5ttieiTc9M+9wq ITjtTFUke2fojrX9tvcZbw7x X-Google-Smtp-Source: ABdhPJwxO854wUtIh12JMTHVMysUOJZdmysPCLLcn5itcs7UedE9FRLaM9NJQ6d/dAmz1Dc2dO9hng== X-Received: by 2002:a17:903:152:b029:10f:f6f7:ede5 with SMTP id r18-20020a1709030152b029010ff6f7ede5mr4636909plc.20.1623766437651; Tue, 15 Jun 2021 07:13:57 -0700 (PDT) Received: from localhost ([139.177.225.241]) by smtp.gmail.com with ESMTPSA id t14sm17322236pgm.9.2021.06.15.07.13.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 15 Jun 2021 07:13:57 -0700 (PDT) From: Xie Yongji To: mst@redhat.com, jasowang@redhat.com, stefanha@redhat.com, sgarzare@redhat.com, parav@nvidia.com, hch@infradead.org, christian.brauner@canonical.com, rdunlap@infradead.org, willy@infradead.org, viro@zeniv.linux.org.uk, axboe@kernel.dk, bcrl@kvack.org, corbet@lwn.net, mika.penttila@nextfour.com, dan.carpenter@oracle.com, joro@8bytes.org, gregkh@linuxfoundation.org Cc: songmuchun@bytedance.com, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, kvm@vger.kernel.org, linux-fsdevel@vger.kernel.org, iommu@lists.linux-foundation.org, linux-kernel@vger.kernel.org Subject: [PATCH v8 01/10] iova: Export alloc_iova_fast() and free_iova_fast(); Date: Tue, 15 Jun 2021 22:13:22 +0800 Message-Id: <20210615141331.407-2-xieyongji@bytedance.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210615141331.407-1-xieyongji@bytedance.com> References: <20210615141331.407-1-xieyongji@bytedance.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Export alloc_iova_fast() and free_iova_fast() so that some modules can use it to improve iova allocation efficiency. Signed-off-by: Xie Yongji --- drivers/iommu/iova.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c index b7ecd5b08039..59916b4b7fe9 100644 --- a/drivers/iommu/iova.c +++ b/drivers/iommu/iova.c @@ -518,6 +518,7 @@ alloc_iova_fast(struct iova_domain *iovad, unsigned long size, return new_iova->pfn_lo; } +EXPORT_SYMBOL_GPL(alloc_iova_fast); /** * free_iova_fast - free iova pfn range into rcache @@ -535,6 +536,7 @@ free_iova_fast(struct iova_domain *iovad, unsigned long pfn, unsigned long size) free_iova(iovad, pfn); } +EXPORT_SYMBOL_GPL(free_iova_fast); #define fq_ring_for_each(i, fq) \ for ((i) = (fq)->head; (i) != (fq)->tail; (i) = ((i) + 1) % IOVA_FQ_SIZE) From patchwork Tue Jun 15 14:13:23 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yongji Xie X-Patchwork-Id: 460902 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 23CEEC48BE5 for ; Tue, 15 Jun 2021 14:18:50 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0179861078 for ; Tue, 15 Jun 2021 14:18:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231218AbhFOOUv (ORCPT ); Tue, 15 Jun 2021 10:20:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36762 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230463AbhFOOR2 (ORCPT ); Tue, 15 Jun 2021 10:17:28 -0400 Received: from mail-pj1-x1034.google.com (mail-pj1-x1034.google.com [IPv6:2607:f8b0:4864:20::1034]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B5BC3C0613A4 for ; Tue, 15 Jun 2021 07:14:02 -0700 (PDT) Received: by mail-pj1-x1034.google.com with SMTP id k5so11867469pjj.1 for ; Tue, 15 Jun 2021 07:14:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=x7lCOA+2uHcoDlHpnAPiyr08qHLnKknrRwhD42bW14c=; b=Q98Ihm2Nik3q5vdP0zcSt08K34vWS/rvAoSEN52J/uuTlzvnsiNq88Z5ucJKY0yf4T WFJOezngtVgbaHQTYK9IFHEeP7OPorUcohm0GpwG5OPARKnRlnDiuj+ZSJdIkZhENFhP Vq0P8aiksUKM/urNGX/K/vS0M5QV+Ai6ruEHeoTG1ZqikOBMRgtepe++5r/KmuRM0Wp7 fQBL37D40now9tM5jNX9bzTs/tFtUeKXIWuG8uRaMPZOawdA+TREx5a2jmwy7Z5WlYma oneZ7pit7q6XulGSMFYPI+9TNvoTTQj76XvP+orlujxi0oRlAXS8GnDtyHjoFs7/bu4w pIwg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=x7lCOA+2uHcoDlHpnAPiyr08qHLnKknrRwhD42bW14c=; b=OMqZG/hm4LyBch9Ov2vtn1VDUTp0pVQmPshmp8WWehyzbtUoDe+5VYXnMyYaIm7rES LIcDkABgfqd5ohOzy8ydEeaSu3UFCPHWpTEzd41UYXg/hsvRDSG48xs5qljgBM4cpvcW z43M6ho2EMl3xUc/uYBJeSgp8WHc6vumR2LYZIp8bVgjqi6IS09kJ1Hry/6pOtECiBi1 YOEJsSb/cVoRJyFnuCaXCiSVqu4DiyteY0IPpPNgkxqLNKNo67EoUFH/HfOnGeIcxVwD SpAiaE4XX9BfLuHUcmSj6FOFPgySXfmh+jmWZ4kRJNhmmlFavp3j640lurnHa6YiP30Z nXFQ== X-Gm-Message-State: AOAM530SPumhY/WqBNFLoqypPTNwFY5XDIvaNS+4jJ0xNn9ZFgdsN4c/ j84jkCqlr6xJtTqWVNxcZGen X-Google-Smtp-Source: ABdhPJyMTsqXCPkyBqwNtnh8NXZihbFI2JopRQ8BKBQqq64DywQJiFaJ/V9M6qRN7G+4EeFFkHyAjg== X-Received: by 2002:a17:90b:b18:: with SMTP id bf24mr25002261pjb.220.1623766442317; Tue, 15 Jun 2021 07:14:02 -0700 (PDT) Received: from localhost ([139.177.225.241]) by smtp.gmail.com with ESMTPSA id u13sm16031239pga.64.2021.06.15.07.14.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 15 Jun 2021 07:14:01 -0700 (PDT) From: Xie Yongji To: mst@redhat.com, jasowang@redhat.com, stefanha@redhat.com, sgarzare@redhat.com, parav@nvidia.com, hch@infradead.org, christian.brauner@canonical.com, rdunlap@infradead.org, willy@infradead.org, viro@zeniv.linux.org.uk, axboe@kernel.dk, bcrl@kvack.org, corbet@lwn.net, mika.penttila@nextfour.com, dan.carpenter@oracle.com, joro@8bytes.org, gregkh@linuxfoundation.org Cc: songmuchun@bytedance.com, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, kvm@vger.kernel.org, linux-fsdevel@vger.kernel.org, iommu@lists.linux-foundation.org, linux-kernel@vger.kernel.org Subject: [PATCH v8 02/10] file: Export receive_fd() to modules Date: Tue, 15 Jun 2021 22:13:23 +0800 Message-Id: <20210615141331.407-3-xieyongji@bytedance.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210615141331.407-1-xieyongji@bytedance.com> References: <20210615141331.407-1-xieyongji@bytedance.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Export receive_fd() so that some modules can use it to pass file descriptor between processes without missing any security stuffs. Signed-off-by: Xie Yongji --- fs/file.c | 6 ++++++ include/linux/file.h | 7 +++---- 2 files changed, 9 insertions(+), 4 deletions(-) diff --git a/fs/file.c b/fs/file.c index 86dc9956af32..210e540672aa 100644 --- a/fs/file.c +++ b/fs/file.c @@ -1134,6 +1134,12 @@ int receive_fd_replace(int new_fd, struct file *file, unsigned int o_flags) return new_fd; } +int receive_fd(struct file *file, unsigned int o_flags) +{ + return __receive_fd(file, NULL, o_flags); +} +EXPORT_SYMBOL_GPL(receive_fd); + static int ksys_dup3(unsigned int oldfd, unsigned int newfd, int flags) { int err = -EBADF; diff --git a/include/linux/file.h b/include/linux/file.h index 2de2e4613d7b..51e830b4fe3a 100644 --- a/include/linux/file.h +++ b/include/linux/file.h @@ -94,6 +94,9 @@ extern void fd_install(unsigned int fd, struct file *file); extern int __receive_fd(struct file *file, int __user *ufd, unsigned int o_flags); + +extern int receive_fd(struct file *file, unsigned int o_flags); + static inline int receive_fd_user(struct file *file, int __user *ufd, unsigned int o_flags) { @@ -101,10 +104,6 @@ static inline int receive_fd_user(struct file *file, int __user *ufd, return -EFAULT; return __receive_fd(file, ufd, o_flags); } -static inline int receive_fd(struct file *file, unsigned int o_flags) -{ - return __receive_fd(file, NULL, o_flags); -} int receive_fd_replace(int new_fd, struct file *file, unsigned int o_flags); extern void flush_delayed_fput(void); From patchwork Tue Jun 15 14:13:25 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yongji Xie X-Patchwork-Id: 460903 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EC4C8C48BDF for ; Tue, 15 Jun 2021 14:17:37 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id CDB3C61481 for ; Tue, 15 Jun 2021 14:17:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230076AbhFOOTk (ORCPT ); Tue, 15 Jun 2021 10:19:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36834 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231538AbhFOOSA (ORCPT ); Tue, 15 Jun 2021 10:18:00 -0400 Received: from mail-pl1-x62f.google.com (mail-pl1-x62f.google.com [IPv6:2607:f8b0:4864:20::62f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 912F6C061157 for ; Tue, 15 Jun 2021 07:14:12 -0700 (PDT) Received: by mail-pl1-x62f.google.com with SMTP id h1so8544367plt.1 for ; Tue, 15 Jun 2021 07:14:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=WWAi/i4+n9XG5D57vjCca2tNocYxaQJOYj5KH2bJ384=; b=0kIA76yEil4BDyN3USnHD3/M8FrPXcpFiBtM5SrfGm4D/hRLIWWQevjzMrmXkvkdwU /I8AKMWCkHdw2qTuchBfN/PK2PZZLJMcsPq6LBQFkREk6RWRwnzts6osF//EZXIafzq0 annTM6mRM+CObk1LmodpcuGI18iPMdrO+f07wL2i7sL4hExnopN7djnuXypf9qHNytXJ GbZ58bvDW/a2OGejh65bmDZlKAV+279i3TTA8RA+It3vMuXrP07HIIzP9P/tUf0O5Ghx ygd5K2GEyaRP+PSFqmHdZRFnbeBqy6jOJ+pho5sg6ufX4iyz0SatX81p0eMgUcrnXIGv ohrQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=WWAi/i4+n9XG5D57vjCca2tNocYxaQJOYj5KH2bJ384=; b=rOWmbEbEom2fzFvItF9SwhojqQdDd/DK0YrOf/SCiOWvziK9cehOzahkFn2VdLNkLk MklOptP4RJTmluB+mfnghdoddxgJpEvcgdESHSJUXh86YgaBsQb3ZOqKanK42oYG8XB6 skMFHwvriQd3XaYZhUPJFTc40QnmKHO1v/S76zRCJCO551VCFfg6SbanfZG7UstQdYlF o8NwgvJULBl8sTIh9bz/WhN+K3onOb9F/GDiTMUcmi1t1YV+y3SJc8Y5FAsP6V3BjSBH pB3OrwLcHopDy9BDcEtNUa/Y47c3LCZeu4EhWUFXT6dCVS7vBv/uHmdBAGIvePlg/Ycv Uong== X-Gm-Message-State: AOAM530h5Wnj/XjY8i5YL7R46S+wZTNx8kKvxU3ETAHKnaZH8AH/V8Hn zE4pTtUr3kVMVuGaPUBeF5tE X-Google-Smtp-Source: ABdhPJwW+q5dHWJSz6MNIvxr3iew20vYFe9rQ1otKyLj3TsWzRln7lNFm51HBugcs9X5HKNDyYkcuA== X-Received: by 2002:a17:90a:6482:: with SMTP id h2mr5018877pjj.188.1623766451749; Tue, 15 Jun 2021 07:14:11 -0700 (PDT) Received: from localhost ([139.177.225.241]) by smtp.gmail.com with ESMTPSA id f18sm2692741pjq.48.2021.06.15.07.14.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 15 Jun 2021 07:14:11 -0700 (PDT) From: Xie Yongji To: mst@redhat.com, jasowang@redhat.com, stefanha@redhat.com, sgarzare@redhat.com, parav@nvidia.com, hch@infradead.org, christian.brauner@canonical.com, rdunlap@infradead.org, willy@infradead.org, viro@zeniv.linux.org.uk, axboe@kernel.dk, bcrl@kvack.org, corbet@lwn.net, mika.penttila@nextfour.com, dan.carpenter@oracle.com, joro@8bytes.org, gregkh@linuxfoundation.org Cc: songmuchun@bytedance.com, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, kvm@vger.kernel.org, linux-fsdevel@vger.kernel.org, iommu@lists.linux-foundation.org, linux-kernel@vger.kernel.org Subject: [PATCH v8 04/10] vhost-iotlb: Add an opaque pointer for vhost IOTLB Date: Tue, 15 Jun 2021 22:13:25 +0800 Message-Id: <20210615141331.407-5-xieyongji@bytedance.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210615141331.407-1-xieyongji@bytedance.com> References: <20210615141331.407-1-xieyongji@bytedance.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Add an opaque pointer for vhost IOTLB. And introduce vhost_iotlb_add_range_ctx() to accept it. Suggested-by: Jason Wang Signed-off-by: Xie Yongji Acked-by: Jason Wang --- drivers/vhost/iotlb.c | 20 ++++++++++++++++---- include/linux/vhost_iotlb.h | 3 +++ 2 files changed, 19 insertions(+), 4 deletions(-) diff --git a/drivers/vhost/iotlb.c b/drivers/vhost/iotlb.c index 0fd3f87e913c..5c99e1112cbb 100644 --- a/drivers/vhost/iotlb.c +++ b/drivers/vhost/iotlb.c @@ -36,19 +36,21 @@ void vhost_iotlb_map_free(struct vhost_iotlb *iotlb, EXPORT_SYMBOL_GPL(vhost_iotlb_map_free); /** - * vhost_iotlb_add_range - add a new range to vhost IOTLB + * vhost_iotlb_add_range_ctx - add a new range to vhost IOTLB * @iotlb: the IOTLB * @start: start of the IOVA range * @last: last of IOVA range * @addr: the address that is mapped to @start * @perm: access permission of this range + * @opaque: the opaque pointer for the new mapping * * Returns an error last is smaller than start or memory allocation * fails */ -int vhost_iotlb_add_range(struct vhost_iotlb *iotlb, - u64 start, u64 last, - u64 addr, unsigned int perm) +int vhost_iotlb_add_range_ctx(struct vhost_iotlb *iotlb, + u64 start, u64 last, + u64 addr, unsigned int perm, + void *opaque) { struct vhost_iotlb_map *map; @@ -71,6 +73,7 @@ int vhost_iotlb_add_range(struct vhost_iotlb *iotlb, map->last = last; map->addr = addr; map->perm = perm; + map->opaque = opaque; iotlb->nmaps++; vhost_iotlb_itree_insert(map, &iotlb->root); @@ -80,6 +83,15 @@ int vhost_iotlb_add_range(struct vhost_iotlb *iotlb, return 0; } +EXPORT_SYMBOL_GPL(vhost_iotlb_add_range_ctx); + +int vhost_iotlb_add_range(struct vhost_iotlb *iotlb, + u64 start, u64 last, + u64 addr, unsigned int perm) +{ + return vhost_iotlb_add_range_ctx(iotlb, start, last, + addr, perm, NULL); +} EXPORT_SYMBOL_GPL(vhost_iotlb_add_range); /** diff --git a/include/linux/vhost_iotlb.h b/include/linux/vhost_iotlb.h index 6b09b786a762..2d0e2f52f938 100644 --- a/include/linux/vhost_iotlb.h +++ b/include/linux/vhost_iotlb.h @@ -17,6 +17,7 @@ struct vhost_iotlb_map { u32 perm; u32 flags_padding; u64 __subtree_last; + void *opaque; }; #define VHOST_IOTLB_FLAG_RETIRE 0x1 @@ -29,6 +30,8 @@ struct vhost_iotlb { unsigned int flags; }; +int vhost_iotlb_add_range_ctx(struct vhost_iotlb *iotlb, u64 start, u64 last, + u64 addr, unsigned int perm, void *opaque); int vhost_iotlb_add_range(struct vhost_iotlb *iotlb, u64 start, u64 last, u64 addr, unsigned int perm); void vhost_iotlb_del_range(struct vhost_iotlb *iotlb, u64 start, u64 last); From patchwork Tue Jun 15 14:13:28 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yongji Xie X-Patchwork-Id: 460901 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 06079C49361 for ; Tue, 15 Jun 2021 14:18:57 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E56E461001 for ; Tue, 15 Jun 2021 14:18:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231904AbhFOOU7 (ORCPT ); Tue, 15 Jun 2021 10:20:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36798 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231783AbhFOOTE (ORCPT ); Tue, 15 Jun 2021 10:19:04 -0400 Received: from mail-pf1-x432.google.com (mail-pf1-x432.google.com [IPv6:2607:f8b0:4864:20::432]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 30E1DC061A11 for ; Tue, 15 Jun 2021 07:14:28 -0700 (PDT) Received: by mail-pf1-x432.google.com with SMTP id k15so13374455pfp.6 for ; Tue, 15 Jun 2021 07:14:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=f3K7Bwh6/88PduZAR/q+JZdxdkelTwBsutq2rLTQ9UA=; b=txIlCbZV0aH9iPOoF/jsZtqCivgZjTayj8QCSsBJDqFFwVBs1OLse3khsoJwxFBR+0 ZAsafx4drbVcC6iRcaMZwRxmN5nNKeZFOio0rBkwhuGwgxMvFfXXUg1PCyudwc/fsADb 9SUKs8/zJTEsZFeY8Hr3tbitMZwnXfkChHFwckWxgY0XKqtl0gSajY5VMqz7gQ6g8Xvt Dffjg6zfN5PnuBJkstlYnORXTTA2ksnuTvRjD6+zppp6izAJO44iFgcIiyzepxsgDhD7 nzTb9yu4fc1ASvPY47iuKXiuZi43wOH2aHbut0KUfMrLN/6AQwUbcwr9NW2hNzCIRhfi tozQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=f3K7Bwh6/88PduZAR/q+JZdxdkelTwBsutq2rLTQ9UA=; b=U6xwR0GUIj8d+CtGKR0D3EQjvIP9AprEn4IkP6MsHAmIIC/H4I7hVMU2qk2K+A7lxS soDE1ejOutkY5JP5jdqxJ29sJMiDjiSAyvK9V+DfEhhJT0m58yTERyrJrttqNVOgwe1R U/i4KUBFP+ZAHeFX+NPAB6ZgmpRdzu911x2IZbc7nBaXknBqIkrGCWbCxJYzmlwbMkOl ZaDEqKEhFs0Mv1waSiUpwnBAF1e3tI51adSXMk7NgkFqs5p6I/CQEJkcjyYquhaW+Sz7 rmceCNOMTiuoegNM8bp7BwJ7FEv3h1O9P997j9mcW+szp9cIUybA4sc2vohdYfE+QbTL AtBA== X-Gm-Message-State: AOAM530U7bpImj3yJsG99Zrvmj92VETu84pfoNjhpgfCsr3L7BOee4bl HTTHug0t3FQtmjuKss8gpIpS X-Google-Smtp-Source: ABdhPJxAhrRC4JrXECWzc6o4UmNRwBTfiaDuZAbOptwogWOI11mDdKVA3fTQArRF4jqaf3VK/quBuw== X-Received: by 2002:a65:68c8:: with SMTP id k8mr13896598pgt.130.1623766467681; Tue, 15 Jun 2021 07:14:27 -0700 (PDT) Received: from localhost ([139.177.225.241]) by smtp.gmail.com with ESMTPSA id gz14sm15339353pjb.18.2021.06.15.07.14.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 15 Jun 2021 07:14:27 -0700 (PDT) From: Xie Yongji To: mst@redhat.com, jasowang@redhat.com, stefanha@redhat.com, sgarzare@redhat.com, parav@nvidia.com, hch@infradead.org, christian.brauner@canonical.com, rdunlap@infradead.org, willy@infradead.org, viro@zeniv.linux.org.uk, axboe@kernel.dk, bcrl@kvack.org, corbet@lwn.net, mika.penttila@nextfour.com, dan.carpenter@oracle.com, joro@8bytes.org, gregkh@linuxfoundation.org Cc: songmuchun@bytedance.com, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, kvm@vger.kernel.org, linux-fsdevel@vger.kernel.org, iommu@lists.linux-foundation.org, linux-kernel@vger.kernel.org Subject: [PATCH v8 07/10] vdpa: Support transferring virtual addressing during DMA mapping Date: Tue, 15 Jun 2021 22:13:28 +0800 Message-Id: <20210615141331.407-8-xieyongji@bytedance.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210615141331.407-1-xieyongji@bytedance.com> References: <20210615141331.407-1-xieyongji@bytedance.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This patch introduces an attribute for vDPA device to indicate whether virtual address can be used. If vDPA device driver set it, vhost-vdpa bus driver will not pin user page and transfer userspace virtual address instead of physical address during DMA mapping. And corresponding vma->vm_file and offset will be also passed as an opaque pointer. Suggested-by: Jason Wang Signed-off-by: Xie Yongji Acked-by: Jason Wang --- drivers/vdpa/ifcvf/ifcvf_main.c | 2 +- drivers/vdpa/mlx5/net/mlx5_vnet.c | 2 +- drivers/vdpa/vdpa.c | 9 +++- drivers/vdpa/vdpa_sim/vdpa_sim.c | 2 +- drivers/vdpa/virtio_pci/vp_vdpa.c | 2 +- drivers/vhost/vdpa.c | 99 ++++++++++++++++++++++++++++++++++----- include/linux/vdpa.h | 19 ++++++-- 7 files changed, 116 insertions(+), 19 deletions(-) diff --git a/drivers/vdpa/ifcvf/ifcvf_main.c b/drivers/vdpa/ifcvf/ifcvf_main.c index ab0ab5cf0f6e..daf9746e51e6 100644 --- a/drivers/vdpa/ifcvf/ifcvf_main.c +++ b/drivers/vdpa/ifcvf/ifcvf_main.c @@ -476,7 +476,7 @@ static int ifcvf_probe(struct pci_dev *pdev, const struct pci_device_id *id) } adapter = vdpa_alloc_device(struct ifcvf_adapter, vdpa, - dev, &ifc_vdpa_ops, NULL); + dev, &ifc_vdpa_ops, NULL, false); if (adapter == NULL) { IFCVF_ERR(pdev, "Failed to allocate vDPA structure"); return -ENOMEM; diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c index dda5dc6f7737..2b7ca111f039 100644 --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c @@ -2012,7 +2012,7 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev *v_mdev, const char *name) max_vqs = min_t(u32, max_vqs, MLX5_MAX_SUPPORTED_VQS); ndev = vdpa_alloc_device(struct mlx5_vdpa_net, mvdev.vdev, mdev->device, &mlx5_vdpa_ops, - name); + name, false); if (IS_ERR(ndev)) return PTR_ERR(ndev); diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c index bb3f1d1f0422..8f01d6a7ecc5 100644 --- a/drivers/vdpa/vdpa.c +++ b/drivers/vdpa/vdpa.c @@ -71,6 +71,7 @@ static void vdpa_release_dev(struct device *d) * @config: the bus operations that is supported by this device * @size: size of the parent structure that contains private data * @name: name of the vdpa device; optional. + * @use_va: indicate whether virtual address must be used by this device * * Driver should use vdpa_alloc_device() wrapper macro instead of * using this directly. @@ -80,7 +81,8 @@ static void vdpa_release_dev(struct device *d) */ struct vdpa_device *__vdpa_alloc_device(struct device *parent, const struct vdpa_config_ops *config, - size_t size, const char *name) + size_t size, const char *name, + bool use_va) { struct vdpa_device *vdev; int err = -EINVAL; @@ -91,6 +93,10 @@ struct vdpa_device *__vdpa_alloc_device(struct device *parent, if (!!config->dma_map != !!config->dma_unmap) goto err; + /* It should only work for the device that use on-chip IOMMU */ + if (use_va && !(config->dma_map || config->set_map)) + goto err; + err = -ENOMEM; vdev = kzalloc(size, GFP_KERNEL); if (!vdev) @@ -106,6 +112,7 @@ struct vdpa_device *__vdpa_alloc_device(struct device *parent, vdev->index = err; vdev->config = config; vdev->features_valid = false; + vdev->use_va = use_va; if (name) err = dev_set_name(&vdev->dev, "%s", name); diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c index efd0cb3d964d..a43479cf57ea 100644 --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c @@ -250,7 +250,7 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr) ops = &vdpasim_config_ops; vdpasim = vdpa_alloc_device(struct vdpasim, vdpa, NULL, ops, - dev_attr->name); + dev_attr->name, false); if (!vdpasim) goto err_alloc; diff --git a/drivers/vdpa/virtio_pci/vp_vdpa.c b/drivers/vdpa/virtio_pci/vp_vdpa.c index c76ebb531212..f907f42e83bb 100644 --- a/drivers/vdpa/virtio_pci/vp_vdpa.c +++ b/drivers/vdpa/virtio_pci/vp_vdpa.c @@ -399,7 +399,7 @@ static int vp_vdpa_probe(struct pci_dev *pdev, const struct pci_device_id *id) return ret; vp_vdpa = vdpa_alloc_device(struct vp_vdpa, vdpa, - dev, &vp_vdpa_ops, NULL); + dev, &vp_vdpa_ops, NULL, false); if (vp_vdpa == NULL) { dev_err(dev, "vp_vdpa: Failed to allocate vDPA structure\n"); return -ENOMEM; diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c index c5ec45b920f8..45d1c0955961 100644 --- a/drivers/vhost/vdpa.c +++ b/drivers/vhost/vdpa.c @@ -520,8 +520,28 @@ static void vhost_vdpa_pa_unmap(struct vhost_vdpa *v, u64 start, u64 last) } } +static void vhost_vdpa_va_unmap(struct vhost_vdpa *v, u64 start, u64 last) +{ + struct vhost_dev *dev = &v->vdev; + struct vhost_iotlb *iotlb = dev->iotlb; + struct vhost_iotlb_map *map; + struct vdpa_map_file *map_file; + + while ((map = vhost_iotlb_itree_first(iotlb, start, last)) != NULL) { + map_file = (struct vdpa_map_file *)map->opaque; + fput(map_file->file); + kfree(map_file); + vhost_iotlb_map_free(iotlb, map); + } +} + static void vhost_vdpa_iotlb_unmap(struct vhost_vdpa *v, u64 start, u64 last) { + struct vdpa_device *vdpa = v->vdpa; + + if (vdpa->use_va) + return vhost_vdpa_va_unmap(v, start, last); + return vhost_vdpa_pa_unmap(v, start, last); } @@ -556,21 +576,21 @@ static int perm_to_iommu_flags(u32 perm) return flags | IOMMU_CACHE; } -static int vhost_vdpa_map(struct vhost_vdpa *v, - u64 iova, u64 size, u64 pa, u32 perm) +static int vhost_vdpa_map(struct vhost_vdpa *v, u64 iova, + u64 size, u64 pa, u32 perm, void *opaque) { struct vhost_dev *dev = &v->vdev; struct vdpa_device *vdpa = v->vdpa; const struct vdpa_config_ops *ops = vdpa->config; int r = 0; - r = vhost_iotlb_add_range(dev->iotlb, iova, iova + size - 1, - pa, perm); + r = vhost_iotlb_add_range_ctx(dev->iotlb, iova, iova + size - 1, + pa, perm, opaque); if (r) return r; if (ops->dma_map) { - r = ops->dma_map(vdpa, iova, size, pa, perm, NULL); + r = ops->dma_map(vdpa, iova, size, pa, perm, opaque); } else if (ops->set_map) { if (!v->in_batch) r = ops->set_map(vdpa, dev->iotlb); @@ -578,13 +598,15 @@ static int vhost_vdpa_map(struct vhost_vdpa *v, r = iommu_map(v->domain, iova, pa, size, perm_to_iommu_flags(perm)); } - - if (r) + if (r) { vhost_iotlb_del_range(dev->iotlb, iova, iova + size - 1); - else + return r; + } + + if (!vdpa->use_va) atomic64_add(size >> PAGE_SHIFT, &dev->mm->pinned_vm); - return r; + return 0; } static void vhost_vdpa_unmap(struct vhost_vdpa *v, u64 iova, u64 size) @@ -605,6 +627,56 @@ static void vhost_vdpa_unmap(struct vhost_vdpa *v, u64 iova, u64 size) } } +static int vhost_vdpa_va_map(struct vhost_vdpa *v, + u64 iova, u64 size, u64 uaddr, u32 perm) +{ + struct vhost_dev *dev = &v->vdev; + u64 offset, map_size, map_iova = iova; + struct vdpa_map_file *map_file; + struct vm_area_struct *vma; + int ret; + + mmap_read_lock(dev->mm); + + while (size) { + vma = find_vma(dev->mm, uaddr); + if (!vma) { + ret = -EINVAL; + break; + } + map_size = min(size, vma->vm_end - uaddr); + if (!(vma->vm_file && (vma->vm_flags & VM_SHARED) && + !(vma->vm_flags & (VM_IO | VM_PFNMAP)))) + goto next; + + map_file = kzalloc(sizeof(*map_file), GFP_KERNEL); + if (!map_file) { + ret = -ENOMEM; + break; + } + offset = (vma->vm_pgoff << PAGE_SHIFT) + uaddr - vma->vm_start; + map_file->offset = offset; + map_file->file = get_file(vma->vm_file); + ret = vhost_vdpa_map(v, map_iova, map_size, uaddr, + perm, map_file); + if (ret) { + fput(map_file->file); + kfree(map_file); + break; + } +next: + size -= map_size; + uaddr += map_size; + map_iova += map_size; + } + if (ret) + vhost_vdpa_unmap(v, iova, map_iova - iova); + + mmap_read_unlock(dev->mm); + + return ret; +} + static int vhost_vdpa_pa_map(struct vhost_vdpa *v, u64 iova, u64 size, u64 uaddr, u32 perm) { @@ -671,7 +743,7 @@ static int vhost_vdpa_pa_map(struct vhost_vdpa *v, csize = (last_pfn - map_pfn + 1) << PAGE_SHIFT; ret = vhost_vdpa_map(v, iova, csize, map_pfn << PAGE_SHIFT, - perm); + perm, NULL); if (ret) { /* * Unpin the pages that are left unmapped @@ -700,7 +772,7 @@ static int vhost_vdpa_pa_map(struct vhost_vdpa *v, /* Pin the rest chunk */ ret = vhost_vdpa_map(v, iova, (last_pfn - map_pfn + 1) << PAGE_SHIFT, - map_pfn << PAGE_SHIFT, perm); + map_pfn << PAGE_SHIFT, perm, NULL); out: if (ret) { if (nchunks) { @@ -733,6 +805,7 @@ static int vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v, struct vhost_iotlb_msg *msg) { struct vhost_dev *dev = &v->vdev; + struct vdpa_device *vdpa = v->vdpa; struct vhost_iotlb *iotlb = dev->iotlb; if (msg->iova < v->range.first || @@ -743,6 +816,10 @@ static int vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v, msg->iova + msg->size - 1)) return -EEXIST; + if (vdpa->use_va) + return vhost_vdpa_va_map(v, msg->iova, msg->size, + msg->uaddr, msg->perm); + return vhost_vdpa_pa_map(v, msg->iova, msg->size, msg->uaddr, msg->perm); } diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h index 281f768cb597..85124d197c55 100644 --- a/include/linux/vdpa.h +++ b/include/linux/vdpa.h @@ -44,6 +44,7 @@ struct vdpa_mgmt_dev; * @config: the configuration ops for this device. * @index: device index * @features_valid: were features initialized? for legacy guests + * @use_va: indicate whether virtual address must be used by this device * @nvqs: maximum number of supported virtqueues * @mdev: management device pointer; caller must setup when registering device as part * of dev_add() mgmtdev ops callback before invoking _vdpa_register_device(). @@ -54,6 +55,7 @@ struct vdpa_device { const struct vdpa_config_ops *config; unsigned int index; bool features_valid; + bool use_va; int nvqs; struct vdpa_mgmt_dev *mdev; }; @@ -69,6 +71,16 @@ struct vdpa_iova_range { }; /** + * Corresponding file area for device memory mapping + * @file: vma->vm_file for the mapping + * @offset: mapping offset in the vm_file + */ +struct vdpa_map_file { + struct file *file; + u64 offset; +}; + +/** * struct vdpa_config_ops - operations for configuring a vDPA device. * Note: vDPA device drivers are required to implement all of the * operations unless it is mentioned to be optional in the following @@ -254,14 +266,15 @@ struct vdpa_config_ops { struct vdpa_device *__vdpa_alloc_device(struct device *parent, const struct vdpa_config_ops *config, - size_t size, const char *name); + size_t size, const char *name, + bool use_va); -#define vdpa_alloc_device(dev_struct, member, parent, config, name) \ +#define vdpa_alloc_device(dev_struct, member, parent, config, name, use_va) \ container_of(__vdpa_alloc_device( \ parent, config, \ sizeof(dev_struct) + \ BUILD_BUG_ON_ZERO(offsetof( \ - dev_struct, member)), name), \ + dev_struct, member)), name, use_va), \ dev_struct, member) int vdpa_register_device(struct vdpa_device *vdev, int nvqs); From patchwork Tue Jun 15 14:13:30 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yongji Xie X-Patchwork-Id: 460900 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-14.0 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,UNWANTED_LANGUAGE_BODY, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3CD6CC48BE8 for ; Tue, 15 Jun 2021 14:19:23 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0495961078 for ; Tue, 15 Jun 2021 14:19:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231960AbhFOOVZ (ORCPT ); Tue, 15 Jun 2021 10:21:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36736 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231830AbhFOOTn (ORCPT ); Tue, 15 Jun 2021 10:19:43 -0400 Received: from mail-pj1-x1030.google.com (mail-pj1-x1030.google.com [IPv6:2607:f8b0:4864:20::1030]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 35F05C061A30 for ; Tue, 15 Jun 2021 07:14:37 -0700 (PDT) Received: by mail-pj1-x1030.google.com with SMTP id ei4so11863061pjb.3 for ; Tue, 15 Jun 2021 07:14:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=x0tUiZ8vKVxaV50NYGwSceHaq6sWnPXoNnzurgTza4Y=; b=oU4bSEqtJeNnT4rUL+xj4Pgpcn27lY6oRFo6pyPJ6qwnT1c2hBtaAeKcr//3WaqvLz LvFvU1m8Bix2nxDWrC2YMEhDpNjt2YqHRscOO6hOj0pi4QVGXAeHVZ6n+xF1AEn5/DcU xmr+fFTX0vXGCSrR9MYDoQlG1MBXaDkZhscphU5BauP5WhRSHXOPyTpd3p/aCX+sWsi8 7NQtK/bYcf4QZJLcMbYnH+fFR1x5z8Mw9xenrkk3YqWRLv1D2Scv39bydhPQ8a68J24i Wx7NCs/9Qs/fQbfPvldnFRdcKPLXAHSsdFxmnBi/nJFiWroL8CFqYuoAokuKtzUJHgsr 3p8Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=x0tUiZ8vKVxaV50NYGwSceHaq6sWnPXoNnzurgTza4Y=; b=kkwshgu/2Yy8chLTR6Oo0t4fR3l/LwrdNRKadVpUKX3ySSRMRZGo7uVRKcmYSipB68 kyr9aIiRM3lFQ/AEIwTB+oQrwaMtfuuOYcs1/UrVObqX6I0e3faOnhF7eufe50E/1enp k375kB323XXkwu9SFYlI0E4pBEnMswQtePPWnVHwmZ6e3UQTGRTJ5LeckDa5uWX1PkRL yHV+1/JKPfw6DbtUYzUN8yGYpFHxCE5gBrvThcssoGS7WVB2y7SIb4DZFzikayMpx1oO l5QDdPIxDJE6Asftn/mZkao+kjMx3bUdsQh21M0EaxDT8wnwXG8vMWF4FShs2AaYNw9c 7o0Q== X-Gm-Message-State: AOAM5327DgTI26uW5bpIab7Mph6Hmg7Jj8huGv/eHw3kkoyhNWgM6Rfp V+b7fK0BJLO7jqYg6B75+pS5 X-Google-Smtp-Source: ABdhPJyn2tUKd6RERG9RWn0PfxZJJDGKtkvlN1PqTPZ8PdrPlXoE1gSjJMFaRbuxh5l4tVNcnnpamw== X-Received: by 2002:a17:90a:de15:: with SMTP id m21mr25266622pjv.87.1623766476446; Tue, 15 Jun 2021 07:14:36 -0700 (PDT) Received: from localhost ([139.177.225.241]) by smtp.gmail.com with ESMTPSA id j24sm15482243pfe.58.2021.06.15.07.14.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 15 Jun 2021 07:14:35 -0700 (PDT) From: Xie Yongji To: mst@redhat.com, jasowang@redhat.com, stefanha@redhat.com, sgarzare@redhat.com, parav@nvidia.com, hch@infradead.org, christian.brauner@canonical.com, rdunlap@infradead.org, willy@infradead.org, viro@zeniv.linux.org.uk, axboe@kernel.dk, bcrl@kvack.org, corbet@lwn.net, mika.penttila@nextfour.com, dan.carpenter@oracle.com, joro@8bytes.org, gregkh@linuxfoundation.org Cc: songmuchun@bytedance.com, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, kvm@vger.kernel.org, linux-fsdevel@vger.kernel.org, iommu@lists.linux-foundation.org, linux-kernel@vger.kernel.org Subject: [PATCH v8 09/10] vduse: Introduce VDUSE - vDPA Device in Userspace Date: Tue, 15 Jun 2021 22:13:30 +0800 Message-Id: <20210615141331.407-10-xieyongji@bytedance.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210615141331.407-1-xieyongji@bytedance.com> References: <20210615141331.407-1-xieyongji@bytedance.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This VDUSE driver enables implementing vDPA devices in userspace. The vDPA device's control path is handled in kernel and the data path is handled in userspace. A message mechnism is used by VDUSE driver to forward some control messages such as starting/stopping datapath to userspace. Userspace can use read()/write() to receive/reply those control messages. And some ioctls are introduced to help userspace to implement the data path. VDUSE_IOTLB_GET_FD ioctl can be used to get the file descriptors referring to vDPA device's iova regions. Then userspace can use mmap() to access those iova regions. VDUSE_DEV_GET_FEATURES and VDUSE_VQ_GET_INFO ioctls are used to get the negotiated features and metadata of virtqueues. VDUSE_INJECT_VQ_IRQ and VDUSE_VQ_SETUP_KICKFD ioctls can be used to inject interrupt and setup the kickfd for virtqueues. VDUSE_DEV_UPDATE_CONFIG ioctl is used to update the configuration space and inject a config interrupt. Signed-off-by: Xie Yongji --- Documentation/userspace-api/ioctl/ioctl-number.rst | 1 + drivers/vdpa/Kconfig | 10 + drivers/vdpa/Makefile | 1 + drivers/vdpa/vdpa_user/Makefile | 5 + drivers/vdpa/vdpa_user/vduse_dev.c | 1453 ++++++++++++++++++++ include/uapi/linux/vduse.h | 143 ++ 6 files changed, 1613 insertions(+) create mode 100644 drivers/vdpa/vdpa_user/Makefile create mode 100644 drivers/vdpa/vdpa_user/vduse_dev.c create mode 100644 include/uapi/linux/vduse.h diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst b/Documentation/userspace-api/ioctl/ioctl-number.rst index 9bfc2b510c64..acd95e9dcfe7 100644 --- a/Documentation/userspace-api/ioctl/ioctl-number.rst +++ b/Documentation/userspace-api/ioctl/ioctl-number.rst @@ -300,6 +300,7 @@ Code Seq# Include File Comments 'z' 10-4F drivers/s390/crypto/zcrypt_api.h conflict! '|' 00-7F linux/media.h 0x80 00-1F linux/fb.h +0x81 00-1F linux/vduse.h 0x89 00-06 arch/x86/include/asm/sockios.h 0x89 0B-DF linux/sockios.h 0x89 E0-EF linux/sockios.h SIOCPROTOPRIVATE range diff --git a/drivers/vdpa/Kconfig b/drivers/vdpa/Kconfig index a503c1b2bfd9..6e23bce6433a 100644 --- a/drivers/vdpa/Kconfig +++ b/drivers/vdpa/Kconfig @@ -33,6 +33,16 @@ config VDPA_SIM_BLOCK vDPA block device simulator which terminates IO request in a memory buffer. +config VDPA_USER + tristate "VDUSE (vDPA Device in Userspace) support" + depends on EVENTFD && MMU && HAS_DMA + select DMA_OPS + select VHOST_IOTLB + select IOMMU_IOVA + help + With VDUSE it is possible to emulate a vDPA Device + in a userspace program. + config IFCVF tristate "Intel IFC VF vDPA driver" depends on PCI_MSI diff --git a/drivers/vdpa/Makefile b/drivers/vdpa/Makefile index 67fe7f3d6943..f02ebed33f19 100644 --- a/drivers/vdpa/Makefile +++ b/drivers/vdpa/Makefile @@ -1,6 +1,7 @@ # SPDX-License-Identifier: GPL-2.0 obj-$(CONFIG_VDPA) += vdpa.o obj-$(CONFIG_VDPA_SIM) += vdpa_sim/ +obj-$(CONFIG_VDPA_USER) += vdpa_user/ obj-$(CONFIG_IFCVF) += ifcvf/ obj-$(CONFIG_MLX5_VDPA) += mlx5/ obj-$(CONFIG_VP_VDPA) += virtio_pci/ diff --git a/drivers/vdpa/vdpa_user/Makefile b/drivers/vdpa/vdpa_user/Makefile new file mode 100644 index 000000000000..260e0b26af99 --- /dev/null +++ b/drivers/vdpa/vdpa_user/Makefile @@ -0,0 +1,5 @@ +# SPDX-License-Identifier: GPL-2.0 + +vduse-y := vduse_dev.o iova_domain.o + +obj-$(CONFIG_VDPA_USER) += vduse.o diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c new file mode 100644 index 000000000000..5271cbd15e28 --- /dev/null +++ b/drivers/vdpa/vdpa_user/vduse_dev.c @@ -0,0 +1,1453 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * VDUSE: vDPA Device in Userspace + * + * Copyright (C) 2020-2021 Bytedance Inc. and/or its affiliates. All rights reserved. + * + * Author: Xie Yongji + * + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "iova_domain.h" + +#define DRV_AUTHOR "Yongji Xie " +#define DRV_DESC "vDPA Device in Userspace" +#define DRV_LICENSE "GPL v2" + +#define VDUSE_DEV_MAX (1U << MINORBITS) +#define VDUSE_MAX_BOUNCE_SIZE (64 * 1024 * 1024) +#define VDUSE_IOVA_SIZE (128 * 1024 * 1024) +#define VDUSE_REQUEST_TIMEOUT 30 + +struct vduse_virtqueue { + u16 index; + u32 num; + u32 avail_idx; + u64 desc_addr; + u64 driver_addr; + u64 device_addr; + bool ready; + bool kicked; + spinlock_t kick_lock; + spinlock_t irq_lock; + struct eventfd_ctx *kickfd; + struct vdpa_callback cb; + struct work_struct inject; +}; + +struct vduse_dev; + +struct vduse_vdpa { + struct vdpa_device vdpa; + struct vduse_dev *dev; +}; + +struct vduse_dev { + struct vduse_vdpa *vdev; + struct device *dev; + struct vduse_virtqueue *vqs; + struct vduse_iova_domain *domain; + char *name; + struct mutex lock; + spinlock_t msg_lock; + u64 msg_unique; + wait_queue_head_t waitq; + struct list_head send_list; + struct list_head recv_list; + struct vdpa_callback config_cb; + struct work_struct inject; + spinlock_t irq_lock; + int minor; + bool connected; + bool started; + u64 api_version; + u64 user_features; + u64 features; + u32 device_id; + u32 vendor_id; + u32 generation; + u32 config_size; + void *config; + u8 status; + u16 vq_size_max; + u32 vq_num; + u32 vq_align; +}; + +struct vduse_dev_msg { + struct vduse_dev_request req; + struct vduse_dev_response resp; + struct list_head list; + wait_queue_head_t waitq; + bool completed; +}; + +struct vduse_control { + u64 api_version; +}; + +static DEFINE_MUTEX(vduse_lock); +static DEFINE_IDR(vduse_idr); + +static dev_t vduse_major; +static struct class *vduse_class; +static struct cdev vduse_ctrl_cdev; +static struct cdev vduse_cdev; +static struct workqueue_struct *vduse_irq_wq; + +static u32 allowed_device_id[] = { + VIRTIO_ID_BLOCK, +}; + +static inline struct vduse_dev *vdpa_to_vduse(struct vdpa_device *vdpa) +{ + struct vduse_vdpa *vdev = container_of(vdpa, struct vduse_vdpa, vdpa); + + return vdev->dev; +} + +static inline struct vduse_dev *dev_to_vduse(struct device *dev) +{ + struct vdpa_device *vdpa = dev_to_vdpa(dev); + + return vdpa_to_vduse(vdpa); +} + +static struct vduse_dev_msg *vduse_find_msg(struct list_head *head, + uint32_t request_id) +{ + struct vduse_dev_msg *msg; + + list_for_each_entry(msg, head, list) { + if (msg->req.request_id == request_id) { + list_del(&msg->list); + return msg; + } + } + + return NULL; +} + +static struct vduse_dev_msg *vduse_dequeue_msg(struct list_head *head) +{ + struct vduse_dev_msg *msg = NULL; + + if (!list_empty(head)) { + msg = list_first_entry(head, struct vduse_dev_msg, list); + list_del(&msg->list); + } + + return msg; +} + +static void vduse_enqueue_msg(struct list_head *head, + struct vduse_dev_msg *msg) +{ + list_add_tail(&msg->list, head); +} + +static int vduse_dev_msg_send(struct vduse_dev *dev, + struct vduse_dev_msg *msg, bool no_reply) +{ + init_waitqueue_head(&msg->waitq); + spin_lock(&dev->msg_lock); + msg->req.request_id = dev->msg_unique++; + vduse_enqueue_msg(&dev->send_list, msg); + wake_up(&dev->waitq); + spin_unlock(&dev->msg_lock); + if (no_reply) + return 0; + + wait_event_killable_timeout(msg->waitq, msg->completed, + VDUSE_REQUEST_TIMEOUT * HZ); + spin_lock(&dev->msg_lock); + if (!msg->completed) { + list_del(&msg->list); + msg->resp.result = VDUSE_REQ_RESULT_FAILED; + } + spin_unlock(&dev->msg_lock); + + return (msg->resp.result == VDUSE_REQ_RESULT_OK) ? 0 : -EIO; +} + +static void vduse_dev_msg_cleanup(struct vduse_dev *dev) +{ + struct vduse_dev_msg *msg; + + spin_lock(&dev->msg_lock); + while ((msg = vduse_dequeue_msg(&dev->send_list))) { + if (msg->req.flags & VDUSE_REQ_FLAGS_NO_REPLY) + kfree(msg); + else + vduse_enqueue_msg(&dev->recv_list, msg); + } + while ((msg = vduse_dequeue_msg(&dev->recv_list))) { + msg->resp.result = VDUSE_REQ_RESULT_FAILED; + msg->completed = 1; + wake_up(&msg->waitq); + } + spin_unlock(&dev->msg_lock); +} + +static void vduse_dev_start_dataplane(struct vduse_dev *dev) +{ + struct vduse_dev_msg *msg = kzalloc(sizeof(*msg), + GFP_KERNEL | __GFP_NOFAIL); + + msg->req.type = VDUSE_START_DATAPLANE; + msg->req.flags |= VDUSE_REQ_FLAGS_NO_REPLY; + vduse_dev_msg_send(dev, msg, true); +} + +static void vduse_dev_stop_dataplane(struct vduse_dev *dev) +{ + struct vduse_dev_msg *msg = kzalloc(sizeof(*msg), + GFP_KERNEL | __GFP_NOFAIL); + + msg->req.type = VDUSE_STOP_DATAPLANE; + msg->req.flags |= VDUSE_REQ_FLAGS_NO_REPLY; + vduse_dev_msg_send(dev, msg, true); +} + +static int vduse_dev_get_vq_state(struct vduse_dev *dev, + struct vduse_virtqueue *vq, + struct vdpa_vq_state *state) +{ + struct vduse_dev_msg msg = { 0 }; + int ret; + + msg.req.type = VDUSE_GET_VQ_STATE; + msg.req.vq_state.index = vq->index; + + ret = vduse_dev_msg_send(dev, &msg, false); + if (ret) + return ret; + + state->avail_index = msg.resp.vq_state.avail_idx; + return 0; +} + +static int vduse_dev_update_iotlb(struct vduse_dev *dev, + u64 start, u64 last) +{ + struct vduse_dev_msg msg = { 0 }; + + if (last < start) + return -EINVAL; + + msg.req.type = VDUSE_UPDATE_IOTLB; + msg.req.iova.start = start; + msg.req.iova.last = last; + + return vduse_dev_msg_send(dev, &msg, false); +} + +static ssize_t vduse_dev_read_iter(struct kiocb *iocb, struct iov_iter *to) +{ + struct file *file = iocb->ki_filp; + struct vduse_dev *dev = file->private_data; + struct vduse_dev_msg *msg; + int size = sizeof(struct vduse_dev_request); + ssize_t ret; + + if (iov_iter_count(to) < size) + return -EINVAL; + + spin_lock(&dev->msg_lock); + while (1) { + msg = vduse_dequeue_msg(&dev->send_list); + if (msg) + break; + + ret = -EAGAIN; + if (file->f_flags & O_NONBLOCK) + goto unlock; + + spin_unlock(&dev->msg_lock); + ret = wait_event_interruptible_exclusive(dev->waitq, + !list_empty(&dev->send_list)); + if (ret) + return ret; + + spin_lock(&dev->msg_lock); + } + spin_unlock(&dev->msg_lock); + ret = copy_to_iter(&msg->req, size, to); + spin_lock(&dev->msg_lock); + if (ret != size) { + ret = -EFAULT; + vduse_enqueue_msg(&dev->send_list, msg); + goto unlock; + } + if (msg->req.flags & VDUSE_REQ_FLAGS_NO_REPLY) + kfree(msg); + else + vduse_enqueue_msg(&dev->recv_list, msg); +unlock: + spin_unlock(&dev->msg_lock); + + return ret; +} + +static ssize_t vduse_dev_write_iter(struct kiocb *iocb, struct iov_iter *from) +{ + struct file *file = iocb->ki_filp; + struct vduse_dev *dev = file->private_data; + struct vduse_dev_response resp; + struct vduse_dev_msg *msg; + size_t ret; + + ret = copy_from_iter(&resp, sizeof(resp), from); + if (ret != sizeof(resp)) + return -EINVAL; + + spin_lock(&dev->msg_lock); + msg = vduse_find_msg(&dev->recv_list, resp.request_id); + if (!msg) { + ret = -ENOENT; + goto unlock; + } + + memcpy(&msg->resp, &resp, sizeof(resp)); + msg->completed = 1; + wake_up(&msg->waitq); +unlock: + spin_unlock(&dev->msg_lock); + + return ret; +} + +static __poll_t vduse_dev_poll(struct file *file, poll_table *wait) +{ + struct vduse_dev *dev = file->private_data; + __poll_t mask = 0; + + poll_wait(file, &dev->waitq, wait); + + if (!list_empty(&dev->send_list)) + mask |= EPOLLIN | EPOLLRDNORM; + if (!list_empty(&dev->recv_list)) + mask |= EPOLLOUT | EPOLLWRNORM; + + return mask; +} + +static void vduse_dev_reset(struct vduse_dev *dev) +{ + int i; + struct vduse_iova_domain *domain = dev->domain; + + /* The coherent mappings are handled in vduse_dev_free_coherent() */ + if (domain->bounce_map) + vduse_domain_reset_bounce_map(domain); + + dev->features = 0; + dev->generation++; + spin_lock(&dev->irq_lock); + dev->config_cb.callback = NULL; + dev->config_cb.private = NULL; + spin_unlock(&dev->irq_lock); + + for (i = 0; i < dev->vq_num; i++) { + struct vduse_virtqueue *vq = &dev->vqs[i]; + + vq->ready = false; + vq->desc_addr = 0; + vq->driver_addr = 0; + vq->device_addr = 0; + vq->avail_idx = 0; + vq->num = 0; + + spin_lock(&vq->kick_lock); + vq->kicked = false; + if (vq->kickfd) + eventfd_ctx_put(vq->kickfd); + vq->kickfd = NULL; + spin_unlock(&vq->kick_lock); + + spin_lock(&vq->irq_lock); + vq->cb.callback = NULL; + vq->cb.private = NULL; + spin_unlock(&vq->irq_lock); + } +} + +static int vduse_vdpa_set_vq_address(struct vdpa_device *vdpa, u16 idx, + u64 desc_area, u64 driver_area, + u64 device_area) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + struct vduse_virtqueue *vq = &dev->vqs[idx]; + + vq->desc_addr = desc_area; + vq->driver_addr = driver_area; + vq->device_addr = device_area; + + return 0; +} + +static void vduse_vdpa_kick_vq(struct vdpa_device *vdpa, u16 idx) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + struct vduse_virtqueue *vq = &dev->vqs[idx]; + + spin_lock(&vq->kick_lock); + if (!vq->ready) + goto unlock; + + if (vq->kickfd) + eventfd_signal(vq->kickfd, 1); + else + vq->kicked = true; +unlock: + spin_unlock(&vq->kick_lock); +} + +static void vduse_vdpa_set_vq_cb(struct vdpa_device *vdpa, u16 idx, + struct vdpa_callback *cb) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + struct vduse_virtqueue *vq = &dev->vqs[idx]; + + spin_lock(&vq->irq_lock); + vq->cb.callback = cb->callback; + vq->cb.private = cb->private; + spin_unlock(&vq->irq_lock); +} + +static void vduse_vdpa_set_vq_num(struct vdpa_device *vdpa, u16 idx, u32 num) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + struct vduse_virtqueue *vq = &dev->vqs[idx]; + + vq->num = num; +} + +static void vduse_vdpa_set_vq_ready(struct vdpa_device *vdpa, + u16 idx, bool ready) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + struct vduse_virtqueue *vq = &dev->vqs[idx]; + + vq->ready = ready; +} + +static bool vduse_vdpa_get_vq_ready(struct vdpa_device *vdpa, u16 idx) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + struct vduse_virtqueue *vq = &dev->vqs[idx]; + + return vq->ready; +} + +static int vduse_vdpa_set_vq_state(struct vdpa_device *vdpa, u16 idx, + const struct vdpa_vq_state *state) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + struct vduse_virtqueue *vq = &dev->vqs[idx]; + + vq->avail_idx = state->avail_index; + return 0; +} + +static int vduse_vdpa_get_vq_state(struct vdpa_device *vdpa, u16 idx, + struct vdpa_vq_state *state) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + struct vduse_virtqueue *vq = &dev->vqs[idx]; + + return vduse_dev_get_vq_state(dev, vq, state); +} + +static u32 vduse_vdpa_get_vq_align(struct vdpa_device *vdpa) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + + return dev->vq_align; +} + +static u64 vduse_vdpa_get_features(struct vdpa_device *vdpa) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + + return dev->user_features; +} + +static int vduse_vdpa_set_features(struct vdpa_device *vdpa, u64 features) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + + dev->features = features; + return 0; +} + +static void vduse_vdpa_set_config_cb(struct vdpa_device *vdpa, + struct vdpa_callback *cb) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + + spin_lock(&dev->irq_lock); + dev->config_cb.callback = cb->callback; + dev->config_cb.private = cb->private; + spin_unlock(&dev->irq_lock); +} + +static u16 vduse_vdpa_get_vq_num_max(struct vdpa_device *vdpa) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + + return dev->vq_size_max; +} + +static u32 vduse_vdpa_get_device_id(struct vdpa_device *vdpa) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + + return dev->device_id; +} + +static u32 vduse_vdpa_get_vendor_id(struct vdpa_device *vdpa) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + + return dev->vendor_id; +} + +static u8 vduse_vdpa_get_status(struct vdpa_device *vdpa) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + + return dev->status; +} + +static void vduse_vdpa_set_status(struct vdpa_device *vdpa, u8 status) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + bool started = !!(status & VIRTIO_CONFIG_S_DRIVER_OK); + + dev->status = status; + + if (dev->started == started) + return; + + dev->started = started; + if (dev->started) { + vduse_dev_start_dataplane(dev); + } else { + vduse_dev_reset(dev); + vduse_dev_stop_dataplane(dev); + } +} + +static size_t vduse_vdpa_get_config_size(struct vdpa_device *vdpa) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + + return dev->config_size; +} + +static void vduse_vdpa_get_config(struct vdpa_device *vdpa, unsigned int offset, + void *buf, unsigned int len) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + + memcpy(buf, dev->config + offset, len); +} + +static void vduse_vdpa_set_config(struct vdpa_device *vdpa, unsigned int offset, + const void *buf, unsigned int len) +{ + /* Now we only support read-only configuration space */ +} + +static u32 vduse_vdpa_get_generation(struct vdpa_device *vdpa) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + + return dev->generation; +} + +static int vduse_vdpa_set_map(struct vdpa_device *vdpa, + struct vhost_iotlb *iotlb) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + int ret; + + ret = vduse_domain_set_map(dev->domain, iotlb); + if (ret) + return ret; + + ret = vduse_dev_update_iotlb(dev, 0ULL, ULLONG_MAX); + if (ret) { + vduse_domain_clear_map(dev->domain, iotlb); + return ret; + } + + return 0; +} + +static void vduse_vdpa_free(struct vdpa_device *vdpa) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + + dev->vdev = NULL; +} + +static const struct vdpa_config_ops vduse_vdpa_config_ops = { + .set_vq_address = vduse_vdpa_set_vq_address, + .kick_vq = vduse_vdpa_kick_vq, + .set_vq_cb = vduse_vdpa_set_vq_cb, + .set_vq_num = vduse_vdpa_set_vq_num, + .set_vq_ready = vduse_vdpa_set_vq_ready, + .get_vq_ready = vduse_vdpa_get_vq_ready, + .set_vq_state = vduse_vdpa_set_vq_state, + .get_vq_state = vduse_vdpa_get_vq_state, + .get_vq_align = vduse_vdpa_get_vq_align, + .get_features = vduse_vdpa_get_features, + .set_features = vduse_vdpa_set_features, + .set_config_cb = vduse_vdpa_set_config_cb, + .get_vq_num_max = vduse_vdpa_get_vq_num_max, + .get_device_id = vduse_vdpa_get_device_id, + .get_vendor_id = vduse_vdpa_get_vendor_id, + .get_status = vduse_vdpa_get_status, + .set_status = vduse_vdpa_set_status, + .get_config_size = vduse_vdpa_get_config_size, + .get_config = vduse_vdpa_get_config, + .set_config = vduse_vdpa_set_config, + .get_generation = vduse_vdpa_get_generation, + .set_map = vduse_vdpa_set_map, + .free = vduse_vdpa_free, +}; + +static dma_addr_t vduse_dev_map_page(struct device *dev, struct page *page, + unsigned long offset, size_t size, + enum dma_data_direction dir, + unsigned long attrs) +{ + struct vduse_dev *vdev = dev_to_vduse(dev); + struct vduse_iova_domain *domain = vdev->domain; + + return vduse_domain_map_page(domain, page, offset, size, dir, attrs); +} + +static void vduse_dev_unmap_page(struct device *dev, dma_addr_t dma_addr, + size_t size, enum dma_data_direction dir, + unsigned long attrs) +{ + struct vduse_dev *vdev = dev_to_vduse(dev); + struct vduse_iova_domain *domain = vdev->domain; + + return vduse_domain_unmap_page(domain, dma_addr, size, dir, attrs); +} + +static void *vduse_dev_alloc_coherent(struct device *dev, size_t size, + dma_addr_t *dma_addr, gfp_t flag, + unsigned long attrs) +{ + struct vduse_dev *vdev = dev_to_vduse(dev); + struct vduse_iova_domain *domain = vdev->domain; + unsigned long iova; + void *addr; + + *dma_addr = DMA_MAPPING_ERROR; + addr = vduse_domain_alloc_coherent(domain, size, + (dma_addr_t *)&iova, flag, attrs); + if (!addr) + return NULL; + + *dma_addr = (dma_addr_t)iova; + + return addr; +} + +static void vduse_dev_free_coherent(struct device *dev, size_t size, + void *vaddr, dma_addr_t dma_addr, + unsigned long attrs) +{ + struct vduse_dev *vdev = dev_to_vduse(dev); + struct vduse_iova_domain *domain = vdev->domain; + + vduse_domain_free_coherent(domain, size, vaddr, dma_addr, attrs); +} + +static size_t vduse_dev_max_mapping_size(struct device *dev) +{ + struct vduse_dev *vdev = dev_to_vduse(dev); + struct vduse_iova_domain *domain = vdev->domain; + + return domain->bounce_size; +} + +static const struct dma_map_ops vduse_dev_dma_ops = { + .map_page = vduse_dev_map_page, + .unmap_page = vduse_dev_unmap_page, + .alloc = vduse_dev_alloc_coherent, + .free = vduse_dev_free_coherent, + .max_mapping_size = vduse_dev_max_mapping_size, +}; + +static unsigned int perm_to_file_flags(u8 perm) +{ + unsigned int flags = 0; + + switch (perm) { + case VDUSE_ACCESS_WO: + flags |= O_WRONLY; + break; + case VDUSE_ACCESS_RO: + flags |= O_RDONLY; + break; + case VDUSE_ACCESS_RW: + flags |= O_RDWR; + break; + default: + WARN(1, "invalidate vhost IOTLB permission\n"); + break; + } + + return flags; +} + +static int vduse_kickfd_setup(struct vduse_dev *dev, + struct vduse_vq_eventfd *eventfd) +{ + struct eventfd_ctx *ctx = NULL; + struct vduse_virtqueue *vq; + u32 index; + + if (eventfd->index >= dev->vq_num) + return -EINVAL; + + index = array_index_nospec(eventfd->index, dev->vq_num); + vq = &dev->vqs[index]; + if (eventfd->fd >= 0) { + ctx = eventfd_ctx_fdget(eventfd->fd); + if (IS_ERR(ctx)) + return PTR_ERR(ctx); + } else if (eventfd->fd != VDUSE_EVENTFD_DEASSIGN) + return 0; + + spin_lock(&vq->kick_lock); + if (vq->kickfd) + eventfd_ctx_put(vq->kickfd); + vq->kickfd = ctx; + if (vq->ready && vq->kicked && vq->kickfd) { + eventfd_signal(vq->kickfd, 1); + vq->kicked = false; + } + spin_unlock(&vq->kick_lock); + + return 0; +} + +static void vduse_dev_irq_inject(struct work_struct *work) +{ + struct vduse_dev *dev = container_of(work, struct vduse_dev, inject); + + spin_lock_irq(&dev->irq_lock); + if (dev->config_cb.callback) + dev->config_cb.callback(dev->config_cb.private); + spin_unlock_irq(&dev->irq_lock); +} + +static void vduse_vq_irq_inject(struct work_struct *work) +{ + struct vduse_virtqueue *vq = container_of(work, + struct vduse_virtqueue, inject); + + spin_lock_irq(&vq->irq_lock); + if (vq->ready && vq->cb.callback) + vq->cb.callback(vq->cb.private); + spin_unlock_irq(&vq->irq_lock); +} + +static long vduse_dev_ioctl(struct file *file, unsigned int cmd, + unsigned long arg) +{ + struct vduse_dev *dev = file->private_data; + void __user *argp = (void __user *)arg; + int ret; + + switch (cmd) { + case VDUSE_IOTLB_GET_FD: { + struct vduse_iotlb_entry entry; + struct vhost_iotlb_map *map; + struct vdpa_map_file *map_file; + struct vduse_iova_domain *domain = dev->domain; + struct file *f = NULL; + + ret = -EFAULT; + if (copy_from_user(&entry, argp, sizeof(entry))) + break; + + ret = -EINVAL; + if (entry.start > entry.last) + break; + + spin_lock(&domain->iotlb_lock); + map = vhost_iotlb_itree_first(domain->iotlb, + entry.start, entry.last); + if (map) { + map_file = (struct vdpa_map_file *)map->opaque; + f = get_file(map_file->file); + entry.offset = map_file->offset; + entry.start = map->start; + entry.last = map->last; + entry.perm = map->perm; + } + spin_unlock(&domain->iotlb_lock); + ret = -EINVAL; + if (!f) + break; + + ret = -EFAULT; + if (copy_to_user(argp, &entry, sizeof(entry))) { + fput(f); + break; + } + ret = receive_fd(f, perm_to_file_flags(entry.perm)); + fput(f); + break; + } + case VDUSE_DEV_GET_FEATURES: + ret = put_user(dev->features, (u64 __user *)argp); + break; + case VDUSE_DEV_UPDATE_CONFIG: { + struct vduse_config_update config; + unsigned long size = offsetof(struct vduse_config_update, + buffer); + + ret = -EFAULT; + if (copy_from_user(&config, argp, size)) + break; + + ret = -EINVAL; + if (config.length == 0 || + config.length > dev->config_size - config.offset) + break; + + ret = -EFAULT; + if (copy_from_user(dev->config + config.offset, argp + size, + config.length)) + break; + + ret = 0; + queue_work(vduse_irq_wq, &dev->inject); + break; + } + case VDUSE_VQ_GET_INFO: { + struct vduse_vq_info vq_info; + u32 vq_index; + + ret = -EFAULT; + if (copy_from_user(&vq_info, argp, sizeof(vq_info))) + break; + + ret = -EINVAL; + if (vq_info.index >= dev->vq_num) + break; + + vq_index = array_index_nospec(vq_info.index, dev->vq_num); + vq_info.desc_addr = dev->vqs[vq_index].desc_addr; + vq_info.driver_addr = dev->vqs[vq_index].driver_addr; + vq_info.device_addr = dev->vqs[vq_index].device_addr; + vq_info.num = dev->vqs[vq_index].num; + vq_info.avail_idx = dev->vqs[vq_index].avail_idx; + vq_info.ready = dev->vqs[vq_index].ready; + + ret = -EFAULT; + if (copy_to_user(argp, &vq_info, sizeof(vq_info))) + break; + + ret = 0; + break; + } + case VDUSE_VQ_SETUP_KICKFD: { + struct vduse_vq_eventfd eventfd; + + ret = -EFAULT; + if (copy_from_user(&eventfd, argp, sizeof(eventfd))) + break; + + ret = vduse_kickfd_setup(dev, &eventfd); + break; + } + case VDUSE_VQ_INJECT_IRQ: { + u32 vq_index; + + ret = -EFAULT; + if (get_user(vq_index, (u32 __user *)argp)) + break; + + ret = -EINVAL; + if (vq_index >= dev->vq_num) + break; + + ret = 0; + vq_index = array_index_nospec(vq_index, dev->vq_num); + queue_work(vduse_irq_wq, &dev->vqs[vq_index].inject); + break; + } + default: + ret = -ENOIOCTLCMD; + break; + } + + return ret; +} + +static int vduse_dev_release(struct inode *inode, struct file *file) +{ + struct vduse_dev *dev = file->private_data; + + spin_lock(&dev->msg_lock); + /* Make sure the inflight messages can processed after reconncection */ + list_splice_init(&dev->recv_list, &dev->send_list); + spin_unlock(&dev->msg_lock); + dev->connected = false; + + return 0; +} + +static struct vduse_dev *vduse_dev_get_from_minor(int minor) +{ + struct vduse_dev *dev; + + mutex_lock(&vduse_lock); + dev = idr_find(&vduse_idr, minor); + mutex_unlock(&vduse_lock); + + return dev; +} + +static int vduse_dev_open(struct inode *inode, struct file *file) +{ + int ret; + struct vduse_dev *dev = vduse_dev_get_from_minor(iminor(inode)); + + if (!dev) + return -ENODEV; + + ret = -EBUSY; + mutex_lock(&dev->lock); + if (dev->connected) + goto unlock; + + ret = 0; + dev->connected = true; + file->private_data = dev; +unlock: + mutex_unlock(&dev->lock); + + return ret; +} + +static const struct file_operations vduse_dev_fops = { + .owner = THIS_MODULE, + .open = vduse_dev_open, + .release = vduse_dev_release, + .read_iter = vduse_dev_read_iter, + .write_iter = vduse_dev_write_iter, + .poll = vduse_dev_poll, + .unlocked_ioctl = vduse_dev_ioctl, + .compat_ioctl = compat_ptr_ioctl, + .llseek = noop_llseek, +}; + +static struct vduse_dev *vduse_dev_create(void) +{ + struct vduse_dev *dev = kzalloc(sizeof(*dev), GFP_KERNEL); + + if (!dev) + return NULL; + + mutex_init(&dev->lock); + spin_lock_init(&dev->msg_lock); + INIT_LIST_HEAD(&dev->send_list); + INIT_LIST_HEAD(&dev->recv_list); + spin_lock_init(&dev->irq_lock); + + INIT_WORK(&dev->inject, vduse_dev_irq_inject); + init_waitqueue_head(&dev->waitq); + + return dev; +} + +static void vduse_dev_destroy(struct vduse_dev *dev) +{ + kfree(dev); +} + +static struct vduse_dev *vduse_find_dev(const char *name) +{ + struct vduse_dev *dev; + int id; + + idr_for_each_entry(&vduse_idr, dev, id) + if (!strcmp(dev->name, name)) + return dev; + + return NULL; +} + +static int vduse_destroy_dev(char *name) +{ + struct vduse_dev *dev = vduse_find_dev(name); + + if (!dev) + return -EINVAL; + + mutex_lock(&dev->lock); + if (dev->vdev || dev->connected) { + mutex_unlock(&dev->lock); + return -EBUSY; + } + dev->connected = true; + mutex_unlock(&dev->lock); + + vduse_dev_msg_cleanup(dev); + device_destroy(vduse_class, MKDEV(MAJOR(vduse_major), dev->minor)); + idr_remove(&vduse_idr, dev->minor); + kvfree(dev->config); + kfree(dev->vqs); + vduse_domain_destroy(dev->domain); + kfree(dev->name); + vduse_dev_destroy(dev); + module_put(THIS_MODULE); + + return 0; +} + +static bool device_is_allowed(u32 device_id) +{ + int i; + + for (i = 0; i < ARRAY_SIZE(allowed_device_id); i++) + if (allowed_device_id[i] == device_id) + return true; + + return false; +} + +static bool features_is_valid(u64 features) +{ + if (!(features & (1ULL << VIRTIO_F_ACCESS_PLATFORM))) + return false; + + /* Now we only support read-only configuration space */ + if (features & (1ULL << VIRTIO_BLK_F_CONFIG_WCE)) + return false; + + return true; +} + +static bool vduse_validate_config(struct vduse_dev_config *config) +{ + if (config->bounce_size > VDUSE_MAX_BOUNCE_SIZE) + return false; + + if (config->vq_align > PAGE_SIZE) + return false; + + if (config->config_size > PAGE_SIZE) + return false; + + if (!device_is_allowed(config->device_id)) + return false; + + if (!features_is_valid(config->features)) + return false; + + return true; +} + +static int vduse_create_dev(struct vduse_dev_config *config, + void *config_buf, u64 api_version) +{ + int i, ret; + struct vduse_dev *dev; + + ret = -EEXIST; + if (vduse_find_dev(config->name)) + goto err; + + ret = -ENOMEM; + dev = vduse_dev_create(); + if (!dev) + goto err; + + dev->api_version = api_version; + dev->user_features = config->features; + dev->device_id = config->device_id; + dev->vendor_id = config->vendor_id; + dev->name = kstrdup(config->name, GFP_KERNEL); + if (!dev->name) + goto err_str; + + dev->domain = vduse_domain_create(VDUSE_IOVA_SIZE - 1, + config->bounce_size); + if (!dev->domain) + goto err_domain; + + dev->config = config_buf; + dev->config_size = config->config_size; + dev->vq_align = config->vq_align; + dev->vq_size_max = config->vq_size_max; + dev->vq_num = config->vq_num; + dev->vqs = kcalloc(dev->vq_num, sizeof(*dev->vqs), GFP_KERNEL); + if (!dev->vqs) + goto err_vqs; + + for (i = 0; i < dev->vq_num; i++) { + dev->vqs[i].index = i; + INIT_WORK(&dev->vqs[i].inject, vduse_vq_irq_inject); + spin_lock_init(&dev->vqs[i].kick_lock); + spin_lock_init(&dev->vqs[i].irq_lock); + } + + ret = idr_alloc(&vduse_idr, dev, 1, VDUSE_DEV_MAX, GFP_KERNEL); + if (ret < 0) + goto err_idr; + + dev->minor = ret; + dev->dev = device_create(vduse_class, NULL, + MKDEV(MAJOR(vduse_major), dev->minor), + NULL, "%s", config->name); + if (IS_ERR(dev->dev)) { + ret = PTR_ERR(dev->dev); + goto err_dev; + } + __module_get(THIS_MODULE); + + return 0; +err_dev: + idr_remove(&vduse_idr, dev->minor); +err_idr: + kfree(dev->vqs); +err_vqs: + vduse_domain_destroy(dev->domain); +err_domain: + kfree(dev->name); +err_str: + vduse_dev_destroy(dev); +err: + kvfree(config_buf); + return ret; +} + +static long vduse_ioctl(struct file *file, unsigned int cmd, + unsigned long arg) +{ + int ret; + void __user *argp = (void __user *)arg; + struct vduse_control *control = file->private_data; + + mutex_lock(&vduse_lock); + switch (cmd) { + case VDUSE_GET_API_VERSION: + ret = put_user(control->api_version, (u64 __user *)argp); + break; + case VDUSE_SET_API_VERSION: { + u64 api_version; + + ret = -EFAULT; + if (get_user(api_version, (u64 __user *)argp)) + break; + + ret = -EINVAL; + if (api_version > VDUSE_API_VERSION) + break; + + ret = 0; + control->api_version = api_version; + break; + } + case VDUSE_CREATE_DEV: { + struct vduse_dev_config config; + unsigned long size = offsetof(struct vduse_dev_config, config); + void *buf; + + ret = -EFAULT; + if (copy_from_user(&config, argp, size)) + break; + + ret = -EINVAL; + if (vduse_validate_config(&config) == false) + break; + + buf = vmemdup_user(argp + size, config.config_size); + if (IS_ERR(buf)) { + ret = PTR_ERR(buf); + break; + } + ret = vduse_create_dev(&config, buf, control->api_version); + break; + } + case VDUSE_DESTROY_DEV: { + char name[VDUSE_NAME_MAX]; + + ret = -EFAULT; + if (copy_from_user(name, argp, VDUSE_NAME_MAX)) + break; + + ret = vduse_destroy_dev(name); + break; + } + default: + ret = -EINVAL; + break; + } + mutex_unlock(&vduse_lock); + + return ret; +} + +static int vduse_release(struct inode *inode, struct file *file) +{ + struct vduse_control *control = file->private_data; + + kfree(control); + return 0; +} + +static int vduse_open(struct inode *inode, struct file *file) +{ + struct vduse_control *control; + + control = kmalloc(sizeof(struct vduse_control), GFP_KERNEL); + if (!control) + return -ENOMEM; + + control->api_version = VDUSE_API_VERSION; + file->private_data = control; + + return 0; +} + +static const struct file_operations vduse_ctrl_fops = { + .owner = THIS_MODULE, + .open = vduse_open, + .release = vduse_release, + .unlocked_ioctl = vduse_ioctl, + .compat_ioctl = compat_ptr_ioctl, + .llseek = noop_llseek, +}; + +static char *vduse_devnode(struct device *dev, umode_t *mode) +{ + return kasprintf(GFP_KERNEL, "vduse/%s", dev_name(dev)); +} + +static void vduse_mgmtdev_release(struct device *dev) +{ +} + +static struct device vduse_mgmtdev = { + .init_name = "vduse", + .release = vduse_mgmtdev_release, +}; + +static struct vdpa_mgmt_dev mgmt_dev; + +static int vduse_dev_init_vdpa(struct vduse_dev *dev, const char *name) +{ + struct vduse_vdpa *vdev; + int ret; + + if (dev->vdev) + return -EEXIST; + + vdev = vdpa_alloc_device(struct vduse_vdpa, vdpa, dev->dev, + &vduse_vdpa_config_ops, name, true); + if (!vdev) + return -ENOMEM; + + dev->vdev = vdev; + vdev->dev = dev; + vdev->vdpa.dev.dma_mask = &vdev->vdpa.dev.coherent_dma_mask; + ret = dma_set_mask_and_coherent(&vdev->vdpa.dev, DMA_BIT_MASK(64)); + if (ret) { + put_device(&vdev->vdpa.dev); + return ret; + } + set_dma_ops(&vdev->vdpa.dev, &vduse_dev_dma_ops); + vdev->vdpa.dma_dev = &vdev->vdpa.dev; + vdev->vdpa.mdev = &mgmt_dev; + + return 0; +} + +static int vdpa_dev_add(struct vdpa_mgmt_dev *mdev, const char *name) +{ + struct vduse_dev *dev; + int ret; + + mutex_lock(&vduse_lock); + dev = vduse_find_dev(name); + if (!dev) { + mutex_unlock(&vduse_lock); + return -EINVAL; + } + ret = vduse_dev_init_vdpa(dev, name); + mutex_unlock(&vduse_lock); + if (ret) + return ret; + + ret = _vdpa_register_device(&dev->vdev->vdpa, dev->vq_num); + if (ret) { + put_device(&dev->vdev->vdpa.dev); + return ret; + } + + return 0; +} + +static void vdpa_dev_del(struct vdpa_mgmt_dev *mdev, struct vdpa_device *dev) +{ + _vdpa_unregister_device(dev); +} + +static const struct vdpa_mgmtdev_ops vdpa_dev_mgmtdev_ops = { + .dev_add = vdpa_dev_add, + .dev_del = vdpa_dev_del, +}; + +static struct virtio_device_id id_table[] = { + { VIRTIO_ID_BLOCK, VIRTIO_DEV_ANY_ID }, + { 0 }, +}; + +static struct vdpa_mgmt_dev mgmt_dev = { + .device = &vduse_mgmtdev, + .id_table = id_table, + .ops = &vdpa_dev_mgmtdev_ops, +}; + +static int vduse_mgmtdev_init(void) +{ + int ret; + + ret = device_register(&vduse_mgmtdev); + if (ret) + return ret; + + ret = vdpa_mgmtdev_register(&mgmt_dev); + if (ret) + goto err; + + return 0; +err: + device_unregister(&vduse_mgmtdev); + return ret; +} + +static void vduse_mgmtdev_exit(void) +{ + vdpa_mgmtdev_unregister(&mgmt_dev); + device_unregister(&vduse_mgmtdev); +} + +static int vduse_init(void) +{ + int ret; + struct device *dev; + + vduse_class = class_create(THIS_MODULE, "vduse"); + if (IS_ERR(vduse_class)) + return PTR_ERR(vduse_class); + + vduse_class->devnode = vduse_devnode; + + ret = alloc_chrdev_region(&vduse_major, 0, VDUSE_DEV_MAX, "vduse"); + if (ret) + goto err_chardev_region; + + /* /dev/vduse/control */ + cdev_init(&vduse_ctrl_cdev, &vduse_ctrl_fops); + vduse_ctrl_cdev.owner = THIS_MODULE; + ret = cdev_add(&vduse_ctrl_cdev, vduse_major, 1); + if (ret) + goto err_ctrl_cdev; + + dev = device_create(vduse_class, NULL, vduse_major, NULL, "control"); + if (IS_ERR(dev)) { + ret = PTR_ERR(dev); + goto err_device; + } + + /* /dev/vduse/$DEVICE */ + cdev_init(&vduse_cdev, &vduse_dev_fops); + vduse_cdev.owner = THIS_MODULE; + ret = cdev_add(&vduse_cdev, MKDEV(MAJOR(vduse_major), 1), + VDUSE_DEV_MAX - 1); + if (ret) + goto err_cdev; + + vduse_irq_wq = alloc_workqueue("vduse-irq", + WQ_HIGHPRI | WQ_SYSFS | WQ_UNBOUND, 0); + if (!vduse_irq_wq) + goto err_wq; + + ret = vduse_domain_init(); + if (ret) + goto err_domain; + + ret = vduse_mgmtdev_init(); + if (ret) + goto err_mgmtdev; + + return 0; +err_mgmtdev: + vduse_domain_exit(); +err_domain: + destroy_workqueue(vduse_irq_wq); +err_wq: + cdev_del(&vduse_cdev); +err_cdev: + device_destroy(vduse_class, vduse_major); +err_device: + cdev_del(&vduse_ctrl_cdev); +err_ctrl_cdev: + unregister_chrdev_region(vduse_major, VDUSE_DEV_MAX); +err_chardev_region: + class_destroy(vduse_class); + return ret; +} +module_init(vduse_init); + +static void vduse_exit(void) +{ + vduse_mgmtdev_exit(); + vduse_domain_exit(); + destroy_workqueue(vduse_irq_wq); + cdev_del(&vduse_cdev); + device_destroy(vduse_class, vduse_major); + cdev_del(&vduse_ctrl_cdev); + unregister_chrdev_region(vduse_major, VDUSE_DEV_MAX); + class_destroy(vduse_class); +} +module_exit(vduse_exit); + +MODULE_LICENSE(DRV_LICENSE); +MODULE_AUTHOR(DRV_AUTHOR); +MODULE_DESCRIPTION(DRV_DESC); diff --git a/include/uapi/linux/vduse.h b/include/uapi/linux/vduse.h new file mode 100644 index 000000000000..f21b2e51b5c8 --- /dev/null +++ b/include/uapi/linux/vduse.h @@ -0,0 +1,143 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +#ifndef _UAPI_VDUSE_H_ +#define _UAPI_VDUSE_H_ + +#include + +#define VDUSE_API_VERSION 0 + +#define VDUSE_NAME_MAX 256 + +/* the control messages definition for read/write */ + +enum vduse_req_type { + /* Get the state for virtqueue from userspace */ + VDUSE_GET_VQ_STATE, + /* Notify userspace to start the dataplane, no reply */ + VDUSE_START_DATAPLANE, + /* Notify userspace to stop the dataplane, no reply */ + VDUSE_STOP_DATAPLANE, + /* Notify userspace to update the memory mapping in device IOTLB */ + VDUSE_UPDATE_IOTLB, +}; + +struct vduse_vq_state { + __u32 index; /* virtqueue index */ + __u32 avail_idx; /* virtqueue state (last_avail_idx) */ +}; + +struct vduse_iova_range { + __u64 start; /* start of the IOVA range */ + __u64 last; /* end of the IOVA range */ +}; + +struct vduse_dev_request { + __u32 type; /* request type */ + __u32 request_id; /* request id */ +#define VDUSE_REQ_FLAGS_NO_REPLY (1 << 0) /* No need to reply */ + __u32 flags; /* request flags */ + __u32 reserved; /* for future use */ + union { + struct vduse_vq_state vq_state; /* virtqueue state */ + struct vduse_iova_range iova; /* iova range for updating */ + __u32 padding[16]; /* padding */ + }; +}; + +struct vduse_dev_response { + __u32 request_id; /* corresponding request id */ +#define VDUSE_REQ_RESULT_OK 0x00 +#define VDUSE_REQ_RESULT_FAILED 0x01 + __u32 result; /* the result of request */ + __u32 reserved[2]; /* for future use */ + union { + struct vduse_vq_state vq_state; /* virtqueue state */ + __u32 padding[16]; /* padding */ + }; +}; + +/* ioctls */ + +struct vduse_dev_config { + char name[VDUSE_NAME_MAX]; /* vduse device name */ + __u32 vendor_id; /* virtio vendor id */ + __u32 device_id; /* virtio device id */ + __u64 features; /* device features */ + __u64 bounce_size; /* bounce buffer size for iommu */ + __u16 vq_size_max; /* the max size of virtqueue */ + __u16 padding; /* padding */ + __u32 vq_num; /* the number of virtqueues */ + __u32 vq_align; /* the allocation alignment of virtqueue's metadata */ + __u32 config_size; /* the size of the configuration space */ + __u32 reserved[15]; /* for future use */ + __u8 config[0]; /* the buffer of the configuration space */ +}; + +struct vduse_iotlb_entry { + __u64 offset; /* the mmap offset on fd */ + __u64 start; /* start of the IOVA range */ + __u64 last; /* last of the IOVA range */ +#define VDUSE_ACCESS_RO 0x1 +#define VDUSE_ACCESS_WO 0x2 +#define VDUSE_ACCESS_RW 0x3 + __u8 perm; /* access permission of this range */ +}; + +struct vduse_config_update { + __u32 offset; /* offset from the beginning of configuration space */ + __u32 length; /* the length to write to configuration space */ + __u8 buffer[0]; /* buffer used to write from */ +}; + +struct vduse_vq_info { + __u32 index; /* virtqueue index */ + __u32 avail_idx; /* virtqueue state (last_avail_idx) */ + __u64 desc_addr; /* address of desc area */ + __u64 driver_addr; /* address of driver area */ + __u64 device_addr; /* address of device area */ + __u32 num; /* the size of virtqueue */ + __u8 ready; /* ready status of virtqueue */ +}; + +struct vduse_vq_eventfd { + __u32 index; /* virtqueue index */ +#define VDUSE_EVENTFD_DEASSIGN -1 + int fd; /* eventfd, -1 means de-assigning the eventfd */ +}; + +#define VDUSE_BASE 0x81 + +/* Get the version of VDUSE API. This is used for future extension */ +#define VDUSE_GET_API_VERSION _IOR(VDUSE_BASE, 0x00, __u64) + +/* Set the version of VDUSE API. */ +#define VDUSE_SET_API_VERSION _IOW(VDUSE_BASE, 0x01, __u64) + +/* Create a vduse device which is represented by a char device (/dev/vduse/) */ +#define VDUSE_CREATE_DEV _IOW(VDUSE_BASE, 0x02, struct vduse_dev_config) + +/* Destroy a vduse device. Make sure there are no references to the char device */ +#define VDUSE_DESTROY_DEV _IOW(VDUSE_BASE, 0x03, char[VDUSE_NAME_MAX]) + +/* + * Get a file descriptor for the first overlapped iova region, + * -EINVAL means the iova region doesn't exist. + */ +#define VDUSE_IOTLB_GET_FD _IOWR(VDUSE_BASE, 0x04, struct vduse_iotlb_entry) + +/* Get the negotiated features */ +#define VDUSE_DEV_GET_FEATURES _IOR(VDUSE_BASE, 0x05, __u64) + +/* Update the configuration space */ +#define VDUSE_DEV_UPDATE_CONFIG _IOW(VDUSE_BASE, 0x06, struct vduse_config_update) + +/* Get the specified virtqueue's information */ +#define VDUSE_VQ_GET_INFO _IOWR(VDUSE_BASE, 0x07, struct vduse_vq_info) + +/* Setup an eventfd to receive kick for virtqueue */ +#define VDUSE_VQ_SETUP_KICKFD _IOW(VDUSE_BASE, 0x08, struct vduse_vq_eventfd) + +/* Inject an interrupt for specific virtqueue */ +#define VDUSE_VQ_INJECT_IRQ _IOW(VDUSE_BASE, 0x09, __u32) + +#endif /* _UAPI_VDUSE_H_ */